> For the complete documentation index, see [llms.txt](https://deeplearning4j.konduit.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/deeplearning4j/multilayernetwork/layers.md).

# Layers Reference

### Overview

Layers are the building blocks of both `MultiLayerNetwork` and `ComputationGraph`. Each layer has a builder class that follows the same pattern:

```java
new LayerType.Builder()
    .nIn(inputSize)
    .nOut(outputSize)
    .activation(Activation.RELU)
    // ...other options...
    .build()
```

Layers inherit common options (weight init, updater, regularization, dropout) from the global `NeuralNetConfiguration.Builder` configuration, and can override them individually.

***

### Common Builder Options (All Layers)

| Method                        | Description                                                  |
| ----------------------------- | ------------------------------------------------------------ |
| `.nIn(int)`                   | Number of input units / channels                             |
| `.nOut(int)`                  | Number of output units / channels                            |
| `.activation(Activation)`     | Activation function                                          |
| `.weightInit(WeightInit)`     | Weight initialization scheme                                 |
| `.updater(IUpdater)`          | Per-layer optimizer override                                 |
| `.l1(double)` / `.l2(double)` | Per-layer regularization                                     |
| `.dropOut(double)`            | Retain probability for dropout applied to this layer's input |
| `.hasBias(boolean)`           | Whether to include a bias parameter (default: true)          |
| `.dist(Distribution)`         | Weight distribution (used with `WeightInit.DISTRIBUTION`)    |

***

### DenseLayer

**Class:** `org.deeplearning4j.nn.conf.layers.DenseLayer` **Source:** [DenseLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DenseLayer.java)

A standard fully connected feedforward layer. Computes `output = activation(W * input + b)`.

#### Builder Parameters

| Parameter      | Type       | Default  | Description                                          |
| -------------- | ---------- | -------- | ---------------------------------------------------- |
| `nIn`          | int        | required | Number of input features                             |
| `nOut`         | int        | required | Number of output units                               |
| `activation`   | Activation | RELU     | Activation function                                  |
| `hasBias`      | boolean    | true     | Include bias vector                                  |
| `hasLayerNorm` | boolean    | false    | Apply layer normalization after the linear transform |

#### Example

```java
new DenseLayer.Builder()
    .nIn(256)
    .nOut(128)
    .activation(Activation.RELU)
    .weightInit(WeightInit.XAVIER)
    .hasBias(true)
    .build()
```

#### With Layer Normalization

```java
new DenseLayer.Builder()
    .nIn(256)
    .nOut(128)
    .activation(Activation.RELU)
    .hasLayerNorm(true)   // applies LayerNorm before activation
    .build()
```

***

### OutputLayer

**Class:** `org.deeplearning4j.nn.conf.layers.OutputLayer` **Source:** [OutputLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/OutputLayer.java)

An output layer that contains a fully connected linear transform followed by an activation and a loss function. This is the final layer for training in `MultiLayerNetwork`. `OutputLayer` has learnable parameters (weights + bias), which means it can project from a different `nIn` to `nOut`.

#### Builder Parameters

| Parameter      | Type         | Description                                                                 |
| -------------- | ------------ | --------------------------------------------------------------------------- |
| `lossFunction` | LossFunction | Required. E.g., `NEGATIVELOGLIKELIHOOD`, `MSE`, `MCXENT`, `XENT`            |
| `nIn`          | int          | Input size                                                                  |
| `nOut`         | int          | Number of output units (classes for classification, outputs for regression) |
| `activation`   | Activation   | `SOFTMAX` for multi-class, `SIGMOID` for binary, `IDENTITY` for regression  |

#### Classification Example

```java
new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
    .nIn(128)
    .nOut(10)
    .activation(Activation.SOFTMAX)
    .build()
```

#### Regression Example

```java
new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
    .nIn(64)
    .nOut(1)
    .activation(Activation.IDENTITY)
    .build()
```

#### Common Loss Functions

| LossFunction            | Use Case                                                |
| ----------------------- | ------------------------------------------------------- |
| `NEGATIVELOGLIKELIHOOD` | Multi-class classification with SOFTMAX                 |
| `MCXENT`                | Multi-class cross-entropy (equivalent to NLL + SOFTMAX) |
| `XENT`                  | Binary cross-entropy with SIGMOID                       |
| `MSE`                   | Mean squared error for regression                       |
| `MAE`                   | Mean absolute error for regression                      |
| `HINGE`                 | SVM-style hinge loss                                    |
| `COSINE`                | Cosine proximity loss                                   |

***

### LossLayer

**Class:** `org.deeplearning4j.nn.conf.layers.LossLayer` **Source:** [LossLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LossLayer.java)

A parameter-free output layer that applies a loss function to its inputs without any linear transform. Unlike `OutputLayer`, `LossLayer` has no weights — it simply wraps whatever activation comes in with a loss function. Output size equals input size.

Use `LossLayer` when you have already projected to the correct output dimension in the previous layer and only need a loss function.

#### Example

```java
// Previous layer outputs 10 units with softmax already applied
new LossLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
    .activation(Activation.SOFTMAX)
    .build()
```

***

### ActivationLayer

**Class:** `org.deeplearning4j.nn.conf.layers.ActivationLayer` **Source:** [ActivationLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ActivationLayer.java)

Applies an activation function as a standalone layer with no learned parameters. Useful in `ComputationGraph` when you need to apply an activation after a residual addition, or when building custom architectures where activation needs to be a named vertex.

#### Example

```java
new ActivationLayer.Builder()
    .activation(Activation.RELU)
    .build()

// Or with an IActivation instance for parameterized activations
new ActivationLayer.Builder()
    .activation(new ActivationPReLU())
    .build()
```

***

### DropoutLayer

**Class:** `org.deeplearning4j.nn.conf.layers.DropoutLayer` **Source:** [DropoutLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DropoutLayer.java)

Applies dropout as a standalone layer. At training time, activations are randomly zeroed with probability `(1 - retainProbability)`. At test time, activations pass through unchanged.

This differs from the `.dropOut()` option on other layers in that it is an explicit layer in the graph (with a named vertex in `ComputationGraph`), rather than dropout applied implicitly to the previous layer's output.

#### Builder Parameters

| Parameter            | Type   | Description                                                            |
| -------------------- | ------ | ---------------------------------------------------------------------- |
| Constructor `double` | double | Retain probability (e.g., `0.5` means 50% chance of keeping each unit) |

#### Example

```java
// As a standalone layer between two dense layers
.layer(new DenseLayer.Builder().nIn(256).nOut(256).activation(Activation.RELU).build())
.layer(new DropoutLayer.Builder(0.5).build())
.layer(new DenseLayer.Builder().nIn(256).nOut(128).activation(Activation.RELU).build())
```

***

### BatchNormalization

**Class:** `org.deeplearning4j.nn.conf.layers.BatchNormalization` **Source:** [BatchNormalization.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/BatchNormalization.java)

Normalizes layer inputs to zero mean and unit variance per minibatch during training, then applies a learned scale (`gamma`) and shift (`beta`). At inference time, running mean/variance statistics accumulated during training are used.

#### Builder Parameters

| Parameter            | Type    | Default | Description                                         |
| -------------------- | ------- | ------- | --------------------------------------------------- |
| `nIn`                | int     | auto    | Number of input channels/features                   |
| `nOut`               | int     | auto    | Must equal `nIn`                                    |
| `decay`              | double  | 0.9     | Momentum for running statistics update              |
| `eps`                | double  | 1e-5    | Small constant for numerical stability              |
| `isMinibatch`        | boolean | true    | Use minibatch statistics during training            |
| `lockGammaBeta`      | boolean | false   | If true, gamma=1 and beta=0 are fixed (not learned) |
| `cudnnAllowFallback` | boolean | true    | Fall back to non-CuDNN if GPU error occurs          |

#### Example — After a Dense Layer

```java
.layer(new DenseLayer.Builder().nIn(256).nOut(128).activation(Activation.IDENTITY).build())
.layer(new BatchNormalization.Builder().nIn(128).nOut(128).build())
.layer(new ActivationLayer.Builder().activation(Activation.RELU).build())
```

#### Example — After a Convolutional Layer

BatchNormalization normalizes across all spatial positions per channel when used after convolutional layers:

```java
.layer(new ConvolutionLayer.Builder(3, 3).nIn(64).nOut(128)
    .activation(Activation.IDENTITY).build())
.layer(new BatchNormalization.Builder().build())
.layer(new ActivationLayer.Builder().activation(Activation.RELU).build())
```

When `setInputType()` is used, `nIn`/`nOut` for `BatchNormalization` can be inferred automatically.

***

### EmbeddingLayer

**Class:** `org.deeplearning4j.nn.conf.layers.EmbeddingLayer` **Source:** [EmbeddingLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/EmbeddingLayer.java)

Maps integer indices to dense embedding vectors. Mathematically equivalent to a `DenseLayer` with a one-hot input, but far more efficient for large vocabularies because it performs a direct row lookup rather than a full matrix multiply.

**Restrictions:**

* Can only be the first layer of a network.
* Input shape: `[minibatch, 1]` — a single integer index per example.
* Output shape: `[minibatch, embeddingSize]`.

#### Builder Parameters

| Parameter                          | Type                 | Default  | Description                                                                 |
| ---------------------------------- | -------------------- | -------- | --------------------------------------------------------------------------- |
| `nIn`                              | int                  | required | Vocabulary size (number of distinct tokens)                                 |
| `nOut`                             | int                  | required | Embedding dimension                                                         |
| `hasBias`                          | boolean              | false    | Include per-embedding bias                                                  |
| `activation`                       | Activation           | IDENTITY | Activation applied after lookup                                             |
| `weightInit(INDArray)`             | INDArray             | —        | Initialize from a pre-trained embedding matrix `[vocabSize, embeddingSize]` |
| `weightInit(EmbeddingInitializer)` | EmbeddingInitializer | —        | Initialize from a Word2Vec model or similar                                 |

#### Example — Basic

```java
new EmbeddingLayer.Builder()
    .nIn(10000)    // vocabulary of 10,000 tokens
    .nOut(128)     // 128-dimensional embeddings
    .build()
```

#### Example — Pre-trained Embeddings

```java
INDArray pretrainedVectors = /* shape [vocabSize, 300] loaded from Word2Vec */;

new EmbeddingLayer.Builder()
    .nIn(vocabSize)
    .nOut(300)
    .weightInit(pretrainedVectors)
    .build()
```

***

### EmbeddingSequenceLayer

**Class:** `org.deeplearning4j.nn.conf.layers.EmbeddingSequenceLayer` **Source:** [EmbeddingSequenceLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/EmbeddingSequenceLayer.java)

Sequence-aware version of `EmbeddingLayer`. Accepts a sequence of integer indices per example and outputs a sequence of embedding vectors.

* Input shape: `[minibatch, inputLength]` or `[minibatch, 1, inputLength]`.
* Output shape: `[minibatch, nOut, inputLength]` — a 3D time-series tensor ready for RNN or CNN-1D layers.

**Restrictions:** Can only be the first layer of a network.

#### Builder Parameters

| Parameter              | Type     | Default  | Description                                 |
| ---------------------- | -------- | -------- | ------------------------------------------- |
| `nIn`                  | int      | required | Vocabulary size                             |
| `nOut`                 | int      | required | Embedding dimension                         |
| `inputLength`          | int      | required | Sequence length                             |
| `inferInputLength`     | boolean  | false    | Infer sequence length from input at runtime |
| `hasBias`              | boolean  | false    | Include bias                                |
| `weightInit(INDArray)` | INDArray | —        | Pre-trained embedding matrix                |

#### Example

```java
new EmbeddingSequenceLayer.Builder()
    .nIn(5000)          // vocabulary
    .nOut(64)           // embedding dim
    .inputLength(100)   // sequence length
    .build()
```

Use this in conjunction with LSTM or Conv1D layers for text classification:

```java
.layer(new EmbeddingSequenceLayer.Builder().nIn(5000).nOut(64).inputLength(100).build())
.layer(new LSTM.Builder().nIn(64).nOut(128).activation(Activation.TANH).build())
.layer(new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
    .nIn(128).nOut(numClasses).activation(Activation.SOFTMAX).build())
```

***

### GlobalPoolingLayer

**Class:** `org.deeplearning4j.nn.conf.layers.GlobalPoolingLayer` **Source:** [GlobalPoolingLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GlobalPoolingLayer.java)

Reduces spatial or temporal dimensions to a single value per channel/feature via pooling. Works with 2D (feedforward), 3D (time series/RNN), 4D (CNN), and 5D (CNN3D) inputs.

Default behaviour (collapseDimensions=true):

* 3D time series `[mb, features, T]` -> 2D `[mb, features]`
* 4D CNN `[mb, C, H, W]` -> 2D `[mb, C]`
* 5D CNN3D `[mb, C, D, H, W]` -> 2D `[mb, C]`

Supports masking for variable-length sequences.

#### Builder Parameters

| Parameter            | Type        | Default | Description                            |
| -------------------- | ----------- | ------- | -------------------------------------- |
| `poolingType`        | PoolingType | AVG     | `MAX`, `AVG`, `SUM`, `PNORM`           |
| `collapseDimensions` | boolean     | true    | Collapse spatial/temporal dims to 1    |
| `pnorm`              | int         | 2       | P value, only for `PNORM` pooling      |
| `poolingDimensions`  | int\[]      | auto    | Override which dimensions to pool over |

#### Example — Global Average Pooling after CNN

```java
// Replaces Flatten + Dense with a parameter-free global pool
.layer(new ConvolutionLayer.Builder(3, 3).nIn(64).nOut(128).activation(Activation.RELU).build())
.layer(new GlobalPoolingLayer.Builder(PoolingType.AVG).build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
    .nIn(128).nOut(numClasses).activation(Activation.SOFTMAX).build())
```

#### Example — Global Max Pooling for sequence classification

```java
// After EmbeddingSequenceLayer + Conv1D: pool across time
.layer(new GlobalPoolingLayer.Builder(PoolingType.MAX).build())
```

***

### LocalResponseNormalization

**Class:** `org.deeplearning4j.nn.conf.layers.LocalResponseNormalization` **Source:** [LocalResponseNormalization.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocalResponseNormalization.java)

Implements the local response normalization described in the AlexNet paper. Normalizes over `n` adjacent feature maps. Largely superseded by Batch Normalization in modern architectures but included for legacy compatibility.

#### Builder Parameters

| Parameter | Default | Description                    |
| --------- | ------- | ------------------------------ |
| `k`       | 2.0     | Additive constant              |
| `n`       | 5.0     | Number of adjacent kernel maps |
| `alpha`   | 1e-4    | Scaling constant               |
| `beta`    | 0.75    | Exponent                       |

***

### ElementWiseMultiplicationLayer

**Class:** `org.deeplearning4j.nn.conf.layers.misc.ElementWiseMultiplicationLayer` **Source:** [ElementWiseMultiplicationLayer.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/misc/ElementWiseMultiplicationLayer.java)

Computes `output = activation(input . w + b)` where `.` is element-wise multiplication and `w` is a learnable weight vector of length `nOut`. Input and output sizes are the same.

Useful for gating mechanisms and attention-like weighting.

***

### RepeatVector

**Class:** `org.deeplearning4j.nn.conf.layers.misc.RepeatVector` **Source:** [RepeatVector.java](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/misc/RepeatVector.java)

Repeats a 2D input `[mb, length]` a specified number of times to produce a 3D output `[mb, n, length]`. Commonly used in sequence-to-sequence encoder-decoder architectures to broadcast the encoder's context vector across all decoder time steps.

#### Builder Parameters

| Parameter               | Type | Description                   |
| ----------------------- | ---- | ----------------------------- |
| `repetitionFactor(int)` | int  | Number of times to repeat (n) |

#### Example

```java
// Encoder produces [mb, 128]; repeat for 10 decoder time steps -> [mb, 10, 128]
new RepeatVector.Builder()
    .repetitionFactor(10)
    .nIn(128)
    .nOut(128)
    .build()
```

***

### MaskLayer

**Class:** `org.deeplearning4j.nn.conf.layers.util.MaskLayer`

Applies the mask array to both forward pass activations and backward pass gradients. Works with 2D, 3D, and 4D inputs. Use when you need to apply masking logic at a specific point in the graph rather than relying on the implicit masking propagated by `DataSet.featuresMaskArray`.

***

### MaskZeroLayer

**Class:** `org.deeplearning4j.nn.conf.layers.util.MaskZeroLayer`

Wraps a recurrent layer and masks time steps where the input activation equals the specified masking value (default: `0.0`). Input shape: `[batch, inputSize, timesteps]`. Useful for variable-length sequence handling without explicit mask arrays.

***

### LocallyConnected1D / LocallyConnected2D

Locally connected layers are like convolutions except that each spatial position has its own independent filter weights (no weight sharing). They are more parameter-heavy than convolutions but more flexible.

#### LocallyConnected1D

```java
new LocallyConnected1D.Builder()
    .nIn(32).nOut(64)
    .kernelSize(3).stride(1).padding(1)
    .activation(Activation.RELU)
    .setInputSize(100)  // sequence length of the input
    .build()
```

#### LocallyConnected2D

```java
new LocallyConnected2D.Builder()
    .nIn(3).nOut(32)
    .kernelSize(3, 3).stride(1, 1).padding(1, 1)
    .activation(Activation.RELU)
    .setInputSize(28, 28)  // spatial dimensions of input
    .build()
```

***

### Full Configuration Example (MLP Classifier, M2.1)

```java
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.*;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.buffer.DataType;
import org.nd4j.linalg.learning.config.Adam;
import org.nd4j.linalg.lossfunctions.LossFunctions;

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .seed(42)
    .dataType(DataType.FLOAT)
    .updater(new Adam(0.001))
    .weightInit(WeightInit.XAVIER)
    .l2(1e-5)
    .list()
    .layer(new DenseLayer.Builder()
        .nIn(784).nOut(512)
        .activation(Activation.RELU)
        .build())
    .layer(new BatchNormalization.Builder().nIn(512).nOut(512).build())
    .layer(new DropoutLayer.Builder(0.5).build())
    .layer(new DenseLayer.Builder()
        .nIn(512).nOut(256)
        .activation(Activation.RELU)
        .build())
    .layer(new BatchNormalization.Builder().nIn(256).nOut(256).build())
    .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
        .nIn(256).nOut(10)
        .activation(Activation.SOFTMAX)
        .build())
    .build();

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
System.out.println(model.summary());
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/deeplearning4j/multilayernetwork/layers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
