> For the complete documentation index, see [llms.txt](https://deeplearning4j.konduit.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/nd4j/activations.md).

# Activations

Activation functions introduce non-linearity into neural networks. Without them, stacking multiple layers would be equivalent to a single linear transformation, regardless of depth. ND4J provides all activation functions through a common `Activation` enum and the `IActivation` interface.

## Using Activations

### In Layer Configuration

The most common way to use activations is through the `Activation` enum when configuring layers:

```java
import org.nd4j.linalg.activations.Activation;

new DenseLayer.Builder()
    .nIn(784).nOut(256)
    .activation(Activation.RELU)
    .build()
```

### As a Standalone Layer

You can use `ActivationLayer` when you want the activation separate from the linear transformation:

```java
import org.deeplearning4j.nn.conf.layers.ActivationLayer;

.addLayer("relu", new ActivationLayer(Activation.RELU), "dense1")
```

### Directly on INDArrays

Apply activations to raw tensors using the `Transforms` class:

```java
import org.nd4j.linalg.ops.transforms.Transforms;

INDArray x = Nd4j.create(new double[]{-2, -1, 0, 1, 2});

INDArray relu = Transforms.relu(x, false);       // copy
// [0, 0, 0, 1, 2]

INDArray sigmoid = Transforms.sigmoid(x, false);  // copy
// [0.1192, 0.2689, 0.5, 0.7311, 0.8808]

INDArray tanh = Transforms.tanh(x, false);        // copy
// [-0.9640, -0.7616, 0, 0.7616, 0.9640]
```

The second argument controls in-place behavior: `true` modifies `x` directly, `false` returns a copy.

### Via the IActivation Interface

For programmatic access, get the `IActivation` instance from the enum:

```java
import org.nd4j.linalg.activations.IActivation;

IActivation reluFn = Activation.RELU.getActivationFunction();
INDArray activated = reluFn.getActivation(input.dup(), true);
```

## Available Activations

All activations are in the `org.nd4j.linalg.activations.Activation` enum. Implementations are in `org.nd4j.linalg.activations.impl`.

### ReLU Family

| Activation       | Enum Value        | Formula                                                    | Notes                                                                               |
| ---------------- | ----------------- | ---------------------------------------------------------- | ----------------------------------------------------------------------------------- |
| ReLU             | `RELU`            | f(x) = max(0, x)                                           | Default choice for hidden layers. Fast, effective, but can suffer from "dying ReLU" |
| Leaky ReLU       | `LEAKYRELU`       | f(x) = max(alpha \* x, x), alpha=0.01                      | Avoids dying ReLU by allowing small negative gradients                              |
| RReLU            | `RRELU`           | f(x) = max(alpha \* x, x), alpha \~ U(l,u)                 | Randomized leaky ReLU. l=1/8, u=1/3 by default. Uses (l+u)/2 at test time           |
| ReLU6            | `RELU6`           | f(x) = min(max(0, x), 6)                                   | Capped ReLU for mobile/quantized networks                                           |
| Thresholded ReLU | `THRESHOLDEDRELU` | f(x) = x if x > theta, 0 otherwise. theta=1.0              | Sparse activations                                                                  |
| PReLU            | Via `PReLULayer`  | f(x) = max(alpha \* x, x), alpha learned                   | Parametric ReLU — alpha is a trainable parameter                                    |
| ELU              | `ELU`             | f(x) = x if x >= 0, alpha\*(exp(x)-1) if x < 0. alpha=1.0  | Smooth alternative to ReLU with negative values                                     |
| SELU             | `SELU`            | f(x) = lambda \* (x if x >= 0, alpha\*(exp(x)-1) if x < 0) | Self-normalizing. Use with `WeightInit.LECUN_NORMAL` and `AlphaDropout`             |

### Smooth Activations

| Activation | Enum Value | Formula                                           | Notes                                                              |
| ---------- | ---------- | ------------------------------------------------- | ------------------------------------------------------------------ |
| GELU       | `GELU`     | f(x) = x \* Phi(x), where Phi is the Gaussian CDF | Used in Transformers (BERT, GPT). Smooth approximation of ReLU     |
| Mish       | `MISH`     | f(x) = x \* tanh(softplus(x))                     | Self-regularized, smooth. Good general-purpose alternative to ReLU |
| Swish      | `SWISH`    | f(x) = x \* sigmoid(x)                            | Smooth, non-monotonic. Discovered via neural architecture search   |
| Softplus   | `SOFTPLUS` | f(x) = log(1 + exp(x))                            | Smooth approximation of ReLU                                       |

### Sigmoid Family

| Activation   | Enum Value    | Formula                           | Notes                                                    |
| ------------ | ------------- | --------------------------------- | -------------------------------------------------------- |
| Sigmoid      | `SIGMOID`     | f(x) = 1 / (1 + exp(-x))          | Output range (0,1). Use for binary classification output |
| Hard Sigmoid | `HARDSIGMOID` | f(x) = min(1, max(0, 0.2x + 0.5)) | Fast piecewise linear approximation of sigmoid           |

### Tanh Family

| Activation     | Enum Value      | Formula                                                 | Notes                                                                       |
| -------------- | --------------- | ------------------------------------------------------- | --------------------------------------------------------------------------- |
| Tanh           | `TANH`          | f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))          | Output range (-1,1). Alternative to ReLU for hidden layers, especially RNNs |
| Hard Tanh      | `HARDTANH`      | f(x) = -1 if x < -1, 1 if x > 1, x otherwise            | Fast piecewise linear approximation of tanh                                 |
| Rectified Tanh | `RECTIFIEDTANH` | f(x) = max(0, tanh(x))                                  | Combination of ReLU and tanh                                                |
| Rational Tanh  | `RATIONALTANH`  | f(x) = 1.7159 \* tanh(2x/3) with rational approximation | Fast approximation from LeCun 1998                                          |

### Output Activations

| Activation | Enum Value | Formula                                             | Notes                                                                       |
| ---------- | ---------- | --------------------------------------------------- | --------------------------------------------------------------------------- |
| Softmax    | `SOFTMAX`  | f\_i(x) = exp(x\_i - max) / sum\_j(exp(x\_j - max)) | Multi-class classification output. Outputs sum to 1. Pair with `LossMCXENT` |
| Identity   | `IDENTITY` | f(x) = x                                            | Regression output (linear). Pair with `LossMSE`                             |

### Other Activations

| Activation | Enum Value | Formula                | Notes                                                                 |
| ---------- | ---------- | ---------------------- | --------------------------------------------------------------------- |
| Softsign   | `SOFTSIGN` | f(x) = x / (1 + \|x\|) | Alternative to tanh — converges polynomially instead of exponentially |
| Cube       | `CUBE`     | f(x) = x^3             | Rarely used in practice                                               |

## Recommended Pairings

Choosing the right activation depends on the layer type and task:

### Hidden Layers

| Network Type          | Activation | Weight Init               | Why                                                         |
| --------------------- | ---------- | ------------------------- | ----------------------------------------------------------- |
| Feed-forward, CNN     | `RELU`     | `WeightInit.RELU`         | Fast convergence, avoids vanishing gradient                 |
| RNN (LSTM, GRU)       | `TANH`     | `WeightInit.XAVIER`       | Bounded output prevents exploding activations in recurrence |
| Self-normalizing nets | `SELU`     | `WeightInit.LECUN_NORMAL` | Maintains mean=0, variance=1 through layers                 |
| Transformer blocks    | `GELU`     | `WeightInit.XAVIER`       | Smooth, works well with attention mechanisms                |

### Output Layers

| Task                       | Activation | Loss Function          | Why                                          |
| -------------------------- | ---------- | ---------------------- | -------------------------------------------- |
| Multi-class classification | `SOFTMAX`  | `LossMCXENT`           | Outputs are valid probabilities summing to 1 |
| Binary classification      | `SIGMOID`  | `LossBinaryXENT`       | Output is probability in (0,1)               |
| Multi-label classification | `SIGMOID`  | `LossBinaryXENT`       | Each output is independent binary decision   |
| Regression                 | `IDENTITY` | `LossMSE` or `LossMAE` | Linear output, no bounds                     |
| Bounded regression \[0,1]  | `SIGMOID`  | `LossMSE`              | Output constrained to (0,1)                  |

## Custom Activations

Implement the `IActivation` interface at `org.nd4j.linalg.activations.IActivation`:

```java
import org.nd4j.linalg.activations.BaseActivationFunction;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.common.primitives.Pair;

public class CustomActivation extends BaseActivationFunction {

    @Override
    public INDArray getActivation(INDArray in, boolean training) {
        // Modify 'in' in-place and return it
        // Example: f(x) = x * sigmoid(x)  (Swish)
        INDArray sigmoid = Transforms.sigmoid(in.dup(), false);
        in.muli(sigmoid);
        return in;
    }

    @Override
    public Pair<INDArray, INDArray> backprop(INDArray in, INDArray epsilon) {
        // Compute activation gradient and multiply by upstream gradient
        // Return: Pair(gradient, null)
        // The second element is null for activations without learnable parameters
        INDArray gradient = computeGradient(in);  // your derivative
        gradient.muli(epsilon);
        return new Pair<>(gradient, null);
    }
}
```

Use a custom activation in a layer:

```java
new DenseLayer.Builder()
    .nIn(256).nOut(128)
    .activation(new CustomActivation())
    .build()
```

## Activation Function Comparison

For quick reference, key properties of each activation:

| Activation | Range                 | Monotonic | Smooth | Zero-Centered | Computational Cost |
| ---------- | --------------------- | --------- | ------ | ------------- | ------------------ |
| ReLU       | \[0, inf)             | Yes       | No     | No            | Very Low           |
| Leaky ReLU | (-inf, inf)           | Yes       | No     | Yes           | Very Low           |
| ELU        | (-alpha, inf)         | Yes       | Yes    | \~Yes         | Medium             |
| SELU       | (-lambda\*alpha, inf) | Yes       | Yes    | \~Yes         | Medium             |
| GELU       | \~(-0.17, inf)        | No        | Yes    | No            | High               |
| Mish       | \~(-0.31, inf)        | No        | Yes    | No            | High               |
| Swish      | \~(-0.28, inf)        | No        | Yes    | No            | Medium             |
| Sigmoid    | (0, 1)                | Yes       | Yes    | No            | Medium             |
| Tanh       | (-1, 1)               | Yes       | Yes    | Yes           | Medium             |
| Softmax    | (0, 1) per class      | N/A       | Yes    | No            | Medium             |
| Identity   | (-inf, inf)           | Yes       | Yes    | Yes           | None               |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/nd4j/activations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
