# Activations

## What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

## Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

``````GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
// add hyperparameters and other layers
// add more layers and output
.build();``````

## Available activations

### ActivationRectifiedTanh

[source]

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

## ActivationELU

[source]

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

[source]

f(x) = max(0, x)

## ActivationRationalTanh

[source]

Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

## ActivationThresholdedReLU

[source]

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

## ActivationReLU6

[source]

f(x) = min(max(input, cutoff), 6)

## ActivationHardTanH

[source]

``````          ⎧  1, if x >  1
f(x) =   ⎨ -1, if x < -1
⎩  x, otherwise``````

## ActivationSigmoid

[source]

f(x) = 1 / (1 + exp(-x))

## ActivationGELU

[source]

GELU activation function - Gaussian Error Linear Units

## ActivationPReLU

[source]

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

[source]

f(x) = x

## ActivationSoftSign

[source]

 f_i(x) = x_i / (1+ x_i )

## ActivationHardSigmoid

[source]

f(x) = min(1, max(0, 0.2x + 0.5))

## ActivationSoftmax

[source]

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

[source]

f(x) = x^3

## ActivationRReLU

[source]

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

Empirical Evaluation of Rectified Activations in Convolutional Network

## ActivationTanH

[source]

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

## ActivationSELU

[source]

https://arxiv.org/pdf/1706.02515.pdf

## ActivationLReLU

[source]

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

## ActivationSwish

[source]

f(x) = x sigmoid(x)

## ActivationSoftPlus

[source]

f(x) = log(1+e^x)

Last updated