# Activations

Special algorithms for gradient descent.

## Note the below algorithms are not reflective of all of the availabe choices for activations. If you need more than the below please consider using Samediff with a much wider array of features. Samediff can be embedded in a dl4j network using the layers in:

Note the below algorithms are not reflective of all of the availabe choices for activations. If you need more than the below please consider using Samediff with a much wider array of features. Samediff can be embedded in a dl4j network using the layers in:

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()

// add hyperparameters and other layers

.addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")

// add more layers and output

.build();

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

f(x) = max(0, x)

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

f(x) = min(max(input, cutoff), 6)

⎧ 1, if x > 1

f(x) = ⎨ -1, if x < -1

⎩ x, otherwise

f(x) = 1 / (1 + exp(-x))

GELU activation function - Gaussian Error Linear Units

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

f(x) = x

f_i(x) = x_i / (1+ | x_i | ) |

f(x) = min(1, max(0, 0.2x + 0.5))

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

f(x) = x^3

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

f(x) = x sigmoid(x)

f(x) = log(1+e^x)

Last modified 11mo ago