arrow-left
All pages
gitbookPowered by GitBook
1 of 1

Loading...

Activations

Special algorithms for gradient descent.

hashtag
What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

hashtag
Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

hashtag
Available activations

hashtag
ActivationRectifiedTanh

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

hashtag
ActivationELU

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

hashtag
ActivationReLU

f(x) = max(0, x)

hashtag
ActivationRationalTanh

Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

hashtag
ActivationThresholdedReLU

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

hashtag
ActivationReLU6

f(x) = min(max(input, cutoff), 6)

hashtag
ActivationHardTanH

hashtag
ActivationSigmoid

f(x) = 1 / (1 + exp(-x))

hashtag
ActivationGELU

GELU activation function - Gaussian Error Linear Units

hashtag
ActivationPReLU

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

hashtag
ActivationIdentity

f(x) = x

hashtag
ActivationSoftSign

hashtag
ActivationHardSigmoid

f(x) = min(1, max(0, 0.2x + 0.5))

hashtag
ActivationSoftmax

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

hashtag
ActivationCube

f(x) = x^3

hashtag
ActivationRReLU

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

hashtag
ActivationTanH

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

hashtag
ActivationSELU

https://arxiv.org/pdf/1706.02515.pdf

hashtag
ActivationLReLU

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

hashtag
ActivationSwish

f(x) = x sigmoid(x)

hashtag
ActivationSoftPlus

f(x) = log(1+e^x)

f_i(x) = x_i / (1+

x_i

)

[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
Empirical Evaluation of Rectified Activations in Convolutional Networkarrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
    // add hyperparameters and other layers
    .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
    // add more layers and output
    .build();
          ⎧  1, if x >  1
 f(x) =   ⎨ -1, if x < -1
          ⎩  x, otherwise