NN

CReLU

INDArray CReLU(INDArray x)

SDVariable CReLU(SDVariable x)
SDVariable CReLU(String name, SDVariable x)

Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.

  • x (NUMERIC) - Input variable

batchNorm

INDArray batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int[] axis)

SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)
SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)

Neural network batch normalization operation.

For details, see https://arxiv.org/abs/1502.03167

  • input (NUMERIC) - Input variable.

  • mean (NUMERIC) - Mean value. For 1d axis, this should match input.size(axis)

  • variance (NUMERIC) - Variance value. For 1d axis, this should match input.size(axis)

  • gamma (NUMERIC) - Gamma value. For 1d axis, this should match input.size(axis)

  • beta (NUMERIC) - Beta value. For 1d axis, this should match input.size(axis)

  • epsilon - Epsilon constant for numerical stability (to avoid division by 0)

  • axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.

    For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC

    For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))

biasAdd

Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

  • input (NUMERIC) - 4d input variable

  • bias (NUMERIC) - 1d bias

  • nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].

    Unused for 2d inputs

dotProductAttention

This operation performs dot product attention on the given timeseries input with the given queries

out = sum(similarity(k_i, q) * v_i)

similarity(k, q) = softmax(k q) where x q is the dot product of x and q

Optionally with normalization step:

similarity(k, q) = softmax(k * q / sqrt(size(q))

See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)

Note: This supports multiple queries at once, if only one query is available the queries vector still has to

be 3D but can have queryCount = 1

Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for

both.

Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The

output rank will depend on the input rank.

  • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

    or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]

  • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

    or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]

  • values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

    or 4D array of shape [batchSize, numHeads, featureValues, timesteps]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

dropout

Dropout operation

  • input (NUMERIC) - Input array

  • inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

elu

Element-wise exponential linear unit (ELU) function:

out = x if x > 0

out = a * (exp(x) - 1) if x <= 0

with constant a = 1.0

See: https://arxiv.org/abs/1511.07289

  • x (NUMERIC) - Input variable

gelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415

This method uses the sigmoid approximation

  • x (NUMERIC) - Input variable

hardSigmoid

Element-wise hard sigmoid function:

out[i] = 0 if in[i] <= -2.5

out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5

out[i] = 1 if in[i] >= 2.5

  • x (NUMERIC) - Input variable

hardTanh

Element-wise hard tanh function:

out[i] = -1 if in[i] <= -1

out[1] = in[i] if -1 < in[i] < 1

out[i] = 1 if in[i] >= 1

  • x (NUMERIC) - Input variable

hardTanhDerivative

Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)

  • x (NUMERIC) - Input variable

layerNorm

Apply Layer Normalization

y = gain * standardize(x) + bias

  • input (NUMERIC) - Input variable

  • gain (NUMERIC) - Gain

  • bias (NUMERIC) - Bias

  • channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data

  • dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))

leakyRelu

Element-wise leaky ReLU function:

out = x if x >= 0.0

out = alpha * x if x < cutoff

Alpha value is most commonly set to 0.01

  • x (NUMERIC) - Input variable

  • alpha - Cutoff - commonly 0.01

leakyReluDerivative

Leaky ReLU derivative: dOut/dIn given input.

  • x (NUMERIC) - Input variable

  • alpha - Cutoff - commonly 0.01

linear

Linear layer operation: out = mmul(in,w) + bias

Note that bias array is optional

  • input (NUMERIC) - Input data

  • weights (NUMERIC) - Weights variable, shape [nIn, nOut]

  • bias (NUMERIC) - Optional bias variable (may be null)

logSigmoid

Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

  • x (NUMERIC) - Input variable

logSoftmax

Log softmax activation

  • x (NUMERIC) -

logSoftmax

Log softmax activation

  • x (NUMERIC) - Input

  • dimension - Dimension along which to apply log softmax

multiHeadDotProductAttention

This performs multi-headed dot product attention on the given timeseries input

out = concat(head_1, head_2, ..., head_n) * Wo

head_i = dot_product_attention(Wq_i_q, Wk_i_k, Wv_i*v)

Optionally with normalization when calculating the attention for each head.

See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")

This makes use of dot_product_attention OP support for rank 4 inputs.

see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)

  • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

  • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

  • values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

  • Wq (NUMERIC) - input query projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wk (NUMERIC) - input key projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wv (NUMERIC) - input value projection weights of shape [numHeads, projectedValues, featureValues]

  • Wo (NUMERIC) - output projection weights of shape [numHeads * projectedValues, outSize]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

pad

Padding operation

  • input (NUMERIC) - Input tensor

  • padding (NUMERIC) - Padding value

  • PadMode - Padding format - default = CONSTANT

  • constant - Padding constant

preciseGelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415

This method uses the precise method

  • x (NUMERIC) - Input variable

prelu

PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:

out[i] = in[i] if in[i] >= 0

out[i] = in[i] * alpha[i] otherwise

sharedAxes allows you to share learnable parameters along axes.

For example, if the input has shape [batchSize, channels, height, width]

and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an

alpha with shape [channels].

  • input (NUMERIC) - Input data

  • alpha (NUMERIC) - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha.

  • sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))

relu

Element-wise rectified linear function with specified cutoff:

out[i] = in[i] if in[i] >= cutoff

out[i] = 0 otherwise

  • x (NUMERIC) - Input

  • cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0

relu6

Element-wise "rectified linear 6" function with specified cutoff:

out[i] = min(max(in, cutoff), 6)

  • x (NUMERIC) - Input

  • cutoff - Cutoff value for ReLU operation. Usually 0

reluLayer

ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)

Note that bias array is optional

  • input (NUMERIC) - Input data

  • weights (NUMERIC) - Weights variable

  • bias (NUMERIC) - Optional bias variable (may be null)

selu

Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks

out[i] = scale alpha (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0

Uses default scale and alpha values.

  • x (NUMERIC) - Input variable

sigmoid

Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

  • x (NUMERIC) - Input variable

sigmoidDerivative

Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

  • x (NUMERIC) - Input Variable

  • wrt (NUMERIC) - Gradient at the output - dL/dOut. Must have same shape as the input

softmax

Softmax activation, along the specified dimension

  • x (NUMERIC) - Input

  • dimension - Dimension along which to apply softmax - default = -1

softmaxDerivative

Softmax derivative function

  • x (NUMERIC) - Softmax input

  • wrt (NUMERIC) - Gradient at output, dL/dx

  • dimension - Softmax dimension

softplus

Element-wise softplus function: out = log(exp(x) + 1)

  • x (NUMERIC) - Input variable

softsign

Element-wise softsign function: out = x / (abs(x) + 1)

  • x (NUMERIC) - Input variable

softsignDerivative

Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)

  • x (NUMERIC) - Input variable

swish

Element-wise "swish" function: out = x _sigmoid(b_x) with b=1.0

See: https://arxiv.org/abs/1710.05941

  • x (NUMERIC) - Input variable

tanh

Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

  • x (NUMERIC) - Input variable

Last updated

Was this helpful?