1 of 1

NN

CReLU

Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.

x (NUMERIC) - Input variable

batchNorm

Neural network batch normalization operation.

For details, see

input (NUMERIC) - Input variable.
mean (NUMERIC) - Mean value. For 1d axis, this should match input.size(axis)
variance (NUMERIC) - Variance value. For 1d axis, this should match input.size(axis)

biasAdd

Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

input (NUMERIC) - 4d input variable
bias (NUMERIC) - 1d bias
nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].
Unused for 2d inputs

dotProductAttention

This operation performs dot product attention on the given timeseries input with the given queries

out = sum(similarity(k_i, q) * v_i)

similarity(k, q) = softmax(k q) where x q is the dot product of x and q

Optionally with normalization step:

similarity(k, q) = softmax(k * q / sqrt(size(q))

See also "Attention is all you need" (, p. 4, eq. 1)

Note: This supports multiple queries at once, if only one query is available the queries vector still has to

be 3D but can have queryCount = 1

Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for

both.

Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The

output rank will depend on the input rank.

queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]
keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]
values

dropout

Dropout operation

input (NUMERIC) - Input array
inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

elu

Element-wise exponential linear unit (ELU) function:

out = x if x > 0

out = a * (exp(x) - 1) if x <= 0

with constant a = 1.0

See:

x (NUMERIC) - Input variable

gelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) -

This method uses the sigmoid approximation

x (NUMERIC) - Input variable

hardSigmoid

Element-wise hard sigmoid function:

out[i] = 0 if in[i] <= -2.5

out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5

out[i] = 1 if in[i] >= 2.5

x (NUMERIC) - Input variable

hardTanh

Element-wise hard tanh function:

out[i] = -1 if in[i] <= -1

out[1] = in[i] if -1 < in[i] < 1

out[i] = 1 if in[i] >= 1

x (NUMERIC) - Input variable

hardTanhDerivative

Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)

x (NUMERIC) - Input variable

layerNorm

Apply Layer Normalization

y = gain * standardize(x) + bias

input (NUMERIC) - Input variable
gain (NUMERIC) - Gain
bias (NUMERIC) - Bias
channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data

leakyRelu

Element-wise leaky ReLU function:

out = x if x >= 0.0

out = alpha * x if x < cutoff

Alpha value is most commonly set to 0.01

x (NUMERIC) - Input variable
alpha - Cutoff - commonly 0.01

leakyReluDerivative

Leaky ReLU derivative: dOut/dIn given input.

x (NUMERIC) - Input variable
alpha - Cutoff - commonly 0.01

linear

Linear layer operation: out = mmul(in,w) + bias

Note that bias array is optional

input (NUMERIC) - Input data
weights (NUMERIC) - Weights variable, shape [nIn, nOut]
bias (NUMERIC) - Optional bias variable (may be null)

logSigmoid

Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

x (NUMERIC) - Input variable

logSoftmax

Log softmax activation

x (NUMERIC) -

logSoftmax

Log softmax activation

x (NUMERIC) - Input
dimension - Dimension along which to apply log softmax

multiHeadDotProductAttention

This performs multi-headed dot product attention on the given timeseries input

out = concat(head_1, head_2, ..., head_n) * Wo

head_i = dot_product_attention(Wq_i_q, Wk_i_k, Wv_i*v)

Optionally with normalization when calculating the attention for each head.

See also "Attention is all you need" (, pp. 4,5, "3.2.2 Multi-Head Attention")

This makes use of dot_product_attention OP support for rank 4 inputs.

see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)

queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

pad

Padding operation

input (NUMERIC) - Input tensor
padding (NUMERIC) - Padding value
PadMode - Padding format - default = CONSTANT
constant

preciseGelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) -

This method uses the precise method

x (NUMERIC) - Input variable

prelu

PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:

out[i] = in[i] if in[i] >= 0

out[i] = in[i] * alpha[i] otherwise

sharedAxes allows you to share learnable parameters along axes.

For example, if the input has shape [batchSize, channels, height, width]

and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an

alpha with shape [channels].

input (NUMERIC) - Input data
alpha (NUMERIC) - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha.
sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))

relu

Element-wise rectified linear function with specified cutoff:

out[i] = in[i] if in[i] >= cutoff

out[i] = 0 otherwise

x (NUMERIC) - Input
cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0

relu6

Element-wise "rectified linear 6" function with specified cutoff:

out[i] = min(max(in, cutoff), 6)

x (NUMERIC) - Input
cutoff - Cutoff value for ReLU operation. Usually 0

reluLayer

ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)

Note that bias array is optional

input (NUMERIC) - Input data
weights (NUMERIC) - Weights variable
bias (NUMERIC) - Optional bias variable (may be null)

selu

Element-wise SeLU function - Scaled exponential Lineal Unit: see

out[i] = scale alpha (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0

Uses default scale and alpha values.

x (NUMERIC) - Input variable

sigmoid

Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

x (NUMERIC) - Input variable

sigmoidDerivative

Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

x (NUMERIC) - Input Variable
wrt (NUMERIC) - Gradient at the output - dL/dOut. Must have same shape as the input

softmax

Softmax activation, along the specified dimension

x (NUMERIC) - Input
dimension - Dimension along which to apply softmax - default = -1

softmaxDerivative

Softmax derivative function

x (NUMERIC) - Softmax input
wrt (NUMERIC) - Gradient at output, dL/dx
dimension - Softmax dimension

softplus

Element-wise softplus function: out = log(exp(x) + 1)

x (NUMERIC) - Input variable

softsign

Element-wise softsign function: out = x / (abs(x) + 1)

x (NUMERIC) - Input variable

softsignDerivative

Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)

x (NUMERIC) - Input variable

swish

Element-wise "swish" function: out = x _sigmoid(b_x) with b=1.0

See:

x (NUMERIC) - Input variable

tanh

Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

x (NUMERIC) - Input variable

NN

CReLU

INDArray CReLU(INDArray x)

SDVariable CReLU(SDVariable x)
SDVariable CReLU(String name, SDVariable x)

x (NUMERIC) - Input variable

batchNorm

Neural network batch normalization operation.

For details, see

input (NUMERIC) - Input variable.
mean (NUMERIC) - Mean value. For 1d axis, this should match input.size(axis)
variance (NUMERIC) - Variance value. For 1d axis, this should match input.size(axis)

biasAdd

Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

input (NUMERIC) - 4d input variable
bias (NUMERIC) - 1d bias
nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].
Unused for 2d inputs

dotProductAttention

This operation performs dot product attention on the given timeseries input with the given queries

out = sum(similarity(k_i, q) * v_i)

similarity(k, q) = softmax(k q) where x q is the dot product of x and q

Optionally with normalization step:

similarity(k, q) = softmax(k * q / sqrt(size(q))

See also "Attention is all you need" (, p. 4, eq. 1)

Note: This supports multiple queries at once, if only one query is available the queries vector still has to

be 3D but can have queryCount = 1

Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for

both.

Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The

output rank will depend on the input rank.

queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]
keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]
values

dropout

Dropout operation

input (NUMERIC) - Input array
inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

elu

Element-wise exponential linear unit (ELU) function:

out = x if x > 0

out = a * (exp(x) - 1) if x <= 0

with constant a = 1.0

See:

x (NUMERIC) - Input variable

gelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) -

This method uses the sigmoid approximation

x (NUMERIC) - Input variable

hardSigmoid

Element-wise hard sigmoid function:

out[i] = 0 if in[i] <= -2.5

out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5

out[i] = 1 if in[i] >= 2.5

x (NUMERIC) - Input variable

hardTanh

Element-wise hard tanh function:

out[i] = -1 if in[i] <= -1

out[1] = in[i] if -1 < in[i] < 1

out[i] = 1 if in[i] >= 1

x (NUMERIC) - Input variable

hardTanhDerivative

Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)

x (NUMERIC) - Input variable

layerNorm

Apply Layer Normalization

y = gain * standardize(x) + bias

input (NUMERIC) - Input variable
gain (NUMERIC) - Gain
bias (NUMERIC) - Bias
channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data

leakyRelu

Element-wise leaky ReLU function:

out = x if x >= 0.0

out = alpha * x if x < cutoff

Alpha value is most commonly set to 0.01

x (NUMERIC) - Input variable
alpha - Cutoff - commonly 0.01

leakyReluDerivative

Leaky ReLU derivative: dOut/dIn given input.

x (NUMERIC) - Input variable
alpha - Cutoff - commonly 0.01

linear

Linear layer operation: out = mmul(in,w) + bias

Note that bias array is optional

input (NUMERIC) - Input data
weights (NUMERIC) - Weights variable, shape [nIn, nOut]
bias (NUMERIC) - Optional bias variable (may be null)

logSigmoid

Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

x (NUMERIC) - Input variable

logSoftmax

Log softmax activation

x (NUMERIC) -

logSoftmax

Log softmax activation

x (NUMERIC) - Input
dimension - Dimension along which to apply log softmax

multiHeadDotProductAttention

This performs multi-headed dot product attention on the given timeseries input

out = concat(head_1, head_2, ..., head_n) * Wo

head_i = dot_product_attention(Wq_i_q, Wk_i_k, Wv_i*v)

Optionally with normalization when calculating the attention for each head.

See also "Attention is all you need" (, pp. 4,5, "3.2.2 Multi-Head Attention")

This makes use of dot_product_attention OP support for rank 4 inputs.

see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)

queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]
keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]
values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

pad

Padding operation

input (NUMERIC) - Input tensor
padding (NUMERIC) - Padding value
PadMode - Padding format - default = CONSTANT
constant

preciseGelu

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) -

This method uses the precise method

x (NUMERIC) - Input variable

prelu

PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:

out[i] = in[i] if in[i] >= 0

out[i] = in[i] * alpha[i] otherwise

sharedAxes allows you to share learnable parameters along axes.

For example, if the input has shape [batchSize, channels, height, width]

and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an

alpha with shape [channels].

input (NUMERIC) - Input data
alpha (NUMERIC) - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha.
sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))

relu

Element-wise rectified linear function with specified cutoff:

out[i] = in[i] if in[i] >= cutoff

out[i] = 0 otherwise

x (NUMERIC) - Input
cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0

relu6

Element-wise "rectified linear 6" function with specified cutoff:

out[i] = min(max(in, cutoff), 6)

x (NUMERIC) - Input
cutoff - Cutoff value for ReLU operation. Usually 0

reluLayer

ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)

Note that bias array is optional

input (NUMERIC) - Input data
weights (NUMERIC) - Weights variable
bias (NUMERIC) - Optional bias variable (may be null)

selu

Element-wise SeLU function - Scaled exponential Lineal Unit: see

out[i] = scale alpha (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0

Uses default scale and alpha values.

x (NUMERIC) - Input variable

sigmoid

Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

x (NUMERIC) - Input variable

sigmoidDerivative

Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

x (NUMERIC) - Input Variable
wrt (NUMERIC) - Gradient at the output - dL/dOut. Must have same shape as the input

softmax

Softmax activation, along the specified dimension

x (NUMERIC) - Input
dimension - Dimension along which to apply softmax - default = -1

softmaxDerivative

Softmax derivative function

x (NUMERIC) - Softmax input
wrt (NUMERIC) - Gradient at output, dL/dx
dimension - Softmax dimension

softplus

Element-wise softplus function: out = log(exp(x) + 1)

x (NUMERIC) - Input variable

softsign

Element-wise softsign function: out = x / (abs(x) + 1)

x (NUMERIC) - Input variable

softsignDerivative

Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)

x (NUMERIC) - Input variable

swish

Element-wise "swish" function: out = x _sigmoid(b_x) with b=1.0

See:

x (NUMERIC) - Input variable

tanh

Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

x (NUMERIC) - Input variable