NN

CReLU

INDArray CReLU(INDArray x)

SDVariable CReLU(SDVariable x)
SDVariable CReLU(String name, SDVariable x)

Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.

  • x (NUMERIC) - Input variable

batchNorm

INDArray batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int[] axis)

SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)
SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)

Neural network batch normalization operation.

For details, see https://arxiv.org/abs/1502.03167

  • input (NUMERIC) - Input variable.

  • mean (NUMERIC) - Mean value. For 1d axis, this should match input.size(axis)

  • variance (NUMERIC) - Variance value. For 1d axis, this should match input.size(axis)

  • gamma (NUMERIC) - Gamma value. For 1d axis, this should match input.size(axis)

  • beta (NUMERIC) - Beta value. For 1d axis, this should match input.size(axis)

  • epsilon - Epsilon constant for numerical stability (to avoid division by 0)

  • axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.

    For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC

    For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))

biasAdd

INDArray biasAdd(INDArray input, INDArray bias, boolean nchw)

SDVariable biasAdd(SDVariable input, SDVariable bias, boolean nchw)
SDVariable biasAdd(String name, SDVariable input, SDVariable bias, boolean nchw)

Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

  • input (NUMERIC) - 4d input variable

  • bias (NUMERIC) - 1d bias

  • nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].

    Unused for 2d inputs

dotProductAttention

INDArray dotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray mask, boolean scaled)

SDVariable dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
SDVariable dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)

This operation performs dot product attention on the given timeseries input with the given queries

out = sum(similarity(k_i, q) * v_i)

similarity(k, q) = softmax(k q) where x q is the dot product of x and q

Optionally with normalization step:

similarity(k, q) = softmax(k * q / sqrt(size(q))

See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)

Note: This supports multiple queries at once, if only one query is available the queries vector still has to

be 3D but can have queryCount = 1

Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for

both.

Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The

output rank will depend on the input rank.

  • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

    or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]

  • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

    or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]

  • values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

    or 4D array of shape [batchSize, numHeads, featureValues, timesteps]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

dropout

INDArray dropout(INDArray input, double inputRetainProbability)

SDVariable dropout(SDVariable input, double inputRetainProbability)
SDVariable dropout(String name, SDVariable input, double inputRetainProbability)

Dropout operation

  • input (NUMERIC) - Input array

  • inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

elu

INDArray elu(INDArray x)

SDVariable elu(SDVariable x)
SDVariable elu(String name, SDVariable x)

Element-wise exponential linear unit (ELU) function:

out = x if x > 0

out = a * (exp(x) - 1) if x <= 0

with constant a = 1.0

See: https://arxiv.org/abs/1511.07289

  • x (NUMERIC) - Input variable

gelu

INDArray gelu(INDArray x)

SDVariable gelu(SDVariable x)
SDVariable gelu(String name, SDVariable x)

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415

This method uses the sigmoid approximation

  • x (NUMERIC) - Input variable

hardSigmoid

INDArray hardSigmoid(INDArray x)

SDVariable hardSigmoid(SDVariable x)
SDVariable hardSigmoid(String name, SDVariable x)

Element-wise hard sigmoid function:

out[i] = 0 if in[i] <= -2.5

out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5

out[i] = 1 if in[i] >= 2.5

  • x (NUMERIC) - Input variable

hardTanh

INDArray hardTanh(INDArray x)

SDVariable hardTanh(SDVariable x)
SDVariable hardTanh(String name, SDVariable x)

Element-wise hard tanh function:

out[i] = -1 if in[i] <= -1

out[1] = in[i] if -1 < in[i] < 1

out[i] = 1 if in[i] >= 1

  • x (NUMERIC) - Input variable

hardTanhDerivative

INDArray hardTanhDerivative(INDArray x)

SDVariable hardTanhDerivative(SDVariable x)
SDVariable hardTanhDerivative(String name, SDVariable x)

Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)

  • x (NUMERIC) - Input variable

layerNorm

INDArray layerNorm(INDArray input, INDArray gain, INDArray bias, boolean channelsFirst, int[] dimensions)
INDArray layerNorm(INDArray input, INDArray gain, boolean channelsFirst, int[] dimensions)

SDVariable layerNorm(SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int[] dimensions)
SDVariable layerNorm(SDVariable input, SDVariable gain, boolean channelsFirst, int[] dimensions)
SDVariable layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int[] dimensions)
SDVariable layerNorm(String name, SDVariable input, SDVariable gain, boolean channelsFirst, int[] dimensions)

Apply Layer Normalization

y = gain * standardize(x) + bias

  • input (NUMERIC) - Input variable

  • gain (NUMERIC) - Gain

  • bias (NUMERIC) - Bias

  • channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data

  • dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))

leakyRelu

INDArray leakyRelu(INDArray x, double alpha)

SDVariable leakyRelu(SDVariable x, double alpha)
SDVariable leakyRelu(String name, SDVariable x, double alpha)

Element-wise leaky ReLU function:

out = x if x >= 0.0

out = alpha * x if x < cutoff

Alpha value is most commonly set to 0.01

  • x (NUMERIC) - Input variable

  • alpha - Cutoff - commonly 0.01

leakyReluDerivative

INDArray leakyReluDerivative(INDArray x, double alpha)

SDVariable leakyReluDerivative(SDVariable x, double alpha)
SDVariable leakyReluDerivative(String name, SDVariable x, double alpha)

Leaky ReLU derivative: dOut/dIn given input.

  • x (NUMERIC) - Input variable

  • alpha - Cutoff - commonly 0.01

linear

INDArray linear(INDArray input, INDArray weights, INDArray bias)

SDVariable linear(SDVariable input, SDVariable weights, SDVariable bias)
SDVariable linear(String name, SDVariable input, SDVariable weights, SDVariable bias)

Linear layer operation: out = mmul(in,w) + bias

Note that bias array is optional

  • input (NUMERIC) - Input data

  • weights (NUMERIC) - Weights variable, shape [nIn, nOut]

  • bias (NUMERIC) - Optional bias variable (may be null)

logSigmoid

INDArray logSigmoid(INDArray x)

SDVariable logSigmoid(SDVariable x)
SDVariable logSigmoid(String name, SDVariable x)

Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

  • x (NUMERIC) - Input variable

logSoftmax

INDArray logSoftmax(INDArray x)

SDVariable logSoftmax(SDVariable x)
SDVariable logSoftmax(String name, SDVariable x)

Log softmax activation

  • x (NUMERIC) -

logSoftmax

INDArray logSoftmax(INDArray x, int dimension)

SDVariable logSoftmax(SDVariable x, int dimension)
SDVariable logSoftmax(String name, SDVariable x, int dimension)

Log softmax activation

  • x (NUMERIC) - Input

  • dimension - Dimension along which to apply log softmax

multiHeadDotProductAttention

INDArray multiHeadDotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray Wq, INDArray Wk, INDArray Wv, INDArray Wo, INDArray mask, boolean scaled)

SDVariable multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
SDVariable multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)

This performs multi-headed dot product attention on the given timeseries input

out = concat(head_1, head_2, ..., head_n) * Wo

head_i = dot_product_attention(Wq_iq, Wk_ik, Wv_i*v)

Optionally with normalization when calculating the attention for each head.

See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")

This makes use of dot_product_attention OP support for rank 4 inputs.

see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)

  • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

  • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

  • values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

  • Wq (NUMERIC) - input query projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wk (NUMERIC) - input key projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wv (NUMERIC) - input value projection weights of shape [numHeads, projectedValues, featureValues]

  • Wo (NUMERIC) - output projection weights of shape [numHeads * projectedValues, outSize]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

pad

INDArray pad(INDArray input, INDArray padding, PadMode PadMode, double constant)
INDArray pad(INDArray input, INDArray padding, double constant)

SDVariable pad(SDVariable input, SDVariable padding, PadMode PadMode, double constant)
SDVariable pad(SDVariable input, SDVariable padding, double constant)
SDVariable pad(String name, SDVariable input, SDVariable padding, PadMode PadMode, double constant)
SDVariable pad(String name, SDVariable input, SDVariable padding, double constant)

Padding operation

  • input (NUMERIC) - Input tensor

  • padding (NUMERIC) - Padding value

  • PadMode - Padding format - default = CONSTANT

  • constant - Padding constant

preciseGelu

INDArray preciseGelu(INDArray x)

SDVariable preciseGelu(SDVariable x)
SDVariable preciseGelu(String name, SDVariable x)

GELU activation function - Gaussian Error Linear Units

For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415

This method uses the precise method

  • x (NUMERIC) - Input variable

prelu

INDArray prelu(INDArray input, INDArray alpha, int[] sharedAxes)

SDVariable prelu(SDVariable input, SDVariable alpha, int[] sharedAxes)
SDVariable prelu(String name, SDVariable input, SDVariable alpha, int[] sharedAxes)

PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:

out[i] = in[i] if in[i] >= 0

out[i] = in[i] * alpha[i] otherwise

sharedAxes allows you to share learnable parameters along axes.

For example, if the input has shape [batchSize, channels, height, width]

and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an

alpha with shape [channels].

  • input (NUMERIC) - Input data

  • alpha (NUMERIC) - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha.

  • sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))

relu

INDArray relu(INDArray x, double cutoff)

SDVariable relu(SDVariable x, double cutoff)
SDVariable relu(String name, SDVariable x, double cutoff)

Element-wise rectified linear function with specified cutoff:

out[i] = in[i] if in[i] >= cutoff

out[i] = 0 otherwise

  • x (NUMERIC) - Input

  • cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0

relu6

INDArray relu6(INDArray x, double cutoff)

SDVariable relu6(SDVariable x, double cutoff)
SDVariable relu6(String name, SDVariable x, double cutoff)

Element-wise "rectified linear 6" function with specified cutoff:

out[i] = min(max(in, cutoff), 6)

  • x (NUMERIC) - Input

  • cutoff - Cutoff value for ReLU operation. Usually 0

reluLayer

INDArray reluLayer(INDArray input, INDArray weights, INDArray bias)

SDVariable reluLayer(SDVariable input, SDVariable weights, SDVariable bias)
SDVariable reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)

ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)

Note that bias array is optional

  • input (NUMERIC) - Input data

  • weights (NUMERIC) - Weights variable

  • bias (NUMERIC) - Optional bias variable (may be null)

selu

INDArray selu(INDArray x)

SDVariable selu(SDVariable x)
SDVariable selu(String name, SDVariable x)

Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks

out[i] = scale alpha (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0

Uses default scale and alpha values.

  • x (NUMERIC) - Input variable

sigmoid

INDArray sigmoid(INDArray x)

SDVariable sigmoid(SDVariable x)
SDVariable sigmoid(String name, SDVariable x)

Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

  • x (NUMERIC) - Input variable

sigmoidDerivative

INDArray sigmoidDerivative(INDArray x, INDArray wrt)

SDVariable sigmoidDerivative(SDVariable x, SDVariable wrt)
SDVariable sigmoidDerivative(String name, SDVariable x, SDVariable wrt)

Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

  • x (NUMERIC) - Input Variable

  • wrt (NUMERIC) - Gradient at the output - dL/dOut. Must have same shape as the input

softmax

INDArray softmax(INDArray x, int dimension)
INDArray softmax(INDArray x)

SDVariable softmax(SDVariable x, int dimension)
SDVariable softmax(SDVariable x)
SDVariable softmax(String name, SDVariable x, int dimension)
SDVariable softmax(String name, SDVariable x)

Softmax activation, along the specified dimension

  • x (NUMERIC) - Input

  • dimension - Dimension along which to apply softmax - default = -1

softmaxDerivative

INDArray softmaxDerivative(INDArray x, INDArray wrt, int dimension)

SDVariable softmaxDerivative(SDVariable x, SDVariable wrt, int dimension)
SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt, int dimension)

Softmax derivative function

  • x (NUMERIC) - Softmax input

  • wrt (NUMERIC) - Gradient at output, dL/dx

  • dimension - Softmax dimension

softplus

INDArray softplus(INDArray x)

SDVariable softplus(SDVariable x)
SDVariable softplus(String name, SDVariable x)

Element-wise softplus function: out = log(exp(x) + 1)

  • x (NUMERIC) - Input variable

softsign

INDArray softsign(INDArray x)

SDVariable softsign(SDVariable x)
SDVariable softsign(String name, SDVariable x)

Element-wise softsign function: out = x / (abs(x) + 1)

  • x (NUMERIC) - Input variable

softsignDerivative

INDArray softsignDerivative(INDArray x)

SDVariable softsignDerivative(SDVariable x)
SDVariable softsignDerivative(String name, SDVariable x)

Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)

  • x (NUMERIC) - Input variable

swish

INDArray swish(INDArray x)

SDVariable swish(SDVariable x)
SDVariable swish(String name, SDVariable x)

Element-wise "swish" function: out = x sigmoid(bx) with b=1.0

See: https://arxiv.org/abs/1710.05941

  • x (NUMERIC) - Input variable

tanh

INDArray tanh(INDArray x)

SDVariable tanh(SDVariable x)
SDVariable tanh(String name, SDVariable x)

Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

  • x (NUMERIC) - Input variable

Last updated