All pages
Powered by GitBook
1 of 16

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Activations

Special algorithms for gradient descent.

What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
    // add hyperparameters and other layers
    .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
    // add more layers and output
    .build();

Available activations

ActivationRectifiedTanh

[source]

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

ActivationELU

[source]

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

ActivationReLU

[source]

f(x) = max(0, x)

ActivationRationalTanh

[source]

Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

ActivationThresholdedReLU

[source]

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

ActivationReLU6

[source]

f(x) = min(max(input, cutoff), 6)

ActivationHardTanH

[source]

          ⎧  1, if x >  1
 f(x) =   ⎨ -1, if x < -1
          ⎩  x, otherwise

ActivationSigmoid

[source]

f(x) = 1 / (1 + exp(-x))

ActivationGELU

[source]

GELU activation function - Gaussian Error Linear Units

ActivationPReLU

[source]

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

ActivationIdentity

[source]

f(x) = x

ActivationSoftSign

[source]

f_i(x) = x_i / (1+

x_i

)

ActivationHardSigmoid

[source]

f(x) = min(1, max(0, 0.2x + 0.5))

ActivationSoftmax

[source]

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

ActivationCube

[source]

f(x) = x^3

ActivationRReLU

[source]

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

Empirical Evaluation of Rectified Activations in Convolutional Network

ActivationTanH

[source]

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

ActivationSELU

[source]

https://arxiv.org/pdf/1706.02515.pdf

ActivationLReLU

[source]

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

ActivationSwish

[source]

f(x) = x sigmoid(x)

ActivationSoftPlus

[source]

f(x) = log(1+e^x)

Auto Encoders

What are autoencoders?

Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

Where’s Restricted Boltzmann Machine?

RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

Supported layers

AutoEncoder

[source]

Autoencoder layer. Adds noise to input and learn a reconstruction function.

corruptionLevel

public Builder corruptionLevel(double corruptionLevel)

Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

sparsity

public Builder sparsity(double sparsity)

Autoencoder sparity parameter

  • param sparsity Sparsity

VariationalAutoencoder

[source]

Variational Autoencoder layer

See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes - https://arxiv.org/abs/1312.6114

This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

encoderLayerSizes

public Builder encoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

setEncoderLayerSizes

public void setEncoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

  • param encoderLayerSizes Size of each encoder layer in the variational autoencoder

decoderLayerSizes

public Builder decoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

  • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

setDecoderLayerSizes

public void setDecoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

  • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

reconstructionDistribution

public Builder reconstructionDistribution(ReconstructionDistribution distribution)

The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:

  • {- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data

  • {- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data

  • param distribution Reconstruction distribution

lossFunction

public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

lossFunction

public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

lossFunction

public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

pzxActivationFn

public Builder pzxActivationFn(IActivation activationFunction)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

  • param activationFunction Activation function for p(z| x)

pzxActivationFunction

public Builder pzxActivationFunction(Activation activation)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

  • param activation Activation function for p(z | x)

nOut

public Builder nOut(int nOut)

Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

  • param nOut Size of P(Z | data) and output size

numSamples

public Builder numSamples(int numSamples)

Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

  • param numSamples Number of samples per data point for pretraining

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-zoo</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

import org.deeplearning4j.zoo.model.AlexNet
import org.deeplearning4j.zoo.*;

...

int numberOfClassesInYourData = 1000;
int randomSeed = 123;

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
Model net = zooModel.init();

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
MultiLayerConfiguration net = ((AlexNet) zooModel).conf();

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

import org.deeplearning4j.zoo.model.VGG16;
import org.deeplearning4j.zoo.*;

...

ZooModel zooModel = VGG16.builder().build();;
Model net = zooModel.initPretrained(PretrainedType.IMAGENET);

And initialize another VGG16 model with weights trained on VGGFace:

ZooModel zooModel = VGG16.builder().build();
Model net = zooModel.initPretrained(PretrainedType.VGGFACE);

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this deeplearning4j-zoo Github link.

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

  • AlexNet

  • Darknet19

  • FaceNetNN4Small2

  • InceptionResNetV1

  • LeNet

  • ResNet50

  • SimpleCNN

  • TextGenerationLSTM

  • TinyYOLO

  • VGG16

  • VGG19

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

int numberOfClassesInYourData = 10;
int randomSeed = 123;

ZooModel zooModel = ResNet50.builder()
        .numClasses(numberOfClassesInYourData)
        .seed(randomSeed)
        .build();
zooModel.setInputShape(new int[][]{{3, 28, 28}});

Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J here.

Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see this section.

Zoo Models

Available models

AlexNet

AlexNet

Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References:

Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.

Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code

Darknet19

Darknet19 Reference: ImageNet weights for this model are available and have been converted from using .

There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per .

FaceNetNN4Small2

A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

InceptionResNetV1

A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

LeNet

LeNet was an early promising achiever on the ImageNet dataset. References:

MNIST weights for this model are available and have been converted from .

NASNet

Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.

This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.

Paper: ImageNet weights for this model are available and have been converted from .

ResNet50

Residual networks for deep learning.

Paper: ImageNet weights for this model are available and have been converted from ;.

SimpleCNN

A simple convolutional network for generic image classification. Reference:

SqueezeNet

U-Net

An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.

Paper: ImageNet weights for this model are available and have been converted from .

TextGenerationLSTM

LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is

Architecture follows this implementation:

Walt Whitman weights are available for generating text from his works, adapted from .

TinyYOLO

Tiny YOLO Reference:

ImageNet+VOC weights for this model are available and have been converted from using and the following code.

String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();

ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();

System.out.println(model.summary(InputType.convolutional(416, 416, 3)));

ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>

The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

UNet

U-Net

An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

Paper: Weights are available for image segmentation trained on a synthetic dataset

VGG16

VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Face Recognition

ImageNet weights for this model are available and have been converted from . CIFAR-10 weights for this model are available and have been converted using “approach 2” from . VGGFace weights for this model are available and have been converted from .

VGG19

VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition ImageNet weights for this model are available and have been converted from .

Xception

U-Net

An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.

Paper: ImageNet weights for this model are available and have been converted from .

YOLO2

YOLOv2 Reference:

ImageNet+COCO weights for this model are available and have been converted from using and the following code.

The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

pretrainedUrl

Default prior boxes for the model

Convolutional Layers

Also known as CNN.

Available layers

Convolution1D

1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]

Convolution2D

2D convolution layer

Convolution3D

3D convolution layer configuration

hasBias

An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].

kernelSize

The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

stride

Set stride size for 3D convolutions in (depth, height, width) order

  • param stride kernel size

  • return 3D convolution layer builder

padding

Set padding size for 3D convolutions in (depth, height, width) order

  • param padding kernel size

  • return 3D convolution layer builder

dilation

Set dilation size for 3D convolutions in (depth, height, width) order

  • param dilation kernel size

  • return 3D convolution layer builder

dataFormat

The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

  • param dataFormat Data format to use for activations

setKernelSize

Set kernel size for 3D convolutions in (depth, height, width) order

  • param kernelSize kernel size

setStride

Set stride size for 3D convolutions in (depth, height, width) order

  • param stride kernel size

setPadding

Set padding size for 3D convolutions in (depth, height, width) order

  • param padding kernel size

setDilation

Set dilation size for 3D convolutions in (depth, height, width) order

  • param dilation kernel size

Deconvolution2D

2D deconvolution layer configuration

Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.

See the paper by Matt Zeiler for details:

For an intuitive guide to convolution arithmetic and shapes, see:

hasBias

Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size

convolutionMode

Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details

  • param convolutionMode Convolution mode for layer

kernelSize

Size of the convolution rows/columns

  • param kernelSize the height and width of the kernel

Cropping1D

Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom

getOutputType

  • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

setCropping

Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

build

  • param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

Cropping2D

Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right

getOutputType

  • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

  • param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations

setCropping

Cropping amount for top/bottom/left/right (in that order). A length 4 array.

build

  • param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.

Cropping3D

Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.

getOutputType

  • param cropDepth Amount of cropping to apply to both depth boundaries of the input activations

  • param cropHeight Amount of cropping to apply to both height boundaries of the input activations

  • param cropWidth Amount of cropping to apply to both width boundaries of the input activations

setCropping

Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

build

  • param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

String filename = “yolo.h5”; 
KerasLayer.registerCustomLayer(“Lambda”, KerasSpaceToDepth.class); 
ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false);
INDArray priors = Nd4j.create(priorBoxes);
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
 .seed(seed)
 .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
 .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
 .gradientNormalizationThreshold(1.0)
 .updater(new Adam.Builder().learningRate(1e-3).build())
 .l2(0.00001)
 .activation(Activation.IDENTITY)
 .trainingWorkspaceMode(workspaceMode)
 .inferenceWorkspaceMode(workspaceMode)
 .build();
ComputationGraph model = new TransferLearning.GraphBuilder(graph)
 .fineTuneConfiguration(fineTuneConf) 
 .addLayer(“outputs”, new Yolo2OutputLayer.Builder() 
                      .boundingBoxPriors(priors)
                      .build(), “conv2d_23”)
 .setOutputs(“outputs”)
 .build();
System.out.println(model.summary(InputType.convolutional(608, 608, 3)));
ModelSerializer.writeModel(model, “yolo2_dl4j_inference.v1.zip”, false); }
public String pretrainedUrl(PretrainedType pretrainedType)
[source]
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/imagenet/
https://github.com/allanzelener/YAD2K
https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.list
[source]
https://arxiv.org/abs/1503.03832
http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
[source]
https://arxiv.org/abs/1503.03832
http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
[source]
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxt
https://github.com/f00-/mnist-lenet-keras
[source]
https://arxiv.org/abs/1707.07012
https://keras.io/applications/
[source]
https://arxiv.org/abs/1512.03385
https://keras.io/applications/</a&gt
[source]
https://github.com/oarriaga/face_classification/
[source]
https://arxiv.org/abs/1602.07360
https://github.com/rcmalli/keras-squeezenet/
[source]
https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py
https://github.com/craigomac/InfiniteMonkeys
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/yolo
https://github.com/allanzelener/YAD2K
[source]
https://arxiv.org/abs/1505.04597
[source]
https://arxiv.org/abs/1409.1556
http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf
https://github.com/fchollet/keras/tree/1.1.2/keras/applications
https://github.com/rajatvikramsingh/cifar10-vgg16
https://github.com/rcmalli/keras-vggface
[source]
https://arxiv.org/abs/1409.1556
https://github.com/fchollet/keras/tree/1.1.2/keras/applications
[source]
https://arxiv.org/abs/1610.02357
https://keras.io/applications/
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/yolo
https://github.com/allanzelener/YAD2K
public boolean hasBias()
public Builder kernelSize(int... kernelSize)
public Builder stride(int... stride)
public Builder padding(int... padding)
public Builder dilation(int... dilation)
public Builder dataFormat(DataFormat dataFormat)
public void setKernelSize(int... kernelSize)
public void setStride(int... stride)
public void setPadding(int... padding)
public void setDilation(int... dilation)
public boolean hasBias()
public Builder convolutionMode(ConvolutionMode convolutionMode)
public Builder kernelSize(int... kernelSize)
public InputType getOutputType(int layerIndex, InputType inputType)
public void setCropping(int... cropping)
public Cropping1D build()
public InputType getOutputType(int layerIndex, InputType inputType)
public void setCropping(int... cropping)
public Cropping2D build()
public InputType getOutputType(int layerIndex, InputType inputType)
public void setCropping(int... cropping)
public Cropping3D build()
[source]
[source]
[source]
[source]
http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf
https://arxiv.org/abs/1603.07285v1
[source]
[source]
[source]

Multi Layer Network

Simple and sequential network configuration.

The MultiLayerNetwork class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph.

You will not want to use MultiLayerNetwork configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.

Usage

The example below shows how to build a simple linear classifier using DenseLayer (a basic multiperceptron layer).

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .seed(seed)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .learningRate(learningRate)
    .updater(Updater.NESTEROVS).momentum(0.9)
    .list()
    .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
            .weightInit(WeightInit.XAVIER)
            .activation("relu")
            .build())
    .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
            .weightInit(WeightInit.XAVIER)
            .activation("softmax").weightInit(WeightInit.XAVIER)
            .nIn(numHiddenNodes).nOut(numOutputs).build())
    .pretrain(false).backprop(true).build();

You can also create convolutional configurations:

MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
    .seed(seed)
    .regularization(true).l2(0.0005)
    .learningRate(0.01)
    .weightInit(WeightInit.XAVIER)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .updater(Updater.NESTEROVS).momentum(0.9)
    .list()
    .layer(0, new ConvolutionLayer.Builder(5, 5)
            //nIn and nOut specify depth. nIn here is the nChannels and nOut is the number of filters to be applied
            .nIn(nChannels)
            .stride(1, 1)
            .nOut(20)
            .activation("identity")
            .build())
    .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2,2)
            .stride(2,2)
            .build())
    .layer(2, new ConvolutionLayer.Builder(5, 5)
            //Note that nIn need not be specified in later layers
            .stride(1, 1)
            .nOut(50)
            .activation("identity")
            .build())
    .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2,2)
            .stride(2,2)
            .build())
    .layer(4, new DenseLayer.Builder().activation("relu")
            .nOut(500).build())
    .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .nOut(outputNum)
            .activation("softmax")
            .build());

Model Listeners

Adding hooks and listeners on DL4J models.

What are listeners?

Listeners allow users to "hook" into certain events in Eclipse Deeplearning4j. This allows you to collect or print information useful for tasks like training. For example, a ScoreIterationListener allows you to print training scores from the output layer of a neural network.

Usage

To add one or more listeners to a MultiLayerNetwork or ComputationGraph, use the addListener method:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
//print the score with every 1 iteration
model.setListeners(new ScoreIterationListener(1));

Available listeners

EvaluativeListener

[source]

This TrainingListener implementation provides simple way for model evaluation during training. It can be launched every Xth Iteration/Epoch, depending on frequency and InvocationType constructor arguments

EvaluativeListener

public EvaluativeListener(@NonNull DataSetIterator iterator, int frequency)

This callback will be invoked after evaluation finished

iterationDone

public void iterationDone(Model model, int iteration, int epoch)
  • param iterator Iterator to provide data for evaluation

  • param frequency Frequency (in number of iterations/epochs according to the invocation type) to perform evaluation

  • param type Type of value for ‘frequency’ - iteration end, epoch end, etc

ScoreIterationListener

[source]

Score iteration listener. Reports the score (value of the loss function )of the network during training every N iterations

ScoreIterationListener

public ScoreIterationListener(int printIterations)
  • param printIterations frequency with which to print scores (i.e., every printIterations parameter updates)

ComposableIterationListener

[source]

A group of listeners

CollectScoresIterationListener

[source]

CollectScoresIterationListener simply stores the model scores internally (along with the iteration) every 1 or N iterations (this is configurable). These scores can then be obtained or exported.

CollectScoresIterationListener

public CollectScoresIterationListener()

Constructor for collecting scores with default saving frequency of 1

iterationDone

public void iterationDone(Model model, int iteration, int epoch)

Constructor for collecting scores with the specified frequency.

  • param frequency Frequency with which to collect/save scores

exportScores

public void exportScores(OutputStream outputStream) throws IOException

Export the scores in tab-delimited (one per line) UTF-8 format.

exportScores

public void exportScores(OutputStream outputStream, String delimiter) throws IOException

Export the scores in delimited (one per line) UTF-8 format with the specified delimiter

  • param outputStream Stream to write to

  • param delimiter Delimiter to use

exportScores

public void exportScores(File file) throws IOException

Export the scores to the specified file in delimited (one per line) UTF-8 format, tab delimited

  • param file File to write to

exportScores

public void exportScores(File file, String delimiter) throws IOException

Export the scores to the specified file in delimited (one per line) UTF-8 format, using the specified delimiter

  • param file File to write to

  • param delimiter Delimiter to use for writing scores

CheckpointListener

[source]

CheckpointListener: The goal of this listener is to periodically save a copy of the model during training.. Model saving may be done:

  1. Every N epochs

  2. Every N iterations

  3. Every T time units (every 15 minutes, for example) Or some combination of the 3. Example 1: Saving a checkpoint every 2 epochs, keep all model files

.keepAll() //Don't delete any models
.saveEveryNEpochs(2)
.build()
}

Example 2: Saving a checkpoint every 1000 iterations, but keeping only the last 3 models (all older model files will be automatically deleted)

.keepLast(3)
.saveEveryNIterations(1000)
.build();
}

Example 3: Saving a checkpoint every 15 minutes, keeping the most recent 3 and otherwise every 4th checkpoint file:

.keepLastAndEvery(3, 4)
.saveEvery(15, TimeUnit.MINUTES)
.build();
}

Note that you can mix these: for example, to save every epoch and every 15 minutes (independent of last save time): To save every epoch, and every 15 minutes, since the last model save use: Note that is this last example, the sinceLast parameter is true. This means the 15-minute counter will be reset any time a model is saved.

CheckpointListener

public CheckpointListener build()

List all available checkpoints. A checkpoint is ‘available’ if the file can be loaded. Any checkpoint files that have been automatically deleted (given the configuration) will not be returned here.

  • return List of checkpoint files that can be loaded

SharedGradient

[source]

SleepyTrainingListener

[source]

This TrainingListener implementation provides a way to “sleep” during specific Neural Network training phases. Suitable for debugging/testing purposes only.

PLEASE NOTE: All timers treat time values as milliseconds. PLEASE NOTE: Do not use it in production environment.

onEpochStart

public void onEpochStart(Model model)

In this mode parkNanos() call will be used, to make process really idle

CollectScoresListener

[source]

A simple listener that collects scores to a list every N iterations. Can also optionally log the score.

PerformanceListener

[source]

Simple IterationListener that tracks time spend on training per iteration.

PerformanceListener

public PerformanceListener build()

This method defines, if iteration number should be reported together with other data

  • param reportIteration

  • return

ParamAndGradientIterationListener

[source]

An iteration listener that provides details on parameters and gradients at each iteration during traning. Attempts to provide much of the same information as the UI histogram iteration listener, but in a text-based format (for example, when learning on a system accessed via SSH etc). i.e., is intended to aid network tuning and debugging This iteration listener is set up to calculate mean, min, max, and mean absolute value of each type of parameter and gradient in the network at each iteration.

TimeIterationListener

[source]

Time Iteration Listener. This listener displays into INFO logs the remaining time in minutes and the date of the end of the process. Remaining time is estimated from the amount of time for training so far, and the total number of iterations specified by the user

TimeIterationListener

public TimeIterationListener(int iterationCount)

Constructor

  • param iterationCount The global number of iteration for training (all epochs)

Vertices

Computation graph nodes for advanced configuration.

What is a vertex?

In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a ComputationGraph. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.

Available Vertices

L2NormalizeVertex

[source]

L2NormalizeVertex performs L2 normalization on a single input.

L2Vertex

[source]

L2Vertex calculates the L2 least squares error of two inputs.

For example, in Triplet Embedding you can input an anchor and a pos/neg class and use two parallel L2 vertices to calculate two real numbers which can be fed into a LossLayer to calculate TripletLoss.

PoolHelperVertex

[source]

A custom layer for removing the first column and row from an input. This is meant to allow importation of Caffe’s GoogLeNet from https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14.

ReshapeVertex

[source]

Adds the ability to reshape and flatten the tensor in the computation graph. This is the equivalent to the next layer. ReshapeVertex also ensures the shape is valid for the backward pass.

ScaleVertex

[source]

A ScaleVertex is used to scale the size of activations of a single layer For example, ResNet activations can be scaled in repeating blocks to keep variance under control.

ShiftVertex

[source]

A ShiftVertex is used to shift the activations of a single layer One could use it to add a bias or as part of some other calculation. For example, Highway Layers need them in two places. One, it’s often useful to have the gate weights have a large negative bias. (Of course for this, we could just initialize the biases that way.) But, also it needs to do this: (1-sigmoid(weight input + bias)) () input + sigmoid(weight input + bias) () activation(w2 input + bias) (() is hadamard product) So, here, we could have

  1. a DenseLayer that does the sigmoid

  2. a ScaleVertex(-1) and

  3. a ShiftVertex(1) to accomplish that.

StackVertex

[source]

StackVertex allows for stacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where shared parameters are not supported by the network.

This vertex will automatically stack all available inputs.

UnstackVertex

[source]

UnstackVertex allows for unstacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where embeddings can be separated and run through subsequent layers.

Works similarly to SubsetVertex, except on dimension 0 of the input. stackSize is explicitly defined by the user to properly calculate an step.

ReverseTimeSeriesVertex

[source]

ReverseTimeSeriesVertex is used in recurrent neural networks to revert the order of time series. As a result, the last time step is moved to the beginning of the time series and the first time step is moved to the end. This allows recurrent layers to backward process time series.

Masks: The input might be masked (to allow for varying time series lengths in one minibatch). In this case the present input (mask array = 1) will be reverted in place and the padding (mask array = 0) will be left untouched at the same place. For a time series of length n, this would normally mean, that the first n time steps are reverted and the following padding is left untouched, but more complex masks are supported (e.g. [1, 0, 1, 0, …].

setBackpropGradientsViewArray

public void setBackpropGradientsViewArray(INDArray backpropGradientsViewArray)

Gets the current mask array from the provided input

  • return The mask or null, if no input was provided

Updaters/Optimizers

Special algorithms for gradient descent.

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();

Available updaters

NadamUpdater

[source]

The Nadam updater. https://arxiv.org/pdf/1609.04747.pdf

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NesterovsUpdater

[source]

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the nesterov update

  • param gradient the gradient to get the update for

  • param iteration

  • return

RmsPropUpdater

[source]

RMS Prop updates:

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf http://cs231n.github.io/neural-networks-3/#ada

AdaGradUpdater

[source]

Vectorized Learning Rate used per Connection Weight

Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent See also http://cs231n.github.io/neural-networks-3/#ada

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

  • param gradient the gradient to get learning rates for

  • param iteration

AdaMaxUpdater

[source]

The AdaMax updater, a variant of Adam. http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NoOpUpdater

[source]

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

[source]

The Adam updater. http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

AdaDeltaUpdater

[source]

http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf https://arxiv.org/pdf/1212.5701v1.pdf

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the updated gradient for the given gradient and also update the state of ada delta.

  • param gradient the gradient to get the updated gradient for

  • param iteration

  • return the update gradient

SgdUpdater

[source]

SGD updater applies a learning rate only

GradientUpdater

[source]

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

AMSGradUpdater

[source]

The AMSGrad updater Reference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ

Saving and Loading Models

Saving and loading of neural networks.

MultiLayerNetwork and ComputationGraph both have save and load methods.

You can save/load a MultiLayerNetwork using:

MultiLayerNetwork net = ...
net.save(new File("...");

MultiLayerNetwork net2 = MultiLayerNetwork.load(new File("..."), true);

Similarly, you can save/load a ComputationGraph using:

ComputationGraph net = ...
net.save(new File("..."));

ComputationGraph net2 = ComputationGraph.load(new File("..."), true);

Internally, these methods use the ModelSerializer class, which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multi layer network, the second one saves a computation graph.

Here is a basic example with code to save a computation graph using the ModelSerializer class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.

RNG Seed

If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:

 Nd4j.getRandom().setSeed(12345);
 ModelSerializer.restoreMultiLayerNetwork(modelFile);

This will guarantee equal results between sessions/JVMs.

ModelSerializer

[source]

Utility class suited to save/restore neural net models

writeModel

public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater) throws IOException

Write a model to a file

  • param model the model to write

  • param file the file to write to

  • param saveUpdater whether to save the updater or not

  • throws IOException

writeModel

public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater,DataNormalization dataNormalization) throws IOException

Write a model to a file

  • param model the model to write

  • param file the file to write to

  • param saveUpdater whether to save the updater or not

  • param dataNormalization the normalizer to save (optional)

  • throws IOException

writeModel

public static void writeModel(@NonNull Model model, @NonNull String path, boolean saveUpdater) throws IOException

Write a model to a file path

  • param model the model to write

  • param path the path to write to

  • param saveUpdater whether to save the updater or not

  • throws IOException

writeModel

public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater)
            throws IOException

Write a model to an output stream

  • param model the model to save

  • param stream the output stream to write to

  • param saveUpdater whether to save the updater for the model or not

  • throws IOException

writeModel

public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater,DataNormalization dataNormalization)
            throws IOException

Write a model to an output stream

  • param model the model to save

  • param stream the output stream to write to

  • param saveUpdater whether to save the updater for the model or not

  • param dataNormalization the normalizer ot save (may be null)

  • throws IOException

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file) throws IOException

Load a multi layer network from a file

  • param file the file to load from

  • return the loaded multi layer network

  • throws IOException

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file, boolean loadUpdater)
            throws IOException

Load a multi layer network from a file

  • param file the file to load from

  • return the loaded multi layer network

  • throws IOException

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is, boolean loadUpdater)
            throws IOException

Load a MultiLayerNetwork from InputStream from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

  • param is the inputstream to load from

  • return the loaded multi layer network

  • throws IOException

  • see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is) throws IOException

Restore a multi layer network from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

  • param is the input stream to restore from

  • return the loaded multi layer network

  • throws IOException

  • see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path) throws IOException

Load a MultilayerNetwork model from a file

  • param path path to the model file, to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreMultiLayerNetwork

public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path, boolean loadUpdater)
            throws IOException

Load a MultilayerNetwork model from a file

  • param path path to the model file, to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull String path) throws IOException

Restore a MultiLayerNetwork and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

  • param is Input stream to read from

  • param loadUpdater Whether to load the updater from the model or not

  • return Model and normalizer, if present

  • throws IOException If an error occurs when reading from the stream

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull String path, boolean loadUpdater)
            throws IOException

Load a computation graph from a file

  • param path path to the model file, to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull InputStream is, boolean loadUpdater)
            throws IOException

Load a computation graph from a InputStream

  • param is the inputstream to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull InputStream is) throws IOException

Load a computation graph from a InputStream

  • param is the inputstream to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull File file) throws IOException

Load a computation graph from a file

  • param file the file to get the computation graph from

  • return the loaded computation graph

  • throws IOException

restoreComputationGraph

public static ComputationGraph restoreComputationGraph(@NonNull File file, boolean loadUpdater) throws IOException

Restore a ComputationGraph and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

  • param is Input stream to read from

  • param loadUpdater Whether to load the updater from the model or not

  • return Model and normalizer, if present

  • throws IOException If an error occurs when reading from the stream

taskByModel

public static Task taskByModel(Model model)
  • param model

  • return

addNormalizerToModel

public static void addNormalizerToModel(File f, Normalizer<?> normalizer)

This method appends normalizer to a given persisted model.

PLEASE NOTE: File should be model file saved earlier with ModelSerializer

  • param f

  • param normalizer

addObjectToFile

public static void addObjectToFile(@NonNull File f, @NonNull String key, @NonNull Object o)

Add an object to the (already existing) model file using Java Object Serialization. Objects can be restored using {- link #getObjectFromFile(File, String)}

  • param f File to add the object to

  • param key Key to store the object under

  • param o Object to store using Java object serialization

Computation Graph

How to build complex networks with DL4J computation graph.

Building Complex Network Architectures with Computation Graph

This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.

Overview of Computation Graph

DL4J has two types of networks comprised of multiple layers:

  • The MultiLayerNetwork, which is essentially a stack of neural network layers (with a single input layer and single output layer), and

  • The ComputationGraph, which allows for greater freedom in network architectures

Specifically, the ComputationGraph allows for networks to be built with the following features:

  • Multiple network input arrays

  • Multiple network outputs (including mixed classification/regression architectures)

  • Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)

As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.

Computation Graph: Some Example Use Cases

Examples of some architectures that can be built using ComputationGraph include:

  • Multi-task learning architectures

  • Recurrent neural networks with skip connections

  • GoogLeNet, a complex type of convolutional netural network for image classification

  • Image caption generation

  • Convolutional networks for sentence classification

  • Residual learning convolutional neural networks

Configuring a Computation Graph

Types of Graph Vertices

The basic idea is that in the ComputationGraph, the core building block is the GraphVertex, instead of layers. Layers (or, more accurately the LayerVertex objects), are but one type of vertex in the graph. Other types of vertices include:

  • Input Vertices

  • Element-wise operation vertices

  • Merge vertices

  • Subset vertices

  • Preprocessor vertices

These types of graph vertices are described briefly below.

LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...) method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an InputPreProcessor (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...) method.

InputVertex: Input vertices are specified by the addInputs(String...) method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).

ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.

MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.

SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1"): this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.

PreProcessorVertex: Occasionally, you might want to the functionality of an InputPreProcessor without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.

Finally, it is also possible to define custom graph vertices by implementing both a configuration and implementation class for your custom GraphVertex.

Example 1: Recurrent Network with Skip Connections

Suppose we wish to build the following recurrent neural network architecture:

For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input") //can use any label for this
    .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
    .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
    .setOutputs("L2")    //We need to specify the network outputs and their order
    .build();

ComputationGraph net = new ComputationGraph(conf);
net.init();

Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.

Example 2: Multiple Inputs and Merge Vertex

Consider the following architecture:

Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.

To build the above network, we use the following configuration:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input1", "input2")
    .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
    .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
    .addVertex("merge", new MergeVertex(), "L1", "L2")
    .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
    .setOutputs("out")
    .build();

Example 3: Multi-Task Learning

In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.

In this case, the network configuration is:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input")
        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
        .addLayer("out1", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nIn(4).nOut(3).build(), "L1")
        .addLayer("out2", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.MSE)
                .nIn(4).nOut(2).build(), "L1")
        .setOutputs("out1","out2")
        .build();

Automatically Adding PreProcessors and Calculating nIns

One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...) method in the configuration.

The setInputType method has two effects:

  1. It will automatically add any InputPreProcessors as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.

  2. It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...) functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.

For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))

Training Data for ComputationGraph

There are two types of data that can be used with the ComputationGraph.

DataSet and the DataSetIterator

The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).

A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see this for more details). A DataSetIterator is essentially an iterator over DataSet objects.

MultiDataSet and the MultiDataSetIterator

MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.

There are currently two ways to use a MultiDataSetIterator:

  • By implementing the MultiDataSetIterator interface directly

  • By using the RecordReaderMultiDataSetIterator in conjuction with DataVec record readers

The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:

  • Multiple DataVec RecordReaders may be used simultaneously

  • The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader

  • It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output

  • It is possible to convert single columns from a class index to a one-hot representation

Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find these unit tests to be useful.

Example 1: Regression Data (RecordReaderMultiDataSetIterator)

Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";
RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String csvPath = "/path/to/my/file.csv";
rr.initialize(new FileSplit(new File(csvPath)));

int batchSize = 4;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("myReader",rr)
        .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
        .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
        .build();

Example 2: Classification and Multi-Task Learning (RecordReaderMultiDataSetIterator)

Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:

  • Input file: myInput.csv, and we want to use all columns as input (without modification)

  • Output file: myOutput.csv.

    • Network output 1 - regression: columns 0 to 3

    • Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.

In this case, we can build our iterator as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";

RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String featuresCsvPath = "/path/to/my/myInput.csv";
featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));

RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String labelsCsvPath = "/path/to/my/myOutput.csv";
labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));

int batchSize = 4;
int numClasses = 3;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("csvInput", featuresReader)
        .addReader("csvLabels", labelsReader)
        .addInput("csvInput") //Input: all columns from input reader
        .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
        .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
        .build();

Recurrent Layers

Recurrent Neural Network (RNN) implementations in DL4J.

This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.

The Basics: Data and Network Configuration

DL4J currently supports the following types of recurrent neural network

  • RNN ("vanilla" RNN)

  • LSTM (Long Short-Term Memory)

Java documentation for each is available: SimpleRnn, LSTM.

Data for RNNs

Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].

Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.

When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.

RnnOutputLayer

RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.

Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:

.layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation(Activation.SOFTMAX)
.weightInit(WeightInit.XAVIER).nIn(prevLayerSize).nOut(nOut).build())

Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.

RNN Training Features

Truncated Back Propagation Through Time

Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.

Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).

Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:

For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.

In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.

The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.

Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)

.backpropType(BackpropType.TruncatedBPTT)
.tBPTTLength(100)

The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.

Some things of note:

  • By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).

  • The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.

  • The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)

Masking: One-to-Many, Many-to-One, and Sequence Classification

DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).

Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:

Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.

The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).

Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.

Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.

For a single example, the input and output masking arrays are shown below:

For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.

In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.

Evaluation and Scoring with Masking

Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.

Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:

Evaluation.evalTimeSeries(INDArray labels, INDArray predicted, INDArray outputMask)

where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.

Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.

Masking and Sequence Classification After Training

Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).

However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't actually label data anywhere except for the last time step.

Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?

To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:

    INDArray timeSeriesFeatures = ...;
    INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);
    int timeSeriesLength = timeSeriesOutput.size(2);        //Size of time dimension
    INDArray lastTimeStepProbabilities = timeSeriesOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength-1));

Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.

The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).

In this 'variable length' case, we need to get the last time step for each example separately. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the timeSeriesLength in the above code with the length of that example.

If we don't have the lengths of the time series directly, we need to extract them from the mask array.

If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):

    INDArray labelsMaskArray = ...;
    INDArray lastTimeStepIndices = Nd4j.argMax(labelMaskArray,1);

Alternatively, if we have only the features mask: One quick and dirty approach is to use this:

    INDArray featuresMaskArray = ...;
    int longestTimeSeries = featuresMaskArray.size(1);
    INDArray linspace = Nd4j.linspace(1,longestTimeSeries,longestTimeSeries);
    INDArray temp = featuresMaskArray.mulColumnVector(linspace);
    INDArray lastTimeStepIndices = Nd4j.argMax(temp,1);

To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).

In either case, we can then do the following:

    int numExamples = timeSeriesFeatures.size(0);
    for( int i=0; i<numExamples; i++ ){
        int thisTimeSeriesLastIndex = lastTimeStepIndices.getInt(i);
        INDArray thisExampleProbabilities = timeSeriesOutput.get(NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(thisTimeSeriesLastIndex));
    }

Combining RNN Layers with Other Layer Types

RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.

Of course, the DenseLayer and Convolutional layers do not handle time series data - they expect a different type of input. To deal with this, we need to use the layer preprocessor functionality: for example, the CnnToRnnPreProcessor and FeedForwardToRnnPreprocessor classes. See here for all preprocessors. Fortunately, in most situations, the DL4J configuration system will automatically add these preprocessors as required. However, the preprocessors can be added manually (overriding the automatic addition of preprocessors, for each layer).

For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: .inputPreProcessor(2, new RnnToFeedForwardPreProcessor()).

Inference: Predictions One Step at a Time

As with other types of neural networks, predictions can be generated for RNNs using the MultiLayerNetwork.output() and MultiLayerNetwork.feedForward() methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.

Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.

For these situations, MultiLayerNetwork provides four methods of note:

  • rnnTimeStep(INDArray)

  • rnnClearPreviousState()

  • rnnGetPreviousState(int layer)

  • rnnSetPreviousState(int layer, Map<String,INDArray> state)

The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.

In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:

  1. Generate output/predictions (forward pass), using the previous stored state (if any)

  2. Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)

For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input). If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.

Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:

For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:

There are a number of important differences here:

  1. In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data

  2. The forward pass is thus a single time step (as compared to the hundreds – or more)

  3. After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.

However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the MultiLayerNetwork.rnnClearPreviousState() method. This will reset the internal state of all recurrent layers in the network.

If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is not saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.

Some other points of note:

  • We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.

  • If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.

  • The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:

    • For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]

    • For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.

  • It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.

  • The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.

  • The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).

  • The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.

Loading Time Series Data

Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.

The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file. This method also supports:

  • Variable length time series input

  • One-to-many and many-to-one data loading (where input and labels are in different files)

  • Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])

  • Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)

Note that in all cases, each line in the data files represents one time step.

(In addition to the examples below, you might find these unit tests to be of some use.)

Example 1: Time Series of Same Length, Input and Labels in Separate Files

Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).

To use the SequenceRecordReaderDataSetIterator and CSVSequenceRecordReader approaches, we first create two CSVSequenceRecordReader objects, one for input and one for labels:

SequenceRecordReader featureReader = new CSVSequenceRecordReader(1, ",");
SequenceRecordReader labelReader = new CSVSequenceRecordReader(1, ",");

This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).

Second, we need to initialize these two readers, by telling them where to get the data from. We do this with an InputSplit object. Suppose that our time series are numbered, with file names "myInput_0.csv", "myInput_1.csv", ..., "myLabels_0.csv", etc. One approach is to use the NumberedFileInputSplit:

featureReader.initialize(new NumberedFileInputSplit("/path/to/data/myInput_%d.csv", 0, 9));
labelReader.initialize(new NumberedFileInputSplit(/path/to/data/myLabels_%d.csv", 0, 9));

In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.

Finally, we can create our SequenceRecordReaderdataSetIterator:

DataSetIterator iter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression);

This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.

The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.

Note that:

  • For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.

    • Labels data: one value per line, as a class index

    • Label data will be converted to a one-hot representation automatically

  • For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.

    • The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)

    • No processing of the labels is done when regression = true

Example 2: Time Series of Same Length, Input and Labels in Same File

Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.

As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)

In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":

SequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
reader.initialize(new NumberedFileInputSplit("/path/to/data/myData_%d.csv", 0, 9));
DataSetIterator iterClassification = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, numPossibleLabels, labelIndex, false);

miniBatchSize and numPossibleLabels are the same as the previous example. Here, labelIndex specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).

For regression on a single output value, we use:

DataSetIterator iterRegression = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, -1, labelIndex, true);

Again, the numPossibleLabels argument is not used for regression.

Example 3: Time Series of Different Lengths (Many-to-Many)

Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.

We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:

DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:

  1. That the time series may be of different lengths

  2. To align the input and labels - for each example individually - such that their last values occur at the same time step.

Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.

Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.

Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.

Example 4: Many-to-One and One-to-Many Data

We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:

  • Input and labels are in separate delimited files

  • The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)

  • The input lengths may (optionally) differ between examples

In fact, the same approach as in example 3 can do this:

DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):

The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.

Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:

Available layers

LSTM

[source]

LSTM recurrent neural network layer without peephole connections. Supports CuDNN acceleration - see cuDNN for details

RnnLossLayer

[source]

Recurrent Neural Network Loss Layer. Handles calculation of gradients etc for various objective (loss) time distributed dense component here. Consequently, the output activations size is equal to the input size. Input and output activations are same as other RNN layers: 3 dimensions with shape [miniBatchSize,nIn,timeSeriesLength] and [miniBatchSize,nOut,timeSeriesLength] respectively. Note that RnnLossLayer also has the option to configure an activation function

setNIn

public void setNIn(int nIn)
  • param lossFunction Loss function for the loss layer

RnnOutputLayer

[source]

and labels of shape [minibatch,nOut,sequenceLength]. It also supports mask arrays. Note that RnnOutputLayer can also be used for 1D CNN layers, which also have [minibatch,nOut,sequenceLength] activations/labels shape.

build

public RnnOutputLayer build()
  • param lossFunction Loss function for the output layer

Bidirectional

[source]

Bidirectional is a “wrapper” layer: it wraps any uni-directional RNN layer to make it bidirectional. Note that multiple different modes are supported - these specify how the activations should be combined from the forward and separate copies of the wrapped RNN layer, each with separate parameters.

getNOut

public long getNOut()

This Mode enumeration defines how the activations for the forward and backward networks should be combined. ADD: out = forward + backward (elementwise addition) MUL: out = forward backward (elementwise multiplication) AVERAGE: out = 0.5 (forward + backward) CONCAT: Concatenate the activations. Where ‘forward’ is the activations for the forward RNN, and ‘backward’ is the activations for the backward RNN. In all cases except CONCAT, the output activations size is the same size as the standard RNN that is being wrapped by this layer. In the CONCAT case, the output activations size (dimension 1) is 2x larger than the standard RNN’s activations array.

getUpdaterByParam

public IUpdater getUpdaterByParam(String paramName)

Get the updater for the given parameter. Typically the same updater will be used for all updaters, but this is not necessarily the case

  • param paramName Parameter name

  • return IUpdater for the parameter

LastTimeStep

[source]

LastTimeStep is a “wrapper” layer: it wraps any RNN (or CNN1D) layer, and extracts out the last time step during forward pass, and returns it as a row vector (per example). That is, for 3d (time series) input (with shape [minibatch, layerSize, timeSeriesLength]), we take the last time step and return it as a 2d array with shape [minibatch, layerSize]. Note that the last time step operation takes into account any mask arrays, if present: thus, variable length time series (in the same minibatch) are handled as expected here.

SimpleRnn

[source]

activationFn( in_t inWeight + out_(t-1) recurrentWeights + bias)}.

Note that other architectures (LSTM, etc) are usually much more effective, especially for longer time series; however SimpleRnn is very fast to compute, and hence may be considered where the length of the temporal dependencies in the dataset are only a few steps long.

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

[source]

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

public ActivationLayer clone()
  • param activation Activation function for the layer

activation

public Builder activation(String activationFunction)

Activation function for the layer

activation

public Builder activation(IActivation activationFunction)
  • param activationFunction Activation function for the layer

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

DenseLayer

[source]

Dense layer: a standard fully connected feed forward layer

hasBias

public Builder hasBias(boolean hasBias)

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

public Builder hasLayerNorm(boolean hasLayerNorm)

If true (default = false): enable layer normalization on this layer

DropoutLayer

[source]

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

public DropoutLayer build()

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

  • param dropout Activation retain probability.

EmbeddingLayer

[source]

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

  • param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

  • param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

[source]

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

inputLength

public Builder inputLength(int inputLength)

Set input sequence length for this embedding layer.

  • param inputLength input sequence length

  • return Builder

inferInputLength

public Builder inferInputLength(boolean inferInputLength)

Set input sequence inference mode for embedding layer.

  • param inferInputLength whether to infer input length

  • return Builder

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

  • param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

  • param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

[source]

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

  • Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only

  • CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

  • [miniBatchSize, vectorSize, 1] for RNN output,

  • [miniBatchSize, channels, 1, 1] for CNN output, and

  • [miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

public Builder poolingDimensions(int... poolingDimensions)

Pooling type for global pooling

poolingType

public Builder poolingType(PoolingType poolingType)
  • param poolingType Pooling type for global pooling

collapseDimensions

public Builder collapseDimensions(boolean collapseDimensions)

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]

  • param collapseDimensions Whether to collapse the dimensions or not

pnorm

public Builder pnorm(int pnorm)

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

  • param pnorm P-norm constant

LocalResponseNormalization

[source]

Local response normalization layer See section 3.3 of http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

k

public Builder k(double k)

LRN scaling constant k. Default: 2

n

public Builder n(double n)

Number of adjacent kernel maps to use when doing LRN. default: 5

  • param n Number of adjacent kernel maps

alpha

public Builder alpha(double alpha)

LRN scaling constant alpha. Default: 1e-4

  • param alpha Scaling constant

beta

public Builder beta(double beta)

Scaling constant beta. Default: 0.75

  • param beta Scaling constant

cudnnAllowFallback

public Builder cudnnAllowFallback(boolean allowFallback)

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

  • param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

[source]

SameDiff version of a 1D locally connected layer.

nIn

public Builder nIn(int nIn)

Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)
  • param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

kernelSize

public Builder kernelSize(int k)
  • param k Kernel size for the layer

stride

public Builder stride(int s)
  • param s Stride for the layer

padding

public Builder padding(int p)
  • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)
  • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int d)
  • param d Dilation for the layer

hasBias

public Builder hasBias(boolean hasBias)
  • param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int inputSize)

Set input filter size for this locally connected 1D layer

  • param inputSize height of the input filters

  • return Builder

LocallyConnected2D

[source]

SameDiff version of a 2D locally connected layer.

setKernel

public void setKernel(int... kernel)

Number of inputs to the layer (input size)

setStride

public void setStride(int... stride)
  • param stride Stride for the layer. Must be 2 values (height/width)

setPadding

public void setPadding(int... padding)
  • param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

public void setDilation(int... dilation)
  • param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

public Builder nIn(int nIn)
  • param nIn Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)
  • param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

kernelSize

public Builder kernelSize(int... k)
  • param k Kernel size for the layer. Must be 2 values (height/width)

stride

public Builder stride(int... s)
  • param s Stride for the layer. Must be 2 values (height/width)

padding

public Builder padding(int... p)
  • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)
  • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int... d)
  • param d Dilation for the layer. Must be 2 values (height/width)

hasBias

public Builder hasBias(boolean hasBias)
  • param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int... inputSize)

Set input filter size (h,w) for this locally connected 2D layer

  • param inputSize pair of height and width of the input filters to this layer

  • return Builder

LossLayer

[source]

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

public Builder nIn(int nIn)
  • param lossFunction Loss function for the loss layer

OutputLayer

[source]

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

public OutputLayer build()
  • param lossFunction Loss function for the output layer

Pooling1D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

[source]

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

public void setKernelSize(int... kernelSize)

Kernel size

  • param kernelSize kernel size

setStride

public void setStride(int... stride)

Stride

  • param stride stride value

setPadding

public void setPadding(int... padding)

Padding

  • param padding padding value

Upsampling1D

[source]

sequenceLength]} Example:

If input (for a single example, with channels down page, and sequence from left to right) is:
[ A1, A2, A3]
[ B1, B2, B3]
Then output with size = 2 is:
[ A1, A1, A2, A2, A3, A3]
[ B1, B1, B2, B2, B3, B2]

size

public Builder size(int size)

Upsampling size

  • param size upsampling size in single spatial dimension of this 1D layer

size

public Builder size(int[] size)

Upsampling size int array with a single element. Array must be length 1

  • param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

[source]

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

Input (slice for one example and channel)
[ A, B ]
[ C, D ]
Size = [2, 2]
Output (slice for one example and channel)
[ A, A, B, B ]
[ A, A, B, B ]
[ C, C, D, D ]
[ C, C, D, D ]

size

public Builder size(int size)

Upsampling size int, used for both height and width

  • param size upsampling size in height and width dimensions

size

public Builder size(int[] size)

Upsampling size array

  • param size upsampling size in height and width dimensions

Upsampling3D

[source]

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

public Builder size(int size)

Upsampling size as int, so same upsampling size is used for depth, width and height

  • param size upsampling size in height, width and depth dimensions

size

public Builder size(int[] size)

Upsampling size as int, so same upsampling size is used for depth, width and height

  • param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

[source]

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

public void setPadding(int... padding)

Padding value for left and right. Must be length 2 array

build

public ZeroPadding1DLayer build()
  • param padding Padding for both the left and right

ZeroPadding3DLayer

[source]

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

public void setPadding(int... padding)

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

public ZeroPadding3DLayer build()
  • param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

[source]

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

public void setPadding(int... padding)

Padding value for top, bottom, left, and right. Must be length 4 array

build

public ZeroPaddingLayer build()
  • param padHeight Padding for both the top and bottom

  • param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

[source]

is a learnable weight vector of length nOut

  • “.” is element-wise multiplication

  • b is a bias vector

Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

public LayerMemoryReport getMemoryReport(InputType inputType)

This is a report of the estimated memory consumption for the given layer

  • param inputType Input type to the layer. Memory consumption is often a function of the input type

  • return Memory report for the layer

RepeatVector

[source]

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

public int getRepetitionFactor()

Set repetition factor for RepeatVector layer

setRepetitionFactor

public void setRepetitionFactor(int n)

Set repetition factor for RepeatVector layer

  • param n upsampling size in height and width dimensions

repetitionFactor

public Builder repetitionFactor(int n)

Set repetition factor for RepeatVector layer

  • param n upsampling size in height and width dimensions

Yolo2OutputLayer

[source]

Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - https://arxiv.org/abs/1612.08242 and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

public Builder lambdaCoord(double lambdaCoord)

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

public Builder lambbaNoObj(double lambdaNoObj)

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

  • param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

public Builder lossPositionScale(ILossFunction lossPositionScale)

Loss function for position/scale component of the loss function

  • param lossPositionScale Loss function for position/scale

lossClassPredictions

public Builder lossClassPredictions(ILossFunction lossClassPredictions)

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

  • param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

public Builder boundingBoxPriors(INDArray boundingBoxes)

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

  • param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

[source]

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

[source]

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

Word2vec/Glove/Doc2Vec

Neural word embeddings for NLP in DL4J.

Word2Vec, Doc2vec & GloVe: Neural Word Embeddings for Natural Language Processing

Contents

  • Introduction

  • Neural Word Embeddings

  • Amusing Word2vec Results

  • Just Give Me the Code

  • Anatomy of Word2Vec

  • Setup, Load and Train

  • A Code Example

  • Troubleshooting & Tuning Word2Vec

  • Word2vec Use Cases

  • Foreign Languages

  • GloVe (Global Vectors) & Doc2Vec

Introduction to Word2Vec

Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a deep neural network, it turns text into a numerical form that deep nets can understand. Deeplearning4j.

Word2vec's applications extend beyond parsing sentences in the wild. It can be applied just as well to genes, code, likes, playlists, social media graphs and other verbal or symbolic series in which patterns may be discerned.

Why? Because words are simply discrete states like the other data mentioned above, and we are simply looking for the transitional probabilities between those states: the likelihood that they will co-occur. So gene2vec, like2vec and follower2vec are all possible. With that in mind, the tutorial below will help you understand how to create neural embeddings for any group of discrete and co-occurring states.

The purpose and usefulness of Word2vec is to group the vectors of similar words together in vectorspace. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention.

Given enough data, usage and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to establish a word's association with other words (e.g. "man" is to "boy" what "woman" is to "girl"), or cluster documents and classify them by topic. Those clusters can form the basis of search, sentiment analysis and recommendations in such diverse fields as scientific research, legal discovery, e-commerce and customer relationship management.

The output of the Word2vec neural net is a vocabulary in which each item has a vector attached to it, which can be fed into a deep-learning net or simply queried to detect relationships between words.

Measuring cosine similarity, no similarity is expressed as a 90 degree angle, while total similarity of 1 is a 0 degree angle, complete overlap; i.e. Sweden equals Sweden, while Norway has a cosine distance of 0.760124 from Sweden, the highest of any other country.

Here's a list of words associated with "Sweden" using Word2vec, in order of proximity:

The nations of Scandinavia and several wealthy, northern European, Germanic countries are among the top nine.

Neural Word Embeddings

The vectors we use to represent words are called neural word embeddings, and representations are strange. One thing describes another, even though those two things are radically different. As Elvis Costello said: "Writing about music is like dancing about architecture." Word2vec "vectorizes" about words, and by doing so it makes natural language computer-readable -- we can start to perform powerful mathematical operations on words to detect their similarities.

So a neural word embedding represents a word with numbers. It's a simple, yet unlikely, translation.

Word2vec is similar to an autoencoder, encoding each word in a vector, but rather than training against the input words through reconstruction word2vec trains words against other words that neighbor them in the input corpus.

It does so in one of two ways, either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram. We use the latter method because it produces more accurate results on large datasets.

When the feature vector assigned to a word cannot be used to accurately predict that word's context, the components of the vector are adjusted. Each word's context in the corpus is the teacher sending error signals back to adjust the feature vector. The vectors of words judged similar by their context are nudged closer together by adjusting the numbers in the vector.

Just as Van Gogh's painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.

Those numbers locate each word as a point in 500-dimensional vectorspace. Spaces of more than three dimensions are difficult to visualize. (Geoff Hinton, teaching people to imagine 13-dimensional space, suggests that students first picture 3-dimensional space and then say to themselves: "Thirteen, thirteen, thirteen." :)

A well trained set of word vectors will place similar words close to each other in that space. The words oak, elm and birch might cluster in one corner, while war, conflict and strife huddle together in another.

Similar things and ideas are shown to be "close". Their relative meanings have been translated to measurable distances. Qualities become quantities, and algorithms can do their work. But similarity is just the basis of many associations that Word2vec can learn. For example, it can gauge relations between words of one language, and map them to another.

These vectors are the basis of a more comprehensive geometry of words. As shown in the graph, capital cities such as Rome, Paris, Berlin and Beijing cluster near each other, and they will each have similar distances in vectorspace to their countries; i.e. Rome - Italy = Beijing - China. If you only knew that Rome was the capital of Italy, and were wondering about the capital of China, then the equation Rome -Italy + China would return Beijing. No kidding.

Amusing Word2Vec Results

Let's look at some other associations Word2vec can produce.

Instead of the pluses, minus and equals signs, we'll give you the results in the notation of logical analogies, where : means "is to" and :: means "as"; e.g. "Rome is to Italy as Beijing is to China" = Rome:Italy::Beijing:China. In the last spot, rather than supplying the "answer", we'll give you the list of words that a Word2vec model proposes, when given the first three elements:

king:queen::man:[woman, Attempted abduction, teenager, girl] 
//Weird, but you can kind of see it

China:Taiwan::Russia:[Ukraine, Moscow, Moldova, Armenia]
//Two large countries and their small, estranged neighbors

house:roof::castle:[dome, bell_tower, spire, crenellations, turrets]

knee:leg::elbow:[forearm, arm, ulna_bone]

New York Times:Sulzberger::Fox:[Murdoch, Chernin, Bancroft, Ailes]
//The Sulzberger-Ochs family owns and runs the NYT.
//The Murdoch family owns News Corp., which owns Fox News. 
//Peter Chernin was News Corp.'s COO for 13 yrs.
//Roger Ailes is president of Fox News. 
//The Bancroft family sold the Wall St. Journal to News Corp.

love:indifference::fear:[apathy, callousness, timidity, helplessness, inaction]
//the poetry of this single array is simply amazing...

Donald Trump:Republican::Barack Obama:[Democratic, GOP, Democrats, McCain]
//It's interesting to note that, just as Obama and McCain were rivals,
//so too, Word2vec thinks Trump has a rivalry with the idea Republican.

monkey:human::dinosaur:[fossil, fossilized, Ice_Age_mammals, fossilization]
//Humans are fossilized monkeys? Humans are what's left 
//over from monkeys? Humans are the species that beat monkeys
//just as Ice Age mammals beat dinosaurs? Plausible.

building:architect::software:[programmer, SecurityCenter, WinPcap]

This model was trained on the Google News vocab, which you can import and play with. Contemplate, for a moment, that the Word2vec algorithm has never been taught a single rule of English syntax. It knows nothing about the world, and is unassociated with any rules-based symbolic logic or knowledge graph. And yet it learns more, in a flexible and automated fashion, than most knowledge graphs will learn after a years of human labor. It comes to the Google News documents as a blank slate, and by the end of training, it can compute complex analogies that mean something to humans.

You can also query a Word2vec model for other assocations. Not everything has to be two analogies that mirror each other. (We explain how below....)

  • Geopolitics: Iraq - Violence = Jordan

  • Distinction: Human - Animal = Ethics

  • President - Power = Prime Minister

  • Library - Books = Hall

  • Analogy: Stock Market ≈ Thermometer

By building a sense of one word's proximity to other similar words, which do not necessarily contain the same letters, we have moved beyond hard tokens to a smoother and more general sense of meaning.

Just Give Me the Code

Anatomy of Word2vec in DL4J

Here are Deeplearning4j's natural-language processing components:

  • SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.

  • Tokenizer/TokenizerFactory: Used in tokenizing the text. In NLP terms, a sentence is represented as a series of tokens. A TokenizerFactory creates an instance of a tokenizer for a "sentence."

  • VocabCache: Used for tracking metadata including word counts, document occurrences, the set of tokens (not vocab in this case, but rather tokens that have occurred), vocab (the features included in both bag of words as well as the word vector lookup table)

  • Inverted Index: Stores metadata about where words occurred. Can be used for understanding the dataset. A Lucene index with the Lucene implementation[1] is automatically created.

While Word2vec refers to a family of related algorithms, this implementation uses Negative Sampling.

Word2Vec Setup

Create a new project in IntelliJ using Maven. If you don't know how to do that, see our Quickstart page. Then specify these properties and dependencies in the POM.xml file in your project's root directory (You can check Maven for the most recent versions -- please use those...).

Loading Data

Now create and name a new class in Java. After that, you'll take the raw sentences in your .txt file, traverse them with your iterator, and subject them to some sort of preprocessing, such as converting all words to lowercase.

String filePath = new ClassPathResource("raw_sentences.txt").getFile().getAbsolutePath();

log.info("Load & Vectorize Sentences....");
// Strip white space before and after for each line
SentenceIterator iter = new BasicLineIterator(filePath);

If you want to load a text file besides the sentences provided in our example, you'd do this:

log.info("Load data....");
SentenceIterator iter = new LineSentenceIterator(new File("/Users/cvn/Desktop/file.txt"));
iter.setPreProcessor(new SentencePreProcessor() {
    @Override
    public String preProcess(String sentence) {
        return sentence.toLowerCase();
    }
});

That is, get rid of the ClassPathResource and feed the absolute path of your .txt file into the LineSentenceIterator.

SentenceIterator iter = new LineSentenceIterator(new File("/your/absolute/file/path/here.txt"));

In bash, you can find the absolute file path of any directory by typing pwd in your command line from within that same directory. To that path, you'll add the file name and voila.

Tokenizing the Data

Word2vec needs to be fed words rather than whole sentences, so the next step is to tokenize the data. To tokenize a text is to break it up into its atomic units, creating a new token each time you hit a white space, for example.

// Split on white spaces in the line to get words
TokenizerFactory t = new DefaultTokenizerFactory();
t.setTokenPreProcessor(new CommonPreprocessor());

That should give you one word per line.

Training the Model

Now that the data is ready, you can configure the Word2vec neural net and feed in the tokens.

log.info("Building model....");
Word2Vec vec = new Word2Vec.Builder()
        .minWordFrequency(5)
        .layerSize(100)
        .seed(42)
        .windowSize(5)
        .iterate(iter)
        .tokenizerFactory(t)
        .build();

log.info("Fitting Word2Vec model....");
vec.fit();

This configuration accepts a number of hyperparameters. A few require some explanation:

  • batchSize is the amount of words you process at a time.

  • minWordFrequency is the minimum number of times a word must appear in the corpus. Here, if it appears less than 5 times, it is not learned. Words must appear in multiple contexts to learn useful features about them. In very large corpora, it's reasonable to raise the minimum.

  • useAdaGrad - Adagrad creates a different gradient for each feature. Here we are not concerned with that.

  • layerSize specifies the number of features in the word vector. This is equal to the number of dimensions in the featurespace. Words represented by 500 features become points in a 500-dimensional space.

  • learningRate is the step size for each update of the coefficients, as words are repositioned in the feature space.

  • minLearningRate is the floor on the learning rate. Learning rate decays as the number of words you train on decreases. If learning rate shrinks too much, the net's learning is no longer efficient. This keeps the coefficients moving.

  • iterate tells the net what batch of the dataset it's training on.

  • tokenizer feeds it the words from the current batch.

  • vec.fit() tells the configured net to begin training.

Evaluating the Model, Using Word2vec

The next step is to evaluate the quality of your feature vectors.

// Write word vectors
WordVectorSerializer.writeWordVectors(vec, "pathToWriteto.txt");

log.info("Closest Words:");
Collection<String> lst = vec.wordsNearest("day", 10);
System.out.println(lst);
//output: [night, week, year, game, season, during, office, until, -]

The line vec.similarity("word1","word2") will return the cosine similarity of the two words you enter. The closer it is to 1, the more similar the net perceives those words to be (see the Sweden-Norway example above). For example:

double cosSim = vec.similarity("day", "night");
System.out.println(cosSim);
//output: 0.7704452276229858

With vec.wordsNearest("word1", numWordsNearest), the words printed to the screen allow you to eyeball whether the net has clustered semantically similar words. You can set the number of nearest words you want with the second parameter of wordsNearest. For example:

Collection<String> lst3 = vec.wordsNearest("man", 10);
System.out.println(lst3);
//output: [director, company, program, former, university, family, group, such, general]

Saving, Reloading & Using the Model

You'll want to save the model. The normal way to save models in Deeplearning4j is via the serialization utils (Java serialization is akin to Python pickling, converting an object into a series of bytes).

log.info("Save vectors....");
WordVectorSerializer.writeWord2VecModel(vec, "pathToSaveModel.txt");

This will save the vectors to a file called pathToSaveModel.txt that will appear in the root of the directory where Word2vec is trained. The output in the file should have one word per line, followed by a series of numbers that together are its vector representation.

To keep working with the vectors, simply call methods on vec like this:

Collection<String> kingList = vec.wordsNearest(Arrays.asList("king", "woman"), Arrays.asList("queen"), 10);

The classic example of Word2vec's arithmetic of words is "king - queen = man - woman" and its logical extension "king - queen + woman = man".

The example above will output the 10 nearest words to the vector king - queen + woman, which should include man. The first parameter for wordsNearest has to include the "positive" words king and woman, which have a + sign associated with them; the second parameter includes the "negative" word queen, which is associated with the minus sign (positive and negative here have no emotional connotation); the third is the length of the list of nearest words you would like to see. Remember to add this to the top of the file: import java.util.Arrays;.

Any number of combinations is possible, but they will only return sensible results if the words you query occurred with enough frequency in the corpus. Obviously, the ability to return similar words (or documents) is at the foundation of both search and recommendation engines.

You can reload the vectors into memory like this:

Word2Vec word2Vec = WordVectorSerializer.readWord2VecModel("pathToSaveModel.txt");

You can then use Word2vec as a lookup table:

WeightLookupTable weightLookupTable = word2Vec.lookupTable();
Iterator<INDArray> vectors = weightLookupTable.vectors();
INDArray wordVectorMatrix = word2Vec.getWordVectorMatrix("myword");
double[] wordVector = word2Vec.getWordVector("myword");

If the word isn't in the vocabulary, Word2vec returns zeros.

Importing Word2vec Models

The Google News Corpus model we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.

If you trained with the C vectors or Gensim, this line will import the model.

File gModel = new File("/Developer/Vector Models/GoogleNews-vectors-negative300.bin.gz");
Word2Vec vec = WordVectorSerializer.readWord2VecModel(gModel);

Remember to add import java.io.File; to your imported packages.

With large models, you may run into trouble with your heap space. The Google model may take as much as 10G of RAM, and the JVM only launches with 256 MB of RAM, so you have to adjust your heap space. You can do that either with a bash_profile file (see our Troubleshooting section), or through IntelliJ itself:

//Click:
IntelliJ Preferences > Compiler > Command Line Options 
//Then paste:
-Xms1024m
-Xmx10g
-XX:MaxPermSize=2g

N-grams & Skip-grams

Words are read into the vector one at a time, and scanned back and forth within a certain range. Those ranges are n-grams, and an n-gram is a contiguous sequence of n items from a given linguistic sequence; it is the nth version of unigram, bigram, trigram, four-gram or five-gram. A skip-gram simply drops items from the n-gram.

The skip-gram representation popularized by Mikolov and used in the DL4J implementation has proven to be more accurate than other models, such as continuous bag of words, due to the more generalizable contexts generated.

This n-gram is then fed into a neural network to learn the significance of a given word vector; i.e. significance is defined as its usefulness as an indicator of certain larger meanings, or labels.

A Working Example

Please note : The code below may be outdated. For updated examples, please see our dl4j-examples repository on Github.

Now that you have a basic idea of how to set up Word2Vec, here's one example of how it can be used with DL4J's API:

After following the instructions in the Quickstart, you can open this example in IntelliJ and hit run to see it work. If you query the Word2vec model with a word isn't contained in the training corpus, it will return null.

Troubleshooting & Tuning Word2Vec

Q: I get a lot of stack traces like this

java.lang.StackOverflowError: null
at java.lang.ref.Reference.<init>(Reference.java:254) ~[na:1.8.0_11]
at java.lang.ref.WeakReference.<init>(WeakReference.java:69) ~[na:1.8.0_11]
at java.io.ObjectStreamClass$WeakClassKey.<init>(ObjectStreamClass.java:2306) [na:1.8.0_11]
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:322) ~[na:1.8.0_11]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) ~[na:1.8.0_11]
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_11]

A: Look inside the directory where you started your Word2vec application. This can, for example, be an IntelliJ project home directory or the directory where you typed Java at the command line. It should have some directories that look like:

ehcache_auto_created2810726831714447871diskstore  
ehcache_auto_created4727787669919058795diskstore
ehcache_auto_created3883187579728988119diskstore  
ehcache_auto_created9101229611634051478diskstore

You can shut down your Word2vec application and try to delete them.

Q: Not all of the words from my raw text data are appearing in my Word2vec object…

A: Try to raise the layer size via .layerSize() on your Word2Vec object like so

Word2Vec vec = new Word2Vec.Builder().layerSize(300).windowSize(5)
        .layerSize(300).iterate(iter).tokenizerFactory(t).build();

Q: How do I load my data? Why does training take forever?

A: If all of your sentences have been loaded as one sentence, Word2vec training could take a very long time. That's because Word2vec is a sentence-level algorithm, so sentence boundaries are very important, because co-occurrence statistics are gathered sentence by sentence. (For GloVe, sentence boundaries don't matter, because it's looking at corpus-wide co-occurrence. For many corpora, average sentence length is six words. That means that with a window size of 5 you have, say, 30 (random number here) rounds of skip-gram calculations. If you forget to specify your sentence boundaries, you may load a "sentence" that's 10,000 words long. In that case, Word2vec would attempt a full skip-gram cycle for the whole 10,000-word "sentence". In DL4J's implementation, a line is assumed to be a sentence. You need plug in your own SentenceIterator and Tokenizer. By asking you to specify how your sentences end, DL4J remains language-agnostic. UimaSentenceIterator is one way to do that. It uses OpenNLP for sentence boundary detection.

Q: Why is there such a difference in performance when feeding whole documents as one "sentence" vs splitting into Sentences?

_A:_If average sentence contains 6 words, and window size is 5, maximum theoretical number of 10 skipgram rounds will be achieved on 0 words. Sentence isn't long enough to have full window set with words. Rough maximum number of 5 sg rounds is available there for all words in such sentence.

But if your "sentence" is 1000k words length, you'll have 10 skipgram rounds for every word in this sentence, excluding the first 5 and last five. So, you'll have to spend WAY more time building model + cooccurrence statistics will be shifted due to the absense of sentence boundaries.

Q: How does Word2Vec Use Memory?

A: The major memory consumer in w2v is weights matrix. Math is simple there: NumberOfWords x NumberOfDimensions x 2 x DataType memory footprint.

So, if you build w2v model for 100k words using floats, and 100 dimensions, your memory footprint will be 100k x 100 x 2 x 4 (float size) = 80MB RAM just for matri + some space for strings, variables, threads etc.

If you load pre-built model, it uses roughly 2 times less RAM then during build time, so it's 40MB RAM.

And the most popular model used so far is Google News model. There's 3M words, and vector size 300. That gives us 3.6GB only to load model. And you have to add 3M of strings, that do not have constant size in java. So, usually that's something around 4-6GB for loaded model depending on jvm version/supplier, gc state and phase of the moon.

Q: I did everything you said and the results still don't look right.

A: Make sure you're not hitting into normalization issues. Some tasks, like wordsNearest(), use normalized weights by default, and others require non-normalized weights. Pay attention to this difference.

Use Cases

Google Scholar keeps a running tally of the papers citing Deeplearning4j's implementation of Word2vec here.

Kenny Helsens, a data scientist based in Belgium, applied Deeplearning4j's implementation of Word2vec to the NCBI's Online Mendelian Inheritance In Man (OMIM) database. He then looked for the words most similar to alk, a known oncogene of non-small cell lung carcinoma, and Word2vec returned: "nonsmall, carcinomas, carcinoma, mapdkd." From there, he established analogies between other cancer phenotypes and their genotypes. This is just one example of the associations Word2vec can learn on a large corpus. The potential for discovering new aspects of important diseases has only just begun, and outside of medicine, the opportunities are equally diverse.

Andreas Klintberg trained Deeplearning4j's implementation of Word2vec on Swedish, and wrote a thorough walkthrough on Medium.

Word2Vec is especially useful in preparing text-based data for information retrieval and QA systems, which DL4J implements with deep autoencoders.

Marketers might seek to establish relationships among products to build a recommendation engine. Investigators might analyze a social graph to surface members of a single group, or other relations they might have to location or financial sponsorship.

Google's Word2vec Patent

Word2vec is a method of computing vector representations of words introduced by a team of researchers at Google led by Tomas Mikolov. Google hosts an open-source version of Word2vec released under an Apache 2.0 license. In 2014, Mikolov left Google for Facebook, and in May 2015, Google was granted a patent for the method, which does not abrogate the Apache license under which it has been released.

Foreign Languages

While words in all languages may be converted into vectors with Word2vec, and those vectors learned with Deeplearning4j, NLP preprocessing can be very language specific, and requires tools beyond our libraries. The Stanford Natural Language Processing Group has a number of Java-based tools for tokenization, part-of-speech tagging and named-entity recognition for languages such as Mandarin Chinese, Arabic, French, German and Spanish. For Japanese, NLP tools like Kuromoji are useful. Other foreign-language resources, including text corpora, are available here.

GloVe: Global Vectors

Loading and saving GloVe models to word2vec can be done like so:

WordVectors wordVectors = WordVectorSerializer.loadTxtVectors(new File("glove.6B.50d.txt"));

Sequence Vectors

Deeplearning4j has a class called SequenceVectors, which is one level of abstraction above word vectors, and which allows you to extract features from any sequence, including social media profiles, transactions, proteins, etc. If data can be described as sequence, it can be learned via skip-gram and hierarchic softmax with the AbstractVectors class. This is compatible with the DeepWalk algorithm, also implemented in Deeplearning4j.

Word2Vec Features on Deeplearning4j

  • Weights update after model serialization/deserialization was added. That is, you can update model state with, say, 200GB of new text by calling loadFullModel, adding TokenizerFactory and SentenceIterator to it, and calling fit() on the restored model.

  • Option for multiple datasources for vocab construction was added.

  • Epochs and Iterations can be specified separately, although they are both typically "1".

  • Word2Vec.Builder has this option: hugeModelExpected. If set to true, the vocab will be periodically truncated during the build.

  • While minWordFrequency is useful for ignoring rare words in the corpus, any number of words can be excluded to customize.

  • Two new WordVectorsSerialiaztion methods have been introduced: writeFullModel and loadFullModel. These save and load a full model state.

  • A decent workstation should be able to handle a vocab with a few million words. Deeplearning4j's Word2vec imlementation can model a few terabytes of data on a single machine. Roughly, the math is: vectorSize * 4 * 3 * vocab.size().

Doc2vec & Other NLP Resources

  • DL4J Example of Text Classification With Word2vec & RNNs

  • DL4J Example of Text Classification With Paragraph Vectors

  • Doc2vec, or Paragraph Vectors, With Deeplearning4j

  • Thought Vectors, Natural Language Processing & the Future of AI

  • Quora: How Does Word2vec Work?

  • Quora: What Are Some Interesting Word2Vec Results?

  • Mikolov's Original Word2vec Code @Google

  • word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method; Yoav Goldberg and Omer Levy

  • Advances in Pre-Training Distributed Word Representations - by Mikolov et al

Word2Vec in Literature

It's like numbers are language, like all the letters in the language are turned into numbers, and so it's something that everyone understands the same way. You lose the sounds of the letters and whether they click or pop or touch the palate, or go ooh or aah, and anything that can be misread or con you with its music or the pictures it puts in your mind, all of that is gone, along with the accent, and you have a new understanding entirely, a language of numbers, and everything becomes as clear to everyone as the writing on the wall. So as I say there comes a certain time for the reading of the numbers.
    -- E.L. Doctorow, Billy Bathgate

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

Details: https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Data: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data Image: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

  • param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

  • param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

  • see https://archive.ics.uci.edu/ml/datasets/Iris

  • param batch Batch size

  • param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

  • param batchSize the batch size of the examples

  • param numExamples the overall number of examples

  • param imgDim an array of height, width and channels

  • param numLabels the overall number of examples

  • param useSubset use a subset of the LFWDataSet

  • param labelGenerator path label generator to use

  • param train true if use train value

  • param splitTrainTest the percentage to split data for train and remainder goes to test

  • param imageTransform how to transform the image

  • param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

  • param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

  • COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes

  • MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z

  • BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)

  • LETTERS: 145,600 examples total. 26 balanced classes

  • DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

  • param dataSet Dataset (subset) to return

  • param batchSize Batch size

  • param train If true: use training set. If false: use test set

  • param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

  • param dataSet Subset to get

  • return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

  • param dataSet Subset to get

  • return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

  • return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

  • param recordReader Record reader to use as the source of data

  • param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

  • param recordReader RecordReader: provides the source of the data

  • param batchSize Batch size (number of examples) for the output DataSet objects

  • param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()

  • param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

  • param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

  • param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

  • param labelIndexFrom Column index of the first regression target (indexes start at 0)

  • param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

  • param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1

  • param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

  • param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

  • param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

  • param featuresReader SequenceRecordReader for the features

  • param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1

  • param miniBatchSize Minibatch size for each call of next()

  • param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

  • param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

  • param iterator Underlying iterator to wrap and fetch asynchronously from

  • param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

  • return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

  • param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

  • return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)
  • param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)
  • param baseIterator

  • param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches

  • param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

  • return

next

public MultiDataSet next(int num)

This method returns test iterator instance

  • return

AsyncShieldDataSetIterator

[source]

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)
  • param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

  • param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

  • return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

  • param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)
  • param numMiniBatches Number of minibatches per epoch

  • param features Each triple in the list specifies the shape, array order and type of values for the features arrays

  • param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)
  • param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

  • param shape Shape of the features

  • param order Order (‘c’ or ‘f’) for the array

  • param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

  • param shape Shape of the features

  • param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

  • param shape Shape of the features

  • param order Order (‘c’ or ‘f’) for the array

  • param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

  • param shape Shape of the array

  • param values Values to fill the array with

  • return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

  • param shape Shape of the array

  • param order Order of array (‘c’ or ‘f’)

  • param values Values to fill the array with

  • return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

  • param underlyingIterator, iterator to wrap

  • param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

  • param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

  • param iterator Iterator to wrap

  • param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

  • param underlyingIterator, iterator to wrap

  • param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

  • param baseIterator - iterator to be wrapped and split

  • param totalBatches - total batches in baseIterator

  • param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

  • return

next

public DataSet next(int i)

This method returns test iterator instance

  • return

JointMultiDataSetIterator

[source]

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)
  • param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)
  • param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet

  • param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

  • return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)
  • param files List of files to iterate over

  • param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)
  • param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)
  • param numMiniBatches Number of minibatches per epoch

  • param featuresShape Features shape

  • param labelsShape Labels shape

  • param featureValues Type of values for the features

  • param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator