1 of 16

Reference

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-zoo</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

import org.deeplearning4j.zoo.model.AlexNet
import org.deeplearning4j.zoo.*;

...

int numberOfClassesInYourData = 1000;
int randomSeed = 123;

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
Model net = zooModel.init();

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
MultiLayerConfiguration net = ((AlexNet) zooModel).conf();

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

import org.deeplearning4j.zoo.model.VGG16;
import org.deeplearning4j.zoo.*;

...

ZooModel zooModel = VGG16.builder().build();;
Model net = zooModel.initPretrained(PretrainedType.IMAGENET);

And initialize another VGG16 model with weights trained on VGGFace:

ZooModel zooModel = VGG16.builder().build();
Model net = zooModel.initPretrained(PretrainedType.VGGFACE);

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this deeplearning4j-zoo Github link.

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

int numberOfClassesInYourData = 10;
int randomSeed = 123;

ZooModel zooModel = ResNet50.builder()
        .numClasses(numberOfClassesInYourData)
        .seed(randomSeed)
        .build();
zooModel.setInputShape(new int[][]{{3, 28, 28}});

Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J here.

Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see this section.

Zoo Models

Available models

AlexNet

[source]

AlexNet

Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt

Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.

Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code

Darknet19

[source]

Darknet19 Reference: https://arxiv.org/pdf/1612.08242.pdf ImageNet weights for this model are available and have been converted from https://pjreddie.com/darknet/imagenet/ using https://github.com/allanzelener/YAD2K .

There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.list .

FaceNetNN4Small2

[source]

A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: https://arxiv.org/abs/1503.03832 Also based on the OpenFace implementation: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf

InceptionResNetV1

[source]

LeNet

[source]

LeNet was an early promising achiever on the ImageNet dataset. References:

MNIST weights for this model are available and have been converted from https://github.com/f00-/mnist-lenet-keras.

NASNet

[source]

Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.

This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.

Paper: https://arxiv.org/abs/1707.07012 ImageNet weights for this model are available and have been converted from https://keras.io/applications/.

ResNet50

[source]

Residual networks for deep learning.

Paper: https://arxiv.org/abs/1512.03385 ImageNet weights for this model are available and have been converted from https://keras.io/applications/</a>.

SimpleCNN

[source]

A simple convolutional network for generic image classification. Reference: https://github.com/oarriaga/face_classification/

SqueezeNet

[source]

U-Net

An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.

Paper: https://arxiv.org/abs/1602.07360 ImageNet weights for this model are available and have been converted from https://github.com/rcmalli/keras-squeezenet/.

TextGenerationLSTM

[source]

LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is

Architecture follows this implementation: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

Walt Whitman weights are available for generating text from his works, adapted from https://github.com/craigomac/InfiniteMonkeys.

TinyYOLO

[source]

Tiny YOLO Reference: https://arxiv.org/pdf/1612.08242.pdf

ImageNet+VOC weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2K and the following code.

String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();

ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();

System.out.println(model.summary(InputType.convolutional(416, 416, 3)));

ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>

The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

UNet

[source]

U-Net

An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

Paper: https://arxiv.org/abs/1505.04597 Weights are available for image segmentation trained on a synthetic dataset

VGG16

[source]

VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556

Deep Face Recognition http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf

ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applications. CIFAR-10 weights for this model are available and have been converted using “approach 2” from https://github.com/rajatvikramsingh/cifar10-vgg16. VGGFace weights for this model are available and have been converted from https://github.com/rcmalli/keras-vggface.

VGG19

[source]

VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556 ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applications.

Xception

[source]

U-Net

An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.

Paper: https://arxiv.org/abs/1610.02357 ImageNet weights for this model are available and have been converted from https://keras.io/applications/.

YOLO2

[source]

YOLOv2 Reference: https://arxiv.org/pdf/1612.08242.pdf

ImageNet+COCO weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2K and the following code.

String filename = “yolo.h5”; 
KerasLayer.registerCustomLayer(“Lambda”, KerasSpaceToDepth.class); 
ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false);
INDArray priors = Nd4j.create(priorBoxes);
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
 .seed(seed)
 .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
 .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
 .gradientNormalizationThreshold(1.0)
 .updater(new Adam.Builder().learningRate(1e-3).build())
 .l2(0.00001)
 .activation(Activation.IDENTITY)
 .trainingWorkspaceMode(workspaceMode)
 .inferenceWorkspaceMode(workspaceMode)
 .build();
ComputationGraph model = new TransferLearning.GraphBuilder(graph)
 .fineTuneConfiguration(fineTuneConf) 
 .addLayer(“outputs”, new Yolo2OutputLayer.Builder() 
                      .boundingBoxPriors(priors)
                      .build(), “conv2d_23”)
 .setOutputs(“outputs”)
 .build();
System.out.println(model.summary(InputType.convolutional(608, 608, 3)));
ModelSerializer.writeModel(model, “yolo2_dl4j_inference.v1.zip”, false); }

The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

pretrainedUrl

public String pretrainedUrl(PretrainedType pretrainedType)

Default prior boxes for the model

Activations

Special algorithms for gradient descent.

What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
    // add hyperparameters and other layers
    .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
    // add more layers and output
    .build();

Available activations

ActivationRectifiedTanh

[source]

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

ActivationELU

[source]

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

ActivationReLU

[source]

f(x) = max(0, x)

ActivationRationalTanh

[source]

Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

ActivationThresholdedReLU

[source]

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

ActivationReLU6

[source]

f(x) = min(max(input, cutoff), 6)

ActivationHardTanH

[source]

          ⎧  1, if x >  1
 f(x) =   ⎨ -1, if x < -1
          ⎩  x, otherwise

ActivationSigmoid

[source]

f(x) = 1 / (1 + exp(-x))

ActivationGELU

[source]

GELU activation function - Gaussian Error Linear Units

ActivationPReLU

[source]

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

ActivationIdentity

[source]

f(x) = x

ActivationSoftSign

[source]

f_i(x) = x_i / (1+

x_i

)

ActivationHardSigmoid

[source]

f(x) = min(1, max(0, 0.2x + 0.5))

ActivationSoftmax

[source]

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

ActivationCube

[source]

f(x) = x^3

ActivationRReLU

[source]

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

Empirical Evaluation of Rectified Activations in Convolutional Network

ActivationTanH

[source]

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

ActivationSELU

[source]

https://arxiv.org/pdf/1706.02515.pdf

ActivationLReLU

[source]

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

ActivationSwish

[source]

f(x) = x sigmoid(x)

ActivationSoftPlus

[source]

f(x) = log(1+e^x)

Auto Encoders

What are autoencoders?

Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

Where’s Restricted Boltzmann Machine?

RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

Supported layers

AutoEncoder

[source]

Autoencoder layer. Adds noise to input and learn a reconstruction function.

corruptionLevel

public Builder corruptionLevel(double corruptionLevel)

Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

sparsity

public Builder sparsity(double sparsity)

Autoencoder sparity parameter

param sparsity Sparsity

VariationalAutoencoder

[source]

Variational Autoencoder layer

See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes - https://arxiv.org/abs/1312.6114

This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

encoderLayerSizes

public Builder encoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

setEncoderLayerSizes

public void setEncoderLayerSizes(int... encoderLayerSizes)

param encoderLayerSizes Size of each encoder layer in the variational autoencoder

decoderLayerSizes

public Builder decoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

setDecoderLayerSizes

public void setDecoderLayerSizes(int... decoderLayerSizes)

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

reconstructionDistribution

public Builder reconstructionDistribution(ReconstructionDistribution distribution)

The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:

{- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data
{- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data
param distribution Reconstruction distribution

lossFunction

public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

pzxActivationFn

public Builder pzxActivationFn(IActivation activationFunction)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activationFunction Activation function for p(z| x)

pzxActivationFunction

public Builder pzxActivationFunction(Activation activation)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activation Activation function for p(z | x)

nOut

public Builder nOut(int nOut)

Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

param nOut Size of P(Z | data) and output size

numSamples

public Builder numSamples(int numSamples)

Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

param numSamples Number of samples per data point for pretraining

Computation Graph

How to build complex networks with DL4J computation graph.

Building Complex Network Architectures with Computation Graph

This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.

Overview of Computation Graph

DL4J has two types of networks comprised of multiple layers:

The MultiLayerNetwork, which is essentially a stack of neural network layers (with a single input layer and single output layer), and
The ComputationGraph, which allows for greater freedom in network architectures

Specifically, the ComputationGraph allows for networks to be built with the following features:

Multiple network input arrays
Multiple network outputs (including mixed classification/regression architectures)
Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)

As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.

Computation Graph: Some Example Use Cases

Examples of some architectures that can be built using ComputationGraph include:

Multi-task learning architectures
Recurrent neural networks with skip connections
GoogLeNet, a complex type of convolutional netural network for image classification
Image caption generation
Convolutional networks for sentence classification
Residual learning convolutional neural networks

Configuring a Computation Graph

Types of Graph Vertices

The basic idea is that in the ComputationGraph, the core building block is the GraphVertex, instead of layers. Layers (or, more accurately the LayerVertex objects), are but one type of vertex in the graph. Other types of vertices include:

Input Vertices
Element-wise operation vertices
Merge vertices
Subset vertices
Preprocessor vertices

These types of graph vertices are described briefly below.

LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...) method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an InputPreProcessor (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...) method.

InputVertex: Input vertices are specified by the addInputs(String...) method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).

ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.

MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.

SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1"): this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.

PreProcessorVertex: Occasionally, you might want to the functionality of an InputPreProcessor without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.

Finally, it is also possible to define custom graph vertices by implementing both a configuration and implementation class for your custom GraphVertex.

Example 1: Recurrent Network with Skip Connections

Suppose we wish to build the following recurrent neural network architecture:

For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input") //can use any label for this
    .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
    .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
    .setOutputs("L2")    //We need to specify the network outputs and their order
    .build();

ComputationGraph net = new ComputationGraph(conf);
net.init();

Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.

Example 2: Multiple Inputs and Merge Vertex

Consider the following architecture:

Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.

To build the above network, we use the following configuration:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input1", "input2")
    .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
    .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
    .addVertex("merge", new MergeVertex(), "L1", "L2")
    .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
    .setOutputs("out")
    .build();

Example 3: Multi-Task Learning

In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.

In this case, the network configuration is:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input")
        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
        .addLayer("out1", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nIn(4).nOut(3).build(), "L1")
        .addLayer("out2", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.MSE)
                .nIn(4).nOut(2).build(), "L1")
        .setOutputs("out1","out2")
        .build();

Automatically Adding PreProcessors and Calculating nIns

One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...) method in the configuration.

The setInputType method has two effects:

It will automatically add any InputPreProcessors as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.
It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...) functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.

For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))

Training Data for ComputationGraph

There are two types of data that can be used with the ComputationGraph.

DataSet and the DataSetIterator

The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).

A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see this for more details). A DataSetIterator is essentially an iterator over DataSet objects.

MultiDataSet and the MultiDataSetIterator

MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.

There are currently two ways to use a MultiDataSetIterator:

By implementing the MultiDataSetIterator interface directly
By using the RecordReaderMultiDataSetIterator in conjuction with DataVec record readers

The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:

Multiple DataVec RecordReaders may be used simultaneously
The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader
It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output
It is possible to convert single columns from a class index to a one-hot representation

Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find these unit tests to be useful.

Example 1: Regression Data (RecordReaderMultiDataSetIterator)

Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";
RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String csvPath = "/path/to/my/file.csv";
rr.initialize(new FileSplit(new File(csvPath)));

int batchSize = 4;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("myReader",rr)
        .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
        .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
        .build();

Example 2: Classification and Multi-Task Learning (RecordReaderMultiDataSetIterator)

Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:

Input file: myInput.csv, and we want to use all columns as input (without modification)
Output file: myOutput.csv.
- Network output 1 - regression: columns 0 to 3
- Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.

In this case, we can build our iterator as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";

RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String featuresCsvPath = "/path/to/my/myInput.csv";
featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));

RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String labelsCsvPath = "/path/to/my/myOutput.csv";
labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));

int batchSize = 4;
int numClasses = 3;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("csvInput", featuresReader)
        .addReader("csvLabels", labelsReader)
        .addInput("csvInput") //Input: all columns from input reader
        .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
        .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
        .build();

Convolutional Layers

Also known as CNN.

Available layers

Convolution1D

[source]

1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]

Convolution2D

[source]

2D convolution layer

Convolution3D

[source]

3D convolution layer configuration

hasBias

public boolean hasBias()

An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].

kernelSize

public Builder kernelSize(int... kernelSize)

The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

stride

public Builder stride(int... stride)

Set stride size for 3D convolutions in (depth, height, width) order

param stride kernel size
return 3D convolution layer builder

padding

public Builder padding(int... padding)

Set padding size for 3D convolutions in (depth, height, width) order

param padding kernel size
return 3D convolution layer builder

dilation

public Builder dilation(int... dilation)

Set dilation size for 3D convolutions in (depth, height, width) order

param dilation kernel size
return 3D convolution layer builder

dataFormat

public Builder dataFormat(DataFormat dataFormat)

param dataFormat Data format to use for activations

setKernelSize

public void setKernelSize(int... kernelSize)

Set kernel size for 3D convolutions in (depth, height, width) order

param kernelSize kernel size

setStride

public void setStride(int... stride)

Set stride size for 3D convolutions in (depth, height, width) order

param stride kernel size

setPadding

public void setPadding(int... padding)

Set padding size for 3D convolutions in (depth, height, width) order

param padding kernel size

setDilation

public void setDilation(int... dilation)

Set dilation size for 3D convolutions in (depth, height, width) order

param dilation kernel size

Deconvolution2D

[source]

2D deconvolution layer configuration

Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.

See the paper by Matt Zeiler for details: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf

For an intuitive guide to convolution arithmetic and shapes, see: https://arxiv.org/abs/1603.07285v1

hasBias

public boolean hasBias()

Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size

convolutionMode

public Builder convolutionMode(ConvolutionMode convolutionMode)

Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details

param convolutionMode Convolution mode for layer

kernelSize

public Builder kernelSize(int... kernelSize)

Size of the convolution rows/columns

param kernelSize the height and width of the kernel

Cropping1D

[source]

Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)

param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

build

public Cropping1D build()

param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

Cropping2D

[source]

Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)

param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations
param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount for top/bottom/left/right (in that order). A length 4 array.

build

public Cropping2D build()

param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.

Cropping3D

[source]

Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)

param cropDepth Amount of cropping to apply to both depth boundaries of the input activations
param cropHeight Amount of cropping to apply to both height boundaries of the input activations
param cropWidth Amount of cropping to apply to both width boundaries of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

build

public Cropping3D build()

param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

Details: https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Data: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data Image: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

see https://archive.ics.uci.edu/ml/datasets/Iris
param batch Batch size
param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

param batchSize the batch size of the examples
param numExamples the overall number of examples
param imgDim an array of height, width and channels
param numLabels the overall number of examples
param useSubset use a subset of the LFWDataSet
param labelGenerator path label generator to use
param train true if use train value
param splitTrainTest the percentage to split data for train and remainder goes to test
param imageTransform how to transform the image
param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes
MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z
BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)
LETTERS: 145,600 examples total. 26 balanced classes
DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

param dataSet Dataset (subset) to return
param batchSize Batch size
param train If true: use training set. If false: use test set
param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

param dataSet Subset to get
return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

param dataSet Subset to get
return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

param recordReader Record reader to use as the source of data
param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

param recordReader RecordReader: provides the source of the data
param batchSize Batch size (number of examples) for the output DataSet objects
param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()
param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

param labelIndexFrom Column index of the first regression target (indexes start at 0)
param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1
param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

param featuresReader SequenceRecordReader for the features
param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1
param miniBatchSize Minibatch size for each call of next()
param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

param iterator Underlying iterator to wrap and fetch asynchronously from
param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)

param baseIterator
param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches
param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public MultiDataSet next(int num)

This method returns test iterator instance

return

AsyncShieldDataSetIterator

[source]

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)

param numMiniBatches Number of minibatches per epoch
param features Each triple in the list specifies the shape, array order and type of values for the features arrays
param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)

param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

param shape Shape of the array
param values Values to fill the array with
return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

param shape Shape of the array
param order Order of array (‘c’ or ‘f’)
param values Values to fill the array with
return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap
param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

param baseIterator - iterator to be wrapped and split
param totalBatches - total batches in baseIterator
param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public DataSet next(int i)

This method returns test iterator instance

return

JointMultiDataSetIterator

[source]

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)

param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)

param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet
param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)

param files List of files to iterate over
param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)

param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

param numMiniBatches Number of minibatches per epoch
param featuresShape Features shape
param labelsShape Labels shape
param featureValues Type of values for the features
param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

param activation Activation function for the layer

activation

Activation function for the layer

activation

param activationFunction Activation function for the layer

activation

param activation Activation function for the layer

DenseLayer

Dense layer: a standard fully connected feed forward layer

hasBias

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

If true (default = false): enable layer normalization on this layer

DropoutLayer

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

param dropout Activation retain probability.

EmbeddingLayer

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

If true: include bias parameters in the layer. False (default): no bias.

weightInit

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

If true: include bias parameters in the layer. False (default): no bias.

inputLength

Set input sequence length for this embedding layer.

param inputLength input sequence length
return Builder

inferInputLength

Set input sequence inference mode for embedding layer.

param inferInputLength whether to infer input length
return Builder

weightInit

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only
CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

[miniBatchSize, vectorSize, 1] for RNN output,
[miniBatchSize, channels, 1, 1] for CNN output, and
[miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

Pooling type for global pooling

poolingType

param poolingType Pooling type for global pooling

collapseDimensions

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]
param collapseDimensions Whether to collapse the dimensions or not

pnorm

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

param pnorm P-norm constant

LocalResponseNormalization

LRN scaling constant k. Default: 2

Number of adjacent kernel maps to use when doing LRN. default: 5

param n Number of adjacent kernel maps

alpha

LRN scaling constant alpha. Default: 1e-4

param alpha Scaling constant

beta

Scaling constant beta. Default: 0.75

param beta Scaling constant

cudnnAllowFallback

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

SameDiff version of a 1D locally connected layer.

nIn

Number of inputs to the layer (input size)

nOut

param nOut Number of outputs (output size)

activation

param activation Activation function for the layer

kernelSize

param k Kernel size for the layer

stride

param s Stride for the layer

padding

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

param d Dilation for the layer

hasBias

param hasBias If true (default is false) the layer will have a bias

setInputSize

Set input filter size for this locally connected 1D layer

param inputSize height of the input filters
return Builder

LocallyConnected2D

SameDiff version of a 2D locally connected layer.

setKernel

Number of inputs to the layer (input size)

setStride

param stride Stride for the layer. Must be 2 values (height/width)

setPadding

param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

param nIn Number of inputs to the layer (input size)

nOut

param nOut Number of outputs (output size)

activation

param activation Activation function for the layer

kernelSize

param k Kernel size for the layer. Must be 2 values (height/width)

stride

param s Stride for the layer. Must be 2 values (height/width)

padding

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

param d Dilation for the layer. Must be 2 values (height/width)

hasBias

param hasBias If true (default is false) the layer will have a bias

setInputSize

Set input filter size (h,w) for this locally connected 2D layer

param inputSize pair of height and width of the input filters to this layer
return Builder

LossLayer

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

param lossFunction Loss function for the loss layer

OutputLayer

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

param lossFunction Loss function for the output layer

Pooling1D

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

Kernel size

param kernelSize kernel size

setStride

Stride

param stride stride value

setPadding

Padding

param padding padding value

Upsampling1D

sequenceLength]} Example:

size

Upsampling size

param size upsampling size in single spatial dimension of this 1D layer

size

Upsampling size int array with a single element. Array must be length 1

param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

size

Upsampling size int, used for both height and width

param size upsampling size in height and width dimensions

size

Upsampling size array

param size upsampling size in height and width dimensions

Upsampling3D

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

size

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

Padding value for left and right. Must be length 2 array

build

param padding Padding for both the left and right

ZeroPadding3DLayer

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

Padding value for top, bottom, left, and right. Must be length 4 array

build

param padHeight Padding for both the top and bottom
param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

is a learnable weight vector of length nOut

“.” is element-wise multiplication
b is a bias vector

Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

This is a report of the estimated memory consumption for the given layer

param inputType Input type to the layer. Memory consumption is often a function of the input type
return Memory report for the layer

RepeatVector

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

Set repetition factor for RepeatVector layer

setRepetitionFactor

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

repetitionFactor

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

Yolo2OutputLayer

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

Loss function for position/scale component of the loss function

param lossPositionScale Loss function for position/scale

lossClassPredictions

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

Model Listeners

Adding hooks and listeners on DL4J models.

What are listeners?

Listeners allow users to "hook" into certain events in Eclipse Deeplearning4j. This allows you to collect or print information useful for tasks like training. For example, a ScoreIterationListener allows you to print training scores from the output layer of a neural network.

Usage

To add one or more listeners to a MultiLayerNetwork or ComputationGraph, use the addListener method:

Available listeners

EvaluativeListener

This TrainingListener implementation provides simple way for model evaluation during training. It can be launched every Xth Iteration/Epoch, depending on frequency and InvocationType constructor arguments

EvaluativeListener

This callback will be invoked after evaluation finished

iterationDone

param iterator Iterator to provide data for evaluation
param frequency Frequency (in number of iterations/epochs according to the invocation type) to perform evaluation
param type Type of value for ‘frequency’ - iteration end, epoch end, etc

ScoreIterationListener

Score iteration listener. Reports the score (value of the loss function )of the network during training every N iterations

ScoreIterationListener

param printIterations frequency with which to print scores (i.e., every printIterations parameter updates)

ComposableIterationListener

A group of listeners

CollectScoresIterationListener

CollectScoresIterationListener simply stores the model scores internally (along with the iteration) every 1 or N iterations (this is configurable). These scores can then be obtained or exported.

CollectScoresIterationListener

Constructor for collecting scores with default saving frequency of 1

iterationDone

Constructor for collecting scores with the specified frequency.

param frequency Frequency with which to collect/save scores

exportScores

Export the scores in tab-delimited (one per line) UTF-8 format.

exportScores

Export the scores in delimited (one per line) UTF-8 format with the specified delimiter

param outputStream Stream to write to
param delimiter Delimiter to use

exportScores

Export the scores to the specified file in delimited (one per line) UTF-8 format, tab delimited

param file File to write to

exportScores

Export the scores to the specified file in delimited (one per line) UTF-8 format, using the specified delimiter

param file File to write to
param delimiter Delimiter to use for writing scores

CheckpointListener

CheckpointListener: The goal of this listener is to periodically save a copy of the model during training.. Model saving may be done:

Every N epochs
Every N iterations
Every T time units (every 15 minutes, for example) Or some combination of the 3. Example 1: Saving a checkpoint every 2 epochs, keep all model files

Example 2: Saving a checkpoint every 1000 iterations, but keeping only the last 3 models (all older model files will be automatically deleted)

Example 3: Saving a checkpoint every 15 minutes, keeping the most recent 3 and otherwise every 4th checkpoint file:

Note that you can mix these: for example, to save every epoch and every 15 minutes (independent of last save time): To save every epoch, and every 15 minutes, since the last model save use: Note that is this last example, the sinceLast parameter is true. This means the 15-minute counter will be reset any time a model is saved.

CheckpointListener

List all available checkpoints. A checkpoint is ‘available’ if the file can be loaded. Any checkpoint files that have been automatically deleted (given the configuration) will not be returned here.

return List of checkpoint files that can be loaded

SharedGradient

SleepyTrainingListener

This TrainingListener implementation provides a way to “sleep” during specific Neural Network training phases. Suitable for debugging/testing purposes only.

PLEASE NOTE: All timers treat time values as milliseconds. PLEASE NOTE: Do not use it in production environment.

onEpochStart

In this mode parkNanos() call will be used, to make process really idle

CollectScoresListener

A simple listener that collects scores to a list every N iterations. Can also optionally log the score.

PerformanceListener

Simple IterationListener that tracks time spend on training per iteration.

PerformanceListener

This method defines, if iteration number should be reported together with other data

param reportIteration
return

ParamAndGradientIterationListener

An iteration listener that provides details on parameters and gradients at each iteration during traning. Attempts to provide much of the same information as the UI histogram iteration listener, but in a text-based format (for example, when learning on a system accessed via SSH etc). i.e., is intended to aid network tuning and debugging This iteration listener is set up to calculate mean, min, max, and mean absolute value of each type of parameter and gradient in the network at each iteration.

TimeIterationListener

Time Iteration Listener. This listener displays into INFO logs the remaining time in minutes and the date of the end of the process. Remaining time is estimated from the amount of time for training so far, and the total number of iterations specified by the user

TimeIterationListener

Constructor

param iterationCount The global number of iteration for training (all epochs)

Saving and Loading Models

Saving and loading of neural networks.

MultiLayerNetwork and ComputationGraph both have save and load methods.

You can save/load a MultiLayerNetwork using:

Similarly, you can save/load a ComputationGraph using:

Internally, these methods use the ModelSerializer class, which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multi layer network, the second one saves a .

Here is a with code to save a computation graph using the ModelSerializer class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.

RNG Seed

If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:

This will guarantee equal results between sessions/JVMs.

ModelSerializer

Utility class suited to save/restore neural net models

writeModel

Write a model to a file

param model the model to write
param file the file to write to
param saveUpdater whether to save the updater or not
throws IOException

writeModel

Write a model to a file

param model the model to write
param file the file to write to
param saveUpdater whether to save the updater or not
param dataNormalization the normalizer to save (optional)
throws IOException

writeModel

Write a model to a file path

param model the model to write
param path the path to write to
param saveUpdater whether to save the updater or not
throws IOException

writeModel

Write a model to an output stream

param model the model to save
param stream the output stream to write to
param saveUpdater whether to save the updater for the model or not
throws IOException

writeModel

Write a model to an output stream

param model the model to save
param stream the output stream to write to
param saveUpdater whether to save the updater for the model or not
param dataNormalization the normalizer ot save (may be null)
throws IOException

restoreMultiLayerNetwork

Load a multi layer network from a file

param file the file to load from
return the loaded multi layer network
throws IOException

restoreMultiLayerNetwork

Load a multi layer network from a file

param file the file to load from
return the loaded multi layer network
throws IOException

restoreMultiLayerNetwork

Load a MultiLayerNetwork from InputStream from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

param is the inputstream to load from
return the loaded multi layer network
throws IOException
see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

restoreMultiLayerNetwork

Restore a multi layer network from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

param is the input stream to restore from
return the loaded multi layer network
throws IOException
see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

restoreMultiLayerNetwork

Load a MultilayerNetwork model from a file

param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException

restoreMultiLayerNetwork

Load a MultilayerNetwork model from a file

param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException

restoreComputationGraph

Restore a MultiLayerNetwork and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

param is Input stream to read from
param loadUpdater Whether to load the updater from the model or not
return Model and normalizer, if present
throws IOException If an error occurs when reading from the stream

restoreComputationGraph

Load a computation graph from a file

param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException

restoreComputationGraph

Load a computation graph from a InputStream

param is the inputstream to get the computation graph from
return the loaded computation graph
throws IOException

restoreComputationGraph

Load a computation graph from a InputStream

param is the inputstream to get the computation graph from
return the loaded computation graph
throws IOException

restoreComputationGraph

Load a computation graph from a file

param file the file to get the computation graph from
return the loaded computation graph
throws IOException

restoreComputationGraph

Restore a ComputationGraph and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

param is Input stream to read from
param loadUpdater Whether to load the updater from the model or not
return Model and normalizer, if present
throws IOException If an error occurs when reading from the stream

taskByModel

param model
return

addNormalizerToModel

This method appends normalizer to a given persisted model.

PLEASE NOTE: File should be model file saved earlier with ModelSerializer

param f
param normalizer

addObjectToFile

Add an object to the (already existing) model file using Java Object Serialization. Objects can be restored using {- link #getObjectFromFile(File, String)}

param f File to add the object to
param key Key to store the object under
param o Object to store using Java object serialization

Multi Layer Network

Simple and sequential network configuration.

The MultiLayerNetwork class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph.

You will not want to use MultiLayerNetwork configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.

Usage

The example below shows how to build a simple linear classifier using DenseLayer (a basic multiperceptron layer).

You can also create convolutional configurations:

Recurrent Layers

Recurrent Neural Network (RNN) implementations in DL4J.

This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.

The Basics: Data and Network Configuration

DL4J currently supports the following types of recurrent neural network

RNN ("vanilla" RNN)
LSTM (Long Short-Term Memory)

Java documentation for each is available: , .

Data for RNNs

Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].

Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.

When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.

RnnOutputLayer

RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.

Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:

Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.

RNN Training Features

Truncated Back Propagation Through Time

Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.

Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).

Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:

For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.

In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.

The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.

Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)

The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.

Some things of note:

By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).
The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.
The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)

Masking: One-to-Many, Many-to-One, and Sequence Classification

DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).

Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:

Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.

The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).

Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.

Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.

For a single example, the input and output masking arrays are shown below:

For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.

In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.

Evaluation and Scoring with Masking

Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.

Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:

where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.

Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.

Masking and Sequence Classification After Training

Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).

However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't actually label data anywhere except for the last time step.

Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?

To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:

Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.

The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).

In this 'variable length' case, we need to get the last time step for each example separately. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the timeSeriesLength in the above code with the length of that example.

If we don't have the lengths of the time series directly, we need to extract them from the mask array.

If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):

Alternatively, if we have only the features mask: One quick and dirty approach is to use this:

To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).

In either case, we can then do the following:

Combining RNN Layers with Other Layer Types

RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.

For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: .inputPreProcessor(2, new RnnToFeedForwardPreProcessor()).

Inference: Predictions One Step at a Time

As with other types of neural networks, predictions can be generated for RNNs using the MultiLayerNetwork.output() and MultiLayerNetwork.feedForward() methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.

Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.

For these situations, MultiLayerNetwork provides four methods of note:

rnnTimeStep(INDArray)
rnnClearPreviousState()
rnnGetPreviousState(int layer)
rnnSetPreviousState(int layer, Map<String,INDArray> state)

The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.

In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:

Generate output/predictions (forward pass), using the previous stored state (if any)
Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)

For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input). If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.

Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:

For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:

There are a number of important differences here:

In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data
The forward pass is thus a single time step (as compared to the hundreds – or more)
After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.

However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the MultiLayerNetwork.rnnClearPreviousState() method. This will reset the internal state of all recurrent layers in the network.

If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is not saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.

Some other points of note:

We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.
If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.
The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:
- For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]
- For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.
It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.
The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.
The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).
The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.

Loading Time Series Data

Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.

The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file. This method also supports:

Variable length time series input
One-to-many and many-to-one data loading (where input and labels are in different files)
Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])
Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)

Note that in all cases, each line in the data files represents one time step.

Example 1: Time Series of Same Length, Input and Labels in Separate Files

Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).

This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).

In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.

Finally, we can create our SequenceRecordReaderdataSetIterator:

This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.

The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.

Note that:

For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.
- Labels data: one value per line, as a class index
- Label data will be converted to a one-hot representation automatically
For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.
- The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)
- No processing of the labels is done when regression = true

Example 2: Time Series of Same Length, Input and Labels in Same File

Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.

As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)

In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":

miniBatchSize and numPossibleLabels are the same as the previous example. Here, labelIndex specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).

For regression on a single output value, we use:

Again, the numPossibleLabels argument is not used for regression.

Example 3: Time Series of Different Lengths (Many-to-Many)

Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.

We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:

The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:

That the time series may be of different lengths
To align the input and labels - for each example individually - such that their last values occur at the same time step.

Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.

Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.

Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.

Example 4: Many-to-One and One-to-Many Data

We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:

Input and labels are in separate delimited files
The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)
The input lengths may (optionally) differ between examples

In fact, the same approach as in example 3 can do this:

Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):

The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.

Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:

Available layers

LSTM

RnnLossLayer

Recurrent Neural Network Loss Layer. Handles calculation of gradients etc for various objective (loss) time distributed dense component here. Consequently, the output activations size is equal to the input size. Input and output activations are same as other RNN layers: 3 dimensions with shape [miniBatchSize,nIn,timeSeriesLength] and [miniBatchSize,nOut,timeSeriesLength] respectively. Note that RnnLossLayer also has the option to configure an activation function

setNIn

param lossFunction Loss function for the loss layer

RnnOutputLayer

and labels of shape [minibatch,nOut,sequenceLength]. It also supports mask arrays. Note that RnnOutputLayer can also be used for 1D CNN layers, which also have [minibatch,nOut,sequenceLength] activations/labels shape.

build

param lossFunction Loss function for the output layer

Bidirectional

Bidirectional is a “wrapper” layer: it wraps any uni-directional RNN layer to make it bidirectional. Note that multiple different modes are supported - these specify how the activations should be combined from the forward and separate copies of the wrapped RNN layer, each with separate parameters.

getNOut

This Mode enumeration defines how the activations for the forward and backward networks should be combined. ADD: out = forward + backward (elementwise addition) MUL: out = forward backward (elementwise multiplication) AVERAGE: out = 0.5 (forward + backward) CONCAT: Concatenate the activations. Where ‘forward’ is the activations for the forward RNN, and ‘backward’ is the activations for the backward RNN. In all cases except CONCAT, the output activations size is the same size as the standard RNN that is being wrapped by this layer. In the CONCAT case, the output activations size (dimension 1) is 2x larger than the standard RNN’s activations array.

getUpdaterByParam

Get the updater for the given parameter. Typically the same updater will be used for all updaters, but this is not necessarily the case

param paramName Parameter name
return IUpdater for the parameter

LastTimeStep

LastTimeStep is a “wrapper” layer: it wraps any RNN (or CNN1D) layer, and extracts out the last time step during forward pass, and returns it as a row vector (per example). That is, for 3d (time series) input (with shape [minibatch, layerSize, timeSeriesLength]), we take the last time step and return it as a 2d array with shape [minibatch, layerSize]. Note that the last time step operation takes into account any mask arrays, if present: thus, variable length time series (in the same minibatch) are handled as expected here.

SimpleRnn

activationFn( in_t inWeight + out_(t-1) recurrentWeights + bias)}.

Note that other architectures (LSTM, etc) are usually much more effective, especially for longer time series; however SimpleRnn is very fast to compute, and hence may be considered where the length of the temporal dependencies in the dataset are only a few steps long.

Updaters/Optimizers

Special algorithms for gradient descent.

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

Available updaters

NadamUpdater

The Nadam updater.

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NesterovsUpdater

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

Get the nesterov update

param gradient the gradient to get the update for
param iteration
return

RmsPropUpdater

RMS Prop updates:

AdaGradUpdater

Vectorized Learning Rate used per Connection Weight

applyUpdater

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

param gradient the gradient to get learning rates for
param iteration

AdaMaxUpdater

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NoOpUpdater

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

AdaDeltaUpdater

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

Get the updated gradient for the given gradient and also update the state of ada delta.

param gradient the gradient to get the updated gradient for
param iteration
return the update gradient

SgdUpdater

SGD updater applies a learning rate only

GradientUpdater

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

AMSGradUpdater

Vertices

Computation graph nodes for advanced configuration.

What is a vertex?

In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a ComputationGraph. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.

Available Vertices

L2NormalizeVertex

L2NormalizeVertex performs L2 normalization on a single input.

L2Vertex

L2Vertex calculates the L2 least squares error of two inputs.

For example, in Triplet Embedding you can input an anchor and a pos/neg class and use two parallel L2 vertices to calculate two real numbers which can be fed into a LossLayer to calculate TripletLoss.

PoolHelperVertex

A custom layer for removing the first column and row from an input. This is meant to allow importation of Caffe’s GoogLeNet from .

ReshapeVertex

Adds the ability to reshape and flatten the tensor in the computation graph. This is the equivalent to the next layer. ReshapeVertex also ensures the shape is valid for the backward pass.

ScaleVertex

A ScaleVertex is used to scale the size of activations of a single layer For example, ResNet activations can be scaled in repeating blocks to keep variance under control.

ShiftVertex

A ShiftVertex is used to shift the activations of a single layer One could use it to add a bias or as part of some other calculation. For example, Highway Layers need them in two places. One, it’s often useful to have the gate weights have a large negative bias. (Of course for this, we could just initialize the biases that way.) But, also it needs to do this: (1-sigmoid(weight input + bias)) () input + sigmoid(weight input + bias) () activation(w2 input + bias) (() is hadamard product) So, here, we could have

a DenseLayer that does the sigmoid
a ScaleVertex(-1) and
a ShiftVertex(1) to accomplish that.

StackVertex

StackVertex allows for stacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where shared parameters are not supported by the network.

This vertex will automatically stack all available inputs.

UnstackVertex

UnstackVertex allows for unstacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where embeddings can be separated and run through subsequent layers.

Works similarly to SubsetVertex, except on dimension 0 of the input. stackSize is explicitly defined by the user to properly calculate an step.

ReverseTimeSeriesVertex

ReverseTimeSeriesVertex is used in recurrent neural networks to revert the order of time series. As a result, the last time step is moved to the beginning of the time series and the first time step is moved to the end. This allows recurrent layers to backward process time series.

Masks: The input might be masked (to allow for varying time series lengths in one minibatch). In this case the present input (mask array = 1) will be reverted in place and the padding (mask array = 0) will be left untouched at the same place. For a time series of length n, this would normally mean, that the first n time steps are reverted and the following padding is left untouched, but more complex masks are supported (e.g. [1, 0, 1, 0, …].

setBackpropGradientsViewArray

Gets the current mask array from the provided input

return The mask or null, if no input was provided

Word2vec/Glove/Doc2Vec

Neural word embeddings for NLP in DL4J.

Word2Vec, Doc2vec & GloVe: Neural Word Embeddings for Natural Language Processing

Contents

Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a , it turns text into a numerical form that deep nets can understand. .

Word2vec's applications extend beyond parsing sentences in the wild. It can be applied just as well to in which patterns may be discerned.

Why? Because words are simply discrete states like the other data mentioned above, and we are simply looking for the transitional probabilities between those states: the likelihood that they will co-occur. So gene2vec, like2vec and follower2vec are all possible. With that in mind, the tutorial below will help you understand how to create neural embeddings for any group of discrete and co-occurring states.

The purpose and usefulness of Word2vec is to group the vectors of similar words together in vectorspace. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention.

Given enough data, usage and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to establish a word's association with other words (e.g. "man" is to "boy" what "woman" is to "girl"), or cluster documents and classify them by topic. Those clusters can form the basis of search, sentiment analysis and recommendations in such diverse fields as scientific research, legal discovery, e-commerce and customer relationship management.

The output of the Word2vec neural net is a vocabulary in which each item has a vector attached to it, which can be fed into a deep-learning net or simply queried to detect relationships between words.

Here's a list of words associated with "Sweden" using Word2vec, in order of proximity:

The nations of Scandinavia and several wealthy, northern European, Germanic countries are among the top nine.

The vectors we use to represent words are called neural word embeddings, and representations are strange. One thing describes another, even though those two things are radically different. As Elvis Costello said: "Writing about music is like dancing about architecture." Word2vec "vectorizes" about words, and by doing so it makes natural language computer-readable -- we can start to perform powerful mathematical operations on words to detect their similarities.

So a neural word embedding represents a word with numbers. It's a simple, yet unlikely, translation.

It does so in one of two ways, either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram. We use the latter method because it produces more accurate results on large datasets.

When the feature vector assigned to a word cannot be used to accurately predict that word's context, the components of the vector are adjusted. Each word's context in the corpus is the teacher sending error signals back to adjust the feature vector. The vectors of words judged similar by their context are nudged closer together by adjusting the numbers in the vector.

Just as Van Gogh's painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.

Those numbers locate each word as a point in 500-dimensional vectorspace. Spaces of more than three dimensions are difficult to visualize. (Geoff Hinton, teaching people to imagine 13-dimensional space, suggests that students first picture 3-dimensional space and then say to themselves: "Thirteen, thirteen, thirteen." :)

A well trained set of word vectors will place similar words close to each other in that space. The words oak, elm and birch might cluster in one corner, while war, conflict and strife huddle together in another.

Similar things and ideas are shown to be "close". Their relative meanings have been translated to measurable distances. Qualities become quantities, and algorithms can do their work. But similarity is just the basis of many associations that Word2vec can learn. For example, it can gauge relations between words of one language, and map them to another.

These vectors are the basis of a more comprehensive geometry of words. As shown in the graph, capital cities such as Rome, Paris, Berlin and Beijing cluster near each other, and they will each have similar distances in vectorspace to their countries; i.e. Rome - Italy = Beijing - China. If you only knew that Rome was the capital of Italy, and were wondering about the capital of China, then the equation Rome -Italy + China would return Beijing. No kidding.

Let's look at some other associations Word2vec can produce.

Instead of the pluses, minus and equals signs, we'll give you the results in the notation of logical analogies, where : means "is to" and :: means "as"; e.g. "Rome is to Italy as Beijing is to China" = Rome:Italy::Beijing:China. In the last spot, rather than supplying the "answer", we'll give you the list of words that a Word2vec model proposes, when given the first three elements:

Geopolitics: Iraq - Violence = Jordan
Distinction: Human - Animal = Ethics
President - Power = Prime Minister
Library - Books = Hall
Analogy: Stock Market ≈ Thermometer

By building a sense of one word's proximity to other similar words, which do not necessarily contain the same letters, we have moved beyond hard tokens to a smoother and more general sense of meaning.

Here are Deeplearning4j's natural-language processing components:

SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.
Tokenizer/TokenizerFactory: Used in tokenizing the text. In NLP terms, a sentence is represented as a series of tokens. A TokenizerFactory creates an instance of a tokenizer for a "sentence."
VocabCache: Used for tracking metadata including word counts, document occurrences, the set of tokens (not vocab in this case, but rather tokens that have occurred), vocab (the features included in both bag of words as well as the word vector lookup table)
Inverted Index: Stores metadata about where words occurred. Can be used for understanding the dataset. A Lucene index with the Lucene implementation[1] is automatically created.

Loading Data

Now create and name a new class in Java. After that, you'll take the raw sentences in your .txt file, traverse them with your iterator, and subject them to some sort of preprocessing, such as converting all words to lowercase.

If you want to load a text file besides the sentences provided in our example, you'd do this:

That is, get rid of the ClassPathResource and feed the absolute path of your .txt file into the LineSentenceIterator.

In bash, you can find the absolute file path of any directory by typing pwd in your command line from within that same directory. To that path, you'll add the file name and voila.

Tokenizing the Data

Word2vec needs to be fed words rather than whole sentences, so the next step is to tokenize the data. To tokenize a text is to break it up into its atomic units, creating a new token each time you hit a white space, for example.

That should give you one word per line.

Training the Model

Now that the data is ready, you can configure the Word2vec neural net and feed in the tokens.

This configuration accepts a number of hyperparameters. A few require some explanation:

batchSize is the amount of words you process at a time.
minWordFrequency is the minimum number of times a word must appear in the corpus. Here, if it appears less than 5 times, it is not learned. Words must appear in multiple contexts to learn useful features about them. In very large corpora, it's reasonable to raise the minimum.
useAdaGrad - Adagrad creates a different gradient for each feature. Here we are not concerned with that.
layerSize specifies the number of features in the word vector. This is equal to the number of dimensions in the featurespace. Words represented by 500 features become points in a 500-dimensional space.
learningRate is the step size for each update of the coefficients, as words are repositioned in the feature space.
minLearningRate is the floor on the learning rate. Learning rate decays as the number of words you train on decreases. If learning rate shrinks too much, the net's learning is no longer efficient. This keeps the coefficients moving.
iterate tells the net what batch of the dataset it's training on.
tokenizer feeds it the words from the current batch.
vec.fit() tells the configured net to begin training.

The next step is to evaluate the quality of your feature vectors.

The line vec.similarity("word1","word2") will return the cosine similarity of the two words you enter. The closer it is to 1, the more similar the net perceives those words to be (see the Sweden-Norway example above). For example:

With vec.wordsNearest("word1", numWordsNearest), the words printed to the screen allow you to eyeball whether the net has clustered semantically similar words. You can set the number of nearest words you want with the second parameter of wordsNearest. For example:

Saving, Reloading & Using the Model

You'll want to save the model. The normal way to save models in Deeplearning4j is via the serialization utils (Java serialization is akin to Python pickling, converting an object into a series of bytes).

This will save the vectors to a file called pathToSaveModel.txt that will appear in the root of the directory where Word2vec is trained. The output in the file should have one word per line, followed by a series of numbers that together are its vector representation.

To keep working with the vectors, simply call methods on vec like this:

The classic example of Word2vec's arithmetic of words is "king - queen = man - woman" and its logical extension "king - queen + woman = man".

The example above will output the 10 nearest words to the vector king - queen + woman, which should include man. The first parameter for wordsNearest has to include the "positive" words king and woman, which have a + sign associated with them; the second parameter includes the "negative" word queen, which is associated with the minus sign (positive and negative here have no emotional connotation); the third is the length of the list of nearest words you would like to see. Remember to add this to the top of the file: import java.util.Arrays;.

Any number of combinations is possible, but they will only return sensible results if the words you query occurred with enough frequency in the corpus. Obviously, the ability to return similar words (or documents) is at the foundation of both search and recommendation engines.

You can reload the vectors into memory like this:

You can then use Word2vec as a lookup table:

If the word isn't in the vocabulary, Word2vec returns zeros.

Remember to add import java.io.File; to your imported packages.

Words are read into the vector one at a time, and scanned back and forth within a certain range. Those ranges are n-grams, and an n-gram is a contiguous sequence of n items from a given linguistic sequence; it is the nth version of unigram, bigram, trigram, four-gram or five-gram. A skip-gram simply drops items from the n-gram.

The skip-gram representation popularized by Mikolov and used in the DL4J implementation has proven to be more accurate than other models, such as continuous bag of words, due to the more generalizable contexts generated.

This n-gram is then fed into a neural network to learn the significance of a given word vector; i.e. significance is defined as its usefulness as an indicator of certain larger meanings, or labels.

Q: I get a lot of stack traces like this

A: Look inside the directory where you started your Word2vec application. This can, for example, be an IntelliJ project home directory or the directory where you typed Java at the command line. It should have some directories that look like:

You can shut down your Word2vec application and try to delete them.

Q: Not all of the words from my raw text data are appearing in my Word2vec object…

A: Try to raise the layer size via .layerSize() on your Word2Vec object like so

Q: How do I load my data? Why does training take forever?

A: If all of your sentences have been loaded as one sentence, Word2vec training could take a very long time. That's because Word2vec is a sentence-level algorithm, so sentence boundaries are very important, because co-occurrence statistics are gathered sentence by sentence. (For GloVe, sentence boundaries don't matter, because it's looking at corpus-wide co-occurrence. For many corpora, average sentence length is six words. That means that with a window size of 5 you have, say, 30 (random number here) rounds of skip-gram calculations. If you forget to specify your sentence boundaries, you may load a "sentence" that's 10,000 words long. In that case, Word2vec would attempt a full skip-gram cycle for the whole 10,000-word "sentence". In DL4J's implementation, a line is assumed to be a sentence. You need plug in your own SentenceIterator and Tokenizer. By asking you to specify how your sentences end, DL4J remains language-agnostic. UimaSentenceIterator is one way to do that. It uses OpenNLP for sentence boundary detection.

Q: Why is there such a difference in performance when feeding whole documents as one "sentence" vs splitting into Sentences?

_A:_If average sentence contains 6 words, and window size is 5, maximum theoretical number of 10 skipgram rounds will be achieved on 0 words. Sentence isn't long enough to have full window set with words. Rough maximum number of 5 sg rounds is available there for all words in such sentence.

But if your "sentence" is 1000k words length, you'll have 10 skipgram rounds for every word in this sentence, excluding the first 5 and last five. So, you'll have to spend WAY more time building model + cooccurrence statistics will be shifted due to the absense of sentence boundaries.

Q: How does Word2Vec Use Memory?

A: The major memory consumer in w2v is weights matrix. Math is simple there: NumberOfWords x NumberOfDimensions x 2 x DataType memory footprint.

So, if you build w2v model for 100k words using floats, and 100 dimensions, your memory footprint will be 100k x 100 x 2 x 4 (float size) = 80MB RAM just for matri + some space for strings, variables, threads etc.

If you load pre-built model, it uses roughly 2 times less RAM then during build time, so it's 40MB RAM.

And the most popular model used so far is Google News model. There's 3M words, and vector size 300. That gives us 3.6GB only to load model. And you have to add 3M of strings, that do not have constant size in java. So, usually that's something around 4-6GB for loaded model depending on jvm version/supplier, gc state and phase of the moon.

Q: I did everything you said and the results still don't look right.

A: Make sure you're not hitting into normalization issues. Some tasks, like wordsNearest(), use normalized weights by default, and others require non-normalized weights. Pay attention to this difference.

Word2Vec is especially useful in preparing text-based data for information retrieval and QA systems, which DL4J implements with deep autoencoders.

Marketers might seek to establish relationships among products to build a recommendation engine. Investigators might analyze a social graph to surface members of a single group, or other relations they might have to location or financial sponsorship.

Loading and saving GloVe models to word2vec can be done like so:

Weights update after model serialization/deserialization was added. That is, you can update model state with, say, 200GB of new text by calling loadFullModel, adding TokenizerFactory and SentenceIterator to it, and calling fit() on the restored model.
Option for multiple datasources for vocab construction was added.
Epochs and Iterations can be specified separately, although they are both typically "1".
Word2Vec.Builder has this option: hugeModelExpected. If set to true, the vocab will be periodically truncated during the build.
While minWordFrequency is useful for ignoring rare words in the corpus, any number of words can be excluded to customize.
Two new WordVectorsSerialiaztion methods have been introduced: writeFullModel and loadFullModel. These save and load a full model state.
A decent workstation should be able to handle a vocab with a few million words. Deeplearning4j's Word2vec imlementation can model a few terabytes of data on a single machine. Roughly, the math is: vectorSize * 4 * 3 * vocab.size().

Doc2vec & Other NLP Resources

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

see https://archive.ics.uci.edu/ml/datasets/Iris
param batch Batch size
param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

param batchSize the batch size of the examples
param numExamples the overall number of examples
param imgDim an array of height, width and channels
param numLabels the overall number of examples
param useSubset use a subset of the LFWDataSet
param labelGenerator path label generator to use
param train true if use train value
param splitTrainTest the percentage to split data for train and remainder goes to test
param imageTransform how to transform the image
param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes
MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z
BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)
LETTERS: 145,600 examples total. 26 balanced classes
DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

param dataSet Dataset (subset) to return
param batchSize Batch size
param train If true: use training set. If false: use test set
param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

param dataSet Subset to get
return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

param dataSet Subset to get
return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

param recordReader Record reader to use as the source of data
param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

param recordReader RecordReader: provides the source of the data
param batchSize Batch size (number of examples) for the output DataSet objects
param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()
param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

param labelIndexFrom Column index of the first regression target (indexes start at 0)
param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1
param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

param featuresReader SequenceRecordReader for the features
param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1
param miniBatchSize Minibatch size for each call of next()
param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

param iterator Underlying iterator to wrap and fetch asynchronously from
param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)

param baseIterator
param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches
param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public MultiDataSet next(int num)

This method returns test iterator instance

return

AsyncShieldDataSetIterator

[source]

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)

param numMiniBatches Number of minibatches per epoch
param features Each triple in the list specifies the shape, array order and type of values for the features arrays
param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)

param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

param shape Shape of the array
param values Values to fill the array with
return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

param shape Shape of the array
param order Order of array (‘c’ or ‘f’)
param values Values to fill the array with
return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap
param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

param baseIterator - iterator to be wrapped and split
param totalBatches - total batches in baseIterator
param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public DataSet next(int i)

This method returns test iterator instance

return

JointMultiDataSetIterator

[source]

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)

param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)

param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet
param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)

param files List of files to iterate over
param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)

param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

param numMiniBatches Number of minibatches per epoch
param featuresShape Features shape
param labelsShape Labels shape
param featureValues Type of values for the features
param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator