arrow-left

All pages
gitbookPowered by GitBook
1 of 16

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

hashtag
Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

hashtag
Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

hashtag
Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

And initialize another VGG16 model with weights trained on VGGFace:

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

hashtag
What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this .

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

hashtag
Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

hashtag
Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

hashtag
Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J .

hashtag
Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see .

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-zoo</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>
  • deeplearning4j-zoo Github linkarrow-up-right
    AlexNetarrow-up-right
    Darknet19arrow-up-right
    FaceNetNN4Small2arrow-up-right
    herearrow-up-right
    this section
    import org.deeplearning4j.zoo.model.AlexNet
    import org.deeplearning4j.zoo.*;
    
    ...
    
    int numberOfClassesInYourData = 1000;
    int randomSeed = 123;
    
    ZooModel zooModel = AlexNet.builder()
                    .numClasses(numberOfClassesInYourData)
                    .seed(randomSeed)
                    .build();
    Model net = zooModel.init();
    ZooModel zooModel = AlexNet.builder()
                    .numClasses(numberOfClassesInYourData)
                    .seed(randomSeed)
                    .build();
    MultiLayerConfiguration net = ((AlexNet) zooModel).conf();
    import org.deeplearning4j.zoo.model.VGG16;
    import org.deeplearning4j.zoo.*;
    
    ...
    
    ZooModel zooModel = VGG16.builder().build();;
    Model net = zooModel.initPretrained(PretrainedType.IMAGENET);
    ZooModel zooModel = VGG16.builder().build();
    Model net = zooModel.initPretrained(PretrainedType.VGGFACE);
    int numberOfClassesInYourData = 10;
    int randomSeed = 123;
    
    ZooModel zooModel = ResNet50.builder()
            .numClasses(numberOfClassesInYourData)
            .seed(randomSeed)
            .build();
    zooModel.setInputShape(new int[][]{{3, 28, 28}});
    InceptionResNetV1arrow-up-right
    LeNetarrow-up-right
    ResNet50arrow-up-right
    SimpleCNNarrow-up-right
    TextGenerationLSTMarrow-up-right
    TinyYOLOarrow-up-right
    VGG16arrow-up-right
    VGG19arrow-up-right

    Multi Layer Network

    Simple and sequential network configuration.

    The MultiLayerNetwork class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph.

    You will not want to use MultiLayerNetwork configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.

    hashtag
    Usage

    The example below shows how to build a simple linear classifier using DenseLayer (a basic multiperceptron layer).

    You can also create convolutional configurations:

    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .seed(seed)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .learningRate(learningRate)
        .updater(Updater.NESTEROVS).momentum(0.9)
        .list()
        .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
                .weightInit(WeightInit.XAVIER)
                .activation("relu")
                .build())
        .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
                .weightInit(WeightInit.XAVIER)
                .activation("softmax").weightInit(WeightInit.XAVIER)
                .nIn(numHiddenNodes).nOut(numOutputs).build())
        .pretrain(false).backprop(true).build();
    MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
        .seed(seed)
        .regularization(true).l2(0.0005)
        .learningRate(0.01)
        .weightInit(WeightInit.XAVIER)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(Updater.NESTEROVS).momentum(0.9)
        .list()
        .layer(0, new ConvolutionLayer.Builder(5, 5)
                //nIn and nOut specify depth. nIn here is the nChannels and nOut is the number of filters to be applied
                .nIn(nChannels)
                .stride(1, 1)
                .nOut(20)
                .activation("identity")
                .build())
        .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2,2)
                .stride(2,2)
                .build())
        .layer(2, new ConvolutionLayer.Builder(5, 5)
                //Note that nIn need not be specified in later layers
                .stride(1, 1)
                .nOut(50)
                .activation("identity")
                .build())
        .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2,2)
                .stride(2,2)
                .build())
        .layer(4, new DenseLayer.Builder().activation("relu")
                .nOut(500).build())
        .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(outputNum)
                .activation("softmax")
                .build());

    Vertices

    Computation graph nodes for advanced configuration.

    hashtag
    What is a vertex?

    In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a ComputationGraph. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.

    hashtag
    Available Vertices

    hashtag
    L2NormalizeVertex

    L2NormalizeVertex performs L2 normalization on a single input.

    hashtag
    L2Vertex

    L2Vertex calculates the L2 least squares error of two inputs.

    For example, in Triplet Embedding you can input an anchor and a pos/neg class and use two parallel L2 vertices to calculate two real numbers which can be fed into a LossLayer to calculate TripletLoss.

    hashtag
    PoolHelperVertex

    A custom layer for removing the first column and row from an input. This is meant to allow importation of Caffe’s GoogLeNet from .

    hashtag
    ReshapeVertex

    Adds the ability to reshape and flatten the tensor in the computation graph. This is the equivalent to the next layer. ReshapeVertex also ensures the shape is valid for the backward pass.

    hashtag
    ScaleVertex

    A ScaleVertex is used to scale the size of activations of a single layer For example, ResNet activations can be scaled in repeating blocks to keep variance under control.

    hashtag
    ShiftVertex

    A ShiftVertex is used to shift the activations of a single layer One could use it to add a bias or as part of some other calculation. For example, Highway Layers need them in two places. One, it’s often useful to have the gate weights have a large negative bias. (Of course for this, we could just initialize the biases that way.) But, also it needs to do this: (1-sigmoid(weight input + bias)) () input + sigmoid(weight input + bias) () activation(w2 input + bias) (() is hadamard product) So, here, we could have

    1. a DenseLayer that does the sigmoid

    2. a ScaleVertex(-1) and

    3. a ShiftVertex(1) to accomplish that.

    hashtag
    StackVertex

    StackVertex allows for stacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where shared parameters are not supported by the network.

    This vertex will automatically stack all available inputs.

    hashtag
    UnstackVertex

    UnstackVertex allows for unstacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where embeddings can be separated and run through subsequent layers.

    Works similarly to SubsetVertex, except on dimension 0 of the input. stackSize is explicitly defined by the user to properly calculate an step.

    hashtag
    ReverseTimeSeriesVertex

    ReverseTimeSeriesVertex is used in recurrent neural networks to revert the order of time series. As a result, the last time step is moved to the beginning of the time series and the first time step is moved to the end. This allows recurrent layers to backward process time series.

    Masks: The input might be masked (to allow for varying time series lengths in one minibatch). In this case the present input (mask array = 1) will be reverted in place and the padding (mask array = 0) will be left untouched at the same place. For a time series of length n, this would normally mean, that the first n time steps are reverted and the following padding is left untouched, but more complex masks are supported (e.g. [1, 0, 1, 0, …].

    setBackpropGradientsViewArray

    Gets the current mask array from the provided input

    • return The mask or null, if no input was provided

    Auto Encoders

    hashtag
    What are autoencoders?

    Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

    hashtag

    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    Where’s Restricted Boltzmann Machine?

    RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

    hashtag
    Supported layers

    hashtag
    AutoEncoder

    [source]arrow-up-right

    Autoencoder layer. Adds noise to input and learn a reconstruction function.

    corruptionLevel

    Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

    sparsity

    Autoencoder sparity parameter

    • param sparsity Sparsity

    hashtag
    VariationalAutoencoder

    [source]arrow-up-right

    Variational Autoencoder layer

    See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes - https://arxiv.org/abs/1312.6114arrow-up-right

    This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

    A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

    encoderLayerSizes

    Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

    setEncoderLayerSizes

    Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

    • param encoderLayerSizes Size of each encoder layer in the variational autoencoder

    decoderLayerSizes

    Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

    • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

    setDecoderLayerSizes

    Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

    • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

    reconstructionDistribution

    The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:

    • {- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data

    • {- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data

    • param distribution Reconstruction distribution

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    pzxActivationFn

    Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

    • param activationFunction Activation function for p(z| x)

    pzxActivationFunction

    Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

    • param activation Activation function for p(z | x)

    nOut

    Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

    • param nOut Size of P(Z | data) and output size

    numSamples

    Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

    This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

    • param numSamples Number of samples per data point for pretraining

    public void setBackpropGradientsViewArray(INDArray backpropGradientsViewArray)
    public Builder corruptionLevel(double corruptionLevel)
    public Builder sparsity(double sparsity)
    public Builder encoderLayerSizes(int... encoderLayerSizes)
    public void setEncoderLayerSizes(int... encoderLayerSizes)
    public Builder decoderLayerSizes(int... decoderLayerSizes)
    public void setDecoderLayerSizes(int... decoderLayerSizes)
    public Builder reconstructionDistribution(ReconstructionDistribution distribution)
    public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)
    public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)
    public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)
    public Builder pzxActivationFn(IActivation activationFunction)
    public Builder pzxActivationFunction(Activation activation)
    public Builder nOut(int nOut)
    public Builder numSamples(int numSamples)

    Model Listeners

    Adding hooks and listeners on DL4J models.

    hashtag
    What are listeners?

    Listeners allow users to "hook" into certain events in Eclipse Deeplearning4j. This allows you to collect or print information useful for tasks like training. For example, a ScoreIterationListener allows you to print training scores from the output layer of a neural network.

    hashtag
    Usage

    To add one or more listeners to a MultiLayerNetwork or ComputationGraph, use the addListener method:

    hashtag
    Available listeners

    hashtag
    EvaluativeListener

    This TrainingListener implementation provides simple way for model evaluation during training. It can be launched every Xth Iteration/Epoch, depending on frequency and InvocationType constructor arguments

    EvaluativeListener

    This callback will be invoked after evaluation finished

    iterationDone

    • param iterator Iterator to provide data for evaluation

    • param frequency Frequency (in number of iterations/epochs according to the invocation type) to perform evaluation

    • param type Type of value for ‘frequency’ - iteration end, epoch end, etc

    hashtag
    ScoreIterationListener

    Score iteration listener. Reports the score (value of the loss function )of the network during training every N iterations

    ScoreIterationListener

    • param printIterations frequency with which to print scores (i.e., every printIterations parameter updates)

    hashtag
    ComposableIterationListener

    A group of listeners

    hashtag
    CollectScoresIterationListener

    CollectScoresIterationListener simply stores the model scores internally (along with the iteration) every 1 or N iterations (this is configurable). These scores can then be obtained or exported.

    CollectScoresIterationListener

    Constructor for collecting scores with default saving frequency of 1

    iterationDone

    Constructor for collecting scores with the specified frequency.

    • param frequency Frequency with which to collect/save scores

    exportScores

    Export the scores in tab-delimited (one per line) UTF-8 format.

    exportScores

    Export the scores in delimited (one per line) UTF-8 format with the specified delimiter

    • param outputStream Stream to write to

    • param delimiter Delimiter to use

    exportScores

    Export the scores to the specified file in delimited (one per line) UTF-8 format, tab delimited

    • param file File to write to

    exportScores

    Export the scores to the specified file in delimited (one per line) UTF-8 format, using the specified delimiter

    • param file File to write to

    • param delimiter Delimiter to use for writing scores

    hashtag
    CheckpointListener

    CheckpointListener: The goal of this listener is to periodically save a copy of the model during training.. Model saving may be done:

    1. Every N epochs

    2. Every N iterations

    3. Every T time units (every 15 minutes, for example) Or some combination of the 3. Example 1: Saving a checkpoint every 2 epochs, keep all model files

    Example 2: Saving a checkpoint every 1000 iterations, but keeping only the last 3 models (all older model files will be automatically deleted)

    Example 3: Saving a checkpoint every 15 minutes, keeping the most recent 3 and otherwise every 4th checkpoint file:

    Note that you can mix these: for example, to save every epoch and every 15 minutes (independent of last save time): To save every epoch, and every 15 minutes, since the last model save use: Note that is this last example, the sinceLast parameter is true. This means the 15-minute counter will be reset any time a model is saved.

    CheckpointListener

    List all available checkpoints. A checkpoint is ‘available’ if the file can be loaded. Any checkpoint files that have been automatically deleted (given the configuration) will not be returned here.

    • return List of checkpoint files that can be loaded

    hashtag
    SharedGradient

    hashtag
    SleepyTrainingListener

    This TrainingListener implementation provides a way to “sleep” during specific Neural Network training phases. Suitable for debugging/testing purposes only.

    PLEASE NOTE: All timers treat time values as milliseconds. PLEASE NOTE: Do not use it in production environment.

    onEpochStart

    In this mode parkNanos() call will be used, to make process really idle

    hashtag
    CollectScoresListener

    A simple listener that collects scores to a list every N iterations. Can also optionally log the score.

    hashtag
    PerformanceListener

    Simple IterationListener that tracks time spend on training per iteration.

    PerformanceListener

    This method defines, if iteration number should be reported together with other data

    • param reportIteration

    • return

    hashtag
    ParamAndGradientIterationListener

    An iteration listener that provides details on parameters and gradients at each iteration during traning. Attempts to provide much of the same information as the UI histogram iteration listener, but in a text-based format (for example, when learning on a system accessed via SSH etc). i.e., is intended to aid network tuning and debugging This iteration listener is set up to calculate mean, min, max, and mean absolute value of each type of parameter and gradient in the network at each iteration.

    hashtag
    TimeIterationListener

    Time Iteration Listener. This listener displays into INFO logs the remaining time in minutes and the date of the end of the process. Remaining time is estimated from the amount of time for training so far, and the total number of iterations specified by the user

    TimeIterationListener

    Constructor

    • param iterationCount The global number of iteration for training (all epochs)

    Activations

    Special algorithms for gradient descent.

    hashtag
    What are activations?

    At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

    Saving and Loading Models

    Saving and loading of neural networks.

    MultiLayerNetwork and ComputationGraph both have save and load methods.

    You can save/load a MultiLayerNetwork using:

    Similarly, you can save/load a ComputationGraph using:

    Internally, these methods use the ModelSerializer class, which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multi layer network, the second one saves a .

    Here is a with code to save a computation graph using the ModelSerializer

    Convolutional Layers

    Also known as CNN.

    hashtag
    Available layers

    hashtag
    Convolution1D

    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    hashtag
    Usage

    The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

    hashtag
    Available activations

    hashtag
    ActivationRectifiedTanh

    [source]arrow-up-right

    Rectified tanh

    Essentially max(0, tanh(x))

    Underlying implementation is in native code

    hashtag
    ActivationELU

    [source]arrow-up-right

    f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

    alpha defaults to 1, if not specified

    hashtag
    ActivationReLU

    [source]arrow-up-right

    f(x) = max(0, x)

    hashtag
    ActivationRationalTanh

    [source]arrow-up-right

    Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3arrow-up-right

    f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

    Underlying implementation is in native code

    hashtag
    ActivationThresholdedReLU

    [source]arrow-up-right

    Thresholded RELU

    f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

    hashtag
    ActivationReLU6

    [source]arrow-up-right

    f(x) = min(max(input, cutoff), 6)

    hashtag
    ActivationHardTanH

    [source]arrow-up-right

    hashtag
    ActivationSigmoid

    [source]arrow-up-right

    f(x) = 1 / (1 + exp(-x))

    hashtag
    ActivationGELU

    [source]arrow-up-right

    GELU activation function - Gaussian Error Linear Units

    hashtag
    ActivationPReLU

    [source]arrow-up-right

    / Parametrized Rectified Linear Unit (PReLU)

    f(x) = alpha x for x < 0, f(x) = x for x >= 0

    alpha has the same shape as x and is a learned parameter.

    hashtag
    ActivationIdentity

    [source]arrow-up-right

    f(x) = x

    hashtag
    ActivationSoftSign

    [source]arrow-up-right

    f_i(x) = x_i / (1+

    x_i

    )

    hashtag
    ActivationHardSigmoid

    [source]arrow-up-right

    f(x) = min(1, max(0, 0.2x + 0.5))

    hashtag
    ActivationSoftmax

    [source]arrow-up-right

    f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

    hashtag
    ActivationCube

    [source]arrow-up-right

    f(x) = x^3

    hashtag
    ActivationRReLU

    [source]arrow-up-right

    f(x) = max(0,x) + alpha min(0, x)

    alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

    Empirical Evaluation of Rectified Activations in Convolutional Networkarrow-up-right

    hashtag
    ActivationTanH

    [source]arrow-up-right

    f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

    hashtag
    ActivationSELU

    [source]arrow-up-right

    https://arxiv.org/pdf/1706.02515.pdfarrow-up-right

    hashtag
    ActivationLReLU

    [source]arrow-up-right

    Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

    hashtag
    ActivationSwish

    [source]arrow-up-right

    f(x) = x sigmoid(x)

    hashtag
    ActivationSoftPlus

    [source]arrow-up-right

    f(x) = log(1+e^x)

    class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.

    hashtag
    RNG Seed

    If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:

    This will guarantee equal results between sessions/JVMs.

    hashtag
    ModelSerializer

    [source]arrow-up-right

    Utility class suited to save/restore neural net models

    writeModel

    Write a model to a file

    • param model the model to write

    • param file the file to write to

    • param saveUpdater whether to save the updater or not

    • throws IOException

    writeModel

    Write a model to a file

    • param model the model to write

    • param file the file to write to

    • param saveUpdater whether to save the updater or not

    • param dataNormalization the normalizer to save (optional)

    • throws IOException

    writeModel

    Write a model to a file path

    • param model the model to write

    • param path the path to write to

    • param saveUpdater whether to save the updater or not

    • throws IOException

    writeModel

    Write a model to an output stream

    • param model the model to save

    • param stream the output stream to write to

    • param saveUpdater whether to save the updater for the model or not

    • throws IOException

    writeModel

    Write a model to an output stream

    • param model the model to save

    • param stream the output stream to write to

    • param saveUpdater whether to save the updater for the model or not

    • param dataNormalization the normalizer ot save (may be null)

    • throws IOException

    restoreMultiLayerNetwork

    Load a multi layer network from a file

    • param file the file to load from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Load a multi layer network from a file

    • param file the file to load from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Load a MultiLayerNetwork from InputStream from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is the inputstream to load from

    • return the loaded multi layer network

    • throws IOException

    • see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

    restoreMultiLayerNetwork

    Restore a multi layer network from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is the input stream to restore from

    • return the loaded multi layer network

    • throws IOException

    • see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)

    restoreMultiLayerNetwork

    Load a MultilayerNetwork model from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreMultiLayerNetwork

    Load a MultilayerNetwork model from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Restore a MultiLayerNetwork and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is Input stream to read from

    • param loadUpdater Whether to load the updater from the model or not

    • return Model and normalizer, if present

    • throws IOException If an error occurs when reading from the stream

    restoreComputationGraph

    Load a computation graph from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a InputStream

    • param is the inputstream to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a InputStream

    • param is the inputstream to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a file

    • param file the file to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Restore a ComputationGraph and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is Input stream to read from

    • param loadUpdater Whether to load the updater from the model or not

    • return Model and normalizer, if present

    • throws IOException If an error occurs when reading from the stream

    taskByModel

    • param model

    • return

    addNormalizerToModel

    This method appends normalizer to a given persisted model.

    PLEASE NOTE: File should be model file saved earlier with ModelSerializer

    • param f

    • param normalizer

    addObjectToFile

    Add an object to the (already existing) model file using Java Object Serialization. Objects can be restored using {- link #getObjectFromFile(File, String)}

    • param f File to add the object to

    • param key Key to store the object under

    • param o Object to store using Java object serialization

    computation grapharrow-up-right
    basic examplearrow-up-right

    1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]

    hashtag
    Convolution2D

    [source]arrow-up-right

    2D convolution layer

    hashtag
    Convolution3D

    [source]arrow-up-right

    3D convolution layer configuration

    hasBias

    An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].

    kernelSize

    The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

    stride

    Set stride size for 3D convolutions in (depth, height, width) order

    • param stride kernel size

    • return 3D convolution layer builder

    padding

    Set padding size for 3D convolutions in (depth, height, width) order

    • param padding kernel size

    • return 3D convolution layer builder

    dilation

    Set dilation size for 3D convolutions in (depth, height, width) order

    • param dilation kernel size

    • return 3D convolution layer builder

    dataFormat

    The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

    • param dataFormat Data format to use for activations

    setKernelSize

    Set kernel size for 3D convolutions in (depth, height, width) order

    • param kernelSize kernel size

    setStride

    Set stride size for 3D convolutions in (depth, height, width) order

    • param stride kernel size

    setPadding

    Set padding size for 3D convolutions in (depth, height, width) order

    • param padding kernel size

    setDilation

    Set dilation size for 3D convolutions in (depth, height, width) order

    • param dilation kernel size

    hashtag
    Deconvolution2D

    [source]arrow-up-right

    2D deconvolution layer configuration

    Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.

    See the paper by Matt Zeiler for details: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdfarrow-up-right

    For an intuitive guide to convolution arithmetic and shapes, see: https://arxiv.org/abs/1603.07285v1arrow-up-right

    hasBias

    Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size

    convolutionMode

    Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details

    • param convolutionMode Convolution mode for layer

    kernelSize

    Size of the convolution rows/columns

    • param kernelSize the height and width of the kernel

    hashtag
    Cropping1D

    [source]arrow-up-right

    Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom

    getOutputType

    • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

    setCropping

    Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

    build

    • param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

    hashtag
    Cropping2D

    [source]arrow-up-right

    Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right

    getOutputType

    • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

    • param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations

    setCropping

    Cropping amount for top/bottom/left/right (in that order). A length 4 array.

    build

    • param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.

    hashtag
    Cropping3D

    [source]arrow-up-right

    Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.

    getOutputType

    • param cropDepth Amount of cropping to apply to both depth boundaries of the input activations

    • param cropHeight Amount of cropping to apply to both height boundaries of the input activations

    • param cropWidth Amount of cropping to apply to both width boundaries of the input activations

    setCropping

    Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

    build

    • param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

    [source]arrow-up-right
    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    model.init();
    //print the score with every 1 iteration
    model.setListeners(new ScoreIterationListener(1));
    public EvaluativeListener(@NonNull DataSetIterator iterator, int frequency)
    public void iterationDone(Model model, int iteration, int epoch)
    public ScoreIterationListener(int printIterations)
    public CollectScoresIterationListener()
    public void iterationDone(Model model, int iteration, int epoch)
    public void exportScores(OutputStream outputStream) throws IOException
    public void exportScores(OutputStream outputStream, String delimiter) throws IOException
    public void exportScores(File file) throws IOException
    public void exportScores(File file, String delimiter) throws IOException
    .keepAll() //Don't delete any models
    .saveEveryNEpochs(2)
    .build()
    }
    .keepLast(3)
    .saveEveryNIterations(1000)
    .build();
    }
    .keepLastAndEvery(3, 4)
    .saveEvery(15, TimeUnit.MINUTES)
    .build();
    }
    public CheckpointListener build()
    public void onEpochStart(Model model)
    public PerformanceListener build()
    public TimeIterationListener(int iterationCount)
    GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
        // add hyperparameters and other layers
        .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
        // add more layers and output
        .build();
              ⎧  1, if x >  1
     f(x) =   ⎨ -1, if x < -1
              ⎩  x, otherwise
    MultiLayerNetwork net = ...
    net.save(new File("...");
    
    MultiLayerNetwork net2 = MultiLayerNetwork.load(new File("..."), true);
    ComputationGraph net = ...
    net.save(new File("..."));
    
    ComputationGraph net2 = ComputationGraph.load(new File("..."), true);
     Nd4j.getRandom().setSeed(12345);
     ModelSerializer.restoreMultiLayerNetwork(modelFile);
    public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater) throws IOException
    public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater,DataNormalization dataNormalization) throws IOException
    public static void writeModel(@NonNull Model model, @NonNull String path, boolean saveUpdater) throws IOException
    public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater)
                throws IOException
    public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater,DataNormalization dataNormalization)
                throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file) throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file, boolean loadUpdater)
                throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is, boolean loadUpdater)
                throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is) throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path) throws IOException
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path, boolean loadUpdater)
                throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull String path) throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull String path, boolean loadUpdater)
                throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull InputStream is, boolean loadUpdater)
                throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull InputStream is) throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull File file) throws IOException
    public static ComputationGraph restoreComputationGraph(@NonNull File file, boolean loadUpdater) throws IOException
    public static Task taskByModel(Model model)
    public static void addNormalizerToModel(File f, Normalizer<?> normalizer)
    public static void addObjectToFile(@NonNull File f, @NonNull String key, @NonNull Object o)
    public boolean hasBias()
    public Builder kernelSize(int... kernelSize)
    public Builder stride(int... stride)
    public Builder padding(int... padding)
    public Builder dilation(int... dilation)
    public Builder dataFormat(DataFormat dataFormat)
    public void setKernelSize(int... kernelSize)
    public void setStride(int... stride)
    public void setPadding(int... padding)
    public void setDilation(int... dilation)
    public boolean hasBias()
    public Builder convolutionMode(ConvolutionMode convolutionMode)
    public Builder kernelSize(int... kernelSize)
    public InputType getOutputType(int layerIndex, InputType inputType)
    public void setCropping(int... cropping)
    public Cropping1D build()
    public InputType getOutputType(int layerIndex, InputType inputType)
    public void setCropping(int... cropping)
    public Cropping2D build()
    public InputType getOutputType(int layerIndex, InputType inputType)
    public void setCropping(int... cropping)
    public Cropping3D build()

    Zoo Models

    hashtag
    Available models

    hashtag
    AlexNet

    [source]arrow-up-right

    AlexNet

    Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References:

    Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.

    Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code

    hashtag
    Darknet19

    Darknet19 Reference: ImageNet weights for this model are available and have been converted from using .

    There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per .

    hashtag
    FaceNetNN4Small2

    A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

    hashtag
    InceptionResNetV1

    A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

    hashtag
    LeNet

    LeNet was an early promising achiever on the ImageNet dataset. References:

    MNIST weights for this model are available and have been converted from .

    hashtag
    NASNet

    Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.

    This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.

    Paper: ImageNet weights for this model are available and have been converted from .

    hashtag
    ResNet50

    Residual networks for deep learning.

    Paper: ImageNet weights for this model are available and have been converted from ;.

    hashtag
    SimpleCNN

    A simple convolutional network for generic image classification. Reference:

    hashtag
    SqueezeNet

    U-Net

    An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.

    Paper: ImageNet weights for this model are available and have been converted from .

    hashtag
    TextGenerationLSTM

    LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is

    Architecture follows this implementation:

    Walt Whitman weights are available for generating text from his works, adapted from .

    hashtag
    TinyYOLO

    Tiny YOLO Reference:

    ImageNet+VOC weights for this model are available and have been converted from using and the following code.

    String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);

    FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();

    ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();

    System.out.println(model.summary(InputType.convolutional(416, 416, 3)));

    ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>

    The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

    hashtag
    UNet

    U-Net

    An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

    Paper: Weights are available for image segmentation trained on a synthetic dataset

    hashtag
    VGG16

    VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition

    Deep Face Recognition

    ImageNet weights for this model are available and have been converted from . CIFAR-10 weights for this model are available and have been converted using “approach 2” from . VGGFace weights for this model are available and have been converted from .

    hashtag
    VGG19

    VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition ImageNet weights for this model are available and have been converted from .

    hashtag
    Xception

    U-Net

    An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.

    Paper: ImageNet weights for this model are available and have been converted from .

    hashtag
    YOLO2

    YOLOv2 Reference:

    ImageNet+COCO weights for this model are available and have been converted from using and the following code.

    The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

    pretrainedUrl

    Default prior boxes for the model

    Computation Graph

    How to build complex networks with DL4J computation graph.

    hashtag
    Building Complex Network Architectures with Computation Graph

    This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.

    hashtag

    http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdfarrow-up-right
    https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxtarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/pdf/1612.08242.pdfarrow-up-right
    https://pjreddie.com/darknet/imagenet/arrow-up-right
    https://github.com/allanzelener/YAD2Karrow-up-right
    https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.listarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1503.03832arrow-up-right
    http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdfarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1503.03832arrow-up-right
    http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdfarrow-up-right
    [source]arrow-up-right
    http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdfarrow-up-right
    https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxtarrow-up-right
    https://github.com/f00-/mnist-lenet-kerasarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1707.07012arrow-up-right
    https://keras.io/applications/arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1512.03385arrow-up-right
    https://keras.io/applications/</a&gtarrow-up-right
    [source]arrow-up-right
    https://github.com/oarriaga/face_classification/arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1602.07360arrow-up-right
    https://github.com/rcmalli/keras-squeezenet/arrow-up-right
    [source]arrow-up-right
    https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.pyarrow-up-right
    https://github.com/craigomac/InfiniteMonkeysarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/pdf/1612.08242.pdfarrow-up-right
    https://pjreddie.com/darknet/yoloarrow-up-right
    https://github.com/allanzelener/YAD2Karrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1505.04597arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1409.1556arrow-up-right
    http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdfarrow-up-right
    https://github.com/fchollet/keras/tree/1.1.2/keras/applicationsarrow-up-right
    https://github.com/rajatvikramsingh/cifar10-vgg16arrow-up-right
    https://github.com/rcmalli/keras-vggfacearrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1409.1556arrow-up-right
    https://github.com/fchollet/keras/tree/1.1.2/keras/applicationsarrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1610.02357arrow-up-right
    https://keras.io/applications/arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/pdf/1612.08242.pdfarrow-up-right
    https://pjreddie.com/darknet/yoloarrow-up-right
    https://github.com/allanzelener/YAD2Karrow-up-right
    String filename = “yolo.h5”; 
    KerasLayer.registerCustomLayer(“Lambda”, KerasSpaceToDepth.class); 
    ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false);
    INDArray priors = Nd4j.create(priorBoxes);
    FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
     .seed(seed)
     .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
     .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
     .gradientNormalizationThreshold(1.0)
     .updater(new Adam.Builder().learningRate(1e-3).build())
     .l2(0.00001)
     .activation(Activation.IDENTITY)
     .trainingWorkspaceMode(workspaceMode)
     .inferenceWorkspaceMode(workspaceMode)
     .build();
    ComputationGraph model = new TransferLearning.GraphBuilder(graph)
     .fineTuneConfiguration(fineTuneConf) 
     .addLayer(“outputs”, new Yolo2OutputLayer.Builder() 
                          .boundingBoxPriors(priors)
                          .build(), “conv2d_23”)
     .setOutputs(“outputs”)
     .build();
    System.out.println(model.summary(InputType.convolutional(608, 608, 3)));
    ModelSerializer.writeModel(model, “yolo2_dl4j_inference.v1.zip”, false); }
    public String pretrainedUrl(PretrainedType pretrainedType)
    Overview of Computation Graph

    DL4J has two types of networks comprised of multiple layers:

    • The MultiLayerNetworkarrow-up-right, which is essentially a stack of neural network layers (with a single input layer and single output layer), and

    • The ComputationGrapharrow-up-right, which allows for greater freedom in network architectures

    Specifically, the ComputationGraph allows for networks to be built with the following features:

    • Multiple network input arrays

    • Multiple network outputs (including mixed classification/regression architectures)

    • Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)

    As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.

    hashtag
    Computation Graph: Some Example Use Cases

    Examples of some architectures that can be built using ComputationGraph include:

    • Multi-task learning architectures

    • Recurrent neural networks with skip connections

    • GoogLeNetarrow-up-right, a complex type of convolutional netural network for image classification

    hashtag
    Configuring a Computation Graph

    hashtag
    Types of Graph Vertices

    The basic idea is that in the ComputationGraph, the core building block is the GraphVertexarrow-up-right, instead of layers. Layers (or, more accurately the LayerVertexarrow-up-right objects), are but one type of vertex in the graph. Other types of vertices include:

    • Input Vertices

    • Element-wise operation vertices

    • Merge vertices

    • Subset vertices

    • Preprocessor vertices

    These types of graph vertices are described briefly below.

    LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...) method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an InputPreProcessorarrow-up-right (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...) method.

    InputVertex: Input vertices are specified by the addInputs(String...) method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).

    ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.

    MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.

    SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1"): this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.

    PreProcessorVertex: Occasionally, you might want to the functionality of an InputPreProcessorarrow-up-right without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.

    Finally, it is also possible to define custom graph vertices by implementing both a configurationarrow-up-right and implementationarrow-up-right class for your custom GraphVertex.

    hashtag
    Example 1: Recurrent Network with Skip Connections

    Suppose we wish to build the following recurrent neural network architecture:

    For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:

    Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.

    hashtag
    Example 2: Multiple Inputs and Merge Vertex

    Consider the following architecture:

    Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.

    To build the above network, we use the following configuration:

    hashtag
    Example 3: Multi-Task Learning

    In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.

    In this case, the network configuration is:

    hashtag
    Automatically Adding PreProcessors and Calculating nIns

    One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...) method in the configuration.

    The setInputType method has two effects:

    1. It will automatically add any InputPreProcessorarrow-up-rights as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.

    2. It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...) functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.

    For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))

    hashtag
    Training Data for ComputationGraph

    There are two types of data that can be used with the ComputationGraph.

    hashtag
    DataSet and the DataSetIterator

    The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).

    A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see thisarrow-up-right for more details). A DataSetIterator is essentially an iterator over DataSet objects.

    hashtag
    MultiDataSet and the MultiDataSetIterator

    MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.

    There are currently two ways to use a MultiDataSetIterator:

    • By implementing the MultiDataSetIteratorarrow-up-right interface directly

    • By using the RecordReaderMultiDataSetIteratorarrow-up-right in conjuction with DataVec record readers

    The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:

    • Multiple DataVec RecordReaders may be used simultaneously

    • The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader

    • It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output

    • It is possible to convert single columns from a class index to a one-hot representation

    Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find these unit testsarrow-up-right to be useful.

    hashtag
    Example 1: Regression Data (RecordReaderMultiDataSetIterator)

    Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:

    hashtag
    Example 2: Classification and Multi-Task Learning (RecordReaderMultiDataSetIterator)

    Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:

    • Input file: myInput.csv, and we want to use all columns as input (without modification)

    • Output file: myOutput.csv.

      • Network output 1 - regression: columns 0 to 3

      • Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.

    In this case, we can build our iterator as follows:

    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input") //can use any label for this
        .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
        .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
        .setOutputs("L2")    //We need to specify the network outputs and their order
        .build();
    
    ComputationGraph net = new ComputationGraph(conf);
    net.init();
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input1", "input2")
        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
        .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
        .addVertex("merge", new MergeVertex(), "L1", "L2")
        .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
        .setOutputs("out")
        .build();
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Sgd(0.01))
            .graphBuilder()
            .addInputs("input")
            .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
            .addLayer("out1", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nIn(4).nOut(3).build(), "L1")
            .addLayer("out2", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.MSE)
                    .nIn(4).nOut(2).build(), "L1")
            .setOutputs("out1","out2")
            .build();
    int numLinesToSkip = 0;
    String fileDelimiter = ",";
    RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String csvPath = "/path/to/my/file.csv";
    rr.initialize(new FileSplit(new File(csvPath)));
    
    int batchSize = 4;
    MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
            .addReader("myReader",rr)
            .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
            .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
            .build();
    int numLinesToSkip = 0;
    String fileDelimiter = ",";
    
    RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String featuresCsvPath = "/path/to/my/myInput.csv";
    featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));
    
    RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String labelsCsvPath = "/path/to/my/myOutput.csv";
    labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));
    
    int batchSize = 4;
    int numClasses = 3;
    MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
            .addReader("csvInput", featuresReader)
            .addReader("csvLabels", labelsReader)
            .addInput("csvInput") //Input: all columns from input reader
            .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
            .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
            .build();
    Image caption generationarrow-up-right
    Convolutional networks for sentence classificationarrow-up-right
    Residual learning convolutional neural networksarrow-up-right

    Recurrent Layers

    Recurrent Neural Network (RNN) implementations in DL4J.

    This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.

    hashtag
    The Basics: Data and Network Configuration

    DL4J currently supports the following types of recurrent neural network

    • RNN ("vanilla" RNN)

    • LSTM (Long Short-Term Memory)

    Java documentation for each is available: , .

    hashtag
    Data for RNNs

    Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].

    Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.

    When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.

    hashtag
    RnnOutputLayer

    RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.

    Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:

    Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.

    hashtag
    RNN Training Features

    hashtag
    Truncated Back Propagation Through Time

    Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.

    Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).

    Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:

    For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.

    In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

    Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.

    The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.

    Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)

    The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.

    Some things of note:

    • By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).

    • The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.

    • The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)

    hashtag
    Masking: One-to-Many, Many-to-One, and Sequence Classification

    DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).

    Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:

    Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.

    The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).

    Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.

    Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.

    For a single example, the input and output masking arrays are shown below:

    For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.

    In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.

    hashtag
    Evaluation and Scoring with Masking

    Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.

    Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:

    where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.

    Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.

    hashtag
    Masking and Sequence Classification After Training

    Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).

    However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't actually label data anywhere except for the last time step.

    Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?

    To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:

    Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.

    The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).

    In this 'variable length' case, we need to get the last time step for each example separately. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the timeSeriesLength in the above code with the length of that example.

    If we don't have the lengths of the time series directly, we need to extract them from the mask array.

    If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):

    Alternatively, if we have only the features mask: One quick and dirty approach is to use this:

    To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).

    In either case, we can then do the following:

    hashtag
    Combining RNN Layers with Other Layer Types

    RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.

    Of course, the DenseLayer and Convolutional layers do not handle time series data - they expect a different type of input. To deal with this, we need to use the layer preprocessor functionality: for example, the CnnToRnnPreProcessor and FeedForwardToRnnPreprocessor classes. See for all preprocessors. Fortunately, in most situations, the DL4J configuration system will automatically add these preprocessors as required. However, the preprocessors can be added manually (overriding the automatic addition of preprocessors, for each layer).

    For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: .inputPreProcessor(2, new RnnToFeedForwardPreProcessor()).

    hashtag
    Inference: Predictions One Step at a Time

    As with other types of neural networks, predictions can be generated for RNNs using the MultiLayerNetwork.output() and MultiLayerNetwork.feedForward() methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.

    Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.

    For these situations, MultiLayerNetwork provides four methods of note:

    • rnnTimeStep(INDArray)

    • rnnClearPreviousState()

    • rnnGetPreviousState(int layer)

    The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.

    In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:

    1. Generate output/predictions (forward pass), using the previous stored state (if any)

    2. Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)

    For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input). If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.

    Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:

    For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:

    There are a number of important differences here:

    1. In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data

    2. The forward pass is thus a single time step (as compared to the hundreds – or more)

    3. After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.

    However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the MultiLayerNetwork.rnnClearPreviousState() method. This will reset the internal state of all recurrent layers in the network.

    If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is not saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.

    Some other points of note:

    • We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.

    • If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.

    • The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:

    hashtag
    Loading Time Series Data

    Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.

    The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file. This method also supports:

    • Variable length time series input

    • One-to-many and many-to-one data loading (where input and labels are in different files)

    • Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])

    Note that in all cases, each line in the data files represents one time step.

    (In addition to the examples below, you might find to be of some use.)

    hashtag
    Example 1: Time Series of Same Length, Input and Labels in Separate Files

    Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).

    To use the and approaches, we first create two CSVSequenceRecordReader objects, one for input and one for labels:

    This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).

    Second, we need to initialize these two readers, by telling them where to get the data from. We do this with an InputSplit object. Suppose that our time series are numbered, with file names "myInput_0.csv", "myInput_1.csv", ..., "myLabels_0.csv", etc. One approach is to use the :

    In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.

    Finally, we can create our SequenceRecordReaderdataSetIterator:

    This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.

    The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.

    Note that:

    • For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.

      • Labels data: one value per line, as a class index

      • Label data will be converted to a one-hot representation automatically

    hashtag
    Example 2: Time Series of Same Length, Input and Labels in Same File

    Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.

    As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)

    In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":

    miniBatchSize and numPossibleLabels are the same as the previous example. Here, labelIndex specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).

    For regression on a single output value, we use:

    Again, the numPossibleLabels argument is not used for regression.

    hashtag
    Example 3: Time Series of Different Lengths (Many-to-Many)

    Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.

    We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:

    The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:

    1. That the time series may be of different lengths

    2. To align the input and labels - for each example individually - such that their last values occur at the same time step.

    Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.

    Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.

    Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.

    hashtag
    Example 4: Many-to-One and One-to-Many Data

    We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:

    • Input and labels are in separate delimited files

    • The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)

    • The input lengths may (optionally) differ between examples

    In fact, the same approach as in example 3 can do this:

    Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):

    The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.

    Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:

    hashtag
    Available layers

    hashtag
    LSTM

    LSTM recurrent neural network layer without peephole connections. Supports CuDNN acceleration - see for details

    hashtag
    RnnLossLayer

    Recurrent Neural Network Loss Layer. Handles calculation of gradients etc for various objective (loss) time distributed dense component here. Consequently, the output activations size is equal to the input size. Input and output activations are same as other RNN layers: 3 dimensions with shape [miniBatchSize,nIn,timeSeriesLength] and [miniBatchSize,nOut,timeSeriesLength] respectively. Note that RnnLossLayer also has the option to configure an activation function

    setNIn

    • param lossFunction Loss function for the loss layer

    hashtag
    RnnOutputLayer

    and labels of shape [minibatch,nOut,sequenceLength]. It also supports mask arrays. Note that RnnOutputLayer can also be used for 1D CNN layers, which also have [minibatch,nOut,sequenceLength] activations/labels shape.

    build

    • param lossFunction Loss function for the output layer

    hashtag
    Bidirectional

    Bidirectional is a “wrapper” layer: it wraps any uni-directional RNN layer to make it bidirectional. Note that multiple different modes are supported - these specify how the activations should be combined from the forward and separate copies of the wrapped RNN layer, each with separate parameters.

    getNOut

    This Mode enumeration defines how the activations for the forward and backward networks should be combined. ADD: out = forward + backward (elementwise addition) MUL: out = forward backward (elementwise multiplication) AVERAGE: out = 0.5 (forward + backward) CONCAT: Concatenate the activations. Where ‘forward’ is the activations for the forward RNN, and ‘backward’ is the activations for the backward RNN. In all cases except CONCAT, the output activations size is the same size as the standard RNN that is being wrapped by this layer. In the CONCAT case, the output activations size (dimension 1) is 2x larger than the standard RNN’s activations array.

    getUpdaterByParam

    Get the updater for the given parameter. Typically the same updater will be used for all updaters, but this is not necessarily the case

    • param paramName Parameter name

    • return IUpdater for the parameter

    hashtag
    LastTimeStep

    LastTimeStep is a “wrapper” layer: it wraps any RNN (or CNN1D) layer, and extracts out the last time step during forward pass, and returns it as a row vector (per example). That is, for 3d (time series) input (with shape [minibatch, layerSize, timeSeriesLength]), we take the last time step and return it as a 2d array with shape [minibatch, layerSize]. Note that the last time step operation takes into account any mask arrays, if present: thus, variable length time series (in the same minibatch) are handled as expected here.

    hashtag
    SimpleRnn

    activationFn( in_t inWeight + out_(t-1) recurrentWeights + bias)}.

    Note that other architectures (LSTM, etc) are usually much more effective, especially for longer time series; however SimpleRnn is very fast to compute, and hence may be considered where the length of the temporal dependencies in the dataset are only a few steps long.

    rnnSetPreviousState(int layer, Map<String,INDArray> state)

    • For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]

    • For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.

  • It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.

  • The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.

  • The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).

  • The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.

  • Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)

    For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.

    • The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)

    • No processing of the labels is done when regression = true

    SimpleRnnarrow-up-right
    LSTMarrow-up-right
    herearrow-up-right
    these unit testsarrow-up-right
    SequenceRecordReaderDataSetIteratorarrow-up-right
    CSVSequenceRecordReaderarrow-up-right
    NumberedFileInputSplitarrow-up-right
    [source]arrow-up-right
    cuDNNarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation(Activation.SOFTMAX)
    .weightInit(WeightInit.XAVIER).nIn(prevLayerSize).nOut(nOut).build())
    .backpropType(BackpropType.TruncatedBPTT)
    .tBPTTLength(100)
    Evaluation.evalTimeSeries(INDArray labels, INDArray predicted, INDArray outputMask)
        INDArray timeSeriesFeatures = ...;
        INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);
        int timeSeriesLength = timeSeriesOutput.size(2);        //Size of time dimension
        INDArray lastTimeStepProbabilities = timeSeriesOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength-1));
        INDArray labelsMaskArray = ...;
        INDArray lastTimeStepIndices = Nd4j.argMax(labelMaskArray,1);
        INDArray featuresMaskArray = ...;
        int longestTimeSeries = featuresMaskArray.size(1);
        INDArray linspace = Nd4j.linspace(1,longestTimeSeries,longestTimeSeries);
        INDArray temp = featuresMaskArray.mulColumnVector(linspace);
        INDArray lastTimeStepIndices = Nd4j.argMax(temp,1);
        int numExamples = timeSeriesFeatures.size(0);
        for( int i=0; i<numExamples; i++ ){
            int thisTimeSeriesLastIndex = lastTimeStepIndices.getInt(i);
            INDArray thisExampleProbabilities = timeSeriesOutput.get(NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(thisTimeSeriesLastIndex));
        }
    SequenceRecordReader featureReader = new CSVSequenceRecordReader(1, ",");
    SequenceRecordReader labelReader = new CSVSequenceRecordReader(1, ",");
    featureReader.initialize(new NumberedFileInputSplit("/path/to/data/myInput_%d.csv", 0, 9));
    labelReader.initialize(new NumberedFileInputSplit(/path/to/data/myLabels_%d.csv", 0, 9));
    DataSetIterator iter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression);
    SequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
    reader.initialize(new NumberedFileInputSplit("/path/to/data/myData_%d.csv", 0, 9));
    DataSetIterator iterClassification = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, numPossibleLabels, labelIndex, false);
    DataSetIterator iterRegression = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, -1, labelIndex, true);
    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    public void setNIn(int nIn)
    public RnnOutputLayer build()
    public long getNOut()
    public IUpdater getUpdaterByParam(String paramName)

    Layers

    Supported neural network layers.

    hashtag
    What are layers?

    Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

    hashtag
    Using layers

    All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

    hashtag
    Layers vs. vertices

    If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

    hashtag
    General layers

    hashtag
    ActivationLayer

    Activation layer is a simple layer that applies the specified activation function to the input activations

    clone

    • param activation Activation function for the layer

    activation

    Activation function for the layer

    activation

    • param activationFunction Activation function for the layer

    activation

    • param activation Activation function for the layer

    hashtag
    DenseLayer

    Dense layer: a standard fully connected feed forward layer

    hasBias

    If true (default): include bias parameters in the model. False: no bias.

    hasLayerNorm

    If true (default = false): enable layer normalization on this layer

    hashtag
    DropoutLayer

    Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

    build

    Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

    • param dropout Activation retain probability.

    hashtag
    EmbeddingLayer

    Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

    hasBias

    If true: include bias parameters in the layer. False (default): no bias.

    weightInit

    Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

    • param embeddingInitializer Source of the embedding layer weights

    weightInit

    Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

    • param vectors Vectors to initialize the embedding layer with

    hashtag
    EmbeddingSequenceLayer

    Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

    hasBias

    If true: include bias parameters in the layer. False (default): no bias.

    inputLength

    Set input sequence length for this embedding layer.

    • param inputLength input sequence length

    • return Builder

    inferInputLength

    Set input sequence inference mode for embedding layer.

    • param inferInputLength whether to infer input length

    • return Builder

    weightInit

    Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

    • param embeddingInitializer Source of the embedding layer weights

    weightInit

    Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

    • param vectors Vectors to initialize the embedding layer with

    hashtag
    GlobalPoolingLayer

    Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

    Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

    • Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only

    • CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

    Behaviour with default settings:

    • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

    • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

    • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

    Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

    • [miniBatchSize, vectorSize, 1] for RNN output,

    • [miniBatchSize, channels, 1, 1] for CNN output, and

    • [miniBatchSize, channels, 1, 1, 1] for CNN3D output.

    poolingDimensions

    Pooling type for global pooling

    poolingType

    • param poolingType Pooling type for global pooling

    collapseDimensions

    Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

    • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

    • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

    • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

    If false:

    • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]

    • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]

    • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]

    pnorm

    P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

    • param pnorm P-norm constant

    hashtag
    LocalResponseNormalization

    Local response normalization layer See section 3.3 of

    k

    LRN scaling constant k. Default: 2

    n

    Number of adjacent kernel maps to use when doing LRN. default: 5

    • param n Number of adjacent kernel maps

    alpha

    LRN scaling constant alpha. Default: 1e-4

    • param alpha Scaling constant

    beta

    Scaling constant beta. Default: 0.75

    • param beta Scaling constant

    cudnnAllowFallback

    When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

    • param allowFallback Whether fallback to non-CuDNN implementation should be used

    hashtag
    LocallyConnected1D

    SameDiff version of a 1D locally connected layer.

    nIn

    Number of inputs to the layer (input size)

    nOut

    • param nOut Number of outputs (output size)

    activation

    • param activation Activation function for the layer

    kernelSize

    • param k Kernel size for the layer

    stride

    • param s Stride for the layer

    padding

    • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

    convolutionMode

    • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

    dilation

    • param d Dilation for the layer

    hasBias

    • param hasBias If true (default is false) the layer will have a bias

    setInputSize

    Set input filter size for this locally connected 1D layer

    • param inputSize height of the input filters

    • return Builder

    hashtag
    LocallyConnected2D

    SameDiff version of a 2D locally connected layer.

    setKernel

    Number of inputs to the layer (input size)

    setStride

    • param stride Stride for the layer. Must be 2 values (height/width)

    setPadding

    • param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

    setDilation

    • param dilation Dilation for the layer. Must be 2 values (height/width)

    nIn

    • param nIn Number of inputs to the layer (input size)

    nOut

    • param nOut Number of outputs (output size)

    activation

    • param activation Activation function for the layer

    kernelSize

    • param k Kernel size for the layer. Must be 2 values (height/width)

    stride

    • param s Stride for the layer. Must be 2 values (height/width)

    padding

    • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

    convolutionMode

    • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

    dilation

    • param d Dilation for the layer. Must be 2 values (height/width)

    hasBias

    • param hasBias If true (default is false) the layer will have a bias

    setInputSize

    Set input filter size (h,w) for this locally connected 2D layer

    • param inputSize pair of height and width of the input filters to this layer

    • return Builder

    hashtag
    LossLayer

    LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

    nIn

    • param lossFunction Loss function for the loss layer

    hashtag
    OutputLayer

    Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

    build

    • param lossFunction Loss function for the output layer

    hashtag
    Pooling1D

    Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

    hashtag
    Pooling2D

    Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

    hashtag
    Subsampling1DLayer

    sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

    Supports the following pooling types: MAX, AVG, SUM, PNORM

    setKernelSize

    Kernel size

    • param kernelSize kernel size

    setStride

    Stride

    • param stride stride value

    setPadding

    Padding

    • param padding padding value

    hashtag
    Upsampling1D

    sequenceLength]} Example:

    size

    Upsampling size

    • param size upsampling size in single spatial dimension of this 1D layer

    size

    Upsampling size int array with a single element. Array must be length 1

    • param size upsampling size in single spatial dimension of this 1D layer

    hashtag
    Upsampling2D

    Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

    size

    Upsampling size int, used for both height and width

    • param size upsampling size in height and width dimensions

    size

    Upsampling size array

    • param size upsampling size in height and width dimensions

    hashtag
    Upsampling3D

    Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

    size

    Upsampling size as int, so same upsampling size is used for depth, width and height

    • param size upsampling size in height, width and depth dimensions

    size

    Upsampling size as int, so same upsampling size is used for depth, width and height

    • param size upsampling size in height, width and depth dimensions

    hashtag
    ZeroPadding1DLayer

    Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

    setPadding

    Padding value for left and right. Must be length 2 array

    build

    • param padding Padding for both the left and right

    hashtag
    ZeroPadding3DLayer

    Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

    setPadding

    [padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

    build

    • param padding Padding for both the left and right in all three spatial dimensions

    hashtag
    ZeroPaddingLayer

    Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

    setPadding

    Padding value for top, bottom, left, and right. Must be length 4 array

    build

    • param padHeight Padding for both the top and bottom

    • param padWidth Padding for both the left and right

    hashtag
    ElementWiseMultiplicationLayer

    is a learnable weight vector of length nOut

    • “.” is element-wise multiplication

    • b is a bias vector

    Note that the input and output sizes of the element-wise layer are the same for this layer

    created by jingshu

    getMemoryReport

    This is a report of the estimated memory consumption for the given layer

    • param inputType Input type to the layer. Memory consumption is often a function of the input type

    • return Memory report for the layer

    hashtag
    RepeatVector

    RepeatVector layer configuration.

    RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

    getRepetitionFactor

    Set repetition factor for RepeatVector layer

    setRepetitionFactor

    Set repetition factor for RepeatVector layer

    • param n upsampling size in height and width dimensions

    repetitionFactor

    Set repetition factor for RepeatVector layer

    • param n upsampling size in height and width dimensions

    hashtag
    Yolo2OutputLayer

    Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

    Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

    Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

    lambdaCoord

    Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

    lambbaNoObj

    Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

    • param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

    lossPositionScale

    Loss function for position/scale component of the loss function

    • param lossPositionScale Loss function for position/scale

    lossClassPredictions

    Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

    • param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

    boundingBoxPriors

    Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

    • param boundingBoxes Bounding box prior dimensions (width, height)

    hashtag
    MaskLayer

    MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

    hashtag
    MaskZeroLayer

    Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

    param collapseDimensions Whether to collapse the dimensions or not

    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    http://www.cs.toronto.edu/~fritz/absps/imagenet.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1612.08242arrow-up-right
    http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    public ActivationLayer clone()
    public Builder activation(String activationFunction)
    public Builder activation(IActivation activationFunction)
    public Builder activation(Activation activation)
    public Builder hasBias(boolean hasBias)
    public Builder hasLayerNorm(boolean hasLayerNorm)
    public DropoutLayer build()
    public Builder hasBias(boolean hasBias)
    public Builder weightInit(EmbeddingInitializer embeddingInitializer)
    public Builder weightInit(INDArray vectors)
    public Builder hasBias(boolean hasBias)
    public Builder inputLength(int inputLength)
    public Builder inferInputLength(boolean inferInputLength)
    public Builder weightInit(EmbeddingInitializer embeddingInitializer)
    public Builder weightInit(INDArray vectors)
    public Builder poolingDimensions(int... poolingDimensions)
    public Builder poolingType(PoolingType poolingType)
    public Builder collapseDimensions(boolean collapseDimensions)
    public Builder pnorm(int pnorm)
    public Builder k(double k)
    public Builder n(double n)
    public Builder alpha(double alpha)
    public Builder beta(double beta)
    public Builder cudnnAllowFallback(boolean allowFallback)
    public Builder nIn(int nIn)
    public Builder nOut(int nOut)
    public Builder activation(Activation activation)
    public Builder kernelSize(int k)
    public Builder stride(int s)
    public Builder padding(int p)
    public Builder convolutionMode(ConvolutionMode cm)
    public Builder dilation(int d)
    public Builder hasBias(boolean hasBias)
    public Builder setInputSize(int inputSize)
    public void setKernel(int... kernel)
    public void setStride(int... stride)
    public void setPadding(int... padding)
    public void setDilation(int... dilation)
    public Builder nIn(int nIn)
    public Builder nOut(int nOut)
    public Builder activation(Activation activation)
    public Builder kernelSize(int... k)
    public Builder stride(int... s)
    public Builder padding(int... p)
    public Builder convolutionMode(ConvolutionMode cm)
    public Builder dilation(int... d)
    public Builder hasBias(boolean hasBias)
    public Builder setInputSize(int... inputSize)
    public Builder nIn(int nIn)
    public OutputLayer build()
    public void setKernelSize(int... kernelSize)
    public void setStride(int... stride)
    public void setPadding(int... padding)
    If input (for a single example, with channels down page, and sequence from left to right) is:
    [ A1, A2, A3]
    [ B1, B2, B3]
    Then output with size = 2 is:
    [ A1, A1, A2, A2, A3, A3]
    [ B1, B1, B2, B2, B3, B2]
    public Builder size(int size)
    public Builder size(int[] size)
    Input (slice for one example and channel)
    [ A, B ]
    [ C, D ]
    Size = [2, 2]
    Output (slice for one example and channel)
    [ A, A, B, B ]
    [ A, A, B, B ]
    [ C, C, D, D ]
    [ C, C, D, D ]
    public Builder size(int size)
    public Builder size(int[] size)
    public Builder size(int size)
    public Builder size(int[] size)
    public void setPadding(int... padding)
    public ZeroPadding1DLayer build()
    public void setPadding(int... padding)
    public ZeroPadding3DLayer build()
    public void setPadding(int... padding)
    public ZeroPaddingLayer build()
    public LayerMemoryReport getMemoryReport(InputType inputType)
    public int getRepetitionFactor()
    public void setRepetitionFactor(int n)
    public Builder repetitionFactor(int n)
    public Builder lambdaCoord(double lambdaCoord)
    public Builder lambbaNoObj(double lambdaNoObj)
    public Builder lossPositionScale(ILossFunction lossPositionScale)
    public Builder lossClassPredictions(ILossFunction lossClassPredictions)
    public Builder boundingBoxPriors(INDArray boundingBoxes)

    Word2vec/Glove/Doc2Vec

    Neural word embeddings for NLP in DL4J.

    hashtag
    Word2Vec, Doc2vec & GloVe: Neural Word Embeddings for Natural Language Processing

    Contents

    • Introduction

    hashtag

    Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a , it turns text into a numerical form that deep nets can understand. .

    Word2vec's applications extend beyond parsing sentences in the wild. It can be applied just as well to in which patterns may be discerned.

    Why? Because words are simply discrete states like the other data mentioned above, and we are simply looking for the transitional probabilities between those states: the likelihood that they will co-occur. So gene2vec, like2vec and follower2vec are all possible. With that in mind, the tutorial below will help you understand how to create neural embeddings for any group of discrete and co-occurring states.

    The purpose and usefulness of Word2vec is to group the vectors of similar words together in vectorspace. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention.

    Given enough data, usage and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to establish a word's association with other words (e.g. "man" is to "boy" what "woman" is to "girl"), or cluster documents and classify them by topic. Those clusters can form the basis of search, sentiment analysis and recommendations in such diverse fields as scientific research, legal discovery, e-commerce and customer relationship management.

    The output of the Word2vec neural net is a vocabulary in which each item has a vector attached to it, which can be fed into a deep-learning net or simply queried to detect relationships between words.

    Measuring , no similarity is expressed as a 90 degree angle, while total similarity of 1 is a 0 degree angle, complete overlap; i.e. Sweden equals Sweden, while Norway has a cosine distance of 0.760124 from Sweden, the highest of any other country.

    Here's a list of words associated with "Sweden" using Word2vec, in order of proximity:

    The nations of Scandinavia and several wealthy, northern European, Germanic countries are among the top nine.

    hashtag

    The vectors we use to represent words are called neural word embeddings, and representations are strange. One thing describes another, even though those two things are radically different. As Elvis Costello said: "Writing about music is like dancing about architecture." Word2vec "vectorizes" about words, and by doing so it makes natural language computer-readable -- we can start to perform powerful mathematical operations on words to detect their similarities.

    So a neural word embedding represents a word with numbers. It's a simple, yet unlikely, translation.

    Word2vec is similar to an autoencoder, encoding each word in a vector, but rather than training against the input words through word2vec trains words against other words that neighbor them in the input corpus.

    It does so in one of two ways, either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram. We use the latter method because it produces more accurate results on large datasets.

    When the feature vector assigned to a word cannot be used to accurately predict that word's context, the components of the vector are adjusted. Each word's context in the corpus is the teacher sending error signals back to adjust the feature vector. The vectors of words judged similar by their context are nudged closer together by adjusting the numbers in the vector.

    Just as Van Gogh's painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.

    Those numbers locate each word as a point in 500-dimensional vectorspace. Spaces of more than three dimensions are difficult to visualize. (Geoff Hinton, teaching people to imagine 13-dimensional space, suggests that students first picture 3-dimensional space and then say to themselves: "Thirteen, thirteen, thirteen." :)

    A well trained set of word vectors will place similar words close to each other in that space. The words oak, elm and birch might cluster in one corner, while war, conflict and strife huddle together in another.

    Similar things and ideas are shown to be "close". Their relative meanings have been translated to measurable distances. Qualities become quantities, and algorithms can do their work. But similarity is just the basis of many associations that Word2vec can learn. For example, it can gauge relations between words of one language, and map them to another.

    These vectors are the basis of a more comprehensive geometry of words. As shown in the graph, capital cities such as Rome, Paris, Berlin and Beijing cluster near each other, and they will each have similar distances in vectorspace to their countries; i.e. Rome - Italy = Beijing - China. If you only knew that Rome was the capital of Italy, and were wondering about the capital of China, then the equation Rome -Italy + China would return Beijing. No kidding.

    hashtag

    Let's look at some other associations Word2vec can produce.

    Instead of the pluses, minus and equals signs, we'll give you the results in the notation of logical analogies, where : means "is to" and :: means "as"; e.g. "Rome is to Italy as Beijing is to China" = Rome:Italy::Beijing:China. In the last spot, rather than supplying the "answer", we'll give you the list of words that a Word2vec model proposes, when given the first three elements:

    This model was trained on the Google News vocab, which you can and play with. Contemplate, for a moment, that the Word2vec algorithm has never been taught a single rule of English syntax. It knows nothing about the world, and is unassociated with any rules-based symbolic logic or knowledge graph. And yet it learns more, in a flexible and automated fashion, than most knowledge graphs will learn after a years of human labor. It comes to the Google News documents as a blank slate, and by the end of training, it can compute complex analogies that mean something to humans.

    You can also query a Word2vec model for other assocations. Not everything has to be two analogies that mirror each other. ()

    • Geopolitics: Iraq - Violence = Jordan

    • Distinction: Human - Animal = Ethics

    • President - Power = Prime Minister

    By building a sense of one word's proximity to other similar words, which do not necessarily contain the same letters, we have moved beyond hard tokens to a smoother and more general sense of meaning.

    hashtag

    hashtag

    Here are Deeplearning4j's natural-language processing components:

    • SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.

    • Tokenizer/TokenizerFactory: Used in tokenizing the text. In NLP terms, a sentence is represented as a series of tokens. A TokenizerFactory creates an instance of a tokenizer for a "sentence."

    • VocabCache: Used for tracking metadata including word counts, document occurrences, the set of tokens (not vocab in this case, but rather tokens that have occurred), vocab (the features included in both bag of words as well as the word vector lookup table)

    While Word2vec refers to a family of related algorithms, this implementation uses .

    hashtag

    Create a new project in IntelliJ using Maven. If you don't know how to do that, see our . Then specify these properties and dependencies in the POM.xml file in your project's root directory (You can for the most recent versions -- please use those...).

    hashtag
    Loading Data

    Now create and name a new class in Java. After that, you'll take the raw sentences in your .txt file, traverse them with your iterator, and subject them to some sort of preprocessing, such as converting all words to lowercase.

    If you want to load a text file besides the sentences provided in our example, you'd do this:

    That is, get rid of the ClassPathResource and feed the absolute path of your .txt file into the LineSentenceIterator.

    In bash, you can find the absolute file path of any directory by typing pwd in your command line from within that same directory. To that path, you'll add the file name and voila.

    hashtag
    Tokenizing the Data

    Word2vec needs to be fed words rather than whole sentences, so the next step is to tokenize the data. To tokenize a text is to break it up into its atomic units, creating a new token each time you hit a white space, for example.

    That should give you one word per line.

    hashtag
    Training the Model

    Now that the data is ready, you can configure the Word2vec neural net and feed in the tokens.

    This configuration accepts a number of hyperparameters. A few require some explanation:

    • batchSize is the amount of words you process at a time.

    • minWordFrequency is the minimum number of times a word must appear in the corpus. Here, if it appears less than 5 times, it is not learned. Words must appear in multiple contexts to learn useful features about them. In very large corpora, it's reasonable to raise the minimum.

    • useAdaGrad - Adagrad creates a different gradient for each feature. Here we are not concerned with that.

    hashtag

    The next step is to evaluate the quality of your feature vectors.

    The line vec.similarity("word1","word2") will return the cosine similarity of the two words you enter. The closer it is to 1, the more similar the net perceives those words to be (see the Sweden-Norway example above). For example:

    With vec.wordsNearest("word1", numWordsNearest), the words printed to the screen allow you to eyeball whether the net has clustered semantically similar words. You can set the number of nearest words you want with the second parameter of wordsNearest. For example:

    hashtag
    Saving, Reloading & Using the Model

    You'll want to save the model. The normal way to save models in Deeplearning4j is via the serialization utils (Java serialization is akin to Python pickling, converting an object into a series of bytes).

    This will save the vectors to a file called pathToSaveModel.txt that will appear in the root of the directory where Word2vec is trained. The output in the file should have one word per line, followed by a series of numbers that together are its vector representation.

    To keep working with the vectors, simply call methods on vec like this:

    The classic example of Word2vec's arithmetic of words is "king - queen = man - woman" and its logical extension "king - queen + woman = man".

    The example above will output the 10 nearest words to the vector king - queen + woman, which should include man. The first parameter for wordsNearest has to include the "positive" words king and woman, which have a + sign associated with them; the second parameter includes the "negative" word queen, which is associated with the minus sign (positive and negative here have no emotional connotation); the third is the length of the list of nearest words you would like to see. Remember to add this to the top of the file: import java.util.Arrays;.

    Any number of combinations is possible, but they will only return sensible results if the words you query occurred with enough frequency in the corpus. Obviously, the ability to return similar words (or documents) is at the foundation of both search and recommendation engines.

    You can reload the vectors into memory like this:

    You can then use Word2vec as a lookup table:

    If the word isn't in the vocabulary, Word2vec returns zeros.

    hashtag

    The we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.

    If you trained with the or Gensim, this line will import the model.

    Remember to add import java.io.File; to your imported packages.

    With large models, you may run into trouble with your heap space. The Google model may take as much as 10G of RAM, and the JVM only launches with 256 MB of RAM, so you have to adjust your heap space. You can do that either with a bash_profile file (see our ), or through IntelliJ itself:

    hashtag

    Words are read into the vector one at a time, and scanned back and forth within a certain range. Those ranges are n-grams, and an n-gram is a contiguous sequence of n items from a given linguistic sequence; it is the nth version of unigram, bigram, trigram, four-gram or five-gram. A skip-gram simply drops items from the n-gram.

    The skip-gram representation popularized by Mikolov and used in the DL4J implementation has proven to be more accurate than other models, such as continuous bag of words, due to the more generalizable contexts generated.

    This n-gram is then fed into a neural network to learn the significance of a given word vector; i.e. significance is defined as its usefulness as an indicator of certain larger meanings, or labels.

    hashtag

    Please note : The code below may be outdated. For updated examples, please see our .

    Now that you have a basic idea of how to set up Word2Vec, here's of how it can be used with DL4J's API:

    After following the instructions in the , you can open this example in IntelliJ and hit run to see it work. If you query the Word2vec model with a word isn't contained in the training corpus, it will return null.

    hashtag

    Q: I get a lot of stack traces like this

    A: Look inside the directory where you started your Word2vec application. This can, for example, be an IntelliJ project home directory or the directory where you typed Java at the command line. It should have some directories that look like:

    You can shut down your Word2vec application and try to delete them.

    Q: Not all of the words from my raw text data are appearing in my Word2vec object…

    A: Try to raise the layer size via .layerSize() on your Word2Vec object like so

    Q: How do I load my data? Why does training take forever?

    A: If all of your sentences have been loaded as one sentence, Word2vec training could take a very long time. That's because Word2vec is a sentence-level algorithm, so sentence boundaries are very important, because co-occurrence statistics are gathered sentence by sentence. (For GloVe, sentence boundaries don't matter, because it's looking at corpus-wide co-occurrence. For many corpora, average sentence length is six words. That means that with a window size of 5 you have, say, 30 (random number here) rounds of skip-gram calculations. If you forget to specify your sentence boundaries, you may load a "sentence" that's 10,000 words long. In that case, Word2vec would attempt a full skip-gram cycle for the whole 10,000-word "sentence". In DL4J's implementation, a line is assumed to be a sentence. You need plug in your own SentenceIterator and Tokenizer. By asking you to specify how your sentences end, DL4J remains language-agnostic. UimaSentenceIterator is one way to do that. It uses OpenNLP for sentence boundary detection.

    Q: Why is there such a difference in performance when feeding whole documents as one "sentence" vs splitting into Sentences?

    _A:_If average sentence contains 6 words, and window size is 5, maximum theoretical number of 10 skipgram rounds will be achieved on 0 words. Sentence isn't long enough to have full window set with words. Rough maximum number of 5 sg rounds is available there for all words in such sentence.

    But if your "sentence" is 1000k words length, you'll have 10 skipgram rounds for every word in this sentence, excluding the first 5 and last five. So, you'll have to spend WAY more time building model + cooccurrence statistics will be shifted due to the absense of sentence boundaries.

    Q: How does Word2Vec Use Memory?

    A: The major memory consumer in w2v is weights matrix. Math is simple there: NumberOfWords x NumberOfDimensions x 2 x DataType memory footprint.

    So, if you build w2v model for 100k words using floats, and 100 dimensions, your memory footprint will be 100k x 100 x 2 x 4 (float size) = 80MB RAM just for matri + some space for strings, variables, threads etc.

    If you load pre-built model, it uses roughly 2 times less RAM then during build time, so it's 40MB RAM.

    And the most popular model used so far is Google News model. There's 3M words, and vector size 300. That gives us 3.6GB only to load model. And you have to add 3M of strings, that do not have constant size in java. So, usually that's something around 4-6GB for loaded model depending on jvm version/supplier, gc state and phase of the moon.

    Q: I did everything you said and the results still don't look right.

    A: Make sure you're not hitting into normalization issues. Some tasks, like wordsNearest(), use normalized weights by default, and others require non-normalized weights. Pay attention to this difference.

    hashtag

    Google Scholar keeps a running tally of the papers citing .

    Kenny Helsens, a data scientist based in Belgium, to the NCBI's Online Mendelian Inheritance In Man (OMIM) database. He then looked for the words most similar to alk, a known oncogene of non-small cell lung carcinoma, and Word2vec returned: "nonsmall, carcinomas, carcinoma, mapdkd." From there, he established analogies between other cancer phenotypes and their genotypes. This is just one example of the associations Word2vec can learn on a large corpus. The potential for discovering new aspects of important diseases has only just begun, and outside of medicine, the opportunities are equally diverse.

    Andreas Klintberg trained Deeplearning4j's implementation of Word2vec on Swedish, and wrote a .

    Word2Vec is especially useful in preparing text-based data for information retrieval and QA systems, which DL4J implements with deep autoencoders.

    Marketers might seek to establish relationships among products to build a recommendation engine. Investigators might analyze a social graph to surface members of a single group, or other relations they might have to location or financial sponsorship.

    hashtag

    Word2vec is introduced by a team of researchers at Google led by Tomas Mikolov. Google released under an Apache 2.0 license. In 2014, Mikolov left Google for Facebook, and in May 2015, , which does not abrogate the Apache license under which it has been released.

    hashtag

    While words in all languages may be converted into vectors with Word2vec, and those vectors learned with Deeplearning4j, NLP preprocessing can be very language specific, and requires tools beyond our libraries. The has a number of Java-based tools for tokenization, part-of-speech tagging and named-entity recognition for languages such as , Arabic, French, German and Spanish. For Japanese, NLP tools like are useful. Other foreign-language resources, including .

    hashtag

    Loading and saving GloVe models to word2vec can be done like so:

    hashtag

    Deeplearning4j has a class called , which is one level of abstraction above word vectors, and which allows you to extract features from any sequence, including social media profiles, transactions, proteins, etc. If data can be described as sequence, it can be learned via skip-gram and hierarchic softmax with the AbstractVectors class. This is compatible with the , also implemented in Deeplearning4j.

    hashtag

    • Weights update after model serialization/deserialization was added. That is, you can update model state with, say, 200GB of new text by calling loadFullModel, adding TokenizerFactory and SentenceIterator to it, and calling fit() on the restored model.

    • Option for multiple datasources for vocab construction was added.

    hashtag
    Doc2vec & Other NLP Resources

    hashtag

    Library - Books = Hall

  • Analogy: Stock Market ≈ Thermometer

  • Inverted Index: Stores metadata about where words occurred. Can be used for understanding the dataset. A Lucene index with the Lucene implementation[1] is automatically created.

  • layerSize specifies the number of features in the word vector. This is equal to the number of dimensions in the featurespace. Words represented by 500 features become points in a 500-dimensional space.

  • learningRate is the step size for each update of the coefficients, as words are repositioned in the feature space.

  • minLearningRate is the floor on the learning rate. Learning rate decays as the number of words you train on decreases. If learning rate shrinks too much, the net's learning is no longer efficient. This keeps the coefficients moving.

  • iterate tells the net what batch of the dataset it's training on.

  • tokenizer feeds it the words from the current batch.

  • vec.fit() tells the configured net to begin training.

  • Epochs and Iterations can be specified separately, although they are both typically "1".

  • Word2Vec.Builder has this option: hugeModelExpected. If set to true, the vocab will be periodically truncated during the build.

  • While minWordFrequency is useful for ignoring rare words in the corpus, any number of words can be excluded to customize.

  • Two new WordVectorsSerialiaztion methods have been introduced: writeFullModel and loadFullModel. These save and load a full model state.

  • A decent workstation should be able to handle a vocab with a few million words. Deeplearning4j's Word2vec imlementation can model a few terabytes of data on a single machine. Roughly, the math is: vectorSize * 4 * 3 * vocab.size().

  • ; Yoav Goldberg and Omer Levy

  • Neural Word Embeddings
    Amusing Word2vec Resultsarrow-up-right
    Just Give Me the Codearrow-up-right
    Anatomy of Word2Vec
    Setup, Load and Train
    A Code Example
    Troubleshooting & Tuning Word2Vecarrow-up-right
    Word2vec Use Cases
    Foreign Languagesarrow-up-right
    GloVe (Global Vectors) & Doc2Vec
    Introduction to Word2Vecarrow-up-right
    deep neural networkarrow-up-right
    Deeplearning4j
    genes, code, likes, playlists, social media graphs and other verbal or symbolic series
    cosine similarityarrow-up-right
    Neural Word Embeddings
    reconstructionarrow-up-right
    Amusing Word2Vec Results
    importarrow-up-right
    We explain how below....arrow-up-right
    Just Give Me the Codearrow-up-right
    Anatomy of Word2vec in DL4Jarrow-up-right
    Negative Samplingarrow-up-right
    Word2Vec Setuparrow-up-right
    Quickstart pagearrow-up-right
    check Mavenarrow-up-right
    Evaluating the Model, Using Word2vecarrow-up-right
    Importing Word2vec Modelsarrow-up-right
    Google News Corpus modelarrow-up-right
    C vectorsarrow-up-right
    Troubleshooting section
    N-grams & Skip-gramsarrow-up-right
    A Working Example
    dl4j-examples repository on Githubarrow-up-right
    one examplearrow-up-right
    Quickstartarrow-up-right
    Troubleshooting & Tuning Word2Vecarrow-up-right
    Use Casesarrow-up-right
    Deeplearning4j's implementation of Word2vec herearrow-up-right
    applied Deeplearning4j's implementation of Word2vecarrow-up-right
    thorough walkthrough on Mediumarrow-up-right
    Google's Word2vec Patent
    a method of computing vector representations of wordsarrow-up-right
    hosts an open-source version of Word2vecarrow-up-right
    Google was granted a patent for the methodarrow-up-right
    Foreign Languages
    Stanford Natural Language Processing Grouparrow-up-right
    Mandarin Chinesearrow-up-right
    Kuromojiarrow-up-right
    text corpora, are available herearrow-up-right
    GloVe: Global Vectors
    Sequence Vectors
    SequenceVectorsarrow-up-right
    DeepWalk algorithmarrow-up-right
    Word2Vec Features on Deeplearning4j
    DL4J Example of Text Classification With Word2vec & RNNsarrow-up-right
    DL4J Example of Text Classification With Paragraph Vectorsarrow-up-right
    Doc2vec, or Paragraph Vectors, With Deeplearning4jarrow-up-right
    Word2Vec in Literature
    king:queen::man:[woman, Attempted abduction, teenager, girl] 
    //Weird, but you can kind of see it
    
    China:Taiwan::Russia:[Ukraine, Moscow, Moldova, Armenia]
    //Two large countries and their small, estranged neighbors
    
    house:roof::castle:[dome, bell_tower, spire, crenellations, turrets]
    
    knee:leg::elbow:[forearm, arm, ulna_bone]
    
    New York Times:Sulzberger::Fox:[Murdoch, Chernin, Bancroft, Ailes]
    //The Sulzberger-Ochs family owns and runs the NYT.
    //The Murdoch family owns News Corp., which owns Fox News. 
    //Peter Chernin was News Corp.'s COO for 13 yrs.
    //Roger Ailes is president of Fox News. 
    //The Bancroft family sold the Wall St. Journal to News Corp.
    
    love:indifference::fear:[apathy, callousness, timidity, helplessness, inaction]
    //the poetry of this single array is simply amazing...
    
    Donald Trump:Republican::Barack Obama:[Democratic, GOP, Democrats, McCain]
    //It's interesting to note that, just as Obama and McCain were rivals,
    //so too, Word2vec thinks Trump has a rivalry with the idea Republican.
    
    monkey:human::dinosaur:[fossil, fossilized, Ice_Age_mammals, fossilization]
    //Humans are fossilized monkeys? Humans are what's left 
    //over from monkeys? Humans are the species that beat monkeys
    //just as Ice Age mammals beat dinosaurs? Plausible.
    
    building:architect::software:[programmer, SecurityCenter, WinPcap]
    String filePath = new ClassPathResource("raw_sentences.txt").getFile().getAbsolutePath();
    
    log.info("Load & Vectorize Sentences....");
    // Strip white space before and after for each line
    SentenceIterator iter = new BasicLineIterator(filePath);
    log.info("Load data....");
    SentenceIterator iter = new LineSentenceIterator(new File("/Users/cvn/Desktop/file.txt"));
    iter.setPreProcessor(new SentencePreProcessor() {
        @Override
        public String preProcess(String sentence) {
            return sentence.toLowerCase();
        }
    });
    SentenceIterator iter = new LineSentenceIterator(new File("/your/absolute/file/path/here.txt"));
    // Split on white spaces in the line to get words
    TokenizerFactory t = new DefaultTokenizerFactory();
    t.setTokenPreProcessor(new CommonPreprocessor());
    log.info("Building model....");
    Word2Vec vec = new Word2Vec.Builder()
            .minWordFrequency(5)
            .layerSize(100)
            .seed(42)
            .windowSize(5)
            .iterate(iter)
            .tokenizerFactory(t)
            .build();
    
    log.info("Fitting Word2Vec model....");
    vec.fit();
    // Write word vectors
    WordVectorSerializer.writeWordVectors(vec, "pathToWriteto.txt");
    
    log.info("Closest Words:");
    Collection<String> lst = vec.wordsNearest("day", 10);
    System.out.println(lst);
    //output: [night, week, year, game, season, during, office, until, -]
    double cosSim = vec.similarity("day", "night");
    System.out.println(cosSim);
    //output: 0.7704452276229858
    Collection<String> lst3 = vec.wordsNearest("man", 10);
    System.out.println(lst3);
    //output: [director, company, program, former, university, family, group, such, general]
    log.info("Save vectors....");
    WordVectorSerializer.writeWord2VecModel(vec, "pathToSaveModel.txt");
    Collection<String> kingList = vec.wordsNearest(Arrays.asList("king", "woman"), Arrays.asList("queen"), 10);
    Word2Vec word2Vec = WordVectorSerializer.readWord2VecModel("pathToSaveModel.txt");
    WeightLookupTable weightLookupTable = word2Vec.lookupTable();
    Iterator<INDArray> vectors = weightLookupTable.vectors();
    INDArray wordVectorMatrix = word2Vec.getWordVectorMatrix("myword");
    double[] wordVector = word2Vec.getWordVector("myword");
    File gModel = new File("/Developer/Vector Models/GoogleNews-vectors-negative300.bin.gz");
    Word2Vec vec = WordVectorSerializer.readWord2VecModel(gModel);
    //Click:
    IntelliJ Preferences > Compiler > Command Line Options 
    //Then paste:
    -Xms1024m
    -Xmx10g
    -XX:MaxPermSize=2g
    java.lang.StackOverflowError: null
    at java.lang.ref.Reference.<init>(Reference.java:254) ~[na:1.8.0_11]
    at java.lang.ref.WeakReference.<init>(WeakReference.java:69) ~[na:1.8.0_11]
    at java.io.ObjectStreamClass$WeakClassKey.<init>(ObjectStreamClass.java:2306) [na:1.8.0_11]
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:322) ~[na:1.8.0_11]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) ~[na:1.8.0_11]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_11]
    ehcache_auto_created2810726831714447871diskstore  
    ehcache_auto_created4727787669919058795diskstore
    ehcache_auto_created3883187579728988119diskstore  
    ehcache_auto_created9101229611634051478diskstore
    Word2Vec vec = new Word2Vec.Builder().layerSize(300).windowSize(5)
            .layerSize(300).iterate(iter).tokenizerFactory(t).build();
    WordVectors wordVectors = WordVectorSerializer.loadTxtVectors(new File("glove.6B.50d.txt"));
    It's like numbers are language, like all the letters in the language are turned into numbers, and so it's something that everyone understands the same way. You lose the sounds of the letters and whether they click or pop or touch the palate, or go ooh or aah, and anything that can be misread or con you with its music or the pictures it puts in your mind, all of that is gone, along with the accent, and you have a new understanding entirely, a language of numbers, and everything becomes as clear to everyone as the writing on the wall. So as I say there comes a certain time for the reading of the numbers.
        -- E.L. Doctorow, Billy Bathgate
    Thought Vectors, Natural Language Processing & the Future of AIarrow-up-right
    Quora: How Does Word2vec Work?arrow-up-right
    Quora: What Are Some Interesting Word2Vec Results?arrow-up-right
    Mikolov's Original Word2vec Code @Googlearrow-up-right
    word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Methodarrow-up-right
    Advances in Pre-Training Distributed Word Representations - by Mikolov et alarrow-up-right

    DataSet Iterators

    Data iteration tools for loading into neural networks.

    hashtag
    What is an iterator?

    A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

    hashtag
    Usage

    For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

    Many other methods also accept iterators for tasks such as evaluation:

    hashtag
    Available iterators

    hashtag
    MnistDataSetIterator

    MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see

    hashtag
    UciSequenceDataSetIterator

    UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

    Details: Data: Image:

    UciSequenceDataSetIterator

    Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

    • param batchSize Minibatch size

    hashtag
    Cifar10DataSetIterator

    CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

    This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: .

    Cifar10DataSetIterator

    Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

    • param batchSize Minibatch size for the iterator

    hashtag
    IrisDataSetIterator

    IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes

    IrisDataSetIterator

    next

    IrisDataSetIterator handles traversing through the Iris Data Set.

    • see

    • param batch Batch size

    • param numExamples Total number of examples

    hashtag
    LFWDataSetIterator

    LFW iterator - Labeled Faces from the Wild dataset See 13233 images total, with 5749 classes.

    LFWDataSetIterator

    Create LFW data specific iterator

    • param batchSize the batch size of the examples

    • param numExamples the overall number of examples

    • param imgDim an array of height, width and channels

    hashtag
    TinyImageNetDataSetIterator

    Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

    Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

    See: and

    TinyImageNetDataSetIterator

    Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

    • param batchSize Minibatch size for the iterator

    hashtag
    EmnistDataSetIterator

    EMNIST DataSetIterator

    • COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes

    • MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z

    • BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)

    See: and

    EmnistDataSetIterator

    EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

    numExamplesTrain

    Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

    • param dataSet Dataset (subset) to return

    • param batchSize Batch size

    • param train If true: use training set. If false: use test set

    numExamplesTest

    Get the number of test examples for the specified subset

    • param dataSet Subset to get

    • return Number of examples for the specified subset

    numLabels

    Get the number of labels for the specified subset

    • param dataSet Subset to get

    • return Number of labels for the specified subset

    isBalanced

    Get the labels as a character array

    • return Labels

    hashtag
    RecordReaderDataSetIterator

    DataSet objects as well as producing minibatches from individual records.

    hashtag
    Example 1: Image classification, batch size 32, 10 classes

    hashtag
    Example 2: Multi-output regression from CSV, batch size 128

    RecordReaderDataSetIterator

    Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

    • param recordReader Record reader to use as the source of data

    • param batchSize Minibatch size, for each call of .next()

    setCollectMetaData

    Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

    • param recordReader RecordReader: provides the source of the data

    • param batchSize Batch size (number of examples) for the output DataSet objects

    • param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()

    loadFromMetaData

    Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    writableConverter

    Builder class for RecordReaderDataSetIterator

    maxNumBatches

    Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

    • param maxNumBatches Maximum number of minibatches per epoch / reset

    regression

    Use this for single output regression (i.e., 1 output/regression target)

    • param labelIndex Column index that contains the regression target (indexes start at 0)

    regression

    Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

    • param labelIndexFrom Column index of the first regression target (indexes start at 0)

    • param labelIndexTo Column index of the last regression target (inclusive)

    classification

    Use this for classification

    • param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1

    • param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

    preProcessor

    Optional arg. Allows the preprocessor to be set

    • param preProcessor Preprocessor to use

    collectMetaData

    When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

    • param collectMetaData Whether metadata should be collected or not

    hashtag
    RecordReaderMultiDataSetIterator

    The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

    RecordReaderMultiDataSetIterator

    When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

    loadFromMetaData

    Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    hashtag
    SequenceRecordReaderDataSetIterator

    Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

    SequenceRecordReaderDataSetIterator

    Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

    • param featuresReader SequenceRecordReader for the features

    • param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1

    • param miniBatchSize Minibatch size for each call of next()

    hasNext

    Constructor where features and labels come from different RecordReaders (for example, different files)

    loadFromMetaData

    Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    hashtag
    AsyncMultiDataSetIterator

    Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

    Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

    next

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    shutdown

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    IteratorDataSetIterator

    required to get the specified batch size.

    Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

    hashtag
    AsyncDataSetIterator

    Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

    Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

    AsyncDataSetIterator

    Create an Async iterator with the default queue size of 8

    • param baseIterator Underlying iterator to wrap and fetch asynchronously from

    next

    Create an Async iterator with the default queue size of 8

    • param iterator Underlying iterator to wrap and fetch asynchronously from

    • param queue Queue size - number of iterators to

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    shutdown

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    batch

    Batch size

    • return

    setPreProcessor

    Set a pre processor

    • param preProcessor a pre processor to set

    getPreProcessor

    Returns preprocessors, if defined

    • return

    hasNext

    Get dataset iterator record reader labels

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    DoublesDataSetIterator

    First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

    DoublesDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    IteratorMultiDataSetIterator

    required to get a specified batch size.

    Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

    hashtag
    SamplingDataSetIterator

    A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

    SamplingDataSetIterator

    hashtag
    INDArrayDataSetIterator

    First value in pair is the features vector, second value in pair is the labels.

    INDArrayDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    WorkspacesShieldDataSetIterator

    This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

    WorkspacesShieldDataSetIterator

    • param iterator The underlying iterator to detach values from

    hashtag
    MultiDataSetIteratorSplitter

    This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

    PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

    hashtag
    MultiDataSetIteratorSplitter

    • param baseIterator

    • param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches

    • param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

    getTrainIterator

    This method returns train iterator instance

    • return

    next

    This method returns test iterator instance

    • return

    hashtag
    AsyncShieldDataSetIterator

    This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

    AsyncShieldDataSetIterator

    • param iterator Iterator to wrop, to disable asynchronous prefetching for

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

    PLEASE NOTE: This iterator ALWAYS returns FALSE

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    setPreProcessor

    Set a pre processor

    • param preProcessor a pre processor to set

    getPreProcessor

    Returns preprocessors, if defined

    • return

    hasNext

    Get dataset iterator record reader labels

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    DummyBlockDataSetIterator

    This class provides baseline implementation of BlockDataSetIterator interface

    hashtag
    BaseDatasetIterator

    Baseline implementation includes control over the data fetcher and some basic getters for metadata

    hashtag
    AsyncShieldMultiDataSetIterator

    This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

    next

    Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

    • param num Number of examples to fetch

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    / Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

    PLEASE NOTE: This iterator ALWAYS returns FALSE

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    RandomMultiDataSetIterator

    RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

    RandomMultiDataSetIterator

    • param numMiniBatches Number of minibatches per epoch

    • param features Each triple in the list specifies the shape, array order and type of values for the features arrays

    • param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

    addFeatures

    • param numMiniBatches Number of minibatches per epoch

    addFeatures

    Add a new features array to the iterator

    • param shape Shape of the features

    • param order Order (‘c’ or ‘f’) for the array

    • param values Values to fill the array with

    addLabels

    Add a new labels array to the iterator

    • param shape Shape of the features

    • param values Values to fill the array with

    addLabels

    Add a new labels array to the iterator

    • param shape Shape of the features

    • param order Order (‘c’ or ‘f’) for the array

    • param values Values to fill the array with

    generate

    Generate a random array with the specified shape

    • param shape Shape of the array

    • param values Values to fill the array with

    • return Random array of specified shape + contents

    generate

    Generate a random array with the specified shape and order

    • param shape Shape of the array

    • param order Order of array (‘c’ or ‘f’)

    • param values Values to fill the array with

    hashtag
    EarlyTerminationMultiDataSetIterator

    Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

    EarlyTerminationMultiDataSetIterator

    Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

    • param underlyingIterator, iterator to wrap

    • param terminationPoint, minibatches after which hasNext() will return false

    hashtag
    ExistingDataSetIterator

    ExistingDataSetIterator

    Note that when using this constructor, resetting is not supported

    • param iterator Iterator to wrap

    next

    Note that when using this constructor, resetting is not supported

    • param iterator Iterator to wrap

    • param labels String labels. May be null.

    hashtag
    DummyBlockMultiDataSetIterator

    This class provides baseline implementation of BlockMultiDataSetIterator interface

    hashtag
    EarlyTerminationDataSetIterator

    Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

    EarlyTerminationDataSetIterator

    Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

    • param underlyingIterator, iterator to wrap

    • param terminationPoint, minibatches after which hasNext() will return false

    hashtag
    ReconstructionDataSetIterator

    Wraps a data set iterator setting the first (feature matrix) as the labels.

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    hashtag
    DataSetIteratorSplitter

    This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

    PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

    DataSetIteratorSplitter

    The only constructor

    • param baseIterator - iterator to be wrapped and split

    • param totalBatches - total batches in baseIterator

    • param ratio - train/test split ratio

    getTrainIterator

    This method returns train iterator instance

    • return

    next

    This method returns test iterator instance

    • return

    hashtag
    JointMultiDataSetIterator

    This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

    JointMultiDataSetIterator

    • param iterators Underlying iterators to wrap

    next

    • param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet

    • param iterators Underlying iterators to wrap

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    getPreProcessor

    Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

    • return Preprocessor

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    PLEASE NOTE: This method is NOT implemented

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    FloatsDataSetIterator

    First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

    FloatsDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    FileSplitDataSetIterator

    Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

    FileSplitDataSetIterator

    • param files List of files to iterate over

    • param callback Callback for loading the files

    hashtag
    MultipleEpochsIterator

    A dataset iterator for doing multiple passes over a dataset

    Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    hashtag
    MultiDataSetWrapperIterator

    This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

    PLEASE NOTE: This only works if number of features/labels/masks is 1

    MultiDataSetWrapperIterator

    • param iterator Undelying iterator to wrap

    hashtag
    RandomDataSetIterator

    RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

    RandomDataSetIterator

    • param numMiniBatches Number of minibatches per epoch

    • param featuresShape Features shape

    • param labelsShape Labels shape

    hashtag
    MultiDataSetIteratorAdapter

    Iterator that adapts a DataSetIterator to a MultiDataSetIterator

    param numLabels the overall number of examples
  • param useSubset use a subset of the LFWDataSet

  • param labelGenerator path label generator to use

  • param train true if use train value

  • param splitTrainTest the percentage to split data for train and remainder goes to test

  • param imageTransform how to transform the image

  • param rng random number to lock in batch shuffling

  • LETTERS: 145,600 examples total. 26 balanced classes

  • DIGITS: 280,000 examples total. 10 balanced classes

  • param seed Random number generator seed

    param numPossibleLabels Number of classes (possible labels) for classification

    param numPossibleLabels Number of classes for the labels

    return Random array of specified shape + contents
    param featureValues Type of values for the features
  • param labelValues Type of values for the labels

  • [source]arrow-up-right
    http://yann.lecun.com/exdb/mnist/arrow-up-right
    [source]arrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Seriesarrow-up-right
    https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.dataarrow-up-right
    https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpegarrow-up-right
    [source]arrow-up-right
    https://pjreddie.com/projects/cifar-10-dataset-mirror/arrow-up-right
    [source]arrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Irisarrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Irisarrow-up-right
    [source]arrow-up-right
    http://vis-www.cs.umass.edu/lfw/arrow-up-right
    [source]arrow-up-right
    http://cs231n.stanford.edu/arrow-up-right
    https://tiny-imagenet.herokuapp.com/arrow-up-right
    [source]arrow-up-right
    https://www.nist.gov/itl/iad/image-group/emnist-datasetarrow-up-right
    https://arxiv.org/abs/1702.05373arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    model.init();
    
    // pass an MNIST data iterator that automatically fetches data
    DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
    net.fit(mnistTrain);
    // passing directly to the neural network
    DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
    net.eval(mnistTest);
    
    // using an evaluation class
    Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
    while(mnistTest.hasNext()){
        DataSet next = mnistTest.next();
        INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
        eval.eval(next.getLabels(), output); //check the prediction against the true class
    }
    public UciSequenceDataSetIterator(int batchSize)
    public Cifar10DataSetIterator(int batchSize)
    public IrisDataSetIterator()
    public DataSet next()
    public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                        PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                        ImageTransform imageTransform, Random rng)
    public TinyImageNetDataSetIterator(int batchSize)
    public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException
    public static int numExamplesTrain(Set dataSet)
    public static int numExamplesTest(Set dataSet)
    public static int numLabels(Set dataSet)
    public static boolean isBalanced(Set dataSet)
    rr.initialize(new FileSplit(new File("/path/to/directory")));
    
    DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
    //Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
    //  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
    // at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
    .classification(1, nClasses)
    .preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
    .build()
    }
    rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));
    
    DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
    //Specify the columns that the regression labels/targets appear in. Note that all other columns will be
    // treated as features. Columns indexes start at 0
    .regression(labelColFrom, labelColTo)
    .build()
    }
    public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)
    public void setCollectMetaData(boolean collectMetaData)
    public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException
    public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException
    public Builder writableConverter(WritableConverter converter)
    public Builder maxNumBatches(int maxNumBatches)
    public Builder regression(int labelIndex)
    public Builder regression(int labelIndexFrom, int labelIndexTo)
    public Builder classification(int labelIndex, int numClasses)
    public Builder preProcessor(DataSetPreProcessor preProcessor)
    public Builder collectMetaData(boolean collectMetaData)
    public RecordReaderMultiDataSetIterator build()
    public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException
    public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException
    public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                        int miniBatchSize, int numPossibleLabels)
    public boolean hasNext()
    public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException
    public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException
    public MultiDataSet next(int num)
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor)
    public boolean resetSupported()
    public boolean asyncSupported()
    public void reset()
    public void shutdown()
    public boolean hasNext()
    public MultiDataSet next()
    public void remove()
    public AsyncDataSetIterator(DataSetIterator baseIterator)
    public DataSet next(int num)
    public int inputColumns()
    public int totalOutcomes()
    public boolean resetSupported()
    public boolean asyncSupported()
    public void reset()
    public void shutdown()
    public int batch()
    public void setPreProcessor(DataSetPreProcessor preProcessor)
    public DataSetPreProcessor getPreProcessor()
    public boolean hasNext()
    public DataSet next()
    public void remove()
    public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)
    public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)
    public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)
    public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)
    public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)
    public MultiDataSetIterator getTrainIterator()
    public MultiDataSet next(int num)
    public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)
    public DataSet next(int num)
    public int inputColumns()
    public int totalOutcomes()
    public boolean resetSupported()
    public boolean asyncSupported()
    public void reset()
    public int batch()
    public void setPreProcessor(DataSetPreProcessor preProcessor)
    public DataSetPreProcessor getPreProcessor()
    public boolean hasNext()
    public DataSet next()
    public void remove()
    public MultiDataSet next(int num)
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor)
    public boolean resetSupported()
    public boolean asyncSupported()
    public void reset()
    public boolean hasNext()
    public MultiDataSet next()
    public void remove()
    public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)
    public Builder addFeatures(long[] shape, Values values)
    public Builder addFeatures(long[] shape, char order, Values values)
    public Builder addLabels(long[] shape, Values values)
    public Builder addLabels(long[] shape, char order, Values values)
    public static INDArray generate(long[] shape, Values values)
    public static INDArray generate(long[] shape, char order, Values values)
    public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)
    public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)
    public DataSet next(int num)
    public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)
    public DataSet next(int num)
    public int inputColumns()
    public int totalOutcomes()
    public void reset()
    public int batch()
    public boolean hasNext()
    public DataSet next()
    public void remove()
    public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)
    public DataSetIterator getTrainIterator()
    public DataSet next(int i)
    public JointMultiDataSetIterator(DataSetIterator... iterators)
    public MultiDataSet next(int num)
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor)
    public MultiDataSetPreProcessor getPreProcessor()
    public boolean resetSupported()
    public boolean asyncSupported()
    public void reset()
    public boolean hasNext()
    public MultiDataSet next()
    public void remove()
    public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)
    public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)
    public DataSet next(int num)
    public int inputColumns()
    public int totalOutcomes()
    public void reset()
    public int batch()
    public boolean hasNext()
    public void remove()
    public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)
    public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

    Updaters/Optimizers

    Special algorithms for gradient descent.

    hashtag
    What are updaters?

    The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

    hashtag
    Usage

    To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

    hashtag
    Available updaters

    hashtag
    NadamUpdater

    The Nadam updater.

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    NesterovsUpdater

    Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

    applyUpdater

    Get the nesterov update

    • param gradient the gradient to get the update for

    • param iteration

    • return

    hashtag
    RmsPropUpdater

    RMS Prop updates:

    hashtag
    AdaGradUpdater

    Vectorized Learning Rate used per Connection Weight

    Adapted from: See also

    applyUpdater

    Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

    • param gradient the gradient to get learning rates for

    • param iteration

    hashtag
    AdaMaxUpdater

    The AdaMax updater, a variant of Adam.

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    NoOpUpdater

    NoOp updater: gradient updater that makes no changes to the gradient

    hashtag
    AdamUpdater

    The Adam updater.

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    AdaDeltaUpdater

    Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

    applyUpdater

    Get the updated gradient for the given gradient and also update the state of ada delta.

    • param gradient the gradient to get the updated gradient for

    • param iteration

    • return the update gradient

    hashtag
    SgdUpdater

    SGD updater applies a learning rate only

    hashtag
    GradientUpdater

    Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

    hashtag
    AMSGradUpdater

    The AMSGrad updater Reference: On the Convergence of Adam and Beyond -

    [source]arrow-up-right
    https://arxiv.org/pdf/1609.04747.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdfarrow-up-right
    http://cs231n.github.io/neural-networks-3/#adaarrow-up-right
    [source]arrow-up-right
    http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descentarrow-up-right
    http://cs231n.github.io/neural-networks-3/#adaarrow-up-right
    [source]arrow-up-right
    http://arxiv.org/abs/1412.6980arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    http://arxiv.org/abs/1412.6980arrow-up-right
    [source]arrow-up-right
    http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdfarrow-up-right
    https://arxiv.org/pdf/1212.5701v1.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    https://openreview.net/forum?id=ryQu7f-RZarrow-up-right
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Adam(0.01))
        // add your layers and hyperparameters below
        .build();
    public void applyUpdater(INDArray gradient, int iteration, int epoch)
    public void applyUpdater(INDArray gradient, int iteration, int epoch)
    public void applyUpdater(INDArray gradient, int iteration, int epoch)
    public void applyUpdater(INDArray gradient, int iteration, int epoch)
    public void applyUpdater(INDArray gradient, int iteration, int epoch)
    public void applyUpdater(INDArray gradient, int iteration, int epoch)