Snippets and links for common functionality in Eclipse Deeplearning4j.
Quick reference
Deeplearning4j (and related projects) have a lot of functionality. The goal of this page is to summarize this functionality so users know what exists, and where to find more information.
Contents
DenseLayer - () - A simple/standard fully-connected layer
EmbeddingLayer - () - Takes positive integer indexes as input, outputs vectors. Only usable as first layer in a model. Mathematically equivalent (when bias is enabled) to DenseLayer with one-hot input, but more efficient. See also: EmbeddingSequenceLayer.
Output layers: usable only as the last layer in a network. Loss functions are set here.
OutputLayer - () - Output layer for standard classification/regression in MLPs/CNNs. Has a fully connected DenseLayer built in. 2d input/output (i.e., row vector per example).
LossLayer - () - Output layer without parameters - only loss function and activation function. 2d input/output (i.e., row vector per example). Unlike Outputlayer, restricted to nIn = nOut.
RnnOutputLayer - () - Output layer for recurrent neural networks. 3d (time series) input and output. Has time distributed fully connected layer built in.
ConvolutionLayer / Convolution2D - () - Standard 2d convolutional neural network layer. Inputs and outputs have 4 dimensions with shape [minibatch,depthIn,heightIn,widthIn] and [minibatch,depthOut,heightOut,widthOut] respectively.
Convolution1DLayer / Convolution1D - () - Standard 1d convolution layer
Convolution3DLayer / Convolution3D - () - Standard 3D convolution layer. Supports both NDHWC ("channels last") and NCDHW ("channels first") activations format.
LSTM - () - LSTM RNN without peephole connections. Supports CuDNN.
GravesLSTM - () - LSTM RNN with peephole connections. Does not support CuDNN (thus for GPUs, LSTM should be used in preference).
GravesBidirectionalLSTM - () - A bidirectional LSTM implementation with peephole connections. Equivalent to Bidirectional(ADD, GravesLSTM). Due to addition of Bidirecitonal wrapper (below), has been deprecated on master.
VariationalAutoencoder - () - A variational autoencoder implementation with MLP/dense layers for the encoder and decoder. Supports multiple different types of
AutoEncoder - () - Standard denoising autoencoder layer
GlobalPoolingLayer - () - Implements both pooling over time (for RNNs/time series - input size [minibatch, size, timeSeriesLength], out [minibatch, size]) and global spatial pooling (for CNNs - input size [minibatch, depth, h, w], out [minibatch, depth]). Available pooling modes: sum, average, max and p-norm.
ActivationLayer - () - Applies an activation function (only) to the input activations. Note that most DL4J layers have activation functions built in as a config option.
DropoutLayer - () - Implements dropout as a separate/single layer. Note that most DL4J layers have a "built-in" dropout configuration option.
Graph vertex: use with ComputationGraph. Similar to layers, vertices usually don't have any parameters, and may support multiple inputs.
ElementWiseVertex - () - Performs an element-wise operation on the inputs - add, subtract, product, average, max
L2NormalizeVertex - () - normalizes the input activations by dividing by the L2 norm for each example. i.e., out <- out / l2Norm(out)
L2Vertex - () - calculates the L2 distance between the two input arrays, for each example separately. Output is a single value, for each input value.
An InputPreProcessor is a simple class/interface that operates on the input to a layer. That is, a preprocessor is attached to a layer, and performs some operation on the input, before passing the layer to the output. Preprocessors also handle backpropagation - i.e., the preprocessing operations are generally differentiable.
Note that in many cases (such as the XtoYPreProcessor classes), users won't need to (and shouldn't) add these manually, and can instead just use .setInputType(InputType.feedForward(10)) or similar, which whill infer and add the preprocessors as required.
CnnToFeedForwardPreProcessor - () - handles the activation reshaping necessary to transition from a CNN layer (ConvolutionLayer, SubsamplingLayer, etc) to DenseLayer/OutputLayer etc.
CnnToRnnPreProcessor - () - handles reshaping necessary to transition from a (effectively, time distributed) CNN layer to a RNN layer.
ComposableInputPreProcessor - () - simple class that allows multiple preprocessors to be chained + used on a single layer
IterationListener: can be attached to a model, and are called during training, once after every iteration (i.e., after each parameter update). TrainingListener: extends IterationListener. Has a number of additional methods are called at different stages of training - i.e., after forward pass, after gradient calculation, at the start/end of each epoch, etc.
Neither type (iteration/training) are called outside of training (i.e., during output or feed-forward methods)
ScoreIterationListener - (, Javadoc) - Logs the loss function score every N training iterations
PerformanceListener - (, Javadoc) - Logs performance (examples per sec, minibatches per sec, ETL time), and optionally score, every N training iterations.
EvaluativeListener - (, Javadoc) - Evaluates network performance on a test set every N iterations or epochs. Also has a system for callbacks, to (for example) save the evaluation results.
Link:
ND4J has a number of classes for evaluating the performance of a network, against a test set. Deeplearning4j (and SameDiff) use these ND4J evaluation classes. Different evaluation classes are suitable for different types of networks. Note: in 1.0.0-beta3 (November 2018), all evaluation classes were moved from DL4J to ND4J; previously they were in DL4J.
Evaluation - () - Used for the evaluation of multi-class classifiers (assumes standard one-hot labels, and softmax probability distribution over N classes for predictions). Calculates a number of metrics - accuracy, precision, recall, F1, F-beta, Matthews correlation coefficient, confusion matrix. Optionally calculates top N accuracy, custom binary decision thresholds, and cost arrays (for non-binary case). Typically used for softmax + mcxent/negative-log-likelihood networks.
EvaluationBinary - () - A multi-label binary version of the Evaluation class. Each network output is assumed to be a separate/independent binary class, with probability 0 to 1 independent of all other outputs. Typically used for sigmoid + binary cross entropy networks.
MultiLayerNetwork.save(File) and MultiLayerNetwork.load(File) methods can be used to save and load models. These use ModelSerializer internally. Similar save/load methods are also available for ComputationGraph.
MultiLayerNetwork and ComputationGraph can be saved using the class - and specifically the writeModel, restoreMultiLayerNetwork and restoreComputationGraph methods.
Examples:
Networks can be trained further after saving and loading: however, be sure to load the 'updater' (i.e., the historical state for updaters like momentum, ). If no futher training is required, the updater state can be ommitted to save disk space and memory.
Most Normalizers (implementing the ND4J Normalizer interface) can also be added to a model using the addNormalizerToModel method.
Note that the format used for models in DL4J is .zip: it's possible to open/extract these files using programs supporting the zip format.
This section lists the various configuration options that Deeplearning4j supports.
Activation functions can be defined in one of two ways: (a) By passing an enumeration value to the configuration - for example, .activation(Activation.TANH) (b) By passing an instance - for example, .activation(new ActivationSigmoid())
Note that Deeplearning4j supports custom activation functions, which can be defined by extending
List of supported activation functions:
ELU - () - Exponential linear unit ()
HARDSIGMOID - () - a piecewise linear version of the standard sigmoid activation function. f(x) = min(1, max(0, 0.2*x + 0.5))
Weight initialization refers to the method by which the initial parameters for a new network should be set.
Weight initialization are usually defined using the enumeration.
Custom weight initializations can be specified using .weightInit(WeightInit.DISTRIBUTION).dist(new NormalDistribution(0, 1)) for example. As for master (but not 0.9.1 release) .weightInit(new NormalDistribution(0, 1)) is also possible, which is equivalent to the previous approach.
Available weight initializations. Not again that not all are available in the 0.9.1 release:
DISTRIBUTION: Sample weights from a provided distribution (specified via dist configuration method
ZERO: Generate weights as zeros
ONES: All weights are set to 1
An 'updater' in DL4J is a class that takes raw gradients and modifies them to become updates. These updates will then be applied to the network parameters. The have a good explanation of some of these updaters.
Supported updaters in Deeplearning4j:
AdaMax - () - A variant of the Adam updater -
All updaters that support a learning rate also support learning rate schedules (the Nesterov momentum updater also supports a momentum schedule). Learning rate schedules can be specified either based on the number of iterations, or the number of epochs that have elapsed. Dropout (see below) can also make use of the schedules listed here.
Configure using, for example: .updater(new Adam(new ExponentialSchedule(ScheduleType.ITERATION, 0.1, 0.99 ))) You can plot/inspect the learning rate that will be used at any point by calling ISchedule.valueAt(int iteration, int epoch) on the schedule object you have created.
Available schedules:
ExponentialSchedule - () - Implements value(i) = initialValue * gamma^i
InverseSchedule - () - Implements value(i) = initialValue * (1 + gamma * i)^(-power)
MapSchedule - () - Learning rate schedule based on a user-provided map. Note that the provided map must have a value for iteration/epoch 0. Has a builder class to conveniently define a schedule.
Note that custom schedules can be created by implementing the ISchedule interface.
L1 and L2 regularization can easily be added to a network via the configuration: .l1(0.1).l2(0.2). Note that .regularization(true) must be enabled on 0.9.1 also (this option has been removed after 0.9.1 was released).
L1 and L2 regularization is applied by default on the weight parameters only. That is, .l1 and .l2 will not impact bias parameters - these can be regularized using .l1Bias(0.1).l2Bias(0.2).
All dropout types are applied at training time only. They are not applied at test time.
Dropout - () - Each input activation x is independently set to (0, with probability 1-p) or (x/p with probability p)
GaussianDropout - () - This is a multiplicative Gaussian noise (mean 1) on the input activations. Each input activation x is independently set to: x * y, where y ~ N(1, stdev = sqrt((1-rate)/rate))
Note that (as of current master - but not 0.9.1) the dropout parameters can also be specified according to any of the schedule classes mentioned in the Learning Rate Schedules section.
As per dropout, dropconnect / weight noise is applied only at training time
DropConnect - () - DropConnect is similar to dropout, but applied to the parameters of a network (instead of the input activations).
WeightNoise - () - Apply noise of the specified distribution to the weights at training time. Both additive and multiplicative modes are supported - when additive, noise should be mean 0, when multiplicative, noise should be mean 1
Constraints are deterministic limitations that are placed on a model's parameters at the end of each iteration (after the parameter update has occurred). They can be thought of as a type of regularization.
MaxNormConstraint - () - Constrain the maximum L2 norm of the incoming weights for each unit to be less than or equal to the specified value. If the L2 norm exceeds the specified value, the weights will be scaled down to satisfy the constraint.
MinMaxNormConstraint - () - Constrain the minimum AND maximum L2 norm of the incoming weights for each unit to be between the specified values. Weights will be scaled up/down if required.
NonNegativeConstraint - () - Constrain all parameters to be non-negative. Negative parameters will be replaced with 0.
DataSetIterator is an abstraction that DL4J uses to iterate over minibatches of data, used for training. DataSetIterator returns DataSet objects, which are minibatches, and support a maximum of 1 input and 1 output array (INDArray).
MultiDataSetIterator is similar to DataSetIterator, but returns MultiDataSet objects, which can have as many input and output arrays as required for the network.
These iterators download their data as required. The actual datasets they return are not customizable.
MnistDataSetIterator - () - DataSetIterator for the well-known MNIST digits dataset. By default, returns a row vector (1x784), with values normalized to 0 to 1 range. Use .setInputType(InputType.convolutionalFlat()) to use with CNNs.
EmnistDataSetIterator - () - Similar to the MNIST digits dataset, but with more examples, and also letters. Includes multiple different splits (letters only, digits only, letters + digits, etc). Same 1x784 format as MNIST, hence (other than different number of labels for some splits) can be used as a drop-in replacement for MnistDataSetIterator. ,
The iterators in this subsection are used with user-provided data.
RecordReaderDataSetIterator - () - an iterator that takes a DataVec record reader (such as CsvRecordReader or ImageRecordReader) and handles conversion to DataSets, batching, masking, etc. One of the most commonly used iterators in DL4J. Handles non-sequence data only, as input (i.e., RecordReader, no SequenceeRecordReader).
RecordReaderMultiDataSetIterator - () - the MultiDataSet version of RecordReaderDataSetIterator, that supports multiple readers. Has a builder pattern for creating more complex data pipelines (such as different subsets of a reader's output to different input/output arrays, conversion to one-hot, etc). Handles both sequence and non-sequence data as input.
SequenceRecordReaderDataSetIterator
MultiDataSetIteratorAdapter - () - Wrap a DataSetIterator to convert it to a MultiDataSetIterator
SingletonMultiDataSetIterator - () - Wrap a MultiDataSet into a MultiDataSetIterator that returns one MultiDataSet (i.e., the wrapped MultiDataSet is not split up)
AsyncDataSetIterator - () - Used automatically by MultiLayerNetwork and ComputationGraph where appropriate. Implements asynchronous prefetching of datasets to improve performance.
ND4J provides a number of classes for performing data normalization. These are implemented as DataSetPreProcessors. The basic pattern for normalization:
Create your (unnormalized) DataSetIterator or MultiDataSetIterator: DataSetIterator myTrainData = ...
Create the normalizer you want to use: NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler();
Fit the normalizer: normalizer.fit(myTrainData)
In general, you should fit only on the training data, and do trainData.setPreProcessor(normalizer) and testData.setPreProcessor(normalizer) with the same/single normalizer that has been fit on the training data only.
Note that where appropriate (NormalizerStandardize, NormalizerMinMaxScaler) statistics such as mean/standard-deviation/min/max are shared across time (for time series) and across image x/y locations (but not depth/channels - for image data).
Data normalization example:
Available normalizers: DataSet / DataSetIterator
ImagePreProcessingScaler - () - Applies min-max scaling to image activations. Default settings do 0-255 input to 0-1 output (but is configurable). Note that unlike the other normalizers here, this one does not rely on statistics (mean/min/max etc) collected from the data, hence the normalizer.fit(trainData) step is unnecessary (is a no-op).
NormalizerStandardize - () - normalizes each feature value independently (and optionally label values) to have 0 mean and a standard deviation of 1
Available normalizers: MultiDataSet / MultiDataSetIterator
ImageMultiPreProcessingScaler - () - A MultiDataSet/MultiDataSetIterator version of ImagePreProcessingScaler
MultiNormalizerStandardize - () - MultiDataSet/MultiDataSetIterator version of NormalizerStandardize
MultiNormalizerMinMaxScaler - () - MultiDataSet/MultiDataSetIterator version of NormalizerMinMaxScaler
Deeplearning4j has classes/utilities for performing transfer learning - i.e., taking an existing network, and modifying some of the layers (optionally freezing others so their parameters don't change). For example, an image classifier could be trained on ImageNet, then applied to a new/different dataset. Both MultiLayerNetwork and ComputationGraph can be used with transfer learning - frequently starting from a pre-trained model from the model zoo (see next section), though any MultiLayerNetwork/ComputationGraph can be used.
Link:
The main class for transfer learning is . This class has a builder pattern that can be used to add/remove layers, freeze layers, etc. can be used here to specify the learning rate and other settings for the non-frozen layers.
Deeplearning4j provides a 'model zoo' - a set of pretrained models that can be downloaded and used either as-is (for image classification, for example) or often for transfer learning.
Link:
Models available in DL4J's model zoo:
*Note: Trained Keras models (not provided by DL4J) may also be imported, using Deeplearning4j's Keras model import functionality.
Cheat sheet code snippets
The Eclipse Deeplearning4j libraries come with a lot of functionality, and we've put together this cheat sheet to help users assemble neural networks and use tensors faster.
Neural networks
Code for configuring common parameters and layers for both MultiLayerNetwork and ComputationGraph. See and for full API.
Sequential networks
Most network configurations can use MultiLayerNetwork class if they are sequential and simple.
Complex networks
Networks that have complex graphs and "branching" such as Inception need to use ComputationGraph.
The code snippet below creates a basic pipeline that loads images from disk, applies random transformations, and fits them to a neural network. It also sets up a UI instance so you can visualize progress, and uses early stopping to terminate training early. You can adapt this pipeline for many different use cases.
Complex Transformation
DataVec comes with a portable TransformProcess class that allows for more complex data wrangling and data conversion. It works well with both 2D and sequence datasets.
We recommend having a look at the before creating more complex transformations.
Both MultiLayerNetwork and ComputationGraph come with built-in .eval() methods that allow you to pass a dataset iterator and return evaluation results.
For advanced evaluation the code snippet below can be adapted into training pipelines. This is when the built-in neuralNetwork.eval() method outputs confusing results or if you need to examine raw data.