Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
AlexNet
Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt
Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.
Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code
Darknet19 Reference: https://arxiv.org/pdf/1612.08242.pdf ImageNet weights for this model are available and have been converted from https://pjreddie.com/darknet/imagenet/ using https://github.com/allanzelener/YAD2K .
There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.list .
A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: https://arxiv.org/abs/1503.03832 Also based on the OpenFace implementation: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: https://arxiv.org/abs/1503.03832 Also based on the OpenFace implementation: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
LeNet was an early promising achiever on the ImageNet dataset. References:
MNIST weights for this model are available and have been converted from https://github.com/f00-/mnist-lenet-keras.
Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.
This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.
Paper: https://arxiv.org/abs/1707.07012 ImageNet weights for this model are available and have been converted from https://keras.io/applications/.
Residual networks for deep learning.
Paper: https://arxiv.org/abs/1512.03385 ImageNet weights for this model are available and have been converted from https://keras.io/applications/</a>.
A simple convolutional network for generic image classification. Reference: https://github.com/oarriaga/face_classification/
U-Net
An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.
Paper: https://arxiv.org/abs/1602.07360 ImageNet weights for this model are available and have been converted from https://github.com/rcmalli/keras-squeezenet/.
LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is
Architecture follows this implementation: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py
Walt Whitman weights are available for generating text from his works, adapted from https://github.com/craigomac/InfiniteMonkeys.
Tiny YOLO Reference: https://arxiv.org/pdf/1612.08242.pdf
ImageNet+VOC weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2K and the following code.
String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();
ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();
System.out.println(model.summary(InputType.convolutional(416, 416, 3)));
ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>
The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].
U-Net
An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Paper: https://arxiv.org/abs/1505.04597 Weights are available for image segmentation trained on a synthetic dataset
VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556
Deep Face Recognition http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf
ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applications. CIFAR-10 weights for this model are available and have been converted using “approach 2” from https://github.com/rajatvikramsingh/cifar10-vgg16. VGGFace weights for this model are available and have been converted from https://github.com/rcmalli/keras-vggface.
VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556 ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applications.
U-Net
An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.
Paper: https://arxiv.org/abs/1610.02357 ImageNet weights for this model are available and have been converted from https://keras.io/applications/.
YOLOv2 Reference: https://arxiv.org/pdf/1612.08242.pdf
ImageNet+COCO weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2K and the following code.
The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].
pretrainedUrl
Default prior boxes for the model
Prebuilt model architectures and weights for out-of-the-box application.
Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.
If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:
Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel
abstract class and uses the InstantiableModel
interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.
You can instantly instantiate a model from the zoo using the .init()
method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:
If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:
Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType
is an enumerator that outlines different weight types, which includes IMAGENET
, MNIST
, CIFAR10
, and VGGFACE
.
For example, you can initialize a VGG-16 model with ImageNet weights like so:
And initialize another VGG16 model with weights trained on VGGFace:
If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable()
method which returns a boolean. Simply pass a PretrainedType
enum to this method, which returns true if weights are available.
Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}
, this means the model has 3 channels and height/width of 224.
The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.
You can find a complete list of models using this deeplearning4j-zoo Github link.
This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.
The zoo comes with a couple additional features if you're looking to use the models for different use cases.
Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape()
.
NOTE: this applies to fresh configurations only, and will not affect pretrained models:
Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J here.
Initialization methods often have an additional parameter named workspaceMode
. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE
for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see this section.
Also known as CNN.
1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]
2D convolution layer
3D convolution layer configuration
hasBias
An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].
kernelSize
The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]
stride
Set stride size for 3D convolutions in (depth, height, width) order
param stride kernel size
return 3D convolution layer builder
padding
Set padding size for 3D convolutions in (depth, height, width) order
param padding kernel size
return 3D convolution layer builder
dilation
Set dilation size for 3D convolutions in (depth, height, width) order
param dilation kernel size
return 3D convolution layer builder
dataFormat
The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]
param dataFormat Data format to use for activations
setKernelSize
Set kernel size for 3D convolutions in (depth, height, width) order
param kernelSize kernel size
setStride
Set stride size for 3D convolutions in (depth, height, width) order
param stride kernel size
setPadding
Set padding size for 3D convolutions in (depth, height, width) order
param padding kernel size
setDilation
Set dilation size for 3D convolutions in (depth, height, width) order
param dilation kernel size
2D deconvolution layer configuration
Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.
See the paper by Matt Zeiler for details: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf
For an intuitive guide to convolution arithmetic and shapes, see: https://arxiv.org/abs/1603.07285v1
hasBias
Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size
convolutionMode
Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details
param convolutionMode Convolution mode for layer
kernelSize
Size of the convolution rows/columns
param kernelSize the height and width of the kernel
Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom
getOutputType
param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations
setCropping
Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.
build
param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.
Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right
getOutputType
param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations
param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations
setCropping
Cropping amount for top/bottom/left/right (in that order). A length 4 array.
build
param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.
Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.
getOutputType
param cropDepth Amount of cropping to apply to both depth boundaries of the input activations
param cropHeight Amount of cropping to apply to both height boundaries of the input activations
param cropWidth Amount of cropping to apply to both width boundaries of the input activations
setCropping
Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width
build
param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width
Saving and loading of neural networks.
MultiLayerNetwork and ComputationGraph both have save and load methods.
You can save/load a MultiLayerNetwork using:
Similarly, you can save/load a ComputationGraph using:
If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:
This will guarantee equal results between sessions/JVMs.
Utility class suited to save/restore neural net models
writeModel
Write a model to a file
param model the model to write
param file the file to write to
param saveUpdater whether to save the updater or not
throws IOException
writeModel
Write a model to a file
param model the model to write
param file the file to write to
param saveUpdater whether to save the updater or not
param dataNormalization the normalizer to save (optional)
throws IOException
writeModel
Write a model to a file path
param model the model to write
param path the path to write to
param saveUpdater whether to save the updater or not
throws IOException
writeModel
Write a model to an output stream
param model the model to save
param stream the output stream to write to
param saveUpdater whether to save the updater for the model or not
throws IOException
writeModel
Write a model to an output stream
param model the model to save
param stream the output stream to write to
param saveUpdater whether to save the updater for the model or not
param dataNormalization the normalizer ot save (may be null)
throws IOException
restoreMultiLayerNetwork
Load a multi layer network from a file
param file the file to load from
return the loaded multi layer network
throws IOException
restoreMultiLayerNetwork
Load a multi layer network from a file
param file the file to load from
return the loaded multi layer network
throws IOException
restoreMultiLayerNetwork
Load a MultiLayerNetwork from InputStream from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.
param is the inputstream to load from
return the loaded multi layer network
throws IOException
see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)
restoreMultiLayerNetwork
Restore a multi layer network from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.
param is the input stream to restore from
return the loaded multi layer network
throws IOException
see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)
restoreMultiLayerNetwork
Load a MultilayerNetwork model from a file
param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException
restoreMultiLayerNetwork
Load a MultilayerNetwork model from a file
param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException
restoreComputationGraph
Restore a MultiLayerNetwork and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.
param is Input stream to read from
param loadUpdater Whether to load the updater from the model or not
return Model and normalizer, if present
throws IOException If an error occurs when reading from the stream
restoreComputationGraph
Load a computation graph from a file
param path path to the model file, to get the computation graph from
return the loaded computation graph
throws IOException
restoreComputationGraph
Load a computation graph from a InputStream
param is the inputstream to get the computation graph from
return the loaded computation graph
throws IOException
restoreComputationGraph
Load a computation graph from a InputStream
param is the inputstream to get the computation graph from
return the loaded computation graph
throws IOException
restoreComputationGraph
Load a computation graph from a file
param file the file to get the computation graph from
return the loaded computation graph
throws IOException
restoreComputationGraph
Restore a ComputationGraph and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.
param is Input stream to read from
param loadUpdater Whether to load the updater from the model or not
return Model and normalizer, if present
throws IOException If an error occurs when reading from the stream
taskByModel
param model
return
addNormalizerToModel
This method appends normalizer to a given persisted model.
PLEASE NOTE: File should be model file saved earlier with ModelSerializer
param f
param normalizer
addObjectToFile
Add an object to the (already existing) model file using Java Object Serialization. Objects can be restored using {- link #getObjectFromFile(File, String)}
param f File to add the object to
param key Key to store the object under
param o Object to store using Java object serialization
How to build complex networks with DL4J computation graph.
This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.
DL4J has two types of networks comprised of multiple layers:
Specifically, the ComputationGraph allows for networks to be built with the following features:
Multiple network input arrays
Multiple network outputs (including mixed classification/regression architectures)
Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)
As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.
Examples of some architectures that can be built using ComputationGraph include:
Multi-task learning architectures
Recurrent neural networks with skip connections
Input Vertices
Element-wise operation vertices
Merge vertices
Subset vertices
Preprocessor vertices
These types of graph vertices are described briefly below.
InputVertex: Input vertices are specified by the addInputs(String...)
method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).
ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.
MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.
SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1")
: this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.
Suppose we wish to build the following recurrent neural network architecture:
For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:
Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.
Consider the following architecture:
Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.
To build the above network, we use the following configuration:
In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.
In this case, the network configuration is:
One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...)
method in the configuration.
The setInputType method has two effects:
It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...)
functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.
For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))
There are two types of data that can be used with the ComputationGraph.
The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).
MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.
There are currently two ways to use a MultiDataSetIterator:
The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:
Multiple DataVec RecordReaders may be used simultaneously
The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader
It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output
It is possible to convert single columns from a class index to a one-hot representation
Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:
Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:
Input file: myInput.csv, and we want to use all columns as input (without modification)
Output file: myOutput.csv.
Network output 1 - regression: columns 0 to 3
Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.
In this case, we can build our iterator as follows:
Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.
RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.
Autoencoder layer. Adds noise to input and learn a reconstruction function.
corruptionLevel
Level of corruption - 0.0 (none) to 1.0 (all values corrupted)
sparsity
Autoencoder sparity parameter
param sparsity Sparsity
Variational Autoencoder layer
This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.
A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.
encoderLayerSizes
Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.
setEncoderLayerSizes
Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.
param encoderLayerSizes Size of each encoder layer in the variational autoencoder
decoderLayerSizes
Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.
param decoderLayerSizes Size of each deccoder layer in the variational autoencoder
setDecoderLayerSizes
Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.
param decoderLayerSizes Size of each deccoder layer in the variational autoencoder
reconstructionDistribution
The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:
{- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data
{- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data
param distribution Reconstruction distribution
lossFunction
Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution
param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use
lossFunction
Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution
param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use
lossFunction
Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution
param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use
pzxActivationFn
Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).
param activationFunction Activation function for p(z| x)
pzxActivationFunction
Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).
param activation Activation function for p(z | x)
nOut
Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.
param nOut Size of P(Z | data) and output size
numSamples
Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.
This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”
param numSamples Number of samples per data point for pretraining
Special algorithms for gradient descent.
At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.
The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:
Rectified tanh
Essentially max(0, tanh(x))
Underlying implementation is in native code
f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0
alpha defaults to 1, if not specified
f(x) = max(0, x)
f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}
Underlying implementation is in native code
Thresholded RELU
f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0
f(x) = min(max(input, cutoff), 6)
f(x) = 1 / (1 + exp(-x))
GELU activation function - Gaussian Error Linear Units
/ Parametrized Rectified Linear Unit (PReLU)
f(x) = alpha x for x < 0, f(x) = x for x >= 0
alpha has the same shape as x and is a learned parameter.
f(x) = x
f(x) = min(1, max(0, 0.2x + 0.5))
f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)
f(x) = x^3
f(x) = max(0,x) + alpha min(0, x)
alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively
f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01
f(x) = x sigmoid(x)
f(x) = log(1+e^x)
Internally, these methods use the ModelSerializer
class, which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multi layer network, the second one saves a .
Here is a with code to save a computation graph using the ModelSerializer
class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.
The , which is essentially a stack of neural network layers (with a single input layer and single output layer), and
The , which allows for greater freedom in network architectures
, a complex type of convolutional netural network for image classification
The basic idea is that in the ComputationGraph, the core building block is the , instead of layers. Layers (or, more accurately the objects), are but one type of vertex in the graph. Other types of vertices include:
LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...)
method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...)
method.
PreProcessorVertex: Occasionally, you might want to the functionality of an without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.
Finally, it is also possible to define custom graph vertices by implementing both a and class for your custom GraphVertex.
It will automatically add any s as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.
A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see for more details). A DataSetIterator is essentially an iterator over DataSet objects.
By implementing the interface directly
By using the in conjuction with DataVec record readers
Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find to be useful.
See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes -
Rational tanh approximation From