1 of 100

EN 1.0.0-beta7

Eclipse DeepLearning4J

如果您希望阅读中文文档，请查看。

Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs.

Distributed

DL4J takes advantage of the latest distributed computing frameworks including Apache Spark and Hadoop to accelerate training. On multi-GPUs, it is equal to Caffe in performance.

Open Source

The libraries are completely open-source, Apache 2.0, and maintained by the developer community and Konduit team.

JVM/Python/C++

Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure or Kotlin. The underlying computations are written in C, C++ and Cuda. Keras will serve as the Python API.

What's included?

Deep neural nets are capable of record-breaking accuracy. For a quick neural net introduction, please visit our page. In a nutshell, Deeplearning4j lets you compose deep neural nets from various shallow nets, each of which form a so-called `layer`. This flexibility lets you combine variational autoencoders, sequence-to-sequence autoencoders, convolutional nets or recurrent nets as needed in a distributed, production-grade framework that works with Spark and Hadoop on top of distributed CPUs or GPUs.

There are a lot of parameters to adjust when you're training a deep-learning network. We've done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers.

Getting Started

Untitled

Tutorials

Deeplearning4j Tutorials

While Deeplearning4j is written in Java, the Java Virtual Machine (JVM) lets you import and share code in other JVM languages. These tutorials are written in Scala, the de facto standard for data science in the Java environment. There’s nothing stopping you from using any other interpreter such as Java, Kotlin, or Clojure.

If you’re coming from non-JVM languages like Python or R, you may want to read about how the JVM works before using these tutorials. Knowing the basic terms such as classpath, virtual machine, “strongly-typed” languages, and functional programming will help you debug, as well as expand on the knowledge you gain here. If you don’t know Scala and want to learn it, Coursera has a great course named Functional Programming Principles in Scala.

The tutorials are currently being reworked. You will likely find stumbling points. If you need any support while working through them, feel free to ask questions on https://community.konduit.ai/.

Tutorials covering basic DL4J features

End to End Tutorials showing specific solutions

Logistic Regression

With deep learning, we can compose a deep neural network to suit the input data and its features. The goal is to train the network on the data to make predictions, and those predictions are tied to the outcomes that you care about; i.e. is this transaction fraudulent or not, or which object is contained in the photo? There are different techniques to configure a neural network, and all of them build a relational hierarchy between the inputs and outputs.

In this tutorial, we are going to configure the simplest neural network and that is logistic regression model network.

Regression is a process that helps show the relations between the independent variables (inputs) and the dependent variables (outputs). Logistic regression is one in which the dependent variable is categorical rather than continuous - meaning that it can predict only a limited number of classes or categories, like a switch you flip on or off. For example, it can predict that an image contains a cat or a dog, or it can classify input in ten buckets with the integers 0 through 9.

A simple logistic regression calculates x*w + b = y. Where x is an instance of input data, w is the weight or coefficient that transforms that input, b is the bias and y is the output, or prediction about the data. The biological terms show how this artificial neuron loosely maps to a neuron in the human brain. The most important point is how data flows through and is transformed by this structure.

What will we learn in this tutorial?

We’re going to configure the simplest network, with just one input layer and one output layer, to show how logistic regression works.

Imports

Configuring logistic regression layers

We are going to first build the layers and then feed these layers into the network configuration.

Why we didn’t build an input layer

You may be wondering why didn’t we write any code for building our input layer. The input layer is only a set of inputs values fed into the network. It doesn’t perform a calculation. It’s just an input sequence (raw or pre-processed data) coming into the network, data to be trained on or to be evaluated. Later, we are going to work with data iterators, which feed input to a network in a specific pattern, and which can be thought of as an input layer of the network.

Feed Forward Networks

In our previous tutorial, we learned about a very simple neural network model - the logistic regression model. Although you can solve many tasks with a simple model like that, most of the problems require a much complex network configuration. Typical Deep leaning model consists of many layers between the inputs and outputs. In this tutorial, we are going to learn about one of those configuration i.e. Feed-forward neural networks.

Feed-Forward Networks

Feed-forward networks are those in which there is not cyclic connection between the network layers. The input flows forward towards the output after going through several intermediate layers. A typical feed-forward network looks like this:

Here you can see a different layer named as a hidden layer. The layers in between our input and output layers are called hidden layers. It’s called hidden because we don’t directly deal with them and hence not visible. There can be more than one hidden layer in the network.

Just as our softmax activation after our output layer in the previous tutorial, there can be activation functions between each layer of the network. They are responsible to allow (activate) or disallow our network output to the next layer node. There are different activation functions such as sigmoid and relu etc.

Imports

import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.deeplearning4j.nn.conf.graph.MergeVertex
import org.deeplearning4j.nn.conf.layers.{DenseLayer, GravesLSTM, OutputLayer, RnnOutputLayer}
import org.deeplearning4j.nn.conf.{ComputationGraphConfiguration, MultiLayerConfiguration, NeuralNetConfiguration, Updater}
import org.deeplearning4j.nn.graph.ComputationGraph
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.learning.config.AdaGrad
import org.nd4j.linalg.lossfunctions.LossFunctions

Let’s create the feed-forward network configuration

val conf = new NeuralNetConfiguration.Builder()
    .seed(12345)
    .weightInit(WeightInit.XAVIER)
    .updater(new AdaGrad(0.5))
    .activation(Activation.RELU)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .l2(0.0001)
    .list()
    .layer(0, new DenseLayer.Builder().nIn(784).nOut(250).weightInit(WeightInit.XAVIER).activation(Activation.RELU) //First hidden layer
            .build())
    .layer(1, new OutputLayer.Builder().nIn(250).nOut(10).weightInit(WeightInit.XAVIER).activation(Activation.SOFTMAX) //Output layer
            .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .build())
    .build()

What we did here?

As you can see above that we have made a feed-forward network configuration with one hidden layer. We have used a RELU activation between our hidden and output layer. RELUs are one of the most popularly used activation functions. Activation functions also introduce non-linearities in our network so that we can learn on more complex features present in our data. Hidden layers can learn features from the input layer and it can send those features to be analyzed by our output layer to get the corresponding outputs. You can similarly make network configurations with more hidden layers as:

//Just make sure the number of inputs of the next layer equals to the number of outputs in the previous layer.
val conf = new NeuralNetConfiguration.Builder()
    .seed(12345)
    .weightInit(WeightInit.XAVIER)
    .updater(new AdaGrad(0.5))
    .activation(Activation.RELU)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .learningRate(0.05)
    .l2(0.0001)
    .list()
    //First hidden layer
    .layer(0, new DenseLayer.Builder()
            .nIn(784).nOut(250)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.RELU) 
            .build())
    //Second hidden layer
    .layer(1, new DenseLayer.Builder()
            .nIn(250).nOut(100)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.RELU) 
            .build())
     //Third hidden layer
    .layer(2, new DenseLayer.Builder()
            .nIn(100).nOut(50)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.RELU)
            .build())
    //Output layer
    .layer(3, new OutputLayer.Builder()
            .nIn(50).nOut(10)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.SOFTMAX) 
            .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .build())
    .build()

Early Stopping

When training neural networks, it is important to avoid overfitting the training data. Overfitting occurs when the neural network learns the noise in the training data and thus does not generalize well to data it has not been trained on. One hyperparameter that affects whether the neural network will overfit or not is the number of epochs or complete passes through the training split. If we use too many epochs, then the neural network is likely to overfit. On the other hand, if we use too few epochs, the neural network might not have the chance to learn fully from the training data.

Early stopping is one mechanism used to manually set the number of epochs to prevent underfitting and overfitting. The idea behind early stopping is intuitive. First the data is split into training and testing sets. At the end of each epoch, the neural network is evaluated on the test set. If the neural network outperforms the previous best model, then we save the neural network. The best overall model is then taken to be the final model.

In this tutorial we will show how to use early stopping with deeplearning4j (DL4J). We will apply the method on a feed forward neural network using the MNIST dataset, which is a dataset consisting of handwritten digits.

Imports

Loading the data

Now that we have imported everything needed to run this tutorial, we can start by setting the parameters for the neural network and initializing the data. We will set the maximum number of epochs to run early stopping on to be 15.

Network Configuration

Next we will set the neural network configuration using the MultiLayerNetwork class of DL4J and initialize the MultiLayerNetwork.

Early Stopping

If we weren’t using early stopping, we would proceed by training the neural network using for loops and the fit method of the MultiLayerNetwork. But since we are using early stopping we need to configure how early stopping will be applied. Looking at the next cell, we will use a maximum epoch number of 10 and a maximum training time of 5 minutes. The evaluation will be done on mnistTest after each epoch. Each model will be saved in the DL4JEarlyStoppingExample directory that we specified.

Once the EarlyStoppingConfiguration is specified, we only need to initialize an EarlyStoppingTrainer using the training data and the two previous configuraitons. The results are obtained just by calling the fit method of EarlyStoppingTrainer.

We can then print out the details of the best model.

Layers and Preprocessors

In previous tutorials we learned how to configure different neural networks such as feed forward, convolutional, and recurrent networks. The type of neural network is determined by the type of hidden layers they contain. For example, feed forward neural networks are comprised of dense layers, while recurrent neural networks can include Graves LSTM (long short-term memory) layers. In this tutorial we will learn how to use combinations of different layers in a single neural network using the MultiLayerNetwork class of deeplearning4j (DL4J). Additionally, we will learn how to use preprocess our data to more efficiently train the neural networks. The MNIST dataset (images of handwritten digits) will be used as an example for a convolutional network.

Imports

Convolutional Neural Network Example

Now that everything needed is imported, we can start by configuring a convolutional neural network for a MultiLayerNetwork. This network will consist of two convolutional layers, two max pooling layers, one dense layer, and an output layer. This is easy to do using DL4J’s functionality; we simply add a dense layer after the max pooling layer to convert the output into vectorized form before passing it to the output layer. The neural network will then attempt to classify an observation using the vectorized data in the output layer.

The only tricky part is getting the dimensions of the input to the dense layer correctly after the convolutional and max pooling layers. Note that we first start off with a 28 by 28 matrix and after applying the convolution layer with a 5 by 5 kernel we end up with twenty 24 by 24 matrices. Once the input is passed through the max pooling layer with a 2 by 2 kernel and a stride of 2 by 2, we end up with twenty 12 by 12 matrices. After the second convolutional layer with a 5 by 5 kernel, we end up with fifty 8 by 8 matrices. This output is reduced to fifty 4 by 4 matrices after the second max pooling layer which has the same kernel size and stride of the first max pooling layer. To vectorize these final matrices, we require an input of dimension 5044 or 800 in the dense layer.

Before training the neural network, we will instantiate built-in DataSetIterators for the MNIST data. One example of data preprocessing is scaling the data. The data we are using in raw form are greyscale images, which are represented by a single matrix filled with integer values from 0 to 255. A 0 value indicates a black pixel, while a 1 value indicates a white pixel. It is helpful to scale the image pixel value from 0 to 1 instead of from 0 to 255. To do this, the ImagePreProcessingScaler class is used directly on the MnistDataSetIterators. Note that this process is typtical for data preprocessing. Once this is done, we are ready to train the neural network.

To train the neural network, we use 5 epochs or complete passes through the training set by simply calling the fit method.

Lastly, we use the test split of the data to evaluate how well our final model performs on data it has never seen. We can see that the model performs pretty well using only 5 epochs!

Using Multiple GPUs

Training neural network models can be a computationally expensive task. In order to speed up the training process, you can choose to train your models in parallel with multiple GPU’s if they are installed on your machine. With deeplearning4j (DL4J), this isn’t a difficult thing to do. In this tutorial we will use the MNIST dataset (dataset of handwritten images) to train a feed forward neural network in parallel with multiple GPUs.

Note: This also works if you can't fully load your CPU. In that case you just stay with the CPU specific backend.

Prerequisite

You must have multiple CUDA compatible GPUs, ideally of the same speed
You must setup your project to use the CUDA Backend, for help see

Imports

Data Set

To obtain the data, we use built-in DataSetIterators for the MNIST with a random seed of 12345. These DataSetIterators can be used to directly feed the data into a neural network.

Model Configuration

Next, we set up the neural network configuration using a convolutional configuration and initialize the model.

Parallel Wrapper

Next we need to configure the parallel training with the ParallelWrapper class using the MultiLayerNetwork as the input. The ParallelWrapper will take care of load balancing between different GPUs.

The notion is that the model will be duplicated within the ParallelWrapper. The prespecified number of workers (in this case 2) will then train its own model using its data. After a specified number of iterations (in this case 3), all models will be averaged and workers will receive duplicate models. The training process will then continue in this way until the model is fully trained.

To train the model, the fit method of the ParallelWrapper is used directly on the DataSetIterator. Because the ParallelWrapper class handles all the training details behind the scenes, it is very simple to parallelize this process using dl4j.

Examples Tour

Brief tour of available examples in DL4J.

Deeplearning4J has a wealth of examples of how to use its many parts. You can find the examples in the Examples Repository.

Prerequisites

The example repository consists of several separate Maven Java projects, each with their own pom files. Maven is a popular build automation tool for Java Projects. The contents of a "pom.xml" file dictate the configurations. Read more about how to configure Maven here.

Users can also refer to the simple sample project provided to get started with a clean project from scratch.

Build tools are considered standard software engineering best practice. Besides this the complexities posed by the projects in the DL4J ecosystem make dependencies too difficult to manage manually. All the projects in the DL4J ecosystem can be used with other build tools like Gradle, SBT etc. More information on that can be found here.

Example Content

Projects are based on what functionality the included examples demonstrate to the user and not necessarily which library in the DL4J stack the functionality lives in.

Examples in a project are in general separated into "quickstart" and "advanced".

Each project README also lists all the examples it contains, with a recommended order to explore them in.

dl4j-examples This project contains a set of examples that demonstrate use of the high level DL4J API to build a variety of neural networks. Some of these examples are end to end, in the sense they start with raw data, process it and then build and train neural networks on it.
tensorflow-keras-import-examples This project contains a set of examples that demonstrate how to import Keras h5 models and TensorFlow frozen pb models into the DL4J ecosystem. Once imported into DL4J these models can be treated like any other DL4J model - meaning you can continue to run training on them or modify them with the transfer learning API or simply run inference on them.
dl4j-distributed-training-examples This project contains a set of examples that demonstrate how to do distributed training, inference and evaluation in DL4J on Apache Spark. DL4J distributed training employs a "hybrid" asynchronous SGD approach - further details can be found in the distributed deep learning documentation here
cuda-specific-examples This project contains a set of examples that demonstrate how to leverage multiple GPUs for data-parallel training of neural networks for increased performance.
samediff-examples This project contains a set of examples that demonstrate the SameDiff API. SameDiff (which is part of the ND4J library) can be used to build lower level auto-differentiating computation graphs. An analogue to the SameDiff API vs the DL4J API is the low level TensorFlow API vs the higher level of abstraction Keras API.
data-pipeline-examples This project contains a set of examples that demonstrate how raw data in various formats can be loaded, split and preprocessed to build serializable (and hence reproducible) ETL pipelines.
nd4j-ndarray-examples This project contains a set of examples that demonstrate how to manipulate NDArrays. The functionality of ND4J demonstrated here can be likened to NumPy.
arbiter-examples This project contains a set of examples that demonstrate usage of the Arbiter library for hyperparameter tuning of Deeplearning4J neural networks.
rl4j-examples This project contains examples of using RL4J, the reinforcement learning library in DL4J.
android-examples This project contains an Android example project, that shows DL4J being used in an Android application.

Feedback & Contributions

While these set of examples don't cover all the features available in DL4J the intent is to cover functionality required for most users - beginners and advanced. File an issue here if you have feedback or feature requests that are not covered here. We are also available via our community forum for questions. We welcome contributions from the community. More information can be found here We love hearing from you. Cheers!

Deep Learning Beginners

Road map for beginners new to deep learning.

How Do I Start Using Deep Learning?

Where you start depends on what you already know.

The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model.

In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven.

Below you'll find a list of resources. The sections are roughly organized in the order they will be useful.

Free Machine- and Deep-learning Courses Online

(For those interested in a survey of artificial intelligence.)
(For those interested in image recognition.)

Math

The math involved with deep learning is basically linear algebra, calculus and probility, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).

; Patrick van der Smagt

Programming

If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start.

(Vim is an editor accessible from the command line.)

If you want to jump into deep-learning from here without Java, we recommend and the various Python frameworks built atop it, including and .

Python

Java

Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.

Deeplearning4j

With that under your belt, we recommend you approach Deeplearning4j through its .

Other Resources

Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups .

While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like and .

Contribute

How to contribute to the Eclipse Deeplearning4j source code.

Prerequisites

Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j monorepo. These include:

DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.
ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.
DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.
Arbiter: Arbiter is a package for (amongst other things) hyperparameter optimization of neural networks. Hyperparameter optimization refers to the process of automating the selection of network hyperparameters (learning rate, number of layers, etc) in order to obtain good performance.

We also have an extensive examples repository at dl4j-examples.

Ways to contribute

There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:

Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)
Add a new training feature
Bug fixes
DL4J examples: Is there an application or network architecture that we don’t have examples for?
Testing performance and identifying bottlenecks or areas to improve
Improve website documentation (or write tutorials, etc)
Improve the JavaDocs

There are a number of different ways to find things to work on. These include:

Looking at the issue trackers:
https://github.com/eclipse/deeplearning4j/issues
https://github.com/eclipse/deeplearning4j-examples/issues
Reviewing our Roadmap
Talking to the developers on the community forums
Reviewing recent papers and blog posts on training features, network architectures and applications
Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?

General guidelines

Before you dive in, there’s a few things you need to know. In particular, the tools we use:

Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.
Git: the version control system we use
Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE
VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.
IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.

Things to keep in mind:

Code should be Java 7 compliant
If you are adding a new method or class: add JavaDocs
You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)
Provide informative comments throughout your code. This helps to keep all code maintainable.
Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.
If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct
If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.
If you are unsure about something - ask us on the community forums!

Eclipse Contributors

IP/Copyright requirements for Eclipse Foundation Projects

This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository: https://github.com/eclipse/deeplearning4j

Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:

Sign the Eclipse Contributor Agreement (once)
Sign commits (each time)

Why Is This Required?

These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here: https://projects.eclipse.org/

By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.

By signing your commits, you are asserting that the code in that particular commit is your own.

Signing the Eclipse Contributor Agreement

You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:

Step 1: Sign up for an Eclipse account

This can be done at https://accounts.eclipse.org/user/register

Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).

Step 2: Sign the ECA

Go to https://accounts.eclipse.org/user/eca and follow the instructions.

Signing Your Commits

Signing a New Commit

There are a few ways to sign commits. Note that you can use any of these aoptions.

Option 1: Use -s When Committing on Command Line

Signing commits here is simple:

git commit -s -m "My signed commit"

Note the use of -s (lower case s) - upper-case S (i.e., -S) is for GPG signing (see below).

Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing

For example, you could set up the following alias in Bash:

alias gcm='git commit -s -m'

Then committing would be done with the following:

gcm "My Commit"

For Windows command line, similar options are available through a few mechanisms (see here)

One simple way is to create a gcm.bat file with the following contents, and add it to your system path:

@echo off
echo.
git commit -s -m %*

You can then commit using the same process as above (i.e., gcm "My Commit")

Option 3: Use GPG Signing

For details on GPG signing, see this link

Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m' - note the upper case -S for GPG signing.

Option 4: Commit using IntelliJ with Auto Signing

IntelliJ can be used to perform git commits, including through signed commits. See this page for details.

Checking If A Commit Is Signed

After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1 to show the signature for the last commit (use -5 to show the last 5 commits, for example)

The output will look like:

$ git log --show-signature -2
commit 81681455918371e29da1490d3f0ca3deecaf0490 (HEAD -> commit_test_branch)
Author: YourName <[email protected]>
Date:   Fri Jun 21 22:27:50 2019 +1000

    This commit is unsigned

commit 2349c6aa3497bd65866d7d0a18fe82bb691bb868
Author: YourName <[email protected]>
Date:   Fri Jun 21 21:42:38 2019 +1000

    My signed commit

    Signed-off-by: YourName <[email protected]>

The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by).

If You Forget to Sign a Commit - Amending the Last Commit

If you forgot to sign the last commit, you can use the following command:

git commit --amend --signoff

If You Forget to Sign Multiple Commits

Suppose your branch has 3 new commits, all of which are unsigned:

$ git log -4 --oneline
4b164026 (HEAD -> commit_test_branch) Your new commit 3
d7799615 Your new commit 2
6bb6113a Your new commit 1
ef09606c This commit already exists

One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)

git reset --soft HEAD~3
git commit -s -m "Squashed and signed"

The result:

$ git log -2 --oneline
31658e11 (HEAD -> commit_test_branch) Squashed and signed
ef09606c This commit already exists

You can confirm that the commit is signed using git log -1 --show-signature as shown earlier.

Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.

If you are updating an existing PR, you may need to force push using -f (as in git push X -f).

About

Facts and introduction to Eclipse Deeplearning4j, the top JVM deep learning framework.

About Eclipse Deeplearning4j

Eclipse Deeplearning4j is an open-source, distributed deep-learning project in Java and Scala spearheaded by the people at , a business intelligence and enterprise software firm. We're a team of data scientists, deep-learning specialists, Java systems engineers and semi-sentient robots.

There are a lot of knobs to turn when you're training a distributed deep-learning network. We've done our best to explain them, so that Eclipse Deeplearning4j can serve as a DIY tool for Java, Scala and Clojure programmers working on Hadoop and other file systems.

Media

Deeplearning4j has been featured in , , , , , and .

Cite Eclipse Deeplearning4j

If you plan to publish an academic paper and wish to cite Deeplearning4j, please use this format:

Eclipse Deeplearning4j Development Team. Deeplearning4j: Open-source distributed deep learning for the JVM, Apache Software Foundation License 2.0.

Supporters

Profiling supported by .

Configuration

Backends

Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.

ND4J works atop so-called backends, or linear-algebra libraries, such as Native nd4j-native and nd4j-cuda-10.2 (GPUs), which you can select by pasting the right dependency into your project’s POM.xml file.

ND4J backends for GPUs and CPUs

You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.

If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:

As of now, the artifactId for the CUDA versions can be one of nd4j-cuda-9.2, nd4j-cuda-10.0, nd4j-cuda-10.1 or nd4j-cuda-10.2.

You can also find the available CUDA versions via or in the .

Otherwise you will need to use the native implementation of ND4J as a CPU backend:

Building for Multiple Operating Systems

If you are developing your project on multiple operating systems/system architectures, you can add -platform to the end of your artifactId which will download binaries for most major systems.

Bundling multiple Backends

For enabling different backends at runtime, you set the priority with your environment via the environment variable

Relative to the priority, it will allow you to dynamically set the backend type.

CuDNN

See our page on .

CUDA Installation

Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA .

Troubleshooting

Nd4jBackend$NoAvailableBackendException

There are multiple reasons why you might run into this error message.

You haven't configured an ND4J backend at all.
You have a jar file that doesn't contain a backend for your platform.
You have a jar file that doesn't contain service loader files.

You haven't configured any ND4J Backend

Read this page and add a ND4J Backend to your dependencies:

You have a jar file that doesn't contain a backend for your platform.

This happens when you use a non -platform type backend dependency definition. In this case, only the Backend for the system that the jar file was built on will be included.

To solve this issue, use nd4j-native-platform instead of nd4j-native, if you are running on CPU and nd4j-cuda-10.2-platform instead of nd4j-cuda-10.2 when using the GPU backend.

If the jar file only contains the GPU backend, but your system has no CUDA capable (CC >= 3.5) GPU or CUDA isn't installed on the system, the CPU Backend should be used instead.

You have a jar file that doesn't contain service loader files.

ND4J uses the Java in order to detect which backends are available on the class path. Depending on your uberjar packaging configuration, those files might be stripped away or broken.

To double check that the required files are included, open your uberjar and make sure it contains /META-INF/services/org.nd4j.linalg.factory.Nd4jBackend. Then open the file, and make sure there are entries for all of your configured backends.

If your uberjar does not contain that file, or if not all of the configured backends are listed there, you will have to reconfigure your shade plugin. See documentation for how to do that.

CPU and AVX

CPU and AVX support in ND4J/Deeplearning4j

What is AVX, and why does it matter?

AVX (Advanced Vector Extensions) is a set of CPU instructions for accelerating numerical computations. See Wikipedia for more details.

Note that AVX only applies to nd4j-native (CPU) backend for x86 devices, not GPUs and not ARM/PPC devices.

Why AVX matters: performance. You want to use the version of ND4J compiled with the highest level of AVX supported by your system.

AVX support for different CPUs - summary:

Most modern x86 CPUs: AVX2 is supported
Some high-end server CPUs: AVX512 may be supported
Old CPUs (pre 2012) and low power x86 (Atom, Celeron): No AVX support (usually)

Note that CPUs supporting later versions of AVX include all earlier versions also. This means it's possible run a generic x86 or AVX2 binary on a system supporting AVX512. However it is not possible to run binaries built for later versions (such as avx512) on a CPU that doesn't have support for those instructions.

In version 1.0.0-beta6 and later you may get a warning as follows, if AVX is not configured optimally:

*********************************** CPU Feature Check Warning ***********************************
Warning: Initializing ND4J with Generic x86 binary on a CPU with AVX/AVX2 support
Using ND4J with AVX/AVX2 will improve performance. See deeplearning4j.org/cpu for more details
Or set environment variable ND4J_IGNORE_AVX=true to suppress this warning
************************************************************************************************

Configuring AVX in ND4J/DL4J

As noted earlier, for best performance you should use the version of ND4J that matches your CPU's supported AVX level.

ND4J defaults configuration (when just including the nd4j-native or nd4j-native-platform dependencies without maven classifier configuration) is "generic x86" (no AVX) for nd4j/nd4j-platform dependencies.

To configure AVX2 and AVX512, you need to specify a classifier for the appropriate architecture.

The following binaries (nd4j-native classifiers) are provided for x86 architectures:

Generic x86 (no AVX): linux-x86_64, windows-x86_64, macosx-x86_64
AVX2: linux-x86_64-avx2, windows-x86_64-avx2, macosx-x86_64-avx2
AVX512: linux-x86_64-avx512

Example: Configuring AVX2 on Windows (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>windows-x86_64-avx2</classifier>
</dependency>

Example: Configuring AVX512 on Linux (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>linux-x86_64-avx512</classifier>
</dependency>

Note that you need both nd4j-native dependencies - with and without the classifier.

In the examples above, it is assumed that a Maven property nd4j.version is set to an appropriate ND4J version such as 1.0.0-beta6

cuDNN

Using the NVIDIA cuDNN library with DL4J.

Using Deeplearning4j with cuDNN

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-10.0, deeplearning4j-cuda-10.1, or deeplearning4j-cuda-10.2, for example:

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

Note there are multiple combinations of cuDNN and CUDA supported. At this time the following combinations are supported by Deeplearning4j:

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

Alternatively, in the case of CUDA 10.2, cuDNN comes bundled with the "redist" package of the . , we can add the following dependencies instead of installing CUDA and cuDNN:

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the , instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

Memory Management

Setting available Memory/RAM for a DL4J application

Memory Management for ND4J/DL4J: How does it work?

ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.

"Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.

To manage memory allocations, we use two approaches:

JVM Garbage Collector (GC) and WeakReference tracking
MemoryWorkspaces - see Workspaces guide for details

Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and MemoryWorkspaces approaches is in when and how the memory is released.

For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.
For MemoryWorkspaces: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.

Configuring Memory Limits

With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:

-Xms - this defines how much memory JVM heap will use at application start.
-Xmx - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.
-Dorg.bytedeco.javacpp.maxbytes - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.
-Dorg.bytedeco.javacpp.maxphysicalbytes - this specifies the maximum bytes for the entire process - usually set to maxbytes plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. Unlike setting maxbytes setting maxphysicalbytes is optional. This can also be a percentage (>100%), in which case it would apply to maxMemory.

Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:

-Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=8G -Dorg.bytedeco.javacpp.maxphysicalbytes=10G

Gotchas: A few things to watch out for

With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.
For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.
If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.
If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.
If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. -Xmx8G will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.
In limited memory environments, it's usually a bad idea to use high -Xmx value together with -Xms option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set -Xms14G: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.

Memory-mapped files

ND4J supports the use of a memory-mapped file instead of RAM when using the nd4j-native backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.

Here's sample code:

WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
                .initialSize(1000000000)
                .policyLocation(LocationPolicy.MMAP)
                .build();

try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) {
    INDArray x = Nd4j.create(10000);
}

In this case, a 1GB temporary file will be created and mmap'ed, and NDArray x will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.

GPUs

When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.

We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.

We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.

If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.

Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the Xmx option.

Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM at minimum. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.

It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.

Example:

WorkspaceConfiguration basicConfig = WorkspaceConfiguration.builder()
    .policyAllocation(AllocationPolicy.STRICT)
    .policyLearning(LearningPolicy.FIRST_LOOP)
    .policyMirroring(MirroringPolicy.HOST_ONLY) // <--- this option does this trick
    .policySpill(SpillPolicy.EXTERNAL)
    .build();

It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the INDArray.unsafeDuplication() method.

Memory Workspaces

Workspaces are an efficient model for memory paging in DL4J.

What are workspaces?

ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all INDArrays' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.

The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call INDArray.detach() and you'll get an independent INDArray copy.

Neural Networks

For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards. Thus for most users, no explicit worspaces configuration is required.

To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:

.trainingWorkspaceMode(WorkspaceMode.SEPARATE) and/or .inferenceWorkspaceMode(WorkspaceMode.SINGLE) in your neural network configuration.

The difference between SEPARATE and SINGLE workspaces is a tradeoff between the performance & memory footprint:

SEPARATE is slightly slower, but uses less memory.
SINGLE is slightly faster, but uses more memory.

That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).

With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference. The only exclusion is the output() method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting INDArray from the workspaces, thus providing you with independent INDArray which will be handled by the JVM GC.

Please note: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.

Garbage Collector

If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:

// this will limit frequency of gc calls to 5000 milliseconds
Nd4j.getMemoryManager().setAutoGcWindow(5000)

// OR you could totally disable it
Nd4j.getMemoryManager().togglePeriodicGc(false);

Put that somewhere before your model.fit(...) call.

ParallelWrapper & ParallelInference

For ParallelWrapper, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.

ParallelWrapper wrapper = new ParallelWrapper.Builder(model)
      // DataSets prefetching options. Buffer size per worker.
      .prefetchBuffer(8)

      // set number of workers equal to number of GPUs.
      .workers(2)

      // rare averaging improves performance but might reduce model accuracy
      .averagingFrequency(5)

      // if set to TRUE, on every averaging model score will be reported
      .reportScoreAfterAveraging(false)

      // 3 options here: NONE, SINGLE, SEPARATE
      .workspaceMode(WorkspaceMode.SINGLE)

      .build();

Iterators

We provide asynchronous prefetch iterators, AsyncDataSetIterator and AsyncMultiDataSetIterator, which are usually used internally.

These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first DataSet coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).

Caution: If you’re using a custom iterator or the RecordReader, please make sure you’re not initializing something huge within the first next() call. Do that in your constructor to avoid undesired workspace growth.

Caution: With AsyncDataSetIterator being used, DataSets are supposed to be used before calling the next() DataSet. You are not supposed to store them, in any way, without the detach() call. Otherwise, the memory used for INDArrays within DataSet will be overwritten within AsyncDataSetIterator eventually.

If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: AsyncShieldDataSetIterator and AsyncShieldMultiDataSetIterator. Basically, those are just thin wrappers that prevent prefetch.

Evaluation

Usually, evaluation assumes use of the model.output() method, which essentially returns an INDArray detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:

Evaluation eval = new Evaluation(outputNum);
ROC roceval = new ROC(outputNum);
model.doEvaluation(iteratorTest, eval, roceval);

This piece of code will run a single cycle over iteratorTest, and it will update both (or less/more if required by your needs) IEvaluation implementations without any additional INDArray allocation.

Workspace Destruction

There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.

That could be done like so: Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.

Workspace Exceptions

If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:

org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic

DL4J's LayerWorkspaceMgr

DL4J's Layer API includes the concept of a "layer workspace manager".

The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces. For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons. However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.

What does this mean in practice? Usually it's quite simple...

When returning activations (activate(boolean training, LayerWorkspaceMgr workspaceMgr) method), make sure the returned array is defined in ArrayType.ACTIVATIONS (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)
When returning activation gradients (backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)), similarly return an array defined in ArrayType.ACTIVATION_GRAD

You can also leverage an array defined in any workspace to the appropriate workspace using, for example, LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)

Note that if you are not implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use LayerWorkspaceMgr.noWorkspaces().

Snapshots

Using daily builds for access to latest Eclipse Deeplearning4j features.

Introduction to Snapshots
Setup Instructions
Limitations
Configuration of ND4J Backend
Note to Gradle Users

Overview/Introduction

We provide automated daily builds of repositories such as ND4J, DataVec, DeepLearning4j, RL4J etc. So all the newest functionality and most recent bug fixes are released daily.

Snapshots work like any other Maven dependency. The only difference is that they are served from a custom repository rather than from Maven Central.

Due to ongoing development, snapshots should be considered less stable than releases: breaking changes or bugs can in principle be introduced at any point during the course of normal development. Typically, releases (not snapshots) should be used when possible, unless a bug fix or new feature is required.

Setup Instructions

Step 1: To use snapshots in your project, you should add snapshot repository information like this to your pom.xml file:

<repositories>
    <repository>
        <id>snapshots-repo</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>  <!-- Optional, update daily -->
        </snapshots>
    </repository>
</repositories>

Step 2: Make sure to specify the snapshot version. We follow a simple rule: If the latest stable release version is A.B.C, the snapshot version will be A.B.(C+1)-SNAPSHOT. The current snapshot version is 1.0.0-SNAPSHOT. For more details on the repositories section of the pom.xml file, see Maven documentation

If using properties like the DL4J examples, change: From version:

<dl4j.version>1.0.0-beta6</dl4j.version>
<nd4j.version>1.0.0-beta6</nd4j.version>

To version:

<dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
<nd4j.version>1.0.0-SNAPSHOT</nd4j.version>

Sample pom.xml using Snapshots

A sample pom.xml is provided here: sample pom.xml using snapshots This has been taken from the DL4J standalone sample project and modified using step 1 and 2 above. The original (using the last release) can be found here

Limitations

Both -platform (all operating systems) and single OS (non-platform) snapshot dependencies are released. Due to the multi-platform build nature of snapshots, it is possible (though rare) for the -platform artifacts to temporarily get out of sync, which can cause build issues.

If you are building and deploying on just one platform, it is safter use the non-platform artifacts, such as:

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${nd4j.version}</version>
        </dependency>

Useful Maven Commands for Snapshots

Two commands that might be useful when using snapshot dependencies in Maven is as follows: 1. -U - for example, in mvn package -U. This -U option forces Maven to check (and if necessary, download) of new snapshot releases. This can be useful if you need the be sure you have the absolute latest snapshot release. 2. -nsu - for example, in mvn package -nsu. This -nsu option stops Maven from checking for snapshot releases. Note however your build will only succeed with this option if you have some snapshot dependencies already downloaded into your local Maven cache (.m2 directory)

An alternative approach to (1) is to set <updatePolicy>always</updatePolicy> in the <repositories> section found earlier in this page. An alternative approach to (2) is to set <updatePolicy>never</updatePolicy> in the <repositories> section found earlier in this page.

Note to Gradle users

Snapshots will not work with Gradle. You must use Maven to download the files. After that, you may try using your local Maven repository with mavenLocal().

In order to download specific snapshot artifacts into your local Maven repository, you can run the following Maven command.

mvn dependency:get -DremoteRepositories=snapshots::::https://oss.sonatype.org/content/repositories/snapshots -Dartifact=org.nd4j:nd4j-native:1.0.0-SNAPSHOT:jar:macos-x86_64

In this example, it will download the nd4j-native (CPU backend) artifact for macOS. If you are on Windows or Linux, you'd use windows-x86_64 or linux-x86_64 respectively.

A bare minimum file like the following should work in theory, but it does not. This is due to a bug in Gradle. Gradle with snapshots and Maven classifiers appears to be a problem.

version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
    mavenCentral()
}

dependencies {
    compile group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-SNAPSHOT'
    compile group: 'org.deeplearning4j', name: 'deeplearning4j-modelimport', version: '1.0.0-SNAPSHOT'
    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT"
    // Use windows-x86_64 or linux-x86_64 if you are not on macos
    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT:macosx-x86_64"
    testCompile group: 'junit', name: 'junit', version: '4.12'

}

Of note when using the nd4j-native backend (in contrast to nd4j-native-platform) on Gradle (and SBT - but not Maven), you need to add openblas as a dependency. We do this for you in the -platform pom. Reference the -platform pom here to double check your dependencies. Note that these are version properties. See the <properties> section of the pom for current versions of the openblas and javacpp presets required to run nd4j-native.

Maven

Configure the Maven build tool for Deeplearning4j.

Configuring the Maven build tool

You can use Deeplearning4j with Maven by adding the following to your pom.xml:

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Add a backend

DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your pom.xml:

You can also swap the standard CPU implementation for .

SBT, Gradle, & Others

Configure the build tools for Deeplearning4j.

Configuring your build tool

While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects.

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Gradle

You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:

implementation "org.deeplearning4j:deeplearning4j-core:1.0.0-beta6"

Add a backend by adding the following:

implementation "org.nd4j:nd4j-native-platform:1.0.0-beta6"

You can also swap the standard CPU implementation for GPUs.

SBT

You can use Deeplearning4j with SBT by adding the following to your build.sbt:

libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-beta6"

Add a backend by adding the following:

libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "1.0.0-beta6"

You can also swap the standard CPU implementation for GPUs.

Ivy

You can use Deeplearning4j with ivy by adding the following to your ivy.xml:

<dependency org="org.deeplearning4j" name="deeplearning4j-core" rev="1.0.0-beta6" conf="build" />

Add a backend by adding the following:

<dependency org="org.nd4j" name="nd4j-native-platform" rev="1.0.0-beta6" conf="build" />

You can also swap the standard CPU implementation for GPUs.

Leinengen

Clojure programmers may want to use Leiningen or Boot to work with Maven. A Leiningen tutorial is here.

NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.

Models

Autoencoders

What are autoencoders?

Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

Where’s Restricted Boltzmann Machine?

RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

Supported layers

AutoEncoder

[source]

Autoencoder layer. Adds noise to input and learn a reconstruction function.

corruptionLevel

public Builder corruptionLevel(double corruptionLevel)

Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

sparsity

public Builder sparsity(double sparsity)

Autoencoder sparity parameter

param sparsity Sparsity

VariationalAutoencoder

[source]

Variational Autoencoder layer

See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes - https://arxiv.org/abs/1312.6114

This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

encoderLayerSizes

public Builder encoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

setEncoderLayerSizes

public void setEncoderLayerSizes(int... encoderLayerSizes)

param encoderLayerSizes Size of each encoder layer in the variational autoencoder

decoderLayerSizes

public Builder decoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

setDecoderLayerSizes

public void setDecoderLayerSizes(int... decoderLayerSizes)

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

reconstructionDistribution

public Builder reconstructionDistribution(ReconstructionDistribution distribution)

The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example: - {- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data - {- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data

param distribution Reconstruction distribution

lossFunction

public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

pzxActivationFn

public Builder pzxActivationFn(IActivation activationFunction)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activationFunction Activation function for p(z| x)

pzxActivationFunction

public Builder pzxActivationFunction(Activation activation)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activation Activation function for p(z | x)

nOut

public Builder nOut(int nOut)

Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

param nOut Size of P(Z | data) and output size

numSamples

public Builder numSamples(int numSamples)

Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

param numSamples Number of samples per data point for pretraining

Multilayer Network

Simple and sequential network configuration.

The MultiLayerNetwork class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph.

You will not want to use MultiLayerNetwork configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.

Usage

The example below shows how to build a simple linear classifier using DenseLayer (a basic multiperceptron layer).

You can also create convolutional configurations:

Vertices

Computation graph nodes for advanced configuration.

What is a vertex?

In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a ComputationGraph. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.

Available Vertices

L2NormalizeVertex

[source]

L2NormalizeVertex performs L2 normalization on a single input.

L2Vertex

[source]

L2Vertex calculates the L2 least squares error of two inputs.

For example, in Triplet Embedding you can input an anchor and a pos/neg class and use two parallel L2 vertices to calculate two real numbers which can be fed into a LossLayer to calculate TripletLoss.

PoolHelperVertex

[source]

A custom layer for removing the first column and row from an input. This is meant to allow importation of Caffe’s GoogLeNet from https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14.

ReshapeVertex

[source]

Adds the ability to reshape and flatten the tensor in the computation graph. This is the equivalent to the next layer. ReshapeVertex also ensures the shape is valid for the backward pass.

ScaleVertex

[source]

A ScaleVertex is used to scale the size of activations of a single layer For example, ResNet activations can be scaled in repeating blocks to keep variance under control.

ShiftVertex

[source]

A ShiftVertex is used to shift the activations of a single layer One could use it to add a bias or as part of some other calculation. For example, Highway Layers need them in two places. One, it’s often useful to have the gate weights have a large negative bias. (Of course for this, we could just initialize the biases that way.) But, also it needs to do this: (1-sigmoid(weight input + bias)) () input + sigmoid(weight input + bias) () activation(w2 input + bias) (() is hadamard product) So, here, we could have

a DenseLayer that does the sigmoid
a ScaleVertex(-1) and
a ShiftVertex(1) to accomplish that.

StackVertex

[source]

StackVertex allows for stacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where shared parameters are not supported by the network.

This vertex will automatically stack all available inputs.

UnstackVertex

[source]

UnstackVertex allows for unstacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where embeddings can be separated and run through subsequent layers.

Works similarly to SubsetVertex, except on dimension 0 of the input. stackSize is explicitly defined by the user to properly calculate an step.

ReverseTimeSeriesVertex

[source]

ReverseTimeSeriesVertex is used in recurrent neural networks to revert the order of time series. As a result, the last time step is moved to the beginning of the time series and the first time step is moved to the end. This allows recurrent layers to backward process time series.

Masks: The input might be masked (to allow for varying time series lengths in one minibatch). In this case the present input (mask array = 1) will be reverted in place and the padding (mask array = 0) will be left untouched at the same place. For a time series of length n, this would normally mean, that the first n time steps are reverted and the following padding is left untouched, but more complex masks are supported (e.g. [1, 0, 1, 0, …].

setBackpropGradientsViewArray

public void setBackpropGradientsViewArray(INDArray backpropGradientsViewArray)

Gets the current mask array from the provided input

return The mask or null, if no input was provided

Custom Layers

Extend DL4J functionality for custom layers.

There are two components to adding a custom layer:

Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer
Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer

The configuration layer ((1) above) class handles the settings. It's the one you would use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom settings here, and use them in your layer.

The implementation layer ((2) above) class has parameters, and handles network forward pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...) method. In other words: the instantiate method is how we go from the configuration to the implementation; MultiLayerNetwork or ComputationGraph will call this method when initializing the

An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the implementation class). Both of these classes have extensive comments regarding their methods.

You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes, etc: the reason is because one is for the configuration, one is for the implementation. We have not followed this "same name" pattern here to hopefully avoid confusion.

Testing Your Custom Layer

Once you have added a custom layer, it is necessary to run some tests to ensure it is correct.

These tests should at a minimum include the following:

Tests to ensure that the JSON configuration (to/from JSON) works correctly
This is necessary for networks with your custom layer to function with both
model serialization (saving) and Spark training.
Gradient checks to ensure that the implementation is correct.

Example

A full custom layer example is available in our .

Updaters

Special algorithms for gradient descent.

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

Available updaters

NadamUpdater

The Nadam updater. https://arxiv.org/pdf/1609.04747.pdf

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NesterovsUpdater

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

Get the nesterov update

param gradient the gradient to get the update for
param iteration
return

RmsPropUpdater

RMS Prop updates:

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf http://cs231n.github.io/neural-networks-3/#ada

AdaGradUpdater

Vectorized Learning Rate used per Connection Weight

Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent See also http://cs231n.github.io/neural-networks-3/#ada

applyUpdater

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

param gradient the gradient to get learning rates for
param iteration

AdaMaxUpdater

The AdaMax updater, a variant of Adam. http://arxiv.org/abs/1412.6980

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NoOpUpdater

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

The Adam updater. http://arxiv.org/abs/1412.6980

applyUpdater

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

AdaDeltaUpdater

http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf https://arxiv.org/pdf/1212.5701v1.pdf

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

Get the updated gradient for the given gradient and also update the state of ada delta.

param gradient the gradient to get the updated gradient for
param iteration
return the update gradient

SgdUpdater

SGD updater applies a learning rate only

GradientUpdater

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

AMSGradUpdater

The AMSGrad updater Reference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ

Model Zoo

Overview

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-zoo</artifactId>
    <version>1.0.0-beta6</version>
</dependency>

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

import org.deeplearning4j.zoo.model.AlexNet
import org.deeplearning4j.zoo.*;

...

int numberOfClassesInYourData = 1000;
int randomSeed = 123;

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
Model net = zooModel.init();

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
MultiLayerConfiguration net = ((AlexNet) zooModel).conf();

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

import org.deeplearning4j.zoo.model.VGG16;
import org.deeplearning4j.zoo.*;

...

ZooModel zooModel = VGG16.builder().build();;
Model net = zooModel.initPretrained(PretrainedType.IMAGENET);

And initialize another VGG16 model with weights trained on VGGFace:

ZooModel zooModel = VGG16.builder().build();
Model net = zooModel.initPretrained(PretrainedType.VGGFACE);

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this deeplearning4j-zoo Github link.

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

int numberOfClassesInYourData = 10;
int randomSeed = 123;

ZooModel zooModel = ResNet50.builder()
        .numClasses(numberOfClassesInYourData)
        .seed(randomSeed)
        .build();
zooModel.setInputShape(new int[][]{{3, 28, 28}});

Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J here.

Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see this section.

ND4J

Basics

Elementwise Operations And Basic Usage

The basic operations of linear algebra are matrix creation, addition and multiplication. This guide will show you how to perform those operations with ND4J, as well as various advanced transforms.

The Java code below will create a simple 2 x 2 matrix, populate it with integers, and place it in the nd-array variable nd:

INDArray nd = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2});

If you print out this array

System.out.println(nd);

you’ll see this

[[1.0 ,3.0]
[2.0 ,4.0]
]

A matrix with two rows and two columns, which orders its elements by column and which we’ll call matrix nd.

A matrix that ordered its elements by row would look like this:

[[1.0 ,2.0]
[3.0 ,4.0]
]

Elementwise scalar operations

The simplest operations you can perform on a matrix are elementwise scalar operations; for example, adding the scalar 1 to each element of the matrix, or multiplying each element by the scalar 5. Let’s try it.

nd.add(1);

This line of code represents this operation:

[[1.0 + 1 ,3.0 + 1]
[2.0 + 1,4.0 + 1]
]

and here is the result

[[2.0 ,4.0]
[3.0 ,5.0]
]

There are two ways to perform any operation in ND4J, destructive and nondestructive; i.e. operations that change the underlying data, or operations that simply work with a copy of the data. Destructive operations will have an “i” at the end – addi, subi, muli, divi. The “i” means the operation is performed “in place,” directly on the data rather than a copy, while nd.add() leaves the original untouched.

Elementwise scalar multiplication looks like this:

nd.mul(5);

And produces this:

[[10.0 ,20.0]
[15.0 ,25.0]
]

Subtraction and division follow a similar pattern:

nd.subi(3);
nd.divi(2);

If you perform all these operations on your initial 2 x 2 matrix, you should end up with this matrix:

[[3.5 ,8.5]
[6.0 ,11.0]
]

Elementwise vector operations

When performed with simple units like scalars, the operations of arithmetic are unambiguous. But working with matrices, addition and multiplication can mean several things. With vector-on-matrix operations, you have to know what kind of addition or multiplication you’re performing in each case.

First, we’ll create a 2 x 2 matrix, a column vector and a row vector.

INDArray nd = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2});
INDArray nd2 = Nd4j.create(new float[]{5,6},new int[]{2,1}); //vector as column
INDArray nd3 = Nd4j.create(new float[]{5,6},new int[]{2}); //vector as row

Notice that the shape of the two vectors is specified with their final parameters. {2,1} means the vector is vertical, with elements populating two rows and one column. A simple {2} means the vector populates along a single row that spans two columns – horizontal. You’re first matrix will look like this

[[1.00, 2.00],
 [3.00, 4.00]]

Here’s how you add a column vector to a matrix:

    nd.addColumnVector(nd2);

And here’s the best way to visualize what’s happening. The top element of the column vector combines with the top elements of each column in the matrix, and so forth. The sum matrix represents the march of that column vector across the matrix from left to right, adding itself along the way.

[1.0 ,2.0]     [5.0]    [6.0 ,7.0]
[3.0 ,4.0]  +  [6.0] =  [9.0 ,10.0]

But let’s say you preserved the initial matrix and instead added a row vector.

nd.addRowVector(nd3);

Then your equation is best visualized like this:

[1.0 ,2.0]                   [6.0 ,8.0]
[3.0 ,4.0]  +  [5.0 ,6.0] =  [8.0 ,10.0]

In this case, the leftmost element of the row vector combines with the leftmost elements of each row in the matrix, and so forth. The sum matrix represents that row vector falling down the matrix from top to bottom, adding itself at each level.

So vector addition can lead to different results depending on the orientation of your vector. The same is true for multiplication, subtraction and division and every other vector operation.

In ND4J, row vectors and column vectors look the same when you print them out with

System.out.println(nd);

They will appear like this.

[5.0 ,6.0]

Don’t be fooled. Getting the parameters right at the beginning is crucial. addRowVector and addColumnVector will not produce different results when using the same initial vector, because they do not change a vector’s orientation as row or column.

Elementwise matrix operations

To carry out scalar and vector elementwise operations, we basically pretend we have two matrices of equal shape. Elementwise scalar multiplication can be represented several ways.

    [1.0 ,3.0]   [c , c]   [1.0 ,3.0]   [1c ,3c]
c * [2.0 ,4.0] = [c , c] * [2.0 ,4.0] = [2c ,4c]

So you see, elementwise operations match the elements of one matrix with their precise counterparts in another matrix. The element in row 1, column 1 of matrix nd will only be added to the element in row one column one of matrix c.

This is clearer when we start elementwise vector operations. We imaginee the vector, like the scalar, as populating a matrix of equal dimensions to matrix nd. Below, you can see why row and column vectors lead to different sums.

Column vector:

[1.0 ,3.0]     [5.0]   [1.0 ,3.0]   [5.0 ,5.0]   [6.0 ,8.0]
[2.0 ,4.0]  +  [6.0] = [2.0 ,4.0] + [6.0 ,6.0] = [8.0 ,10.0]

Row vector:

[1.0 ,3.0]                   [1.0 ,3.0]    [5.0 ,6.0]   [6.0 ,9.0]    
[2.0 ,4.0]  +  [5.0 ,6.0] =  [2.0 ,4.0] +  [5.0 ,6.0] = [7.0 ,10.0]

Now you can see why row vectors and column vectors produce different results. They are simply shorthand for different matrices.

Given that we’ve already been doing elementwise matrix operations implicitly with scalars and vectors, it’s a short hop to do them with more varied matrices:

INDArray nd4 = Nd4j.create(new float[]{5,6,7,8},new int[]{2,2});

nd.add(nd4);

Here’s how you can visualize that command:

[1.0 ,3.0]   [5.0 ,7.0]   [6.0 ,10.0]
[2.0 ,4.0] + [6.0 ,8.0] = [8.0 ,12.0]

Muliplying the initial matrix nd with matrix nd4 works the same way:

nd.muli(nd4);

[1.0 ,3.0]   [5.0 ,7.0]   [5.0 ,21.0]
[2.0 ,4.0] * [6.0 ,8.0] = [12.0 ,32.0]

The term of art for this particular matrix manipulation is a Hadamard product.

These toy matrices are a useful heuristic to introduce the ND4J interface as well as basic ideas in linear algebra. This framework, however, is built to handle billions of parameters in n dimensions (and beyond…).

Elementwise Operations

Elementwise operations are more intuitive than vectorwise operations, because the elements of one matrix map clearly onto the other, and to obtain the result, you have to perform just one arithmetical operation.

With vectorwise matrix operations, you will have to first build intuition and also perform multiple steps. There are two basic types of matrix multiplication: inner (dot) product and outer product. The inner product results in a matrix of reduced dimensions, the outer product results in one of expanded dimensions. A helpful mnemonic: Expand outward, contract inward.

Inner product

Unlike Hadamard products, which require that both matrices have equal rows and columns, inner products simply require that the number of columns of the first matrix equal the number of rows of the second. For example, this works

Notice a 1 x 2 row times a 2 x 1 column produces a scalar. This operation reduces the dimensions to 1,1. You can imagine rotating the row vector [1.0 ,2.0] clockwise to stand on its end, placed against the column vector. The two top elements are then multiplied by each other, as are the bottom two, and the two products are added to consolidate in a single scalar.

In ND4J, you would create the two vectors like this:

And multiply them like this

Notice ND4J code mirrors the equation in that nd * nd2 is row vector times column vector. The method is mmul, rather than the mul we used for elementwise operations, and the extra “m” stands for “matrix.”

Now let’s take the same operation, while adding an additional column to a new array we’ll call nd4.

Now let’s add an extra row to the first matrix, call it nd3, and multiply it by nd4

The equation will look like this

Outer product

Taking the outer product of the two vectors we first worked with is as simple as reversing their order.

It turns out the multiplying nd2 by nd is the same as multiplying it by two nd’s stacked on top of each other. That’s an outer product. As you can see, outer products also require fewer operations, since they don’t combine two products into one element in the final matrix.

A few aspects of ND4J code should be noted here. Firstly, the method mmul takes two parameters.

which could be expressed like this

which is the same as this line

Using the second parameter to specify the nd-array to which the product should be assigned is a convention common in ND4J.

Matrix Manipulation

There are several other basic matrix manipulations to highlight as you learn ND4J’s workings.

Transpose

The transpose of a matrix is its mirror image. An element located in row 1, column 2, in matrix A will be located in row 2, column 1, in the transpose of matrix A, whose mathematical notation is A to the T, or A^T. Notice that the elements along the diagonal of a square matrix do not move – they are at the hinge of the reflection. In ND4J, transpose matrices like this:

INDArray nd = Nd4j.create(new float[]{1, 2, 3, 4}, new int[]{2, 2});

[1.0 ,3.0]
[2.0 ,4.0]                                                                                                                      
nd.transpose();

[1.0 ,2.0]
[3.0 ,4.0]

And a long matrix like this

[1.0 ,3.0 ,5.0 ,7.0 ,9.0 ,11.0]
[2.0 ,4.0 ,6.0 ,8.0 ,10.0 ,12.0]

Looks like this when it is transposed

[1.0 ,2.0]
[3.0 ,4.0]
[5.0 ,6.0]
[7.0 ,8.0]
[9.0 ,10.0]
[11.0 ,12.0]

In fact, transpose is just an important subset of a more general operation: reshape.

Reshape

Yes, matrices can be reshaped. You can change the number of rows and columns they have. The reshaped matrix has to fulfill one condition: the product of its rows and columns must equal the product of the row and columns of the original matrix. For example, proceeding columnwise, you can reshape a 3 by 4 matrix into a 2 by 6 matrix:

    INDArray nd2 = Nd4j.create(new float[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, new int[]{2, 6});

The array nd2 looks like this

[1.0 ,3.0 ,5.0 ,7.0 ,9.0 ,11.0]
[2.0 ,4.0 ,6.0 ,8.0 ,10.0 ,12.0]

Reshaping it is easy, and follows the same convention by which we gave it shape to begin with

nd2.reshape(3,4);

[1.0 ,4.0 ,7.0 ,10.0]
[2.0 ,5.0 ,8.0 ,11.0]
[3.0 ,6.0 ,9.0 ,12.0]

Broadcast

Broadcast is advanced. It usually happens in the background without having to be called. The simplest way to understand it is by working with one long row vector, like the one above.

nd2 = Nd4j.create(new float[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12});

Broadcasting will actually take multiple copies of that row vector and put them together into a larger matrix. The first parameter is the number of copies you want “broadcast,” as well as the number of rows involved. In order not to throw a compiler error, make the second parameter of broadcast equal to the number of elements in your row vector.

nd2.broadcast(new int[]{3,12});

[1.0 ,4.0 ,7.0 ,10.0 ,1.0 ,4.0 ,7.0 ,10.0 ,1.0 ,4.0 ,7.0 ,10.0]
[2.0 ,5.0 ,8.0 ,11.0 ,2.0 ,5.0 ,8.0 ,11.0 ,2.0 ,5.0 ,8.0 ,11.0]
[3.0 ,6.0 ,9.0 ,12.0 ,3.0 ,6.0 ,9.0 ,12.0 ,3.0 ,6.0 ,9.0 ,12.0]

nd2.broadcast(new int[]{6,12});

[1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0]
[2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0]
[3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0]
[4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0]
[5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0]
[6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0]

Tensors

A vector, that column of numbers we feed into neural nets, is simply a subclass of a more general mathematical structure called a tensor. A tensor is a multidimensional array.

You are already familiar with a matrix composed of rows and columns: the rows extend along the y axis and the columns along the x axis. Each axis is a dimension. Tensors have additional dimensions.

Tensors also have a so-called rank: a scalar, or single number, is of rank 0; a vector is rank 1; a matrix is rank 2; and entities of rank 3 and above are all simply called tensors.

It may be helpful to think of a scalar as a point, a vector as a line, a matrix as a plane, and tensors as objects of three dimensions or more. A matrix has rows and columns, two dimensions, and therefore is of rank 2. A three-dimensional tensor, such as those we use to represent color images, has channels, rows and columns, and therefore counts as rank 3.

As mathematical objects with multiple dimensions, tensors have a shape, and we specify that shape by treating tensors as n-dimensional arrays.

With ND4J, we do that by creating a new nd array and feeding it data, shape and order as its parameters. In pseudo code, this would be

nd4j.createArray(data, shape, order)

In real code, this line

INDArray arr = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2},'c');

creates an array with four elements, whose shape is 2 by 2, and whose order is “row major”, or rows first, which is the default in C. (In contrast, Fortran uses “column major” ordering, and could be specified with an ‘f’ as the third parameter.) The distinction between thetwo orderings, for the array created above, is best illustrated with a table:

Row-major (C)

Column-major (Fortran)

[1,2]

[1,3]

[3,4]

[2,4]

Once we create an n-dimensional array, we may want to work with slices of it. Rather than copying the data, which is expensive, we can simply “view” muli-dimensional slices. A slice of array “a” could be defined like this:

a[0:5,3:4,6:7]

which would give you the first 5 channels, rows 3 to 4 and columns 6 to 7, and so forth for n dimensions, which each individual dimension’s slice starting before the colon and ending after it.

Linear Buffer

Now, while it is useful to imagine matrices as two-dimensional planes, and 3-D tensors are cubic volumes, we store all tensors as a linear buffer. That is, they are all flattened to a row of numbers.

For that linear buffer, we specify something called stride. Stride tells the computation layer how to interpret the flattened representation. It is the number of elements you skip in the buffer to get to the next channel or row or column. There’s a stride for each dimension.

Here’s a brief video summarizing how tensors are converted into linear byte buffers for ND4J.

Additional Resources and Definitions

The word tensor derives from the Latin tendere, or “to stretch”; therefore, tensor relates to that which stretches, the stretcher. Tensor was introduced to English from the German in 1915, after being coined by Woldemar Voigt in 1898. The mathematical object is called a tensor because an early application of the idea was the study of materials stretching under tension.

Tensors are generalizations of scalars (that have no indices), vectors (that have exactly one index), and matrices (that have exactly two indices) to an arbitrary number of indices. - Mathworld

tensor, n. a mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space.

SAMEDIFF

Importing TensorFlow models

What models can be imported into SameDiff

Currently SameDiff supports the import of TensorFlow frozen graphs through the various SameDiff.importFrozenTF methods. TensorFlow documentation on frozen models can be found here.

import org.nd4j.autodiff.SameDiff.SameDiff;
SameDiff sd = SameDiff.importFrozenTF(modelFile);

Finding the model input/outputs and running inference

After you import the TensorFlow model there are 2 ways to find the inputs and outputs. The first method is to look at the output of

 sd.summary();

Where the input variables are the output of no ops, and the output variables are the input of no ops. Another way to find the inputs is

List<String> inputs = sd.inputs();

To run inference use:

INDArray out = sd.batchOutput()
    .input(inputName, inputArray)
    .output(outputs)
    .execSingle();

For multiple outputs, use exec() instead of execSingle(), to return a Map<String,INDArray> of outputs instead. Alternatively, you can use methods such as SameDiff.output(Map<String, INDArray> placeholders, String... outputs) to get the same output.

Import Validation

We have a TensorFlow graph analyzing utility which will report any missing operations (operations that still need to be implemented) here

Advanced: Node Skipping and Import Overrides

It is possible to remove nodes from the network. For example TensorFlow 1.x models can have hard coded dropout layers. See the BERT Graph test for an example.

List of models known to work with SameDiff.

Operations Coverage

SameDiff’s TensorFlow import is still being developed, and does not yet have support for every single operation and datatype in TensorFlow. Almost all of the common/standard operations are importable and tested, however - including almost everything in the tf, tf.math, tf.layers, tf.losses, tf.bitwise and tf.nn namespaces. The majority of existing pretrained models out there should be importable into SameDiff.

If you run into an operation that can’t be imported, feel free to open an issue.

ND4J & SameDiff Ops

Overview

All operations in ND4J and SameDiff are available in "Operation Namespaces". Each namespace is available on the Nd4j and SameDiff classes with its lowercase name.

For example, if you want to use the operation it would look like this

Namespaces

Random

bernoulli

INDArray bernoulli(double p, DataType datatype, long[] shape)

SDVariable bernoulli(double p, DataType datatype, long[] shape)
SDVariable bernoulli(String name, double p, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a Bernoulli distribution,

with the specified probability. Array values will have value 1 with probability P and value 0 with probability

1-P.

p - Probability of value 1
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

binomial

INDArray binomial(int nTrials, double p, DataType datatype, long[] shape)

SDVariable binomial(int nTrials, double p, DataType datatype, long[] shape)
SDVariable binomial(String name, int nTrials, double p, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a Binomial distribution,

with the specified number of trials and probability.

nTrials - Number of trials parameter for the binomial distribution
p - Probability of success for each trial
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

exponential

INDArray exponential(double lambda, DataType datatype, long[] shape)

SDVariable exponential(double lambda, DataType datatype, long[] shape)
SDVariable exponential(String name, double lambda, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a exponential distribution:

P(x) = lambda exp(-lambda x)

lambda - lambda parameter
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

logNormal

INDArray logNormal(double mean, double stddev, DataType datatype, long[] shape)

SDVariable logNormal(double mean, double stddev, DataType datatype, long[] shape)
SDVariable logNormal(String name, double mean, double stddev, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a Log Normal distribution,

i.e., log(x) ~ N(mean, stdev)

mean - Mean value for the random array
stddev - Standard deviation for the random array
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

normal

INDArray normal(double mean, double stddev, DataType datatype, long[] shape)

SDVariable normal(double mean, double stddev, DataType datatype, long[] shape)
SDVariable normal(String name, double mean, double stddev, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a Gaussian (normal) distribution,

N(mean, stdev)

mean - Mean value for the random array
stddev - Standard deviation for the random array
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

normalTruncated

INDArray normalTruncated(double mean, double stddev, DataType datatype, long[] shape)

SDVariable normalTruncated(double mean, double stddev, DataType datatype, long[] shape)
SDVariable normalTruncated(String name, double mean, double stddev, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a Gaussian (normal) distribution,

N(mean, stdev). However, any values more than 1 standard deviation from the mean are dropped and re-sampled

mean - Mean value for the random array
stddev - Standard deviation for the random array
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

uniform

INDArray uniform(double min, double max, DataType datatype, long[] shape)

SDVariable uniform(double min, double max, DataType datatype, long[] shape)
SDVariable uniform(String name, double min, double max, DataType datatype, long[] shape)

Generate a new random INDArray, where values are randomly sampled according to a uniform distribution,

U(min,max)

min - Minimum value
max - Maximum value.
datatype - Data type of the output variable
shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

Tuning & Training

Evaluation

Tools and classes for evaluating neural network performance

Why evaluate?

When training or deploying a Neural Network it is useful to know the accuracy of your model. In DL4J the Evaluation Class and variants of the Evaluation Class are available to evaluate your model's performance.

The Evaluation class is used to evaluate the performance for binary and multi-class classifiers (including time series classifiers). This section covers basic usage of the Evaluation Class.

Given a dataset in the form of a DataSetIterator, the easiest way to perform evaluation is to use the built-in evaluate methods on MultiLayerNetwork and ComputationGraph:

However, evaluation can be performed on individual minibatches also. Here is an example taken from our dataexamples/CSVExample in the project.

The CSV example has CSV data for 3 classes of flowers and builds a simple feed forward neural network to classify the flowers based on 4 measurements.

The first line creates an Evaluation object with 3 classes. The second line gets the labels from the model for our test dataset. The third line uses the eval method to compare the labels array from the testdata with the labels generated from the model. The fourth line logs the evaluation data to the console.

The output.

By default the .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:

Confusion Matrix
False Positive/Negative Rate
True Positive/Negative
Class Counts
F-beta, G-measure, Matthews Correlation Coefficient and more, see

Display the Confusion Matrix.

Displays

Additionaly the confusion matrix can be accessed directly, converted to csv or html using.

To Evaluate a network performing regression use the RegressionEvaluation Class.

As with the Evaluation class, RegressionEvaluation on a DataSetIterator can be performed as follows:

Here is a code snippet with single column, in this case the neural network was predicting the age of shelfish based on measurements.

Print the statistics for the Evaluation.

Returns

Columns are Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, Relative Squared Error, and R^2 Coefficient of Determination

See

When performing multiple types of evaluations (for example, Evaluation and ROC on the same network and dataset) it is more efficient to do this in one pass of the dataset, as follows:

Time series evaluation is very similar to the above evaluation approaches. Evaluation in DL4J is performed on all (non-masked) time steps separately - for example, a time series of length 10 will contribute 10 predictions/labels to an Evaluation object. One difference with time seires is the (optional) presence of mask arrays, which are used to mark some time steps as missing or not present. See for more details on masking.

For most users, it is simply sufficient to use the MultiLayerNetwork.evaluate(DataSetIterator) or MultiLayerNetwork.evaluateRegression(DataSetIterator) and similar methods. These methods will properly handle masking, if mask arrays are present.

The EvaluationBinary is used for evaluating networks with binary classification outputs - these networks usually have Sigmoid activation functions and XENT loss functions. The typical classification metrics, such as accuracy, precision, recall, F1 score, etc. are calculated for each output.

See

ROC (Receiver Operating Characteristic) is another commonly used evaluation metric for the evaluation of classifiers. Three ROC variants exist in DL4J:

ROC - for single binary label (as a single column probability, or 2 column 'softmax' probability distribution).
ROCBinary - for multiple binary labels
ROCMultiClass - for evaluation of non-binary classifiers, using a "one vs. all" approach

These classes have the ability to calculate the area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC), via the calculateAUC() and calculateAUPRC() methods. Furthermore, the ROC and Precision-Recall curves can be obtained using getRocCurve() and getPrecisionRecallCurve().

The ROC and Precision-Recall curves can be exported to HTML for viewing using: EvaluationTools.exportRocChartsToHtmlFile(ROC, File), which will export a HTML file with both ROC and P-R curves, that can be viewed in a browser.

Note that all three support two modes of operation/calculation

Thresholded (approximate AUROC/AUPRC calculation, no memory issues)
Exact (exact AUROC/AUPRC calculation, but can require large amount of memory with very large datasets - i.e., datasets with many millions of examples)

The number of bins can be set using the constructors. Exact can be set using the default constructor new ROC() or explicitly using new ROC(0)

See is used to evaluate Binary Classifiers.

Deeplearning4j also has the EvaluationCalibration class, which is designed to analyze the calibration of a classifier. It provides a number of tools for this purpose:

Counts of the number of labels and predictions for each class
Reliability diagram (or reliability curve)
Residual plot (histogram)
Histograms of probabilities, including probabilities for each class separately
Evaluation of a classifier using EvaluationCalibration is performed in a similar manner to the other evaluation classes. The various plots/histograms can be exported to HTML for viewing using EvaluationTools.exportevaluationCalibrationToHtmlFile(EvaluationCalibration, File).

SparkDl4jMultiLayer and SparkComputationGraph both have similar methods for evaluation:

A multi-task network is a network that is trained to produce multiple outputs. For example a network given audio samples can be trained to both predict the language spoken and the gender of the speaker. Multi-task configuration is briefly described .

Evaluation Classes useful for Multi-Task Network

See

Available evaluations

Early Stopping

Terminate a training session given certain conditions.

What is early stopping?

When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).

Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.

The idea behind early stopping is relatively simple:

Split data into training and test sets
At the end of each epoch (or, every N epochs):
- evaluate the network performance on the test set
- if the network outperforms the previous best model: save a copy of the network at the current epoch
Take as our final model the model that has the best test set performance

This is shown graphically below:

The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.

Using DL4J's early stopping functionality requires you to provide a number of configuration options:

A score calculator, such as the DataSetLossCalculator(JavaDoc, Source Code) for a Multi Layer Network, or DataSetLossCalculatorCG (JavaDoc, Source Code) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)
How frequently we want to calculate the score function (default: every epoch)
One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:
- Epoch termination conditions: evaluated every N epochs
- Iteration termination conditions: evaluated once per minibatch
A model saver, that defines how models are saved

An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:

MultiLayerConfiguration myNetworkConfiguration = ...;
DataSetIterator myTrainData = ...;
DataSetIterator myTestData = ...;

EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
        .epochTerminationConditions(new MaxEpochsTerminationCondition(30))
        .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
        .scoreCalculator(new DataSetLossCalculator(myTestData, true))
        .evaluateEveryNEpochs(1)
        .modelSaver(new LocalFileModelSaver(directory))
        .build();

EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);

//Conduct early stopping training:
EarlyStoppingResult result = trainer.fit();

//Print out the results:
System.out.println("Termination reason: " + result.getTerminationReason());
System.out.println("Termination details: " + result.getTerminationDetails());
System.out.println("Total epochs: " + result.getTotalEpochs());
System.out.println("Best epoch number: " + result.getBestModelEpoch());
System.out.println("Score at best epoch: " + result.getBestModelScore());

//Get the best model:
MultiLayerNetwork bestModel = result.getBestModel();

You can also implement your own iteration and epoch termination conditions.

Early Stopping w/ Parallel Wrapper

The early stopping implementation described above will only work with a single device. However, EarlyStoppingParallelTrainer provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. EarlyStoppingParallelTrainer wraps your model in a ParallelWrapper class and performs localized distributed training.

Note that EarlyStoppingParallelTrainer doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.

t-SNE Visualization

Data visualizaiton with t-SNE with higher dimensional data.

(t-SNE) is a data-visualization tool created by Laurens van der Maaten at Delft University of Technology.

While it can be used for any data, t-SNE (pronounced Tee-Snee) is only really meaningful with labeled data, which clarify how the input is clustering. Below, you can see the kind of graphic you can generate in DL4J with t-SNE working on MNIST data.

Look closely and you can see the numerals clustered near their likes, alongside the dots.

Here's how t-SNE appears in Deeplearning4j code.

Here is an image of the tsne-standard-coords.csv file plotted using gnuplot.

Transfer Learning

DL4J’s Transfer Learning API

The DL4J transfer learning API enables users to:

Modify the architecture of an existing model
Fine tune learning configurations of an existing model.
Hold parameters of a specified layer constant during training, also referred to as “frozen"

Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

The transfer learning helper

The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

Featurize an input dataset to save for future use
Fit the model with frozen layers with a featurized dataset
Output from the model with frozen layers given a featurized input.

When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

Show me the code

This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from http://download.tensorflow.org/example_images/flower_photos.tgz

I. Import a zoo model

Deeplearning4j has a new native model zoo. Read about the deeplearning4j-zoo module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:

ZooModel zooModel = VGG16.builder().build();
ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);

II. Set up a fine-tune configuration

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Nesterovs(5e-5))
            .seed(seed)
            .build();

III. Build new models based on VGG16

A.Modifying only the last layer, keeping other frozen

The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
    .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("fc2")
              .removeVertexKeepConnections("predictions") 
              .addLayer("predictions", 
        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nIn(4096).nOut(numClasses)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SOFTMAX).build(), "fc2")
              .build();

After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

B. Attach new layers to the bottleneck (block5_pool)

Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("block5_pool")
              .nOutReplace("fc2",1024, WeightInit.XAVIER)
              .removeVertexAndConnections("predictions") 
              .addLayer("fc3",new DenseLayer.Builder()
              .activation(Activation.RELU)
              .nIn(1024).nOut(256).build(),"fc2") 
              .addLayer("newpredictions",new OutputLayer
              .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                                .activation(Activation.SOFTMAX)
                                .nIn(256).nOut(numClasses).build(),"fc3") 
              .setOutputs("newpredictions") 
              .build();

C. Fine tune layers from a previously saved model

Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor(“block4_pool”)
              .build();

IV. Saving “featurized” datasets and training with them.

We use the transfer learning helper API. Note this freezes the layers of the model passed in.

Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(pretrainedNet, "fc2");
while(trainIter.hasNext()) {
        DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
        saveToDisk(currentFeaturized,trainDataSaved,true);
  trainDataSaved++;
}

Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(vgg16Transfer);
while (trainIter.hasNext()) {
       transferLearningHelper.fitFeaturized(trainIter.next());
}

Notes

The TransferLearning builder returns a new instance of a dl4j model.

Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

The trained model helper imports models from Keras without enforcing a training configuration.

Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

Frozen layer configurations are not saved when writing the model to disk.

In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
In place: With the transfer learning helper API which will apply the frozen layers to the given model.
FineTune configurations will selectively update learning parameters.

For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.

Utilities

Keras Import

Overview

Overview of model import.

Deeplearning4j: Keras model import

Keras model import provides routines for importing neural network models originally configured and trained using Keras, a popular Python deep learning library.

Once you have imported your model into DL4J, our full production stack is at your disposal. We support import of all Keras model types, most layers and practically all utility functionality. Please check here for a complete list of supported Keras features.

Getting started: Import a Keras model in 60 seconds

To import a Keras model, you need to create and serialize such a model first. Here's a simple example that you can use. The model is a simple MLP that takes mini-batches of vectors of length 100, has two Dense layers and predicts a total of 10 categories. After defining the model, we serialize it in HDF5 format.

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('simple_mlp.h5')

If you put this model file (simple_mlp.h5) into the base of your resource folder of your project, you can load the Keras model as DL4J MultiLayerNetwork as follows

This shows only how to import a Keras Sequential model. For more details take a look at both Functional Model import and Sequential Model import.

String simpleMlp = new ClassPathResource("simple_mlp.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp);

That's it! The KerasModelImport is your main entry point to model import and class takes care of mapping Keras to DL4J concepts internally. As user you just have to provide your model file, see our Getting started guide for more details and options to load Keras models into DL4J.

You can now use your imported model for inference (here with dummy data for simplicity)

INDArray input = Nd4j.create(DataType.FLOAT, 256, 100);
INDArray output = model.output(input);

Here's how you do training in DL4J for your imported model:

model.fit(input, output);

The full example just shown can be found in our DL4J examples.

Project setup

To use Keras model import in your existing project, all you need to do is add the following dependency to your pom.xml.

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-modelimport</artifactId>
    <version>1.0.0-beta6</version> // This version should match that of your other DL4J project dependencies.
</dependency>

If you need a project to get started in the first place, consider cloning DL4J examples and follow the instructions in the repository to build the project.

Backend

DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models can be imported into DL4J.

Popular models and applications

We support import for a growing number of applications, check here for a full list of currently covered models. These applications include

Deep convolutional and Wasserstein GANs
UNET
ResNet50
SqueezeNet
MobileNet
Inception
Xception

Troubleshooting and support

An IncompatibleKerasConfigurationException message indicates that you are attempting to import a Keras model configuration that is not currently supported in Deeplearning4j (either because model import does not cover it, or DL4J does not implement the layer, or feature).

Once you have imported your model, we recommend our own ModelSerializer class for further saving and reloading of your model.

You can inquire further by visiting the community forums. You might consider filing a feature request via Github so that this missing functionality can be placed on the DL4J development roadmap or even sending us a pull request with the necessary changes!

Why Keras model import?

Keras is a popular and user-friendly deep learning library written in Python. The intuitive API of Keras makes defining and running your deep learning models in Python easy. Keras allows you to choose which lower-level library it runs on, but provides a unified API for each such backend. Currently, Keras supports Tensorflow, CNTK and Theano backends.

There is often a gap between the production system of a company and the experimental setup of its data scientists. Keras model import allows data scientists to write their models in Python, but still seamlessly integrates with the production stack.

Keras model import is targeted at users mainly familiar with writing their models in Python with Keras. With model import you can bring your Python models to production by allowing users to import their models into the DL4J ecosphere for either further training or evaluation purposes.

You should use this module when the experimentation phase of your project is completed and you need to ship your models to production. Konduit commercial support for Keras implementations in enterprise.

Get Started

Getting started with model import.

Below is a demonstrating working code to load a Keras model into Deeplearning4j and validating the working network. Instructor Tom Hanlon provides an overview of a simple classifier over Iris data built in Keras with a Theano backend, and exported and loaded into Deeplearning4j:

If you have trouble viewing the video, please click here to .

Activations

Supported Keras activations.

We support all Keras activation functions, namely:

softmax
elu
selu
softplus
softsign
relu
tanh
sigmoid
hard_sigmoid
linear

The mapping of Keras to DL4J activation functions is defined in KerasActivationUtils

Losses

Supported Keras loss functions.

DL4J supports all available (except for logcosh), namely:

mean_squared_error
mean_absolute_error
mean_absolute_percentage_error
mean_squared_logarithmic_error
squared_hinge
hinge
categorical_hinge
logcosh
categorical_crossentropy
sparse_categorical_crossentropy
binary_crossentropy
kullback_leibler_divergence
poisson
cosine_proximity

The mapping of Keras loss functions can be found in .

Regularizers

Supported Keras regularizers.

All [Keras regularizers] are supported by DL4J model import:

l1
l2
l1_l2

Mapping of regularizers can be found in .

Recurrent Neural Network

Recurrent Neural Network (RNN) implementations in DL4J.

This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.

The Basics: Data and Network Configuration

DL4J currently supports the following types of recurrent neural network

RNN ("vanilla" RNN)
LSTM (Long Short-Term Memory)

Java documentation for each is available: SimpleRnn, LSTM.

Data for RNNs

Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].

Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.

When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.

RnnOutputLayer

RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.

Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:

.layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation(Activation.SOFTMAX)
.weightInit(WeightInit.XAVIER).nIn(prevLayerSize).nOut(nOut).build())

Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.

RNN Training Features

Truncated Back Propagation Through Time

Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.

Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).

Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:

For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.

In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.

The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.

Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)

.backpropType(BackpropType.TruncatedBPTT)
.tBPTTLength(100)

The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.

Some things of note:

By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).
The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.
The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)

Masking: One-to-Many, Many-to-One, and Sequence Classification

DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).

Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:

Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.

The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).

Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.

Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.

For a single example, the input and output masking arrays are shown below:

For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.

In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.

Evaluation and Scoring with Masking

Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.

Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:

Evaluation.evalTimeSeries(INDArray labels, INDArray predicted, INDArray outputMask)

where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.

Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.

Masking and Sequence Classification After Training

Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).

However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't actually label data anywhere except for the last time step.

Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?

To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:

    INDArray timeSeriesFeatures = ...;
    INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);
    int timeSeriesLength = timeSeriesOutput.size(2);        //Size of time dimension
    INDArray lastTimeStepProbabilities = timeSeriesOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength-1));

Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.

The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).

In this 'variable length' case, we need to get the last time step for each example separately. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the timeSeriesLength in the above code with the length of that example.

If we don't have the lengths of the time series directly, we need to extract them from the mask array.

If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):

    INDArray labelsMaskArray = ...;
    INDArray lastTimeStepIndices = Nd4j.argMax(labelMaskArray,1);

Alternatively, if we have only the features mask: One quick and dirty approach is to use this:

    INDArray featuresMaskArray = ...;
    int longestTimeSeries = featuresMaskArray.size(1);
    INDArray linspace = Nd4j.linspace(1,longestTimeSeries,longestTimeSeries);
    INDArray temp = featuresMaskArray.mulColumnVector(linspace);
    INDArray lastTimeStepIndices = Nd4j.argMax(temp,1);

To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).

In either case, we can then do the following:

    int numExamples = timeSeriesFeatures.size(0);
    for( int i=0; i<numExamples; i++ ){
        int thisTimeSeriesLastIndex = lastTimeStepIndices.getInt(i);
        INDArray thisExampleProbabilities = timeSeriesOutput.get(NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(thisTimeSeriesLastIndex));
    }

Combining RNN Layers with Other Layer Types

RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.

Of course, the DenseLayer and Convolutional layers do not handle time series data - they expect a different type of input. To deal with this, we need to use the layer preprocessor functionality: for example, the CnnToRnnPreProcessor and FeedForwardToRnnPreprocessor classes. See here for all preprocessors. Fortunately, in most situations, the DL4J configuration system will automatically add these preprocessors as required. However, the preprocessors can be added manually (overriding the automatic addition of preprocessors, for each layer).

For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: .inputPreProcessor(2, new RnnToFeedForwardPreProcessor()).

Inference: Predictions One Step at a Time

As with other types of neural networks, predictions can be generated for RNNs using the MultiLayerNetwork.output() and MultiLayerNetwork.feedForward() methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.

Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.

For these situations, MultiLayerNetwork provides four methods of note:

rnnTimeStep(INDArray)
rnnClearPreviousState()
rnnGetPreviousState(int layer)
rnnSetPreviousState(int layer, Map<String,INDArray> state)

The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.

In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:

Generate output/predictions (forward pass), using the previous stored state (if any)
Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)

For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input). If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.

Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:

For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:

There are a number of important differences here:

In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data
The forward pass is thus a single time step (as compared to the hundreds – or more)
After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.

However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the MultiLayerNetwork.rnnClearPreviousState() method. This will reset the internal state of all recurrent layers in the network.

If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is not saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.

Some other points of note:

We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.
If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.
The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:
- For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]
- For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.
It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.
The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.
The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).
The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.

Loading Time Series Data

Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.

The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file. This method also supports:

Variable length time series input
One-to-many and many-to-one data loading (where input and labels are in different files)
Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])
Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)

Note that in all cases, each line in the data files represents one time step.

(In addition to the examples below, you might find these unit tests to be of some use.)

Example 1: Time Series of Same Length, Input and Labels in Separate Files

Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).

To use the SequenceRecordReaderDataSetIterator and CSVSequenceRecordReader approaches, we first create two CSVSequenceRecordReader objects, one for input and one for labels:

SequenceRecordReader featureReader = new CSVSequenceRecordReader(1, ",");
SequenceRecordReader labelReader = new CSVSequenceRecordReader(1, ",");

This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).

Second, we need to initialize these two readers, by telling them where to get the data from. We do this with an InputSplit object. Suppose that our time series are numbered, with file names "myInput_0.csv", "myInput_1.csv", ..., "myLabels_0.csv", etc. One approach is to use the NumberedFileInputSplit:

featureReader.initialize(new NumberedFileInputSplit("/path/to/data/myInput_%d.csv", 0, 9));
labelReader.initialize(new NumberedFileInputSplit(/path/to/data/myLabels_%d.csv", 0, 9));

In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.

Finally, we can create our SequenceRecordReaderdataSetIterator:

DataSetIterator iter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression);

This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.

The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.

Note that:

For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.
- Labels data: one value per line, as a class index
- Label data will be converted to a one-hot representation automatically
For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.
- The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)
- No processing of the labels is done when regression = true

Example 2: Time Series of Same Length, Input and Labels in Same File

Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.

As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)

In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":

SequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
reader.initialize(new NumberedFileInputSplit("/path/to/data/myData_%d.csv", 0, 9));
DataSetIterator iterClassification = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, numPossibleLabels, labelIndex, false);

miniBatchSize and numPossibleLabels are the same as the previous example. Here, labelIndex specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).

For regression on a single output value, we use:

DataSetIterator iterRegression = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, -1, labelIndex, true);

Again, the numPossibleLabels argument is not used for regression.

Example 3: Time Series of Different Lengths (Many-to-Many)

Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.

We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:

DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:

That the time series may be of different lengths
To align the input and labels - for each example individually - such that their last values occur at the same time step.

Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.

Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.

Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.

Example 4: Many-to-One and One-to-Many Data

We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:

Input and labels are in separate delimited files
The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)
The input lengths may (optionally) differ between examples

In fact, the same approach as in example 3 can do this:

DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):

The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.

Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:

Available layers

LSTM

[source]

LSTM recurrent neural network layer without peephole connections. Supports CuDNN acceleration - see cuDNN for details

RnnLossLayer

[source]

Recurrent Neural Network Loss Layer. Handles calculation of gradients etc for various objective (loss) time distributed dense component here. Consequently, the output activations size is equal to the input size. Input and output activations are same as other RNN layers: 3 dimensions with shape [miniBatchSize,nIn,timeSeriesLength] and [miniBatchSize,nOut,timeSeriesLength] respectively. Note that RnnLossLayer also has the option to configure an activation function

setNIn

public void setNIn(int nIn)

param lossFunction Loss function for the loss layer

RnnOutputLayer

[source]

and labels of shape [minibatch,nOut,sequenceLength]. It also supports mask arrays. Note that RnnOutputLayer can also be used for 1D CNN layers, which also have [minibatch,nOut,sequenceLength] activations/labels shape.

build

public RnnOutputLayer build()

param lossFunction Loss function for the output layer

Bidirectional

[source]

Bidirectional is a “wrapper” layer: it wraps any uni-directional RNN layer to make it bidirectional. Note that multiple different modes are supported - these specify how the activations should be combined from the forward and separate copies of the wrapped RNN layer, each with separate parameters.

getNOut

public long getNOut()

This Mode enumeration defines how the activations for the forward and backward networks should be combined. ADD: out = forward + backward (elementwise addition) MUL: out = forward backward (elementwise multiplication) AVERAGE: out = 0.5 (forward + backward) CONCAT: Concatenate the activations. Where ‘forward’ is the activations for the forward RNN, and ‘backward’ is the activations for the backward RNN. In all cases except CONCAT, the output activations size is the same size as the standard RNN that is being wrapped by this layer. In the CONCAT case, the output activations size (dimension 1) is 2x larger than the standard RNN’s activations array.

getUpdaterByParam

public IUpdater getUpdaterByParam(String paramName)

Get the updater for the given parameter. Typically the same updater will be used for all updaters, but this is not necessarily the case

param paramName Parameter name
return IUpdater for the parameter

LastTimeStep

[source]

LastTimeStep is a “wrapper” layer: it wraps any RNN (or CNN1D) layer, and extracts out the last time step during forward pass, and returns it as a row vector (per example). That is, for 3d (time series) input (with shape [minibatch, layerSize, timeSeriesLength]), we take the last time step and return it as a 2d array with shape [minibatch, layerSize]. Note that the last time step operation takes into account any mask arrays, if present: thus, variable length time series (in the same minibatch) are handled as expected here.

SimpleRnn

[source]

activationFn( in_t inWeight + out_(t-1) recurrentWeights + bias)}.

Note that other architectures (LSTM, etc) are usually much more effective, especially for longer time series; however SimpleRnn is very fast to compute, and hence may be considered where the length of the temporal dependencies in the dataset are only a few steps long.

Overview

Comprehensive programming guide for ND4J. This user guide is designed to explain (and provide examples for) the main functionality in ND4J.

Introduction

An NDArray is in essence n-dimensional array: i.e., a rectangular array of numbers, with some number of dimensions.

Some concepts you should be familiar with:

The rank of a NDArray is the number of dimensions. 2d NDArrays have a rank of 2, 3d arrays have a rank of 3, and so on. You can create NDArrays with any arbitrary rank.
The shape of an NDArray defines the size of each of the dimensions. Suppose we have a 2d array with 3 rows and 5 columns. This NDArray would have shape [3,5]
The length of an NDArray defines the total number of elements in the array. The length is always equal to the product of the values that make up the shape.
The stride of an NDArray is defined as the separation (in the underlying data buffer) of contiguous elements in each dimension. Stride is defined per dimension, so a rank N NDArray has N stride values, one for each dimension. Note that most of the time, you don't need to know (or concern yourself with) the stride - just be aware that this is how ND4J operates internally. The next section has an example of strides.
The data type of an NDArray refers to the type of data of an NDArray (for example, float or double precision). Note that this is set globally in ND4J, so all NDArrays should have the same data type. Setting the data type is discussed later in this document.

In terms of indexing there are a few things to know. First, rows are dimension 0, and columns are dimension 1: thus INDArray.size(0) is the number of rows, and INDArray.size(1) is the number of columns. Like normal arrays in most programming languages, indexing is zero-based: thus rows have indexes 0 to INDArray.size(0)-1, and so on for the other dimensions.

Throughout this document, we'll use the term NDArray to refer to the general concept of an n-dimensional array; the term INDArray refers specifically to the Java interface that ND4J defines. In practice, these two terms can be used interchangeably.

NDArrays: How Are They Stored in Memory?

The next few paragraphs describe some of architecture behind ND4J. Understanding this is not strictly necessary in order to use ND4J, but it may help you to understand what is going on behind the scenes. NDArrays are stored in memory as a single flat array of numbers (or more generally, as a single contiguous block of memory), and hence differs a lot from typical Java multidimensional arrays such as a float[][] or double[][][].

Physically, the data that backs an INDArray is stored off-heap: that is, it is stored outside of the Java Virtual Machine (JVM). This has numerous benefits, including performance, interoperability with high-performance BLAS libraries, and the ability to avoid some shortcomings of the JVM in high-performance computing (such as issues with Java arrays being limited to 2^31 -1 (2.14 billion) elements due to integer indexing).

In terms of encoding, an NDArray can be encoded in either C (row-major) or Fortran (column-major) order. For more details on row vs. column major order, see Wikipedia. Nd4J may use a combination of C and F order arrays together, at the same time. Most users can just use the default array ordering, but note that it is possible to use a specific ordering for a given array, should the need arise.

The following image shows how a simple 3x3 (2d) NDArray is stored in memory,

In the above array, we have:

Shape = [3,3] (3 rows, 3 columns)
Rank = 2 (2 dimensions)
Length = 9 (3x3=9)
Stride
- C order stride: [3,1]: the values in consecutive rows are separated in the buffer by 3, and the values consecutive columns are separated in the buffer by 1
- F order stride: [1,3]: the values in consecutive rows are separated in the buffer by 1, and the values in consecutive columns are separated in the buffer by 3

Views: When Two or More NDArrays Refer to the Same Data

A key concept in ND4J is the fact that two NDArrays can actually point to the same underlying data in memory. Usually, we have one NDArray referring to some subset of another array, and this only occurs for certain operations (such as INDArray.get(), INDArray.transpose(), INDArray.getRow() etc. This is a powerful concept, and one that is worth understanding.

There are two primary motivations for this:

There are considerable performance benefits, most notably in avoiding copying arrays
We gain a lot of power in terms of how we can perform operations on our NDArrays

Consider a simple operation like a matrix transpose on a large (10,000 x 10,000) matrix. Using views, we can perform this matrix transpose in constant time without performing any copies (i.e., O(1) in big O notation), avoiding the considerable cost copying all of the array elements. Of course, sometimes we do want to make a copy - at which point we can use the INDArray.dup() to get a copy. For example, to get a copy of a transposed matrix, use INDArray out = myMatrix.transpose().dup(). After this dup() call, there will be no link between the original array myMatrix and the array out (thus, changes to one will not impact the other).

So see how views can be powerful, consider a simple task: adding 1.0 to the first row of a larger array, myArray. We can do this easily, in one line:

myArray.getRow(0).addi(1.0)

Let's break down what is happening here. First, the getRow(0) operation returns an INDArray that is a view of the original. Note that both myArrays and myArray.getRow(0) point to the same area in memory:

then, after the addi(1.0) is performed, we have the following situation:

As we can see, changes to the NDArray returned by myArray.getRow(0) will be reflected in the original array myArray; similarly, changes to myArray will be reflected in the row vector.

Creating NDArrays

Zero, One and Scalar-Value Initialized Arrays

Two of the most commonly used methods of creating arrays are:

Nd4j.zeros(int...)
Nd4j.ones(int...)

The shape of the arrays are specified as integers. For example, to create a zero-filled array with 3 rows and 5 columns, use Nd4j.zeros(3,5).

These can often be combined with other operations to create arrays with other values. For example, to create an array filled with 10s:

INDArray tens = Nd4j.zeros(3,5).addi(10)

The above initialization works in two steps: first by allocating a 3x5 array filled with zeros, and then by adding 10 to each value.

Random Arrays

Nd4j provides a few methods to generate INDArrays, where the contents are pseudo-random numbers.

To generate uniform random numbers in the range 0 to 1, use Nd4j.rand(int nRows, int nCols) (for 2d arrays), or Nd4j.rand(int[]) (for 3 or more dimensions).

Similarly, to generate Gaussian random numbers with mean zero and standard deviation 1, use Nd4j.randn(int nRows, int nCols) or Nd4j.randn(int[]).

For repeatability (i.e., to set Nd4j's random number generator seed) you can use Nd4j.getRandom().setSeed(long)

Creating NDArrays from Java arrays

Nd4j provides convenience methods for the creation of arrays from Java float and double arrays.

To create a 1d NDArray from a 1d Java array, use:

Row vector: Nd4j.create(float[]) or Nd4j.create(double[])
Column vector: Nd4j.create(float[],new int[]{length,1}) or Nd4j.create(double[],new int[]{length,1})

For 2d arrays, use Nd4j.create(float[][]) or Nd4j.create(double[][]).

For creating NDArrays from Java primitive arrays with 3 or more dimensions (double[][][] etc), one approach is to use the following:

double[] flat = ArrayUtil.flattenDoubleArray(myDoubleArray);
int[] shape = ...;    //Array shape here
INDArray myArr = Nd4j.create(flat,shape,'c');

Creating NDArrays from Other NDArrays

There are three primary ways of creating arrays from other arrays:

Creating an exact copy of an existing NDArray using INDArray.dup()
Create the array as a subset of an existing NDArray
Combine a number of existing NDArrays to create a new NDArray

For the second case, you can use getRow(), get(), etc. See Getting and Setting Parts of NDArrays for details on this.

Two methods for combining NDArrays are Nd4j.hstack(INDArray...) and Nd4j.vstack(INDArray...).

hstack (horizontal stack) takes as argument a number of matrices that have the same number of rows, and stacks them horizontally to produce a new array. The input NDArrays can have a different number of columns, however.

Example:

int nRows = 2;
int nColumns = 2;
// Create INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
//hstack
INDArray hstack = Nd4j.hstack(ones,zeros);
System.out.println("### HSTACK ####");
System.out.println(hstack);

Output:

### HSTACK ####
[[1.00, 1.00, 0.00, 0.00],
[1.00, 1.00, 0.00, 0.00]]

vstack (vertical stack) is the vertical equivalent of hstack. The input arrays must have the same number of columns.

Example:

int nRows = 2;
int nColumns = 2;
// Create INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
//vstack
INDArray vstack = Nd4j.vstack(ones,zeros);
System.out.println("### VSTACK ####");
System.out.println(vstack);

Output:

### VSTACK ####
[[1.00, 1.00],
 [1.00, 1.00],
 [0.00, 0.00],
 [0.00, 0.00]]

ND4J.concat combines arrays along a dimension.

Example:

int nRows = 2;
int nColumns = 2;
//INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
// Concat on dimension 0
INDArray combined = Nd4j.concat(0,zeros,ones);
System.out.println("### COMBINED dimension 0####");
System.out.println(combined);
//Concat on dimension 1
INDArray combined2 = Nd4j.concat(1,zeros,ones);
System.out.println("### COMBINED dimension 1 ####");
System.out.println(combined2);

Output:

### COMBINED dimension 0####
[[0.00, 0.00],
 [0.00, 0.00],
 [1.00, 1.00],
 [1.00, 1.00]]
### COMBINED dimension 1 ####
[[0.00, 0.00, 1.00, 1.00],
 [0.00, 0.00, 1.00, 1.00]]

ND4J.pad is used to pad an array.

Example:

int nRows = 2;
int nColumns = 2;
// Create INDArray of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
// pad the INDArray
INDArray padded = Nd4j.pad(ones, new int[]{1,1}, Nd4j.PadMode.CONSTANT );
System.out.println("### Padded ####");
System.out.println(padded);

Output:

### Padded ####
[[0.00, 0.00, 0.00, 0.00],
 [0.00, 1.00, 1.00, 0.00],
 [0.00, 1.00, 1.00, 0.00],
 [0.00, 0.00, 0.00, 0.00]]

One other method that can occasionally be useful is Nd4j.diag(INDArray in). This method has two uses, depending on the argument in:

If in in a vector, diag outputs a NxN matrix with the diagonal equal to the array in (where N is the length of in)
If in is a NxN matrix, diag outputs a vector taken from the diagonal of in

Miscellaneous NDArray Creation Methods

To create an identity matrix of size N, you can use Nd4j.eye(N).

To create a row vector with elements [a, a+1, a+2, ..., b] you can use the linspace command:

Nd4j.linspace(a, b, b-a+1)

Linspace can be combined with a reshape operation to get other shapes. For example, if you want a 2d NDArray with 5 rows and 5 columns, with values 1 to 25 inclusive, you can use the following:

Nd4j.linspace(1,25,25).reshape(5,5)

Getting and Setting Individual Values

For an INDArray, you can get or set values using the indexes of the element you want to get or set. For a rank N array (i.e., an array with N dimensions) you need N indices.

Note: getting or setting values individually (for example, one at a time in a for loop) is generally a bad idea in terms of performance. When possible, try to use other INDArray methods that operate on a large number of elements at a time.

To get values from a 2d array, you can use: INDArray.getDouble(int row, int column)

For arrays of any dimensionality, you can use INDArray.getDouble(int...). For example, to get the value at index i,j,k use INDArray.getDouble(i,j,k)

To set values, use one of the putScalar methods:

INDArray.putScalar(int[],double)
INDArray.putScalar(int[],float)
INDArray.putScalar(int[],int)

Here, the int[] is the index, and the double/float/int is the value to be placed at that index.

Some additional functionality that might be useful in certain circumstances is the NDIndexIterator class. The NDIndexIterator allows you to get the indexes in a defined order (specifially, the C-order traversal order: [0,0,0], [0,0,1], [0,0,2], ..., [0,1,0], ... etc for a rank 3 array).

To iterate over the values in a 2d array, you can use:

NdIndexIterator iter = new NdIndexIterator(nRows, nCols);
while (iter.hasNext()) {
    int[] nextIndex = iter.next();
    double nextVal = myArray.getDouble(nextIndex);
    //do something with the value
}

Getting and Setting Parts of NDArrays

getRow() and putRow()

In order to get a single row from an INDArray, you can use INDArray.getRow(int). This will obviously return a row vector. Of note here is that this row is a view: changes to the returned row will impact the original array. This can be quite useful at times (for example: myArr.getRow(3).addi(1.0) to add 1.0 to the third row of a larger array); if you want a copy of a row, use getRow(int).dup().

Simiarly, to get multiple rows, use INDArray.getRows(int...). This returns an array with the rows stacked; note however that this will be a copy (not a view) of the original rows, a view is not possible here due to the way NDArrays are stored in memory.

For setting a single row, you can use myArray.putRow(int rowIdx,INDArray row). This will set the rowIdxth row of myArray to the values contained in the INDArray row.

Sub-Arrays: get(), put() and NDArrayIndex

Get:

A more powerful and general method is to use INDArray.get(NDArrayIndex...). This functionality allows you to get an arbitrary sub-arrays based on certain indexes. This is perhaps best explained by some examples:

To get a single row (and all columns), you can use:

myArray.get(NDArrayIndex.point(rowIdx), NDArrayIndex.all())

To get a range of rows (row a (inclusive) to row b (exclusive)) and all columns, you can use:

myArray.get(NDArrayIndex.interval(a,b), NDArrayIndex.all())

To get all rows and every second column, you can use:

myArray.get(NDArrayIndex.all(),NDArrayIndex.interval(0,2,nCols))

Though the above examples are for 2d arrays only, the NDArrayIndex approach extends to 3 or more dimensions. For 3 dimension, you would provide 3 INDArrayIndex objects instead of just two, as above.

Note that the NDArrayIndex.interval(...), .all() and .point(int) methods always return views of the underlying arrays. Thus, changes to the arrays returned by .get() will be reflected in the original array.

Put:

The same NDArrayIndex approach is also used to put elements to another array: in this case you use the INDArray.put(INDArrayIndex[], INDArray toPut) method. Clearly, the size of the NDArray toPut must match the size implied by the provided indexes.

Also note that myArray.put(NDArrayIndex[],INDArray other) is functionally equivalent to doing myArray.get(INDArrayIndex...).assign(INDArray other). Again, this is because .get(INDArrayIndex...) returns a view of the underlying array, not a copy.

Tensor Along Dimension

(Note: ND4J versions 0.4-rc3.8 and earlier returned slightly different results for tensor along dimension, as compared to current versions).

Tensor along dimension is a powerful technique, but can be a little hard to understand at first. The idea behind tensor along dimension (hereafter refered to as TAD) is to get a lower rank sub-array that is a view of the original array.

The tensor along dimension method takes two arguments:

The index of the tensor to return (in the range of 0 to numTensors-1)
The dimensions (1 or more values) along which to execute the TAD operation

The simplest case is a tensor along a single row or column of a 2d array. Consider the following diagram (where dimension 0 (rows) are indexed going down the page, and dimension 1 (columns) are indexed going across the page):

Note that the output of the tensorAlongDimension call with one dimension is a row vector in all cases.

To understand why we get this output: consider the first case in the above diagram. There, we are taking the 0th (first) tensor along dimension 0 (dimension 0 being rows); the values (1,5,2) are in a line as we move along dimension 0, hence the output. Similarly, the tensorAlongDimension(1,1) is the second (index=1) tensor along dimension 1; values (5,3,5) are in a line as we move along dimension 1.

The TAD operation can also be executed along multiple dimensions. For example, by specifying two dimensions to execute the TAD operation along, we can use it to get a 2d sub-array from a 3d (or 4d, or 5d...) array. Similarly, by specifying 3 dimensions, we can use it to get a 3d from 4d or higher.

There are two things we need to know about the output, for the TAD operation to be useful.

First, we need to the number of tensors that we can get, for a given set of dimensions. To determine this, we can use the "number of tensors along dimensions" method, INDArray.tensorssAlongDimension(int... dimensions). This method simply returns the number of tensors along the specified dimensions. In the examples above, we have:

myArray.tensorssAlongDimension(0) = 3
myArray.tensorssAlongDimension(1) = 3
myArray.tensorssAlongDimension(0,1) = 1
myArray.tensorssAlongDimension(1,0) = 1

(In the latter 2 cases, note that tensor along dimension would give us the same array out as the original array in - i.e., we get a 2d output from a 2d array).

More generally, the number of tensors is given by the product of the remaining dimensions, and the shape of the tensors is given by the size of the specified dimensions in the original shape.

Here's some examples:

For input shape [a,b,c], tensorssAlongDimension(0) gives b*c tensors, and tensorAlongDimension(i,0) returns tensors of shape [1,a].
For input shape [a,b,c], tensorssAlongDimension(1) gives a*c tensors, and tensorAlongDimension(i,1) returns tensors of shape [1,b].
For input shape [a,b,c], tensorssAlongDimension(0,1) gives c tensors, and tensorAlongDimension(i,0,1) returns tensors of shape [a,b].
For input shape [a,b,c], tensorssAlongDimension(1,2) gives a tensors, and tensorAlongDimension(i,1,2) returns tensors of shape [b,c].
For input shape [a,b,c,d], tensorssAlongDimension(1,2) gives a*d tensors, and tensorAlongDimension(i,1,2) returns tensors of shape [b,c].
For input shape [a,b,c,d], tensorssAlongDimension(0,2,3) gives b tensors, and tensorAlongDimension(i,0,2,3) returns tensors of shape [a,c,d].

Slice

[This section: Forthcoming.]

Performing Operations on NDArrays

Nd4J has the concept of ops (operations) for many things you might want to do with (or to) an INDArray. For example, ops are used to apply things like tanh operations, or add a scalar, or do element-wise operations.

ND4J defines five types of operations:

Scalar
Transform
Accumulation
Index Accumulation
Broadcast

And two methods of executing each:

Directly on the entire INDArray, or
Along a dimension

Before getting into the specifics of these operations, let's take a moment to consider the difference between in-place and copy operations.

Many ops have both in-place and copy operations. Suppose we want to add two arrays. Nd4j defines two methods for this: INDArray.add(INDArray) and INDArray.addi(INDArray). The former (add) is a copy operation; the latter is an in-place operation - the i in addi means in-place. This convention (...i means in-place, no i means copy) holds for other ops that are accessible via the INDArray interface.

Suppose we have two INDArrays x and y and we do INDArray z = x.add(y) or INDArray z = x.addi(y). The results of these operations are shown below.

Note that with the x.add(y) operation, the original array x is not modified. Comparatively, with the in-place version x.addi(y), the array x is modified. In both versions of the add operation, an INDArray is returned that contains the result. Note however that in the case of the addi operation, the result array us actually just the original array x.

Scalar Ops

Scalar ops are element-wise operations that also take a scalar (i.e., a number). Examples of scalar ops are add, max, multiply, set and divide operations (see the previous link for a full list).

A number of the methods such as INDArray.addi(Number) and INDArray.divi(Number) actually execute scalar ops behind the scenes, so when available, it is more convenient to use these methods.

To execute a scalar op more directly, you can use for example:

Nd4j.getExecutioner().execAndReturn(new ScalarAdd(myArray,1.0))

Note that myArray is modified by this operation. If this is not what you want, use myArray.dup().

Unlike the remaining ops, scalar ops don't have a sensible interpretation of executing them along a dimension.

Transform Ops

Transform ops are operations such as element-wise logarithm, cosine, tanh, rectified linear, etc. Other examples include add, subtract and copy operations. Transform ops are commonly used in an element-wise manner (such as tanh on each element), but this is not always the case - for example, softmax is typically executed along a dimension.

To execute an element-wise tanh operation directly (on the full NDArray) you can use:

INDArray tanh = Nd4j.getExecutioner().execAndReturn(new Tanh(myArr)) As with scalar ops mentioned above, transform operations using the above method are in-place operations: that is, the NDArray myArr is modified, and the returned array tanh is actually the same object as the input myArr. Again, you can use myArr.dup() if you want a copy.

The Transforms class also defines some convenience methods, such as: INDArray tanh = Transforms.tanh(INDArray in,boolean copy); This is equivalent to the method using Nd4j.getExecutioner() above.

Accumulation (Reduction) Ops

When it comes to executing accumulations, there is a key difference between executing the accumulation on the entire NDArray, versus executing along a particular dimension (or dimensions). In the first case (executing on the entire array), only a single value is returned. In the second case (accumulating along a dimension) a new INDArray is returned.

To get the sum of all values in the array:

double sum = Nd4j.getExecutioner().execAndReturn(new Sum(myArray)).getFinalResult().doubleValue();

or equivalently (and more conveniently)

double sum = myArray.sumNumber().doubleValue();

Accumulation ops can also be executed along a dimension. For example, to get the sum of all values in each column (in each column = along dimension 0, or "for values in each row"), you can use:

INDArray sumOfColumns = Nd4j.getExecutioner().exec(new Sum(myArray),0);

or equivalently,

INDArray sumOfColumns = myArray.sum(0)

Suppose this was executed on a 3x3 input array. Visually, this sum operation along dimension 0 operation looks like:

Note that here, the input has shape [3,3] (3 rows, 3 columns) and the output has shape [1,3] (i.e., our output is a row vector). Had we instead done the operation along dimension 1, we would get a column vector with shape [3,1], with values (12,13,11).

Accumulations along dimensions also generalize to NDArrays with 3 or more dimensions.

Index Accumulation Ops

Index accumulation ops are very similar to accumulation ops. The difference is that they return an integer index, instead of a double values.

Examples of index accumulation ops are IMax (argmax), IMin (argmin) and IAMax (argmax of absolute values).

To get the index of the maximum value in the array:

int idx = Nd4j.getExecutioner().execAndReturn(new IAMax(myArray)).getFinalResult();

Index accumulation ops are often most useful when executed along a dimension. For example, to get the index of the maximum value in each column (in each column = along dimension 0), you can use:

INDArray idxOfMaxInEachColumn = Nd4j.getExecutioner().exec(new IAMax(myArray),0);

Suppose this was executed on a 3x3 input array. Visually, this argmax/IAMax operation along dimension 0 operation looks like:

As with the accumulation op described above, the output has shape [1,3]. Again, had we instead done the operation along dimension 1, we would get a column vector with shape [3,1], with values (1,0,2).

Broadcast and Vector Ops

ND4J also defines broadcast and vector operations.

Some of the more useful operations are vector operations, such as addRowVector and muliColumnVector.

Consider for example the operation x.addRowVector(y) where x is a matrix and y is a row vector. In this case, the addRowVector operation adds the row vector y to each row of the matrix x, as shown below.

As with other ops, there are inplace and copy versions. There are also column column versions of these operations, such as addColumnVector, which adds a column vector to each column of the original INDArray.

Boolean Indexing: Selectively Apply Operations Based on a Condition

[This section: Forthcoming.]

Link: Boolean Indexing Unit Tests

Workspaces

Workspaces are a feature of ND4J used to improve performance, by means of more efficient memory allocation and management. Specifically, workspaces are designed for cyclical workloads - such as training neural networks - as they allow for off-heap memory reuse (instead of continually allocating and deallocating memory on each iteration of the loop). The net effect is improved performance and reduced memory use.

For more details on workspaces, see the following links:

Deeplearning4j Guide to Workspaces
Workspaces Examples

Workspaces: Scope Panic

Sometimes with workspaces, you may encounter an exception such as:

org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic

org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses outdated workspace pointer from workspace [LOOP_EXTERNAL]
For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic

Understanding Scope Panic Exceptions

In short: these exceptions mean that an INDArray that has been allocated in a workspace is being used incorrectly (for example, a bug or incorrect implementation of some method). This can occur for two reasons:

The INDArray has 'leaked out' of the workspace in which is was defined
The INDArray is used within the correct workspace, but from a previous iteration

In both cases, the underlying off-heap memory that the INDArray points to has been invalidated, and can no longer be used.

An example sequence of events leading to a workspace leak: 1. Workspace W is opened 2. INDArray X is allocated in workspace W 3. Workspace W is closed, and hence the memory for X is no longer valid. 4. INDArray X is used in some operation, resulting in an exception

An example sequence of events, leading to an outdated workspace pointer: 1. Workspace W is opened (iteration 1) 2. INDArray X is allocated in workspace W (iteration 1) 3. Workspace W is closed (iteration 1) 4. Workspace W is opened (iteration 2) 5. INDArray X (from iteration 1) is used in some operation, resulting in an exception

Workarounds and Fixes for Scope Panic Exceptions

There are two basic solutions, depending on the cause.

First. if you have implemented some custom code (or are using workspaces manually), this usually indicates a bug in your code. Generally, you have two options: 1. Detach the INDArray from all workspace, using the INDArray.detach() method. The consequence is that the returned array is no longer associated with a workspace, and can be used freely within or outside of any workspace. 2. Don't allocate the array in the workspace in the first place. You can temporarily 'turn off' a workspace using: try(MemoryWorkspace scopedOut = Nd4j.getWorkspaceManager().scopeOutOfWorkspaces()){ <your code here> }. The consequence is that any new arrays (created via Nd4j.create, for example) within the try block will not be associated with a workspace, and can be used outside of a workspace 3. Move/copy the array to a parent workspace, using one of the INDArray.leverage() or leverageTo(String) or migrate() methods. See the Javadoc of these methods for more details.

Second, if you are using workspaces as part of Deeplearning4j and have not implemented any custom functionality (i.e., you have not written your own layer, data pipeline, etc), then (on the off-chance you run into this), this most likely indicates a bug in the underlying library, which usually should be reported via a Github issue. One possible workaround in the mean time is to disable workspaces using the following code:

.trainingWorkspaceMode(WorkspaceMode.NONE)
.inferenceWorkspaceMode(WorkspaceMode.NONE)

If the exception is due to an issue in the data pipeline, you can try wrapping your DataSetIterator or MultiDataSetIterator in an AsyncShieldDataSetIterator or AsyncShieldMultiDataSetIterator.

For either cause, a final solution - if you are sure your code is correct - is to try disabling scope panic. Note that this is NOT recommended and can crash the JVM if a legitimate issue is present. To do this, use Nd4j.getExecutioner().setProfilingMode(OpExecutioner.ProfilingMode.DISABLED); before executing your code.

Advanced and Miscellaneous Topics

Setting the data type

ND4J currently allows INDArrays to be backed by either float or double-precision values. The default is single-precision (float). To set the order that ND4J uses for arrays globally to double precision, you can use:

Nd4j.setDataType(DataBuffer.Type.DOUBLE);

Note that this should be done before using ND4J operations or creating arrays.

Alternatively, you can set the property when launching the JVM:

-Ddtype=double

Reshaping

[This section: Forthcoming.]

Flattening

Flattening is the process of taking a or more INDArrays and converting them into a single flat array (a row vector), given some traversal order of the arrays.

Nd4j provides the following methods for this:

Nd4j.toFlattened(char order, INDArray... arrays)
Nd4j.toFlattened(char order, Collection<INDArray>)

Nd4j also provides overloaded toFlattened methods with the default ordering. The order argument must be 'c' or 'f', and defines the order in which values are taken from the arrays: c order results in the arrays being flattened using array indexes in an order like [0,0,0], [0,0,1], etc (for 3d arrays) whereas f order results in values being taken in order [0,0,0], [1,0,0], etc.

Permute

[This section: Forthcoming.]

sortRows/sortColumns

[This section: Forthcoming.]

Directly accessing BLAS operations

[This section: Forthcoming.]

Serialization

Nd4j provides serialization of INDArrays many formats. Here are some examples for binary and text serialization:

import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.serde.binary.BinarySerde;

import java.io.*;
import java.nio.ByteBuffer;

INDArray arrWrite = Nd4j.linspace(1,10,10);
INDArray arrRead;

//1. Binary format
//   Close the streams manually or use try with resources.
try (DataOutputStream sWrite = new DataOutputStream(new FileOutputStream(new File("tmp.bin")))) {
    Nd4j.write(arrWrite, sWrite);
    }

try (DataInputStream sRead = new DataInputStream(new FileInputStream(new File("tmp.bin")))) {
    arrRead = Nd4j.read(sRead);
    }

//2. Binary format using java.nio.ByteBuffer;
ByteBuffer buffer = BinarySerde.toByteBuffer(arrWrite);
arrRead = BinarySerde.toArray(buffer);

//3. Text format
Nd4j.writeTxt(arrWrite, "tmp.txt");
arrRead = Nd4j.readTxt("tmp.txt");

// To read csv format:
// The writeNumpy method has been deprecated.
arrRead =Nd4j.readNumpy("tmp.csv", ", ");

The nd4j-serde directory provides packages for Aeron, base64, camel-routes, gsom, jackson and kryo.

Quick Reference: A Summary Overview of ND4J Methods

This section lists the most commonly used operations in ND4J, in a summary form. More details on most of these can be found later in this page.

In this section, assume that arr, arr1 etc are INDArrays.

Creating NDArrays:

Create a zero-initialized array: Nd4j.zeros(nRows, nCols) or Nd4j.zeros(int...)
Create a one-initialized array: Nd4j.ones(nRows, nCols)
Create a copy (duplicate) of an NDArray: arr.dup()
Create a row/column vector from a double[]: myRow = Nd4j.create(myDoubleArr), myCol = Nd4j.create(myDoubleArr,new int[]{10,1})
Create a 2d NDArray from a double[][]: Nd4j.create(double[][])
Stacking a set of arrays to make a larger array: Nd4j.hstack(INDArray...), Nd4j.vstack(INDArray...) for horizontal and vertical respectively
Uniform random NDArrays: Nd4j.rand(int,int), Nd4j.rand(int[]) etc
Normal(0,1) random NDArrays: Nd4j.randn(int,int), Nd4j.randn(int[])

Determining the Size/Dimensions of an INDArray:

The following methods are defined by the INDArray interface:

Get the number of dimensions: rank()
For 2d NDArrays only: rows(), columns()
Size of the ith dimension: size(i)
Get the size of all dimensions, as an int[]: shape()
Determine the total number of elements in array: arr.length()
See also: isMatrix(), isVector(), isRowVector(), isColumnVector()

Getting and Setting Single Values:

Get the value at row i, column j: arr.getDouble(i,j)
Getting a values from a 3+ dimenional array: arr.getDouble(int[])
Set a single value in an array: arr.putScalar(int[],double)

Scalar operations: Scalar operations take a double/float/int value and do an operation for each As with element-wise operations, there are in-place and copy operations.

Add a scalar: arr1.add(myDouble)
Substract a scalar: arr1.sub(myDouble)
Multiply by a scalar: arr.mul(myDouble)
Divide by a scalar: arr.div(myDouble)
Reverse subtract (scalar - arr1): arr1.rsub(myDouble)
Reverse divide (scalar / arr1): arr1.rdiv(myDouble)

Element-Wise Operations: Note: there are copy (add, mul, etc) and in-place (addi, muli) operations. The former: arr1 is not modified. In the latter: arr1 is modified

Adding: arr1.add(arr2)
Subtract: arr.sub(arr2)
Multiply: add1.mul(arr2)
Divide: arr1.div(arr2)
Assignment (set each value in arr1 to those in arr2): arr1.assign(arr2)

Reduction Operations (sum, etc); Note that these operations operate on the entire array. Call .doubleValue() to get a double out of the returned Number.

Sum of all elements: arr.sumNumber()
Product of all elements: arr.prod()
L1 and L2 norms: arr.norm1() and arr.norm2()
Standard deviation of all elements: arr.stdNumber()

Linear Algebra Operations:

Matrix multiplication: arr1.mmul(arr2)
Transpose a matrix: transpose()
Get the diagonal of a matrix: Nd4j.diag(INDArray)
Matrix inverse: InvertMatrix.invert(INDArray,boolean)

Getting Parts of a Larger NDArray: Note: all of these methods return

Getting a row (2d NDArrays only): getRow(int)
Getting multiple rows as a matrix (2d only): getRows(int...)
Setting a row (2d NDArrays only): putRow(int,INDArray)
Getting the first 3 rows, all columns: Nd4j.create(0).get(NDArrayIndex.interval(0,3),NDArrayIndex.all());

Element-Wise Transforms (Tanh, Sigmoid, Sin, Log etc):

Using Transforms: Transforms.sin(INDArray), Transforms.log(INDArray), Transforms.sigmoid(INDArray) etc
Directly (method 1): Nd4j.getExecutioner().execAndReturn(new Tanh(INDArray))
Directly (method 2) Nd4j.getExecutioner().execAndReturn(Nd4j.getOpFactory().createTransform("tanh",INDArray))

FAQ: Frequently Asked Questions

Q: Does ND4J support sparse arrays?

At present: no. Support for spase arrays is planned for the future.

Q: Is it possible to dynamically grow or shrink the size on an INDArray? In the current version of ND4J, this is not possible. We may add this functionality in the future, however.

There are two possible work-arounds:

Allocate a new array and do a copy (for example, a .put() operation)
Initially, pre-allocate a larger than required NDArray, and then operate on a view of that array. Then, as you need a larger array, get a larger view on the original pre-allocated array.

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

[source]

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

public ActivationLayer clone()

param activation Activation function for the layer

activation

public Builder activation(String activationFunction)

Activation function for the layer

activation

public Builder activation(IActivation activationFunction)

param activationFunction Activation function for the layer

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

DenseLayer

[source]

Dense layer: a standard fully connected feed forward layer

hasBias

public Builder hasBias(boolean hasBias)

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

public Builder hasLayerNorm(boolean hasLayerNorm)

If true (default = false): enable layer normalization on this layer

DropoutLayer

[source]

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

public DropoutLayer build()

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

param dropout Activation retain probability.

EmbeddingLayer

[source]

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

[source]

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

inputLength

public Builder inputLength(int inputLength)

Set input sequence length for this embedding layer.

param inputLength input sequence length
return Builder

inferInputLength

public Builder inferInputLength(boolean inferInputLength)

Set input sequence inference mode for embedding layer.

param inferInputLength whether to infer input length
return Builder

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

[source]

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass: - Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only - CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives - [miniBatchSize, vectorSize, 1] for RNN output, - [miniBatchSize, channels, 1, 1] for CNN output, and - [miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

public Builder poolingDimensions(int... poolingDimensions)

Pooling type for global pooling

poolingType

public Builder poolingType(PoolingType poolingType)

param poolingType Pooling type for global pooling

collapseDimensions

public Builder collapseDimensions(boolean collapseDimensions)

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]

param collapseDimensions Whether to collapse the dimensions or not

pnorm

public Builder pnorm(int pnorm)

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

param pnorm P-norm constant

LocalResponseNormalization

[source]

Local response normalization layer See section 3.3 of http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

public Builder k(double k)

LRN scaling constant k. Default: 2

public Builder n(double n)

Number of adjacent kernel maps to use when doing LRN. default: 5

param n Number of adjacent kernel maps

alpha

public Builder alpha(double alpha)

LRN scaling constant alpha. Default: 1e-4

param alpha Scaling constant

beta

public Builder beta(double beta)

Scaling constant beta. Default: 0.75

param beta Scaling constant

cudnnAllowFallback

public Builder cudnnAllowFallback(boolean allowFallback)

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

[source]

SameDiff version of a 1D locally connected layer.

nIn

public Builder nIn(int nIn)

Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int k)

param k Kernel size for the layer

stride

public Builder stride(int s)

param s Stride for the layer

padding

public Builder padding(int p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int d)

param d Dilation for the layer

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int inputSize)

Set input filter size for this locally connected 1D layer

param inputSize height of the input filters
return Builder

LocallyConnected2D

[source]

SameDiff version of a 2D locally connected layer.

setKernel

public void setKernel(int... kernel)

Number of inputs to the layer (input size)

setStride

public void setStride(int... stride)

param stride Stride for the layer. Must be 2 values (height/width)

setPadding

public void setPadding(int... padding)

param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

public void setDilation(int... dilation)

param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

public Builder nIn(int nIn)

param nIn Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int... k)

param k Kernel size for the layer. Must be 2 values (height/width)

stride

public Builder stride(int... s)

param s Stride for the layer. Must be 2 values (height/width)

padding

public Builder padding(int... p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int... d)

param d Dilation for the layer. Must be 2 values (height/width)

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int... inputSize)

Set input filter size (h,w) for this locally connected 2D layer

param inputSize pair of height and width of the input filters to this layer
return Builder

LossLayer

[source]

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

public Builder nIn(int nIn)

param lossFunction Loss function for the loss layer

OutputLayer

[source]

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

public OutputLayer build()

param lossFunction Loss function for the output layer

Pooling1D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

[source]

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

public void setKernelSize(int... kernelSize)

Kernel size

param kernelSize kernel size

setStride

public void setStride(int... stride)

Stride

param stride stride value

setPadding

public void setPadding(int... padding)

Padding

param padding padding value

Upsampling1D

[source]

sequenceLength]} Example:

If input (for a single example, with channels down page, and sequence from left to right) is:
[ A1, A2, A3]
[ B1, B2, B3]
Then output with size = 2 is:
[ A1, A1, A2, A2, A3, A3]
[ B1, B1, B2, B2, B3, B2]

size

public Builder size(int size)

Upsampling size

param size upsampling size in single spatial dimension of this 1D layer

size

public Builder size(int[] size)

Upsampling size int array with a single element. Array must be length 1

param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

[source]

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

Input (slice for one example and channel)
[ A, B ]
[ C, D ]
Size = [2, 2]
Output (slice for one example and channel)
[ A, A, B, B ]
[ A, A, B, B ]
[ C, C, D, D ]
[ C, C, D, D ]

size

public Builder size(int size)

Upsampling size int, used for both height and width

param size upsampling size in height and width dimensions

size

public Builder size(int[] size)

Upsampling size array

param size upsampling size in height and width dimensions

Upsampling3D

[source]

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

public Builder size(int size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

size

public Builder size(int[] size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

[source]

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

public void setPadding(int... padding)

Padding value for left and right. Must be length 2 array

build

public ZeroPadding1DLayer build()

param padding Padding for both the left and right

ZeroPadding3DLayer

[source]

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

public void setPadding(int... padding)

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

public ZeroPadding3DLayer build()

param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

[source]

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

public void setPadding(int... padding)

Padding value for top, bottom, left, and right. Must be length 4 array

build

public ZeroPaddingLayer build()

param padHeight Padding for both the top and bottom
param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

[source]

is a learnable weight vector of length nOut - “.” is element-wise multiplication - b is a bias vector Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

public LayerMemoryReport getMemoryReport(InputType inputType)

This is a report of the estimated memory consumption for the given layer

param inputType Input type to the layer. Memory consumption is often a function of the input type
return Memory report for the layer

RepeatVector

[source]

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

public int getRepetitionFactor()

Set repetition factor for RepeatVector layer

setRepetitionFactor

public void setRepetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

repetitionFactor

public Builder repetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

Yolo2OutputLayer

[source]

Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - https://arxiv.org/abs/1612.08242 and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

public Builder lambdaCoord(double lambdaCoord)

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

public Builder lambbaNoObj(double lambdaNoObj)

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

public Builder lossPositionScale(ILossFunction lossPositionScale)

Loss function for position/scale component of the loss function

param lossPositionScale Loss function for position/scale

lossClassPredictions

public Builder lossClassPredictions(ILossFunction lossClassPredictions)

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

public Builder boundingBoxPriors(INDArray boundingBoxes)

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

[source]

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

[source]

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

Cheat Sheet

Snippets and links for common functionality in Eclipse Deeplearning4j.

Quick reference

Deeplearning4j (and related projects) have a lot of functionality. The goal of this page is to summarize this functionality so users know what exists, and where to find more information.

Contents

Layers

Feed-Forward Layers

DenseLayer - (Source) - A simple/standard fully-connected layer
EmbeddingLayer - (Source) - Takes positive integer indexes as input, outputs vectors. Only usable as first layer in a model. Mathematically equivalent (when bias is enabled) to DenseLayer with one-hot input, but more efficient. See also: EmbeddingSequenceLayer.

Output Layers

Output layers: usable only as the last layer in a network. Loss functions are set here.

OutputLayer - (Source) - Output layer for standard classification/regression in MLPs/CNNs. Has a fully connected DenseLayer built in. 2d input/output (i.e., row vector per example).
LossLayer - (Source) - Output layer without parameters - only loss function and activation function. 2d input/output (i.e., row vector per example). Unlike Outputlayer, restricted to nIn = nOut.
RnnOutputLayer - (Source) - Output layer for recurrent neural networks. 3d (time series) input and output. Has time distributed fully connected layer built in.
RnnLossLayer - (Source) - The 'no parameter' version of RnnOutputLayer. 3d (time series) input and output.
CnnLossLayer - (Source) - Used with CNNs, where a prediction must be made at each spatial location of the output (for example: segmentation or denoising). No parameters, 4d input/output with shape [minibatch, depth, height, width]. When using softmax, this is applied depthwise at each spatial location.
Cnn3DLossLayer - (Source) - used with 3D CNNs, where a preduction must be made at each spatial location (x/y/z) of the output. Layer has no parameters, 5d data in either NCDHW or NDHWC ("channels first" or "channels last") format (configurable). Supports masking. When using Softmax, this is applied along channels at each spatial location.
Yolo2OutputLayer - (Source) - Implentation of the YOLO 2 model for object detection in images
CenterLossOutputLayer - (Source) - A version of OutputLayer that also attempts to minimize the intra-class distance of examples' activations - i.e., "If example x is in class Y, ensure that embedding(x) is close to average(embedding(y)) for all examples y in Y"

Convolutional Layers

ConvolutionLayer / Convolution2D - (Source) - Standard 2d convolutional neural network layer. Inputs and outputs have 4 dimensions with shape [minibatch,depthIn,heightIn,widthIn] and [minibatch,depthOut,heightOut,widthOut] respectively.
Convolution1DLayer / Convolution1D - (Source) - Standard 1d convolution layer
Convolution3DLayer / Convolution3D - (Source) - Standard 3D convolution layer. Supports both NDHWC ("channels last") and NCDHW ("channels first") activations format.
Deconvolution2DLayer - (Source) - also known as transpose or fractionally strided convolutions. Can be considered a "reversed" ConvolutionLayer; output size is generally larger than the input, whilst maintaining the spatial connection structure.
SeparableConvolution2DLayer - (Source) - depthwise separable convolution layer
SubsamplingLayer - (Source) - Implements standard 2d spatial pooling for CNNs - with max, average and p-norm pooling available.
Subsampling1DLayer - (Source) - 1D version of the subsampling layer.
Upsampling2D - (Source) - Upscale CNN activations by repeating the row/column values
Upsampling1D - (Source) - 1D version of the upsampling layer
Cropping2D - (Source) - Cropping layer for 2D convolutional neural networks
DepthwiseConvolution2D (Source)- 2d depthwise convolution layer
ZeroPaddingLayer - (Source) - Very simple layer that adds the specified amount of zero padding to edges of the 4d input activations.
ZeroPadding1DLayer - (Source) - 1D version of ZeroPaddingLayer
SpaceToDepth - (Source) - This operation takes 4D array in, and moves data from spatial dimensions (HW) to channels (C) for given blockSize
SpaceToBatch - (Source) - Transforms data from a tensor from 2 spatial dimensions into batch dimension according to the "blocks" specified

Recurrent Layers

LSTM - (Source) - LSTM RNN without peephole connections. Supports CuDNN.
GravesLSTM - (Source) - LSTM RNN with peephole connections. Does not support CuDNN (thus for GPUs, LSTM should be used in preference).
GravesBidirectionalLSTM - (Source) - A bidirectional LSTM implementation with peephole connections. Equivalent to Bidirectional(ADD, GravesLSTM). Due to addition of Bidirecitonal wrapper (below), has been deprecated on master.
Bidirectional - (Source) - A 'wrapper' layer - converts any standard uni-directional RNN into a bidirectional RNN (doubles number of params - forward/backward nets have independent parameters). Activations from forward/backward nets may be either added, multiplied, averaged or concatenated.
SimpleRnn - (Source) - A standard/'vanilla' RNN layer. Usually not effective in practice with long time series dependencies - LSTM is generally preferred.
LastTimeStep - (Source) - A 'wrapper' layer - extracts out the last time step of the (non-bidirectional) RNN layer it wraps. 3d input with shape [minibatch, size, timeSeriesLength], 2d output with shape [minibatch, size].
EmbeddingSequenceLayer: (Source) - A version of EmbeddingLayer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Can only be used as the first layer for a network.

Unsupervised Layers

VariationalAutoencoder - (Source) - A variational autoencoder implementation with MLP/dense layers for the encoder and decoder. Supports multiple different types of reconstruction distributions
AutoEncoder - (Source) - Standard denoising autoencoder layer

Other Layers

GlobalPoolingLayer - (Source) - Implements both pooling over time (for RNNs/time series - input size [minibatch, size, timeSeriesLength], out [minibatch, size]) and global spatial pooling (for CNNs - input size [minibatch, depth, h, w], out [minibatch, depth]). Available pooling modes: sum, average, max and p-norm.
ActivationLayer - (Source) - Applies an activation function (only) to the input activations. Note that most DL4J layers have activation functions built in as a config option.
DropoutLayer - (Source) - Implements dropout as a separate/single layer. Note that most DL4J layers have a "built-in" dropout configuration option.
BatchNormalization - (Source) - Batch normalization for 2d (feedforward), 3d (time series) or 4d (CNN) activations. For time series, parameter sharing across time; for CNNs, parameter sharing across spatial locations (but not depth).
LocalResponseNormalization - (Source) - Local response normalization layer for CNNs. Not frequently used in modern CNN architectures.
FrozenLayer - (Source) - Usually not used directly by users - added as part of transfer learning, to freeze a layer's parameters such that they don't change during further training.
LocallyConnected2D - (Source) - a 2d locally connected layer, assumes input is 4d data in NCHW ("channels first") format.
LocallyConected1D - (Source) - a 1d locally connected layer, assumes input is 3d data in NCW ([minibatch, size, sequenceLength]) format

Graph Vertices

Graph vertex: use with ComputationGraph. Similar to layers, vertices usually don't have any parameters, and may support multiple inputs.

ElementWiseVertex - (Source) - Performs an element-wise operation on the inputs - add, subtract, product, average, max
L2NormalizeVertex - (Source) - normalizes the input activations by dividing by the L2 norm for each example. i.e., out <- out / l2Norm(out)
L2Vertex - (Source) - calculates the L2 distance between the two input arrays, for each example separately. Output is a single value, for each input value.
MergeVertex - (Source) - merge the input activations along dimension 1, to make a larger output array. For CNNs, this implements merging along the depth/channels dimension
PreprocessorVertex - (Source) - a simple GraphVertex that contains an InputPreProcessor only
ReshapeVertex - (Source) - Performs arbitrary activation array reshaping. The preprocessors in the next section should usually be preferred.
ScaleVertex - (Source) - implements simple multiplicative scaling of the inputs - i.e., out = scalar * input
ShiftVertex - (Source) - implements simple scalar element-wise addition on the inputs - i.e., out = input + scalar
StackVertex - (Source) - used to stack all inputs along the minibatch dimension. Analogous to MergeVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels)
SubsetVertex - (Source) - used to get a contiguous subset of the input activations along dimension 1. For example, two SubsetVertex instances could be used to split the activations from an input array into two separate activations. Essentially the opposite of MergeVertex.
UnstackVertex - (Source) - similar to SubsetVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels). Opposite of StackVertex

Input Pre Processors

An InputPreProcessor is a simple class/interface that operates on the input to a layer. That is, a preprocessor is attached to a layer, and performs some operation on the input, before passing the layer to the output. Preprocessors also handle backpropagation - i.e., the preprocessing operations are generally differentiable.

Note that in many cases (such as the XtoYPreProcessor classes), users won't need to (and shouldn't) add these manually, and can instead just use .setInputType(InputType.feedForward(10)) or similar, which whill infer and add the preprocessors as required.

CnnToFeedForwardPreProcessor - (Source) - handles the activation reshaping necessary to transition from a CNN layer (ConvolutionLayer, SubsamplingLayer, etc) to DenseLayer/OutputLayer etc.
CnnToRnnPreProcessor - (Source) - handles reshaping necessary to transition from a (effectively, time distributed) CNN layer to a RNN layer.
ComposableInputPreProcessor - (Source) - simple class that allows multiple preprocessors to be chained + used on a single layer
FeedForwardToCnnPreProcessor - (Source) - handles activation reshaping to transition from a row vector (per example) to a CNN layer. Note that this transition/preprocessor only makes sense if the activations are actually CNN activations, but have been 'flattened' to a row vector.
FeedForwardToRnnPreProcessor - (Source) - handles transition from a (time distributed) feed-forward layer to a RNN layer
RnnToCnnPreProcessor - (Source) - handles transition from a sequence of CNN activations with shape [minibatch, depth*height*width, timeSeriesLength] to time-distributed [numExamples*timeSeriesLength, numChannels, inputWidth, inputHeight] format
RnnToFeedForwardPreProcessor - (Source) - handles transition from time series activations (shape [minibatch,size,timeSeriesLength]) to time-distributed feed-forward (shape [minibatch*tsLength,size]) activations.

Iteration/Training Listeners

IterationListener: can be attached to a model, and are called during training, once after every iteration (i.e., after each parameter update). TrainingListener: extends IterationListener. Has a number of additional methods are called at different stages of training - i.e., after forward pass, after gradient calculation, at the start/end of each epoch, etc.

Neither type (iteration/training) are called outside of training (i.e., during output or feed-forward methods)

ScoreIterationListener - (Source, Javadoc) - Logs the loss function score every N training iterations
PerformanceListener - (Source, Javadoc) - Logs performance (examples per sec, minibatches per sec, ETL time), and optionally score, every N training iterations.
EvaluativeListener - (Source, Javadoc) - Evaluates network performance on a test set every N iterations or epochs. Also has a system for callbacks, to (for example) save the evaluation results.
CheckpointListener - (Source, Javadoc) - Save network checkpoints periodically - based on epochs, iterations or time (or some combination of all three).
StatsListener - (Source) - Main listener for DL4J's web-based network training user interface. See visualization page for more details.
CollectScoresIterationListener - (Source, Javadoc) - Similar to ScoreIterationListener, but stores scores internally in a list (for later retrieval) instead of logging scores
TimeIterationListener - (Source, Javadoc) - Attempts to estimate time until training completion, based on current speed and specified total number of iterations

Evaluation

Link: Main evaluation page

ND4J has a number of classes for evaluating the performance of a network, against a test set. Deeplearning4j (and SameDiff) use these ND4J evaluation classes. Different evaluation classes are suitable for different types of networks. Note: in 1.0.0-beta3 (November 2018), all evaluation classes were moved from DL4J to ND4J; previously they were in DL4J.

Evaluation - (Source) - Used for the evaluation of multi-class classifiers (assumes standard one-hot labels, and softmax probability distribution over N classes for predictions). Calculates a number of metrics - accuracy, precision, recall, F1, F-beta, Matthews correlation coefficient, confusion matrix. Optionally calculates top N accuracy, custom binary decision thresholds, and cost arrays (for non-binary case). Typically used for softmax + mcxent/negative-log-likelihood networks.
EvaluationBinary - (Source) - A multi-label binary version of the Evaluation class. Each network output is assumed to be a separate/independent binary class, with probability 0 to 1 independent of all other outputs. Typically used for sigmoid + binary cross entropy networks.
EvaluationCalibration - (Source) - Used to evaluation the calibration of a binary or multi-class classifier. Produces reliability diagrams, residual plots, and histograms of probabilities. Export plots to HTML using EvaluationTools.exportevaluationCalibrationToHtmlFile method
ROC - (Source) - Used for single output binary classifiers only - i.e., networks with nOut(1) + sigmoid, or nOut(2) + softmax. Supports 2 modes: thresholded (approximate) or exact (the default). Calculates area under ROC curve, area under precision-recall curve. Plot ROC and P-R curves to HTML using EvaluationTools
ROCBinary - (Source) - a version of ROC that is used for multi-label binary networks (i.e., sigmoid + binary cross entropy), where each network output is assumed to be an independent binary variable.
ROCMultiClass - (Source) - a version of ROC that is used for multi-class (non-binary) networks (i.e., softmax + mcxent/negative-log-likelihood networks). As ROC metrics are only defined for binary classification, this treats the multi-class output as a set of 'one-vs-all' binary classification problems.
RegressionEvaluation - (Source) - An evaluation class used for regression models (including multi-output regression models). Reports metrics such as mean-squared error (MSE), mean-absolute error, etc for each output/column.

Network Saving and Loading

MultiLayerNetwork.save(File) and MultiLayerNetwork.load(File) methods can be used to save and load models. These use ModelSerializer internally. Similar save/load methods are also available for ComputationGraph.

MultiLayerNetwork and ComputationGraph can be saved using the ModelSerializer class - and specifically the writeModel, restoreMultiLayerNetwork and restoreComputationGraph methods.

Examples: Saving and loading network

Networks can be trained further after saving and loading: however, be sure to load the 'updater' (i.e., the historical state for updaters like momentum, ). If no futher training is required, the updater state can be ommitted to save disk space and memory.

Most Normalizers (implementing the ND4J Normalizer interface) can also be added to a model using the addNormalizerToModel method.

Note that the format used for models in DL4J is .zip: it's possible to open/extract these files using programs supporting the zip format.

Network Configurations

This section lists the various configuration options that Deeplearning4j supports.

Activation Functions

Activation functions can be defined in one of two ways: (a) By passing an Activation enumeration value to the configuration - for example, .activation(Activation.TANH) (b) By passing an IActivation instance - for example, .activation(new ActivationSigmoid())

Note that Deeplearning4j supports custom activation functions, which can be defined by extending BaseActivationFunction

List of supported activation functions:

CUBE - (Source) - f(x) = x^3
ELU - (Source) - Exponential linear unit (Reference)
HARDSIGMOID - (Source) - a piecewise linear version of the standard sigmoid activation function. f(x) = min(1, max(0, 0.2*x + 0.5))
HARDTANH - (Source) - a piecewise linear version of the standard tanh activation function.
IDENTITY - (Source) - a 'no op' activation function: f(x) = x
LEAKYRELU - (Source) - leaky rectified linear unit. f(x) = max(0, x) + alpha * min(0, x) with alpha=0.01 by default.
RATIONALTANH - (Source) - tanh(y) ~ sgn(y) * { 1 - 1/(1+|y|+y^2+1.41645*y^4)} which approximates f(x) = 1.7159 * tanh(2x/3), but should be faster to execute. (Reference)
RELU - (Source) - standard rectified linear unit: f(x) = x if x>0 or f(x) = 0 otherwise
RRELU - (Source) - randomized rectified linear unit. Deterministic during test time. (Reference)
SIGMOID - (Source) - standard sigmoid activation function, f(x) = 1 / (1 + exp(-x))
SOFTMAX - (Source) - standard softmax activation function
SOFTPLUS - (Source) - f(x) = log(1+e^x) - shape is similar to a smooth version of the RELU activation function
SOFTSIGN - (Source) - f(x) = x / (1+|x|) - somewhat similar in shape to the standard tanh activation function (faster to calculate).
TANH - (Source) - standard tanh (hyperbolic tangent) activation function
RECTIFIEDTANH - (Source) - f(x) = max(0, tanh(x))
SELU - (Source) - scaled exponential linear unit - used with self normalizing neural networks
SWISH - (Source) - Swish activation function, f(x) = x * sigmoid(x) (Reference)

Weight Initialization

Weight initialization refers to the method by which the initial parameters for a new network should be set.

Weight initialization are usually defined using the WeightInit enumeration.

Custom weight initializations can be specified using .weightInit(WeightInit.DISTRIBUTION).dist(new NormalDistribution(0, 1)) for example. As for master (but not 0.9.1 release) .weightInit(new NormalDistribution(0, 1)) is also possible, which is equivalent to the previous approach.

Available weight initializations. Not again that not all are available in the 0.9.1 release:

DISTRIBUTION: Sample weights from a provided distribution (specified via dist configuration method
ZERO: Generate weights as zeros
ONES: All weights are set to 1
SIGMOID_UNIFORM: A version of XAVIER_UNIFORM for sigmoid activation functions. U(-r,r) with r=4*sqrt(6/(fanIn + fanOut))
NORMAL: Normal/Gaussian distribution, with mean 0 and standard deviation 1/sqrt(fanIn). This is the initialization recommented in Klambauer et al. 2017, "Self-Normalizing Neural Network" paper. Equivalent to DL4J's XAVIER_FAN_IN and LECUN_NORMAL (i.e. Keras' "lecun_normal")
LECUN_UNIFORM: Uniform U[-a,a] with a=3/sqrt(fanIn).
UNIFORM: Uniform U[-a,a] with a=1/sqrt(fanIn). "Commonly used heuristic" as per Glorot and Bengio 2010
XAVIER: As per Glorot and Bengio 2010: Gaussian distribution with mean 0, variance 2.0/(fanIn + fanOut)
XAVIER_UNIFORM: As per Glorot and Bengio 2010: Uniform distribution U(-s,s) with s = sqrt(6/(fanIn + fanOut))
XAVIER_FAN_IN: Similar to Xavier, but 1/fanIn -> Caffe originally used this.
RELU: He et al. (2015), "Delving Deep into Rectifiers". Normal distribution with variance 2.0/nIn
RELU_UNIFORM: He et al. (2015), "Delving Deep into Rectifiers". Uniform distribution U(-s,s) with s = sqrt(6/fanIn)
IDENTITY: Weights are set to an identity matrix. Note: can only be used with square weight matrices
VAR_SCALING_NORMAL_FAN_IN: Gaussian distribution with mean 0, variance 1.0/(fanIn)
VAR_SCALING_NORMAL_FAN_OUT: Gaussian distribution with mean 0, variance 1.0/(fanOut)
VAR_SCALING_NORMAL_FAN_AVG: Gaussian distribution with mean 0, variance 1.0/((fanIn + fanOut)/2)
VAR_SCALING_UNIFORM_FAN_IN: Uniform U[-a,a] with a=3.0/(fanIn)
VAR_SCALING_UNIFORM_FAN_OUT: Uniform U[-a,a] with a=3.0/(fanOut)
VAR_SCALING_UNIFORM_FAN_AVG: Uniform U[-a,a] with a=3.0/((fanIn + fanOut)/2)

Updaters (Optimizers)

An 'updater' in DL4J is a class that takes raw gradients and modifies them to become updates. These updates will then be applied to the network parameters. The CS231n course notes have a good explanation of some of these updaters.

Supported updaters in Deeplearning4j:

AdaDelta - (Source) - Reference
AdaGrad - (Source) - Reference
AdaMax - (Source) - A variant of the Adam updater - Reference
Adam - (Source)
Nadam - (Source) - A variant of the Adam updater, using the Nesterov mementum update rule - Reference
Nesterovs - (Source) - Nesterov momentum updater
NoOp - (Source) - A 'no operation' updater. That is, gradients are not modified at all by this updater. Mathematically equivalent to the SGD updater with a learning rate of 1.0
RmsProp - (Source) - Reference - slide 29
Sgd - (Source) - Standard stochastic gradient descent updater. This updater applies a learning rate only.

Learning Rate Schedules

All updaters that support a learning rate also support learning rate schedules (the Nesterov momentum updater also supports a momentum schedule). Learning rate schedules can be specified either based on the number of iterations, or the number of epochs that have elapsed. Dropout (see below) can also make use of the schedules listed here.

Configure using, for example: .updater(new Adam(new ExponentialSchedule(ScheduleType.ITERATION, 0.1, 0.99 ))) You can plot/inspect the learning rate that will be used at any point by calling ISchedule.valueAt(int iteration, int epoch) on the schedule object you have created.

Available schedules:

ExponentialSchedule - (Source) - Implements value(i) = initialValue * gamma^i
InverseSchedule - (Source) - Implements value(i) = initialValue * (1 + gamma * i)^(-power)
MapSchedule - (Source) - Learning rate schedule based on a user-provided map. Note that the provided map must have a value for iteration/epoch 0. Has a builder class to conveniently define a schedule.
PolySchedule - (Source) - Implements value(i) = initialValue * (1 + i/maxIter)^(-power)
SigmoidSchedule - (Source) - Implements value(i) = initialValue * 1.0 / (1 + exp(-gamma * (iter - stepSize)))
StepSchedule - (Source) - Implements value(i) = initialValue * gamma^( floor(iter/step) )

Note that custom schedules can be created by implementing the ISchedule interface.

Regularization

L1/L2 Regularization

L1 and L2 regularization can easily be added to a network via the configuration: .l1(0.1).l2(0.2). Note that .regularization(true) must be enabled on 0.9.1 also (this option has been removed after 0.9.1 was released).

L1 and L2 regularization is applied by default on the weight parameters only. That is, .l1 and .l2 will not impact bias parameters - these can be regularized using .l1Bias(0.1).l2Bias(0.2).

Dropout

All dropout types are applied at training time only. They are not applied at test time.

Dropout - (Source) - Each input activation x is independently set to (0, with probability 1-p) or (x/p with probability p)
GaussianDropout - (Source) - This is a multiplicative Gaussian noise (mean 1) on the input activations. Each input activation x is independently set to: x * y, where y ~ N(1, stdev = sqrt((1-rate)/rate))
GaussianNoise - (Source) - Applies additive, mean-zero Gaussian noise to the input - i.e., x = x + N(0,stddev)
AlphaDropout - (Source) - AlphaDropout is a dropout technique proposed by Klaumbauer et al. 2017 - Self-Normalizing Neural Networks. Designed for self-normalizing neural networks (SELU activation, NORMAL weight init). Attempts to keep both the mean and variance of the post-dropout activations to the same (in expectation) as before alpha dropout was applied

Note that (as of current master - but not 0.9.1) the dropout parameters can also be specified according to any of the schedule classes mentioned in the Learning Rate Schedules section.

Weight Noise

As per dropout, dropconnect / weight noise is applied only at training time

DropConnect - (Source) - DropConnect is similar to dropout, but applied to the parameters of a network (instead of the input activations). Reference
WeightNoise - (Source) - Apply noise of the specified distribution to the weights at training time. Both additive and multiplicative modes are supported - when additive, noise should be mean 0, when multiplicative, noise should be mean 1

Constraints

Constraints are deterministic limitations that are placed on a model's parameters at the end of each iteration (after the parameter update has occurred). They can be thought of as a type of regularization.

MaxNormConstraint - (Source) - Constrain the maximum L2 norm of the incoming weights for each unit to be less than or equal to the specified value. If the L2 norm exceeds the specified value, the weights will be scaled down to satisfy the constraint.
MinMaxNormConstraint - (Source) - Constrain the minimum AND maximum L2 norm of the incoming weights for each unit to be between the specified values. Weights will be scaled up/down if required.
NonNegativeConstraint - (Source) - Constrain all parameters to be non-negative. Negative parameters will be replaced with 0.
UnitNormConstraint - (Source) - Constrain the L2 norm of the incoming weights for each unit to be 1.0.

Data Classes

Iterators

DataSetIterator is an abstraction that DL4J uses to iterate over minibatches of data, used for training. DataSetIterator returns DataSet objects, which are minibatches, and support a maximum of 1 input and 1 output array (INDArray).

MultiDataSetIterator is similar to DataSetIterator, but returns MultiDataSet objects, which can have as many input and output arrays as required for the network.

Iterators - Build-In (DL4J-Provided Data)

These iterators download their data as required. The actual datasets they return are not customizable.

MnistDataSetIterator - (Source) - DataSetIterator for the well-known MNIST digits dataset. By default, returns a row vector (1x784), with values normalized to 0 to 1 range. Use .setInputType(InputType.convolutionalFlat()) to use with CNNs.
EmnistDataSetIterator - (Source) - Similar to the MNIST digits dataset, but with more examples, and also letters. Includes multiple different splits (letters only, digits only, letters + digits, etc). Same 1x784 format as MNIST, hence (other than different number of labels for some splits) can be used as a drop-in replacement for MnistDataSetIterator. Reference 1, Reference 2
IrisDataSetIterator - (Source) - An iterator for the well known Iris dataset. 4 features, 3 output classes.
Cifar10DataSetIterator - (Source) - An iterator for the CIFAR-10 images dataset. 10 classes, 4d features/activations format for CNNs in DL4J: [minibatch,channels,height,width] = [minibatch,3,32,32]. Features are not normalized - instead, are in the range 0 to 255.
LFWDataSetIterator - (Source) - Labeled Faces from the Wild dataset.
TinyImageNetDataSetIterator (Source) - A subset of the standard imagenet dataset; 200 classes, 500 images per class
UciSequenceDataSetIterator (Source) - UCI synthetic control time series dataset

Iterators - User Provided Data

The iterators in this subsection are used with user-provided data.

RecordReaderDataSetIterator - (Source) - an iterator that takes a DataVec record reader (such as CsvRecordReader or ImageRecordReader) and handles conversion to DataSets, batching, masking, etc. One of the most commonly used iterators in DL4J. Handles non-sequence data only, as input (i.e., RecordReader, no SequenceeRecordReader).
RecordReaderMultiDataSetIterator - (Source) - the MultiDataSet version of RecordReaderDataSetIterator, that supports multiple readers. Has a builder pattern for creating more complex data pipelines (such as different subsets of a reader's output to different input/output arrays, conversion to one-hot, etc). Handles both sequence and non-sequence data as input.
SequenceRecordReaderDataSetIterator - (Source) - The sequence (SequenceRecordReader) version of RecordReaderDataSetIterator. Users may be better off using RecordReaderMultiDataSetIterator, in conjunction with
DoublesDataSetIterator - (Source)
FloatsDataSetIterator - (Source)
INDArrayDataSetIterator - (Source)

Iterators - Adapter and Utility Iterators

MultiDataSetIteratorAdapter - (Source) - Wrap a DataSetIterator to convert it to a MultiDataSetIterator
SingletonMultiDataSetIterator - (Source) - Wrap a MultiDataSet into a MultiDataSetIterator that returns one MultiDataSet (i.e., the wrapped MultiDataSet is not split up)
AsyncDataSetIterator - (Source) - Used automatically by MultiLayerNetwork and ComputationGraph where appropriate. Implements asynchronous prefetching of datasets to improve performance.
AsyncMultiDataSetIterator - (Source) - Used automatically by ComputationGraph where appropriate. Implements asynchronous prefetching of MultiDataSets to improve performance.
AsyncShieldDataSetIterator - (Source) - Generally used only for debugging. Stops MultiLayerNetwork and ComputationGraph from using an AsyncDataSetIterator.
AsyncShieldMultiDataSetIterator - (Source) - The MultiDataSetIterator version of AsyncShieldDataSetIterator
EarlyTerminationDataSetIterator - (Source) - Wraps another DataSetIterator, ensuring that only a specified (maximum) number of minibatches (DataSet) objects are returned between resets. Can be used to 'cut short' an iterator, returning only the first N DataSets.
EarlyTerminationMultiDataSetIterator - (Source) - The MultiDataSetIterator version of EarlyTerminationDataSetIterator
ExistingDataSetIterator - (Source) - Convert an Iterator<DataSet> or Iterable<DataSet> to a DataSetIterator. Does not split the underlying DataSet objects
FileDataSetIterator - (Source) - An iterator that iterates over DataSet files that have been previously saved with DataSet.save(File). Supports randomization, filtering, different output batch size vs. saved DataSet batch size, etc.
FileMultiDataSetIterator - (Source) - A MultiDataSet version of FileDataSetIterator
IteratorDataSetIterator - (Source) - Convert an Iterator<DataSet> to a DataSetIterator. Unlike ExistingDataSetIterator, the underlying DataSet objects may be split/combined - i.e., the minibatch size may differ for the output, vs. the input iterator.
IteratorMultiDataSetIterator - (Source) - The Iterator<MultiDataSet> version of IteratorDataSetIterator
MultiDataSetWrapperIterator - (Source) - Convert a MultiDataSetIterator to a DataSetIterator. Note that this is only possible if the number of features and labels arrays is equal to 1.
MultipleEpochsIterator - (Source) - Treat multiple passes (epochs) of the underlying iterator as a single epoch, when training.
WorkspaceShieldDataSetIterator - (Source) - Generally used only for debugging, and not usually by users. Detaches/migrates DataSets coming out of the underlying DataSetIterator.

Data Normalization

ND4J provides a number of classes for performing data normalization. These are implemented as DataSetPreProcessors. The basic pattern for normalization:

Create your (unnormalized) DataSetIterator or MultiDataSetIterator: DataSetIterator myTrainData = ...
Create the normalizer you want to use: NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler();
Fit the normalizer: normalizer.fit(myTrainData)
Set the normalizer/preprocessor on the iterator: myTrainData.setPreProcessor(normalizer);
End result: the data that comes from your DataSetIterator will now be normalized.

In general, you should fit only on the training data, and do trainData.setPreProcessor(normalizer) and testData.setPreProcessor(normalizer) with the same/single normalizer that has been fit on the training data only.

Note that where appropriate (NormalizerStandardize, NormalizerMinMaxScaler) statistics such as mean/standard-deviation/min/max are shared across time (for time series) and across image x/y locations (but not depth/channels - for image data).

Data normalization example: link

Available normalizers: DataSet / DataSetIterator

ImagePreProcessingScaler - (Source) - Applies min-max scaling to image activations. Default settings do 0-255 input to 0-1 output (but is configurable). Note that unlike the other normalizers here, this one does not rely on statistics (mean/min/max etc) collected from the data, hence the normalizer.fit(trainData) step is unnecessary (is a no-op).
NormalizerStandardize - (Source) - normalizes each feature value independently (and optionally label values) to have 0 mean and a standard deviation of 1
NormalizerMinMaxScaler - (Source) - normalizes each feature value independently (and optionally label values) to lie between a minimum and maximum value (by default between 0 and 1)
VGG16ImagePreProcessor - (Source) - This is a preprocessor specifically for VGG16. It subtracts the mean RGB value, computed on the training set, from each pixel as reported in Link

Available normalizers: MultiDataSet / MultiDataSetIterator

ImageMultiPreProcessingScaler - (Source) - A MultiDataSet/MultiDataSetIterator version of ImagePreProcessingScaler
MultiNormalizerStandardize - (Source) - MultiDataSet/MultiDataSetIterator version of NormalizerStandardize
MultiNormalizerMinMaxScaler - (Source) - MultiDataSet/MultiDataSetIterator version of NormalizerMinMaxScaler
MultiNormalizerHybrid - (Source) - A MultiDataSet normalizer that can combine different normalization types (standardize, min/max etc) for different input/feature and output/label arrays.

Transfer Learning

Deeplearning4j has classes/utilities for performing transfer learning - i.e., taking an existing network, and modifying some of the layers (optionally freezing others so their parameters don't change). For example, an image classifier could be trained on ImageNet, then applied to a new/different dataset. Both MultiLayerNetwork and ComputationGraph can be used with transfer learning - frequently starting from a pre-trained model from the model zoo (see next section), though any MultiLayerNetwork/ComputationGraph can be used.

Link: Transfer learning examples

The main class for transfer learning is TransferLearning. This class has a builder pattern that can be used to add/remove layers, freeze layers, etc. FineTuneConfiguration can be used here to specify the learning rate and other settings for the non-frozen layers.

Trained Model Library - Model Zoo

Deeplearning4j provides a 'model zoo' - a set of pretrained models that can be downloaded and used either as-is (for image classification, for example) or often for transfer learning.

Link: Deeplearning4j Model Zoo

Models available in DL4J's model zoo:

AlexNet - (Source)
Darknet19 - (Source)
FaceNetNN4Small2 - (Source)
InceptionResNetV1 - (Source)
LeNet - (Source)
ResNet50 - (Source)
SimpleCNN - (Source)
TextGenerationLSTM - (Source)
TinyYOLO - (Source)
VGG16 - (Source)
VGG19 - (Source)

*Note: Trained Keras models (not provided by DL4J) may also be imported, using Deeplearning4j's Keras model import functionality.

Cheat sheet code snippets

The Eclipse Deeplearning4j libraries come with a lot of functionality, and we've put together this cheat sheet to help users assemble neural networks and use tensors faster.

Neural networks

Code for configuring common parameters and layers for both MultiLayerNetwork and ComputationGraph. See MultiLayerNetwork and ComputationGraph for full API.

Sequential networks

Most network configurations can use MultiLayerNetwork class if they are sequential and simple.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .seed(1234)
    // parameters below are copied to every layer in the network
    // for inputs like dropOut() or activation() you should do this per layer
    // only specify the parameters you need
    .updater(new AdaGrad())
    .activation(Activation.RELU)
    .dropOut(0.8)
    .l1(0.001)
    .l2(1e-4)
    .weightInit(WeightInit.XAVIER)
    .weightInit(Distribution.TruncatedNormalDistribution)
    .cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST)
    .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
    .gradientNormalizationThreshold(1e-3)
    .list()
    // layers in the network, added sequentially
    // parameters set per-layer override the parameters above
    .layer(new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
            .weightInit(WeightInit.XAVIER)
            .build())
    .layer(new ActivationLayer(Activation.RELU))
    .layer(new ConvolutionLayer.Builder(1,1)
            .nIn(1024)
            .nOut(2048)
            .stride(1,1)
            .convolutionMode(ConvolutionMode.Same)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.IDENTITY)
            .build())
    .layer(new GravesLSTM.Builder()
            .activation(Activation.TANH)
            .nIn(inputNum)
            .nOut(100)
            .build())
    .layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.SOFTMAX)
            .nIn(numHiddenNodes).nOut(numOutputs).build())
    .pretrain(false).backprop(true)
    .build();

MultiLayerNetwork neuralNetwork = new MultiLayerNetwork(conf);

Complex networks

Networks that have complex graphs and "branching" such as Inception need to use ComputationGraph.

ComputationGraphConfiguration.GraphBuilder graph = new NeuralNetConfiguration.Builder()
    .seed(seed)
    // parameters below are copied to every layer in the network
    // for inputs like dropOut() or activation() you should do this per layer
    // only specify the parameters you need
    .activation(Activation.IDENTITY)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .updater(updater)
    .weightInit(WeightInit.RELU)
    .l2(5e-5)
    .miniBatch(true)
    .cacheMode(cacheMode)
    .trainingWorkspaceMode(workspaceMode)
    .inferenceWorkspaceMode(workspaceMode)
    .cudnnAlgoMode(cudnnAlgoMode)
    .convolutionMode(ConvolutionMode.Same)
    .graphBuilder()
    // layers in the network, added sequentially
    // parameters set per-layer override the parameters above
    // note that you must name each layer and manually specify its input
    .addInputs("input1")
    .addLayer("stem-cnn1", new ConvolutionLayer.Builder(new int[] {7, 7}, new int[] {2, 2}, new int[] {3, 3})
        .nIn(inputShape[0])
        .nOut(64)
        .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
        .build(),"input1")
    .addLayer("stem-batch1", new BatchNormalization.Builder(false)
        .nIn(64)
        .nOut(64)
        .build(), "stem-cnn1")
    .addLayer("stem-activation1", new ActivationLayer.Builder()
        .activation(Activation.RELU)
        .build(), "stem-batch1")
    .addLayer("lossLayer", new CenterLossOutputLayer.Builder()
        .lossFunction(LossFunctions.LossFunction.SQUARED_LOSS)
        .activation(Activation.SOFTMAX).nOut(numClasses).lambda(1e-4).alpha(0.9)
        .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer).build(),
        "stem-activation1")
    .setOutputs("lossLayer")
    .setInputTypes(InputType.convolutional(224, 224, 3))
    .backprop(true).pretrain(false).build();

ComputationGraph neuralNetwork = new ComputationGraph(graph);

Training

The code snippet below creates a basic pipeline that loads images from disk, applies random transformations, and fits them to a neural network. It also sets up a UI instance so you can visualize progress, and uses early stopping to terminate training early. You can adapt this pipeline for many different use cases.

ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();
File mainPath = new File(System.getProperty("user.dir"), "dl4j-examples/src/main/resources/animals/");
FileSplit fileSplit = new FileSplit(mainPath, NativeImageLoader.ALLOWED_FORMATS, rng);
int numExamples = Math.toIntExact(fileSplit.length());
int numLabels = fileSplit.getRootDir().listFiles(File::isDirectory).length; //This only works if your root is clean: only label subdirs.
BalancedPathFilter pathFilter = new BalancedPathFilter(rng, labelMaker, numExamples, numLabels, maxPathsPerLabel);

InputSplit[] inputSplit = fileSplit.sample(pathFilter, splitTrainTest, 1 - splitTrainTest);
InputSplit trainData = inputSplit[0];
InputSplit testData = inputSplit[1];

boolean shuffle = false;
ImageTransform flipTransform1 = new FlipImageTransform(rng);
ImageTransform flipTransform2 = new FlipImageTransform(new Random(123));
ImageTransform warpTransform = new WarpImageTransform(rng, 42);
List<Pair<ImageTransform,Double>> pipeline = Arrays.asList(
    new Pair<>(flipTransform1,0.9),
    new Pair<>(flipTransform2,0.8),
    new Pair<>(warpTransform,0.5));

ImageTransform transform = new PipelineImageTransform(pipeline,shuffle);
DataNormalization scaler = new ImagePreProcessingScaler(0, 1);

// training dataset
ImageRecordReader recordReaderTrain = new ImageRecordReader(height, width, channels, labelMaker);
recordReader.initialize(trainData, null);
DataSetIterator trainingIterator = new RecordReaderDataSetIterator(recordReaderTrain, batchSize, 1, numLabels);

// testing dataset
ImageRecordReader recordReaderTest = new ImageRecordReader(height, width, channels, labelMaker);
recordReader.initialize(testData, null);
DataSetIterator testingIterator = new RecordReaderDataSetIterator(recordReaderTest, batchSize, 1, numLabels);

// early stopping configuration, model saver, and trainer
EarlyStoppingModelSaver saver = new LocalFileModelSaver(System.getProperty("user.dir"));
EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
    .epochTerminationConditions(new MaxEpochsTerminationCondition(50)) //Max of 50 epochs
    .evaluateEveryNEpochs(1)
    .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES)) //Max of 20 minutes
    .scoreCalculator(new DataSetLossCalculator(testingIterator, true))     //Calculate test set score
    .modelSaver(saver)
    .build();

EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf, neuralNetwork, trainingIterator);

// begin training
trainer.fit();

Complex Transformation

DataVec comes with a portable TransformProcess class that allows for more complex data wrangling and data conversion. It works well with both 2D and sequence datasets.

Schema schema = new Schema.Builder()
    .addColumnsDouble("Sepal length", "Sepal width", "Petal length", "Petal width")
    .addColumnCategorical("Species", "Iris-setosa", "Iris-versicolor", "Iris-virginica")
    .build();

TransformProcess tp = new TransformProcess.Builder(schema)
    .categoricalToInteger("Species")
    .build();

// do the transformation on spark
JavaRDD<List<Writable>> processedData = SparkTransformExecutor.execute(parsedInputData, tp);

We recommend having a look at the DataVec examples before creating more complex transformations.

Evaluation

Both MultiLayerNetwork and ComputationGraph come with built-in .eval() methods that allow you to pass a dataset iterator and return evaluation results.

// returns evaluation class with accuracy, precision, recall, and other class statistics
Evaluation eval = neuralNetwork.eval(testIterator);
System.out.println(eval.accuracy());
System.out.println(eval.precision());
System.out.println(eval.recall());

// ROC for Area Under Curve on multi-class datasets (not binary classes)
ROCMultiClass roc = neuralNetwork.doEvaluation(testIterator, new ROCMultiClass());
System.out.println(roc.calculateAverageAuc());
System.out.println(roc.calculateAverageAucPR());

For advanced evaluation the code snippet below can be adapted into training pipelines. This is when the built-in neuralNetwork.eval() method outputs confusing results or if you need to examine raw data.

//Evaluate the model on the test set
Evaluation eval = new Evaluation(numClasses);
INDArray output = neuralNetwork.output(testData.getFeatures());
eval.eval(testData.getLabels(), output, testMetaData); //Note we are passing in the test set metadata here

//Get a list of prediction errors, from the Evaluation object
//Prediction errors like this are only available after calling iterator.setCollectMetaData(true)
List<Prediction> predictionErrors = eval.getPredictionErrors();
System.out.println("\n\n+++++ Prediction Errors +++++");
for(Prediction p : predictionErrors){
    System.out.println("Predicted class: " + p.getPredictedClass() + ", Actual class: " + p.getActualClass()
        + "\t" + p.getRecordMetaData(RecordMetaData.class).getLocation());
}

//We can also load the raw data:
List<Record> predictionErrorRawData = recordReader.loadFromMetaData(predictionErrorMetaData);
for(int i=0; i<predictionErrors.size(); i++ ){
    Prediction p = predictionErrors.get(i);
    RecordMetaData meta = p.getRecordMetaData(RecordMetaData.class);
    INDArray features = predictionErrorExamples.getFeatures().getRow(i);
    INDArray labels = predictionErrorExamples.getLabels().getRow(i);
    List<Writable> rawData = predictionErrorRawData.get(i).getRecord();

    INDArray networkPrediction = model.output(features);

    System.out.println(meta.getLocation() + ": "
        + "\tRaw Data: " + rawData
        + "\tNormalized: " + features
        + "\tLabels: " + labels
        + "\tPredictions: " + networkPrediction);
}

//Some other useful evaluation methods:
List<Prediction> list1 = eval.getPredictions(1,2);                  //Predictions: actual class 1, predicted class 2
List<Prediction> list2 = eval.getPredictionByPredictedClass(2);     //All predictions for predicted class 2
List<Prediction> list3 = eval.getPredictionsByActualClass(2);       //All predictions for actual class 2

Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

Details: https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Data: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data Image: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

see https://archive.ics.uci.edu/ml/datasets/Iris
param batch Batch size
param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

param batchSize the batch size of the examples
param numExamples the overall number of examples
param imgDim an array of height, width and channels
param numLabels the overall number of examples
param useSubset use a subset of the LFWDataSet
param labelGenerator path label generator to use
param train true if use train value
param splitTrainTest the percentage to split data for train and remainder goes to test
param imageTransform how to transform the image
param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes
MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z
BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)
LETTERS: 145,600 examples total. 26 balanced classes
DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

param dataSet Dataset (subset) to return
param batchSize Batch size
param train If true: use training set. If false: use test set
param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

param dataSet Subset to get
return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

param dataSet Subset to get
return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

param recordReader Record reader to use as the source of data
param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

param recordReader RecordReader: provides the source of the data
param batchSize Batch size (number of examples) for the output DataSet objects
param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()
param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

param labelIndexFrom Column index of the first regression target (indexes start at 0)
param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1
param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

param featuresReader SequenceRecordReader for the features
param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1
param miniBatchSize Minibatch size for each call of next()
param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

param iterator Underlying iterator to wrap and fetch asynchronously from
param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)

param baseIterator
param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches
param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public MultiDataSet next(int num)

This method returns test iterator instance

return

AsyncShieldDataSetIterator

[source]

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)

param numMiniBatches Number of minibatches per epoch
param features Each triple in the list specifies the shape, array order and type of values for the features arrays
param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)

param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

param shape Shape of the array
param values Values to fill the array with
return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

param shape Shape of the array
param order Order of array (‘c’ or ‘f’)
param values Values to fill the array with
return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap
param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

param baseIterator - iterator to be wrapped and split
param totalBatches - total batches in baseIterator
param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public DataSet next(int i)

This method returns test iterator instance

return

JointMultiDataSetIterator

[source]

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)

param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)

param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet
param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)

param files List of files to iterate over
param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)

param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

param numMiniBatches Number of minibatches per epoch
param featuresShape Features shape
param labelsShape Labels shape
param featureValues Type of values for the features
param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator

Math

ClipByAvgNorm

INDArray ClipByAvgNorm(INDArray x, double clipValue, int[] dimensions)

SDVariable ClipByAvgNorm(SDVariable x, double clipValue, int[] dimensions)
SDVariable ClipByAvgNorm(String name, SDVariable x, double clipValue, int[] dimensions)

Clips tensor values to a maximum average L2-norm.

x (NUMERIC) - Input variable
clipValue - Value for clipping
dimensions - Dimensions to reduce over (Size: AtLeast(min=0))

EmbeddingLookup

INDArray EmbeddingLookup(INDArray x, INDArray indices, PartitionMode PartitionMode)

SDVariable EmbeddingLookup(SDVariable x, SDVariable indices, PartitionMode PartitionMode)
SDVariable EmbeddingLookup(String name, SDVariable x, SDVariable indices, PartitionMode PartitionMode)

Looks up ids in a list of embedding tensors.

x (NUMERIC) - Input tensor
indices (INT) - A Tensor containing the ids to be looked up.
PartitionMode - partition_mode == 0 - i.e. 'mod' , 1 - 'div'

MergeMaxIndex

INDArray MergeMaxIndex(INDArray x, DataType dataType)
INDArray MergeMaxIndex(INDArray x)

SDVariable MergeMaxIndex(SDVariable x, DataType dataType)
SDVariable MergeMaxIndex(SDVariable x)
SDVariable MergeMaxIndex(String name, SDVariable x, DataType dataType)
SDVariable MergeMaxIndex(String name, SDVariable x)

Return array of max elements indices with along tensor dimensions

x (NUMERIC) - Input tensor
dataType - Data type - default = DataType.INT

abs

INDArray abs(INDArray x)

SDVariable abs(SDVariable x)
SDVariable abs(String name, SDVariable x)

Elementwise absolute value operation: out = abs(x)

x (NUMERIC) - Input variable

acos

INDArray acos(INDArray x)

SDVariable acos(SDVariable x)
SDVariable acos(String name, SDVariable x)

Elementwise acos (arccosine, inverse cosine) operation: out = arccos(x)

x (NUMERIC) - Input variable

acosh

INDArray acosh(INDArray x)

SDVariable acosh(SDVariable x)
SDVariable acosh(String name, SDVariable x)

Elementwise acosh (inverse hyperbolic cosine) function: out = acosh(x)

x (NUMERIC) - Input variable

add

INDArray add(INDArray x, INDArray y)

SDVariable add(SDVariable x, SDVariable y)
SDVariable add(String name, SDVariable x, SDVariable y)

Pairwise addition operation, out = x + y

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

add

INDArray add(INDArray x, double value)

SDVariable add(SDVariable x, double value)
SDVariable add(String name, SDVariable x, double value)

Scalar add operation, out = in + scalar

x (NUMERIC) - Input variable
value - Scalar value for op

amax

INDArray amax(INDArray in, int[] dimensions)

SDVariable amax(SDVariable in, int[] dimensions)
SDVariable amax(String name, SDVariable in, int[] dimensions)

Absolute max array reduction operation, optionally along specified dimensions: out = max(abs(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

amean

INDArray amean(INDArray in, int[] dimensions)

SDVariable amean(SDVariable in, int[] dimensions)
SDVariable amean(String name, SDVariable in, int[] dimensions)

Absolute mean array reduction operation, optionally along specified dimensions: out = mean(abs(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

amin

INDArray amin(INDArray in, int[] dimensions)

SDVariable amin(SDVariable in, int[] dimensions)
SDVariable amin(String name, SDVariable in, int[] dimensions)

Absolute min array reduction operation, optionally along specified dimensions: out = min(abs(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

and

INDArray and(INDArray x, INDArray y)

SDVariable and(SDVariable x, SDVariable y)
SDVariable and(String name, SDVariable x, SDVariable y)

Boolean AND operation: elementwise (x != 0) && (y != 0)

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

x (BOOL) - Input 1
y (BOOL) - Input 2

asin

INDArray asin(INDArray x)

SDVariable asin(SDVariable x)
SDVariable asin(String name, SDVariable x)

Elementwise asin (arcsin, inverse sine) operation: out = arcsin(x)

x (NUMERIC) - Input variable

asinh

INDArray asinh(INDArray x)

SDVariable asinh(SDVariable x)
SDVariable asinh(String name, SDVariable x)

Elementwise asinh (inverse hyperbolic sine) function: out = asinh(x)

x (NUMERIC) - Input variable

asum

INDArray asum(INDArray in, int[] dimensions)

SDVariable asum(SDVariable in, int[] dimensions)
SDVariable asum(String name, SDVariable in, int[] dimensions)

Absolute sum array reduction operation, optionally along specified dimensions: out = sum(abs(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

atan

INDArray atan(INDArray x)

SDVariable atan(SDVariable x)
SDVariable atan(String name, SDVariable x)

Elementwise atan (arctangent, inverse tangent) operation: out = arctangent(x)

x (NUMERIC) - Input variable

atan2

INDArray atan2(INDArray y, INDArray x)

SDVariable atan2(SDVariable y, SDVariable x)
SDVariable atan2(String name, SDVariable y, SDVariable x)

Elementwise atan (arctangent, inverse tangent) operation: out = atan2(x,y).

Similar to atan(y/x) but sigts of x and y are used to determine the location of the result

y (NUMERIC) - Input Y variable
x (NUMERIC) - Input X variable

atanh

INDArray atanh(INDArray x)

SDVariable atanh(SDVariable x)
SDVariable atanh(String name, SDVariable x)

Elementwise atanh (inverse hyperbolic tangent) function: out = atanh(x)

x (NUMERIC) - Input variable

bitShift

INDArray bitShift(INDArray x, INDArray shift)

SDVariable bitShift(SDVariable x, SDVariable shift)
SDVariable bitShift(String name, SDVariable x, SDVariable shift)

Bit shift operation

x (NUMERIC) - input
shift (NUMERIC) - shift value

bitShiftRight

INDArray bitShiftRight(INDArray x, INDArray shift)

SDVariable bitShiftRight(SDVariable x, SDVariable shift)
SDVariable bitShiftRight(String name, SDVariable x, SDVariable shift)

Right bit shift operation

x (NUMERIC) - Input tensor
shift (NUMERIC) - shift argument

bitShiftRotl

INDArray bitShiftRotl(INDArray x, INDArray shift)

SDVariable bitShiftRotl(SDVariable x, SDVariable shift)
SDVariable bitShiftRotl(String name, SDVariable x, SDVariable shift)

Cyclic bit shift operation

x (NUMERIC) - Input tensor
shift (NUMERIC) - shift argy=ument

bitShiftRotr

INDArray bitShiftRotr(INDArray x, INDArray shift)

SDVariable bitShiftRotr(SDVariable x, SDVariable shift)
SDVariable bitShiftRotr(String name, SDVariable x, SDVariable shift)

Cyclic right shift operation

x (NUMERIC) - Input tensor
shift (NUMERIC) - Shift argument

ceil

INDArray ceil(INDArray x)

SDVariable ceil(SDVariable x)
SDVariable ceil(String name, SDVariable x)

Element-wise ceiling function: out = ceil(x).

Rounds each value up to the nearest integer value (if not already an integer)

x (NUMERIC) - Input variable

clipByNorm

INDArray clipByNorm(INDArray x, double clipValue, int[] dimensions)

SDVariable clipByNorm(SDVariable x, double clipValue, int[] dimensions)
SDVariable clipByNorm(String name, SDVariable x, double clipValue, int[] dimensions)

Clipping by L2 norm, optionally along dimension(s)

if l2Norm(x,dimension) < clipValue, then input is returned unmodifed

Otherwise, out[i] = in[i] * clipValue / l2Norm(in, dimensions) where each value is clipped according

to the corresponding l2Norm along the specified dimensions

x (NUMERIC) - Input variable
clipValue - Clipping value (maximum l2 norm)
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

clipByValue

INDArray clipByValue(INDArray x, double clipValueMin, double clipValueMax)

SDVariable clipByValue(SDVariable x, double clipValueMin, double clipValueMax)
SDVariable clipByValue(String name, SDVariable x, double clipValueMin, double clipValueMax)

Element-wise clipping function:

out[i] = in[i] if in[i] >= clipValueMin and in[i] <= clipValueMax

out[i] = clipValueMin if in[i] < clipValueMin

out[i] = clipValueMax if in[i] > clipValueMax

x (NUMERIC) - Input variable
clipValueMin - Minimum value for clipping
clipValueMax - Maximum value for clipping

confusionMatrix

INDArray confusionMatrix(INDArray labels, INDArray pred, DataType dataType)

SDVariable confusionMatrix(SDVariable labels, SDVariable pred, DataType dataType)
SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, DataType dataType)

Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

which are represented as integer values. This version assumes the number of classes is 1 + max(max(labels), max(pred))

For example, if labels = [0, 1, 1] and predicted = [0, 2, 1] then output is:

[1, 0, 0]

[0, 1, 1]

[0, 0, 0]

labels (NUMERIC) - Labels - 1D array of integer values representing label values
pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels
dataType - Data type

confusionMatrix

INDArray confusionMatrix(INDArray labels, INDArray pred, int numClasses)

SDVariable confusionMatrix(SDVariable labels, SDVariable pred, int numClasses)
SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, int numClasses)

Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

which are represented as integer values.

For example, if labels = [0, 1, 1], predicted = [0, 2, 1], and numClasses=4 then output is:

[1, 0, 0, 0]

[0, 1, 1, 0]

[0, 0, 0, 0]

labels (NUMERIC) - Labels - 1D array of integer values representing label values
pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels
numClasses - Number of classes

confusionMatrix

INDArray confusionMatrix(INDArray labels, INDArray pred, INDArray weights)

SDVariable confusionMatrix(SDVariable labels, SDVariable pred, SDVariable weights)
SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, SDVariable weights)

Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

which are represented as integer values. This version assumes the number of classes is 1 + max(max(labels), max(pred))

For example, if labels = [0, 1, 1], predicted = [0, 2, 1] and weights = [1, 2, 3]

[1, 0, 0]

[0, 3, 2]

[0, 0, 0]

labels (NUMERIC) - Labels - 1D array of integer values representing label values
pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels
weights (NUMERIC) - Weights - 1D array of values (may be real/decimal) representing the weight/contribution of each prediction. Must be same length as both labels and predictions arrays

confusionMatrix

INDArray confusionMatrix(INDArray labels, INDArray pred, INDArray weights, int numClasses)

SDVariable confusionMatrix(SDVariable labels, SDVariable pred, SDVariable weights, int numClasses)
SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, SDVariable weights, int numClasses)

Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

which are represented as integer values.

For example, if labels = [0, 1, 1], predicted = [0, 2, 1], numClasses = 4, and weights = [1, 2, 3]

[1, 0, 0, 0]

[0, 3, 2, 0]

[0, 0, 0, 0]

labels (NUMERIC) - Labels - 1D array of integer values representing label values
pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels
weights (NUMERIC) - Weights - 1D array of values (may be real/decimal) representing the weight/contribution of each prediction. Must be same length as both labels and predictions arrays
numClasses -

cos

INDArray cos(INDArray x)

SDVariable cos(SDVariable x)
SDVariable cos(String name, SDVariable x)

Elementwise cosine operation: out = cos(x)

x (NUMERIC) - Input variable

cosh

INDArray cosh(INDArray x)

SDVariable cosh(SDVariable x)
SDVariable cosh(String name, SDVariable x)

Elementwise cosh (hyperbolic cosine) operation: out = cosh(x)

x (NUMERIC) - Input variable

cosineDistance

INDArray cosineDistance(INDArray x, INDArray y, int[] dimensions)

SDVariable cosineDistance(SDVariable x, SDVariable y, int[] dimensions)
SDVariable cosineDistance(String name, SDVariable x, SDVariable y, int[] dimensions)

Cosine distance reduction operation. The output contains the cosine distance for each

tensor/subset along the specified dimensions:

out = 1.0 - cosineSimilarity(x,y)

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate cosineDistance over (Size: AtLeast(min=0))

cosineSimilarity

INDArray cosineSimilarity(INDArray x, INDArray y, int[] dimensions)

SDVariable cosineSimilarity(SDVariable x, SDVariable y, int[] dimensions)
SDVariable cosineSimilarity(String name, SDVariable x, SDVariable y, int[] dimensions)

Cosine similarity pairwise reduction operation. The output contains the cosine similarity for each tensor/subset

along the specified dimensions:

out = (sum_i x[i] y[i]) / ( sqrt(sum_i x[i]^2) sqrt(sum_i y[i]^2)

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate cosineSimilarity over (Size: AtLeast(min=0))

countNonZero

INDArray countNonZero(INDArray in, int[] dimensions)

SDVariable countNonZero(SDVariable in, int[] dimensions)
SDVariable countNonZero(String name, SDVariable in, int[] dimensions)

Count non zero array reduction operation, optionally along specified dimensions: out = count(x != 0)

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

countZero

INDArray countZero(INDArray in, int[] dimensions)

SDVariable countZero(SDVariable in, int[] dimensions)
SDVariable countZero(String name, SDVariable in, int[] dimensions)

Count zero array reduction operation, optionally along specified dimensions: out = count(x == 0)

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

cross

INDArray cross(INDArray a, INDArray b)

SDVariable cross(SDVariable a, SDVariable b)
SDVariable cross(String name, SDVariable a, SDVariable b)

Returns the pair-wise cross product of equal size arrays a and b: a x b = ||a||x||b|| sin(theta).

Can take rank 1 or above inputs (of equal shapes), but note that the last dimension must have dimension 3

a (NUMERIC) - First input
b (NUMERIC) - Second input

cube

INDArray cube(INDArray x)

SDVariable cube(SDVariable x)
SDVariable cube(String name, SDVariable x)

Element-wise cube function: out = x^3

x (NUMERIC) - Input variable

diag

INDArray diag(INDArray x)

SDVariable diag(SDVariable x)
SDVariable diag(String name, SDVariable x)

Returns an output variable with diagonal values equal to the specified values; off-diagonal values will be set to 0

For example, if input = [1,2,3], then output is given by:

[ 1, 0, 0]

[ 0, 2, 0]

[ 0, 0, 3]

Higher input ranks are also supported: if input has shape [a,...,R-1] then output[i,...,k,i,...,k] = input[i,...,k].

i.e., for input rank R, output has rank 2R

x (NUMERIC) - Input variable

diagPart

INDArray diagPart(INDArray x)

SDVariable diagPart(SDVariable x)
SDVariable diagPart(String name, SDVariable x)

Extract the diagonal part from the input array.

If input is

[ 1, 0, 0]

[ 0, 2, 0]

[ 0, 0, 3]

then output is [1, 2, 3].

Supports higher dimensions: in general, out[i,...,k] = in[i,...,k,i,...,k]

x (NUMERIC) - Input variable

div

INDArray div(INDArray x, INDArray y)

SDVariable div(SDVariable x, SDVariable y)
SDVariable div(String name, SDVariable x, SDVariable y)

Pairwise division operation, out = x / y

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

div

INDArray div(INDArray x, double value)

SDVariable div(SDVariable x, double value)
SDVariable div(String name, SDVariable x, double value)

Scalar division operation, out = in / scalar

x (NUMERIC) - Input variable
value - Scalar value for op

entropy

INDArray entropy(INDArray in, int[] dimensions)

SDVariable entropy(SDVariable in, int[] dimensions)
SDVariable entropy(String name, SDVariable in, int[] dimensions)

Entropy reduction: -sum(x * log(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

erf

INDArray erf(INDArray x)

SDVariable erf(SDVariable x)
SDVariable erf(String name, SDVariable x)

Element-wise Gaussian error function - out = erf(in)

x (NUMERIC) - Input variable

erfc

INDArray erfc(INDArray x)

SDVariable erfc(SDVariable x)
SDVariable erfc(String name, SDVariable x)

Element-wise complementary Gaussian error function - out = erfc(in) = 1 - erf(in)

x (NUMERIC) - Input variable

euclideanDistance

INDArray euclideanDistance(INDArray x, INDArray y, int[] dimensions)

SDVariable euclideanDistance(SDVariable x, SDVariable y, int[] dimensions)
SDVariable euclideanDistance(String name, SDVariable x, SDVariable y, int[] dimensions)

Euclidean distance (l2 norm, l2 distance) reduction operation. The output contains the Euclidean distance for each

tensor/subset along the specified dimensions:

out = sqrt( sum_i (x[i] - y[i])^2 )

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate euclideanDistance over (Size: AtLeast(min=0))

exp

INDArray exp(INDArray x)

SDVariable exp(SDVariable x)
SDVariable exp(String name, SDVariable x)

Elementwise exponent function: out = exp(x) = 2.71828...^x

x (NUMERIC) - Input variable

expm1

INDArray expm1(INDArray x)

SDVariable expm1(SDVariable x)
SDVariable expm1(String name, SDVariable x)

Elementwise 1.0 - exponent function: out = 1.0 - exp(x) = 1.0 - 2.71828...^x

x (NUMERIC) - Input variable

eye

INDArray eye(int rows)

SDVariable eye(int rows)
SDVariable eye(String name, int rows)

Generate an identity matrix with the specified number of rows and columns.

rows - Number of rows

eye

INDArray eye(int rows, int cols)

SDVariable eye(int rows, int cols)
SDVariable eye(String name, int rows, int cols)

As per eye(String, int, int, DataType) but with the default datatype, Eye.DEFAULT_DTYPE

rows - Number of rows
cols - Number of columns

eye

INDArray eye(int rows, int cols, DataType dataType, int[] dimensions)

SDVariable eye(int rows, int cols, DataType dataType, int[] dimensions)
SDVariable eye(String name, int rows, int cols, DataType dataType, int[] dimensions)

Generate an identity matrix with the specified number of rows and columns

Example:



`INDArray eye = eye(3,2)

eye:

[ 1, 0]

[ 0, 1]

[ 0, 0]`

rows - Number of rows
cols - Number of columns
dataType - Data type
dimensions - (Size: AtLeast(min=0))

eye

INDArray eye(INDArray rows, INDArray cols)

SDVariable eye(SDVariable rows, SDVariable cols)
SDVariable eye(String name, SDVariable rows, SDVariable cols)

As per eye(int, int) bit with the number of rows/columns specified as scalar INDArrays

rows (INT) - Number of rows
cols (INT) - Number of columns

eye

INDArray eye(INDArray rows)

SDVariable eye(SDVariable rows)
SDVariable eye(String name, SDVariable rows)

As per eye(String, int) but with the number of rows specified as a scalar INDArray

rows (INT) - Number of rows

firstIndex

INDArray firstIndex(INDArray in, Condition condition, int[] dimensions)
INDArray firstIndex(INDArray in, Condition condition, boolean keepDims, int[] dimensions)

SDVariable firstIndex(SDVariable in, Condition condition, int[] dimensions)
SDVariable firstIndex(SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
SDVariable firstIndex(String name, SDVariable in, Condition condition, int[] dimensions)
SDVariable firstIndex(String name, SDVariable in, Condition condition, boolean keepDims, int[] dimensions)

First index reduction operation.

Returns a variable that contains the index of the first element that matches the specified condition (for each

slice along the specified dimensions)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

in (NUMERIC) - Input variable
condition - Condition to check on input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))
keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

floor

INDArray floor(INDArray x)

SDVariable floor(SDVariable x)
SDVariable floor(String name, SDVariable x)

Element-wise floor function: out = floor(x).

Rounds each value down to the nearest integer value (if not already an integer)

x (NUMERIC) - Input variable

floorDiv

INDArray floorDiv(INDArray x, INDArray y)

SDVariable floorDiv(SDVariable x, SDVariable y)
SDVariable floorDiv(String name, SDVariable x, SDVariable y)

Pairwise floor division operation, out = floor(x / y)

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

floorMod

INDArray floorMod(INDArray x, INDArray y)

SDVariable floorMod(SDVariable x, SDVariable y)
SDVariable floorMod(String name, SDVariable x, SDVariable y)

Pairwise Modulus division operation

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

floorMod

INDArray floorMod(INDArray x, double value)

SDVariable floorMod(SDVariable x, double value)
SDVariable floorMod(String name, SDVariable x, double value)

Scalar floor modulus operation

x (NUMERIC) - Input variable
value - Scalar value for op

hammingDistance

INDArray hammingDistance(INDArray x, INDArray y, int[] dimensions)

SDVariable hammingDistance(SDVariable x, SDVariable y, int[] dimensions)
SDVariable hammingDistance(String name, SDVariable x, SDVariable y, int[] dimensions)

Hamming distance reduction operation. The output contains the cosine distance for each

tensor/subset along the specified dimensions:

out = count( x[i] != y[i] )

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate hammingDistance over (Size: AtLeast(min=0))

iamax

INDArray iamax(INDArray in, int[] dimensions)
INDArray iamax(INDArray in, boolean keepDims, int[] dimensions)

SDVariable iamax(SDVariable in, int[] dimensions)
SDVariable iamax(SDVariable in, boolean keepDims, int[] dimensions)
SDVariable iamax(String name, SDVariable in, int[] dimensions)
SDVariable iamax(String name, SDVariable in, boolean keepDims, int[] dimensions)

Index of the max absolute value: argmax(abs(in))

see argmax(String, INDArray, boolean, int...)

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))
keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

iamin

INDArray iamin(INDArray in, int[] dimensions)
INDArray iamin(INDArray in, boolean keepDims, int[] dimensions)

SDVariable iamin(SDVariable in, int[] dimensions)
SDVariable iamin(SDVariable in, boolean keepDims, int[] dimensions)
SDVariable iamin(String name, SDVariable in, int[] dimensions)
SDVariable iamin(String name, SDVariable in, boolean keepDims, int[] dimensions)

Index of the min absolute value: argmin(abs(in))

see argmin(String, INDArray, boolean, int...)

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))
keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

isFinite

INDArray isFinite(INDArray x)

SDVariable isFinite(SDVariable x)
SDVariable isFinite(String name, SDVariable x)

Is finite operation: elementwise isFinite(x)

Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

value 0 otherwise

x (NUMERIC) - Input variable

isInfinite

INDArray isInfinite(INDArray x)

SDVariable isInfinite(SDVariable x)
SDVariable isInfinite(String name, SDVariable x)

Is infinite operation: elementwise isInfinite(x)

Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

value 0 otherwise

x (NUMERIC) - Input variable

isMax

INDArray isMax(INDArray x)

SDVariable isMax(SDVariable x)
SDVariable isMax(String name, SDVariable x)

Is maximum operation: elementwise x == max(x)

Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

value 0 otherwise

x (NUMERIC) - Input variable

isNaN

INDArray isNaN(INDArray x)

SDVariable isNaN(SDVariable x)
SDVariable isNaN(String name, SDVariable x)

Is Not a Number operation: elementwise isNaN(x)

Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

value 0 otherwise

x (NUMERIC) - Input variable

isNonDecreasing

INDArray isNonDecreasing(INDArray x)

SDVariable isNonDecreasing(SDVariable x)
SDVariable isNonDecreasing(String name, SDVariable x)

Is the array non decreasing?

An array is non-decreasing if for every valid i, x[i] <= x[i+1]. For Rank 2+ arrays, values are compared

in 'c' (row major) order

x (NUMERIC) - Input variable

isStrictlyIncreasing

INDArray isStrictlyIncreasing(INDArray x)

SDVariable isStrictlyIncreasing(SDVariable x)
SDVariable isStrictlyIncreasing(String name, SDVariable x)

Is the array strictly increasing?

An array is strictly increasing if for every valid i, x[i] < x[i+1]. For Rank 2+ arrays, values are compared

in 'c' (row major) order

x (NUMERIC) - Input variable

jaccardDistance

INDArray jaccardDistance(INDArray x, INDArray y, int[] dimensions)

SDVariable jaccardDistance(SDVariable x, SDVariable y, int[] dimensions)
SDVariable jaccardDistance(String name, SDVariable x, SDVariable y, int[] dimensions)

Jaccard similarity reduction operation. The output contains the Jaccard distance for each

            tensor along the specified dimensions.

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate jaccardDistance over (Size: AtLeast(min=0))

lastIndex

INDArray lastIndex(INDArray in, Condition condition, int[] dimensions)
INDArray lastIndex(INDArray in, Condition condition, boolean keepDims, int[] dimensions)

SDVariable lastIndex(SDVariable in, Condition condition, int[] dimensions)
SDVariable lastIndex(SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
SDVariable lastIndex(String name, SDVariable in, Condition condition, int[] dimensions)
SDVariable lastIndex(String name, SDVariable in, Condition condition, boolean keepDims, int[] dimensions)

Last index reduction operation.

Returns a variable that contains the index of the last element that matches the specified condition (for each

slice along the specified dimensions)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

in (NUMERIC) - Input variable
condition - Condition to check on input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))
keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

listDiff

INDArray[] listDiff(INDArray x, INDArray y)

SDVariable[] listDiff(SDVariable x, SDVariable y)
SDVariable[] listDiff(String name, SDVariable x, SDVariable y)

Calculates difference between inputs X and Y.

x (NUMERIC) - Input variable X
y (NUMERIC) - Input variable Y

log

INDArray log(INDArray x)

SDVariable log(SDVariable x)
SDVariable log(String name, SDVariable x)

Element-wise logarithm function (base e - natural logarithm): out = log(x)

x (NUMERIC) - Input variable

log

INDArray log(INDArray x, double base)

SDVariable log(SDVariable x, double base)
SDVariable log(String name, SDVariable x, double base)

Element-wise logarithm function (with specified base): out = log_{base`(x)

x (NUMERIC) - Input variable
base - Logarithm base

log1p

INDArray log1p(INDArray x)

SDVariable log1p(SDVariable x)
SDVariable log1p(String name, SDVariable x)

Elementwise natural logarithm function: out = log_e (1 + x)

x (NUMERIC) - Input variable

logEntropy

INDArray logEntropy(INDArray in, int[] dimensions)

SDVariable logEntropy(SDVariable in, int[] dimensions)
SDVariable logEntropy(String name, SDVariable in, int[] dimensions)

Log entropy reduction: log(-sum(x * log(x)))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

logSumExp

INDArray logSumExp(INDArray input, int[] dimensions)

SDVariable logSumExp(SDVariable input, int[] dimensions)
SDVariable logSumExp(String name, SDVariable input, int[] dimensions)

Log-sum-exp reduction (optionally along dimension).

Computes log(sum(exp(x))

input (NUMERIC) - Input variable
dimensions - Optional dimensions to reduce along (Size: AtLeast(min=0))

manhattanDistance

INDArray manhattanDistance(INDArray x, INDArray y, int[] dimensions)

SDVariable manhattanDistance(SDVariable x, SDVariable y, int[] dimensions)
SDVariable manhattanDistance(String name, SDVariable x, SDVariable y, int[] dimensions)

Manhattan distance (l1 norm, l1 distance) reduction operation. The output contains the Manhattan distance for each

tensor/subset along the specified dimensions:

out = sum_i abs(x[i]-y[i])

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensions - Dimensions to calculate manhattanDistance over (Size: AtLeast(min=0))

matrixDeterminant

INDArray matrixDeterminant(INDArray in)

SDVariable matrixDeterminant(SDVariable in)
SDVariable matrixDeterminant(String name, SDVariable in)

Matrix determinant op. For 2D input, this returns the standard matrix determinant.

For higher dimensional input with shape [..., m, m] the matrix determinant is returned for each

shape [m,m] sub-matrix.

in (NUMERIC) - Input

matrixInverse

INDArray matrixInverse(INDArray in)

SDVariable matrixInverse(SDVariable in)
SDVariable matrixInverse(String name, SDVariable in)

Matrix inverse op. For 2D input, this returns the standard matrix inverse.

For higher dimensional input with shape [..., m, m] the matrix inverse is returned for each

shape [m,m] sub-matrix.

in (NUMERIC) - Input

max

INDArray max(INDArray x, INDArray y)

SDVariable max(SDVariable x, SDVariable y)
SDVariable max(String name, SDVariable x, SDVariable y)

Pairwise max operation, out = max(x, y)

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - First input variable, x
y (NUMERIC) - Second input variable, y

mergeAdd

INDArray mergeAdd(INDArray inputs)

SDVariable mergeAdd(SDVariable inputs)
SDVariable mergeAdd(String name, SDVariable inputs)

Merge add function: merges an arbitrary number of equal shaped arrays using element-wise addition:

out = sum_i in[i]

inputs (NUMERIC) - Input variables

mergeAvg

INDArray mergeAvg(INDArray inputs)

SDVariable mergeAvg(SDVariable inputs)
SDVariable mergeAvg(String name, SDVariable inputs)

Merge average function: merges an arbitrary number of equal shaped arrays using element-wise mean operation:

out = mean_i in[i]

inputs (NUMERIC) - Input variables

mergeMax

INDArray mergeMax(INDArray inputs)

SDVariable mergeMax(SDVariable inputs)
SDVariable mergeMax(String name, SDVariable inputs)

Merge max function: merges an arbitrary number of equal shaped arrays using element-wise maximum operation:

out = max_i in[i]

inputs (NUMERIC) - Input variables

meshgrid

INDArray[] meshgrid(INDArray inputs, boolean cartesian)

SDVariable[] meshgrid(SDVariable inputs, boolean cartesian)
SDVariable[] meshgrid(String name, SDVariable inputs, boolean cartesian)

Broadcasts parameters for evaluation on an N-D grid.

inputs (NUMERIC) -
cartesian -

min

INDArray min(INDArray x, INDArray y)

SDVariable min(SDVariable x, SDVariable y)
SDVariable min(String name, SDVariable x, SDVariable y)

Pairwise max operation, out = min(x, y)

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - First input variable, x
y (NUMERIC) - Second input variable, y

mod

INDArray mod(INDArray x, INDArray y)

SDVariable mod(SDVariable x, SDVariable y)
SDVariable mod(String name, SDVariable x, SDVariable y)

Pairwise modulus (remainder) operation, out = x % y

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

moments

INDArray[] moments(INDArray input, int[] axes)

SDVariable[] moments(SDVariable input, int[] axes)
SDVariable[] moments(String name, SDVariable input, int[] axes)

Calculate the mean and (population) variance for the input variable, for the specified axis

input (NUMERIC) - Input to calculate moments for
axes - Dimensions to perform calculation over (Size: AtLeast(min=0))

mul

INDArray mul(INDArray x, INDArray y)

SDVariable mul(SDVariable x, SDVariable y)
SDVariable mul(String name, SDVariable x, SDVariable y)

Pairwise multiplication operation, out = x * y

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

mul

INDArray mul(INDArray x, double value)

SDVariable mul(SDVariable x, double value)
SDVariable mul(String name, SDVariable x, double value)

Scalar multiplication operation, out = in * scalar

x (NUMERIC) - Input variable
value - Scalar value for op

neg

INDArray neg(INDArray x)

SDVariable neg(SDVariable x)
SDVariable neg(String name, SDVariable x)

Elementwise negative operation: out = -x

x (NUMERIC) - Input variable

normalizeMoments

INDArray[] normalizeMoments(INDArray counts, INDArray means, INDArray variances, double shift)

SDVariable[] normalizeMoments(SDVariable counts, SDVariable means, SDVariable variances, double shift)
SDVariable[] normalizeMoments(String name, SDVariable counts, SDVariable means, SDVariable variances, double shift)

Calculate the mean and variance from the sufficient statistics

counts (NUMERIC) - Rank 0 (scalar) value with the total number of values used to calculate the sufficient statistics
means (NUMERIC) - Mean-value sufficient statistics: this is the SUM of all data values
variances (NUMERIC) - Variaance sufficient statistics: this is the squared sum of all data values
shift - Shift value, possibly 0, used when calculating the sufficient statistics (for numerical stability)

or

INDArray or(INDArray x, INDArray y)

SDVariable or(SDVariable x, SDVariable y)
SDVariable or(String name, SDVariable x, SDVariable y)

Boolean OR operation: elementwise (x != 0) || (y != 0)

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

x (BOOL) - Input 1
y (BOOL) - Input 2

pow

INDArray pow(INDArray x, double value)

SDVariable pow(SDVariable x, double value)
SDVariable pow(String name, SDVariable x, double value)

Element-wise power function: out = x^value

x (NUMERIC) - Input variable
value - Scalar value for op

pow

INDArray pow(INDArray x, INDArray y)

SDVariable pow(SDVariable x, SDVariable y)
SDVariable pow(String name, SDVariable x, SDVariable y)

Element-wise (broadcastable) power function: out = x[i]^y[i]

x (NUMERIC) - Input variable
y (NUMERIC) - Power

rationalTanh

INDArray rationalTanh(INDArray x)

SDVariable rationalTanh(SDVariable x)
SDVariable rationalTanh(String name, SDVariable x)

Rational Tanh Approximation elementwise function, as described in the paper:

Compact Convolutional Neural Network Cascade for Face Detection

This is a faster Tanh approximation

x (NUMERIC) - Input variable

rdiv

INDArray rdiv(INDArray x, INDArray y)

SDVariable rdiv(SDVariable x, SDVariable y)
SDVariable rdiv(String name, SDVariable x, SDVariable y)

Pairwise reverse division operation, out = y / x

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

rdiv

INDArray rdiv(INDArray x, double value)

SDVariable rdiv(SDVariable x, double value)
SDVariable rdiv(String name, SDVariable x, double value)

Scalar reverse division operation, out = scalar / in

x (NUMERIC) - Input variable
value - Scalar value for op

reciprocal

INDArray reciprocal(INDArray x)

SDVariable reciprocal(SDVariable x)
SDVariable reciprocal(String name, SDVariable x)

Element-wise reciprocal (inverse) function: out[i] = 1 / in[i]

x (NUMERIC) - Input variable

rectifiedTanh

INDArray rectifiedTanh(INDArray x)

SDVariable rectifiedTanh(SDVariable x)
SDVariable rectifiedTanh(String name, SDVariable x)

Rectified tanh operation: max(0, tanh(in))

x (NUMERIC) - Input variable

round

INDArray round(INDArray x)

SDVariable round(SDVariable x)
SDVariable round(String name, SDVariable x)

Element-wise round function: out = round(x).

Rounds (up or down depending on value) to the nearest integer value.

x (NUMERIC) - Input variable

rsqrt

INDArray rsqrt(INDArray x)

SDVariable rsqrt(SDVariable x)
SDVariable rsqrt(String name, SDVariable x)

Element-wise reciprocal (inverse) of square root: out = 1.0 / sqrt(x)

x (NUMERIC) - Input variable

rsub

INDArray rsub(INDArray x, INDArray y)

SDVariable rsub(SDVariable x, SDVariable y)
SDVariable rsub(String name, SDVariable x, SDVariable y)

Pairwise reverse subtraction operation, out = y - x

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

rsub

INDArray rsub(INDArray x, double value)

SDVariable rsub(SDVariable x, double value)
SDVariable rsub(String name, SDVariable x, double value)

Scalar reverse subtraction operation, out = scalar - in

x (NUMERIC) - Input variable
value - Scalar value for op

setDiag

INDArray setDiag(INDArray in, INDArray diag)

SDVariable setDiag(SDVariable in, SDVariable diag)
SDVariable setDiag(String name, SDVariable in, SDVariable diag)

Set the diagonal value to the specified values

If input is

[ a, b, c]

[ d, e, f]

[ g, h, i]

and diag = [ 1, 2, 3] then output is

[ 1, b, c]

[ d, 2, f]

[ g, h, 3]

in (NUMERIC) - Input variable
diag (NUMERIC) - Diagonal

shannonEntropy

INDArray shannonEntropy(INDArray in, int[] dimensions)

SDVariable shannonEntropy(SDVariable in, int[] dimensions)
SDVariable shannonEntropy(String name, SDVariable in, int[] dimensions)

Shannon Entropy reduction: -sum(x * log2(x))

in (NUMERIC) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

sign

INDArray sign(INDArray x)

SDVariable sign(SDVariable x)
SDVariable sign(String name, SDVariable x)

Element-wise sign (signum) function:

out = -1 if in < 0

out = 0 if in = 0

out = 1 if in > 0

x (NUMERIC) - Input variable

sin

INDArray sin(INDArray x)

SDVariable sin(SDVariable x)
SDVariable sin(String name, SDVariable x)

Elementwise sine operation: out = sin(x)

x (NUMERIC) - Input variable

sinh

INDArray sinh(INDArray x)

SDVariable sinh(SDVariable x)
SDVariable sinh(String name, SDVariable x)

Elementwise sinh (hyperbolic sine) operation: out = sinh(x)

x (NUMERIC) - Input variable

sqrt

INDArray sqrt(INDArray x)

SDVariable sqrt(SDVariable x)
SDVariable sqrt(String name, SDVariable x)

Element-wise square root function: out = sqrt(x)

x (NUMERIC) - Input variable

square

INDArray square(INDArray x)

SDVariable square(SDVariable x)
SDVariable square(String name, SDVariable x)

Element-wise square function: out = x^2

x (NUMERIC) - Input variable

squaredDifference

INDArray squaredDifference(INDArray x, INDArray y)

SDVariable squaredDifference(SDVariable x, SDVariable y)
SDVariable squaredDifference(String name, SDVariable x, SDVariable y)

Pairwise squared difference operation.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

standardize

INDArray standardize(INDArray x, int[] dimensions)

SDVariable standardize(SDVariable x, int[] dimensions)
SDVariable standardize(String name, SDVariable x, int[] dimensions)

Standardize input variable along given axis

out = (x - mean) / stdev

with mean and stdev being calculated along the given dimension.

For example: given x as a mini batch of the shape [numExamples, exampleLength]:

use dimension 1 too use the statistics (mean, stdev) for each example
use dimension 0 if you want to use the statistics for each column across all examples
use dimensions 0,1 if you want to use the statistics across all columns and examples
x (NUMERIC) - Input variable
dimensions - (Size: AtLeast(min=1))

step

INDArray step(INDArray x, double value)

SDVariable step(SDVariable x, double value)
SDVariable step(String name, SDVariable x, double value)

Elementwise step function:

out(x) = 1 if x >= cutoff

out(x) = 0 otherwise

x (NUMERIC) - Input variable
value - Scalar value for op

sub

INDArray sub(INDArray x, INDArray y)

SDVariable sub(SDVariable x, SDVariable y)
SDVariable sub(String name, SDVariable x, SDVariable y)

Pairwise subtraction operation, out = x - y

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

sub

INDArray sub(INDArray x, double value)

SDVariable sub(SDVariable x, double value)
SDVariable sub(String name, SDVariable x, double value)

Scalar subtraction operation, out = in - scalar

x (NUMERIC) - Input variable
value - Scalar value for op

tan

INDArray tan(INDArray x)

SDVariable tan(SDVariable x)
SDVariable tan(String name, SDVariable x)

Elementwise tangent operation: out = tan(x)

x (NUMERIC) - Input variable

tanh

INDArray tanh(INDArray x)

SDVariable tanh(SDVariable x)
SDVariable tanh(String name, SDVariable x)

Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

x (NUMERIC) - Input variable

trace

INDArray trace(INDArray in)

SDVariable trace(SDVariable in)
SDVariable trace(String name, SDVariable in)

Matrix trace operation

For rank 2 matrices, the output is a scalar vith the trace - i.e., sum of the main diagonal.

For higher rank inputs, output[a,b,c] = trace(in[a,b,c,:,:])

in (NUMERIC) - Input variable

xor

INDArray xor(INDArray x, INDArray y)

SDVariable xor(SDVariable x, SDVariable y)
SDVariable xor(String name, SDVariable x, SDVariable y)

Boolean XOR (exclusive OR) operation: elementwise (x != 0) XOR (y != 0)

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

x (BOOL) - Input 1
y (BOOL) - Input 2

zeroFraction

INDArray zeroFraction(INDArray input)

SDVariable zeroFraction(SDVariable input)
SDVariable zeroFraction(String name, SDVariable input)

Full array zero fraction array reduction operation, optionally along specified dimensions: out = (count(x == 0) / length(x))

input (NUMERIC) - Input variable

BaseOps

These ops are generally available directly on SameDiff instances. Due to an oversight before the release, this ops aren't also available on Nd4j. To use the INDArray variants of these operations, you will have to instantiate a NDBase instance.

all

INDArray all(INDArray x, int[] dimensions)

SDVariable all(SDVariable x, int[] dimensions)
SDVariable all(String name, SDVariable x, int[] dimensions)

Boolean and array reduction operation, optionally along specified dimensions

x (NDARRAY) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

any

INDArray any(INDArray x, int[] dimensions)

SDVariable any(SDVariable x, int[] dimensions)
SDVariable any(String name, SDVariable x, int[] dimensions)

Boolean or array reduction operation, optionally along specified dimensions

x (NDARRAY) - Input variable
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

argmax

INDArray argmax(INDArray in, boolean keepDims, int[] dimensions)
INDArray argmax(INDArray in, int[] dimensions)

SDVariable argmax(SDVariable in, boolean keepDims, int[] dimensions)
SDVariable argmax(SDVariable in, int[] dimensions)
SDVariable argmax(String name, SDVariable in, boolean keepDims, int[] dimensions)
SDVariable argmax(String name, SDVariable in, int[] dimensions)

Argmax array reduction operation, optionally along specified dimensions.

Output values are the index of the maximum value of each slice along the specified dimension.

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

in (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

argmin

INDArray argmin(INDArray in, boolean keepDims, int[] dimensions)
INDArray argmin(INDArray in, int[] dimensions)

SDVariable argmin(SDVariable in, boolean keepDims, int[] dimensions)
SDVariable argmin(SDVariable in, int[] dimensions)
SDVariable argmin(String name, SDVariable in, boolean keepDims, int[] dimensions)
SDVariable argmin(String name, SDVariable in, int[] dimensions)

Argmin array reduction operation, optionally along specified dimensions.

Output values are the index of the minimum value of each slice along the specified dimension.

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

in (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

batchMmul

INDArray batchMmul(INDArray inputsA, INDArray inputsB, boolean transposeA, boolean transposeB)
INDArray batchMmul(INDArray inputsA, INDArray inputsB)

SDVariable batchMmul(SDVariable inputsA, SDVariable inputsB, boolean transposeA, boolean transposeB)
SDVariable batchMmul(SDVariable inputsA, SDVariable inputsB)
SDVariable batchMmul(String name, SDVariable inputsA, SDVariable inputsB, boolean transposeA, boolean transposeB)
SDVariable batchMmul(String name, SDVariable inputsA, SDVariable inputsB)

Matrix multiply a batch of matrices. matricesA and matricesB have to be arrays of same

length and each pair taken from these sets has to have dimensions (M, N) and (N, K),

respectively. If transposeA is true, matrices from matricesA will have shape (N, M) instead.

Likewise, if transposeB is true, matrices from matricesB will have shape (K, N).

The result of this operation will be a batch of multiplied matrices. The

result has the same length as both input batches and each output matrix is of shape (M, K).

inputsA (NUMERIC) - First array of input matrices, all of shape (M, N) or (N, M)
inputsB (NUMERIC) - Second array of input matrices, all of shape (N, K) or (K, N)
transposeA - Whether to transpose A arrays or not - default = false
transposeB - Whether to transpose B arrays or not - default = false

castTo

INDArray castTo(INDArray arg, DataType datatype)

SDVariable castTo(SDVariable arg, DataType datatype)
SDVariable castTo(String name, SDVariable arg, DataType datatype)

Cast the array to a new datatype - for example, Integer -> Float

arg (NDARRAY) - Input variable to cast
datatype - Datatype to cast to

concat

INDArray concat(INDArray inputs, int dimension)

SDVariable concat(SDVariable inputs, int dimension)
SDVariable concat(String name, SDVariable inputs, int dimension)

Concatenate a set of inputs along the specified dimension.

Note that inputs must have identical rank and identical dimensions, other than the dimension to stack on.

For example, if 2 inputs have shape [a, x, c] and [a, y, c] and dimension = 1, then the output has shape [a, x+y, c]

inputs (NUMERIC) - Input variables
dimension - Dimension to concatenate on

cumprod

INDArray cumprod(INDArray in, boolean exclusive, boolean reverse, int[] axis)
INDArray cumprod(INDArray in, int[] axis)

SDVariable cumprod(SDVariable in, boolean exclusive, boolean reverse, int[] axis)
SDVariable cumprod(SDVariable in, int[] axis)
SDVariable cumprod(String name, SDVariable in, boolean exclusive, boolean reverse, int[] axis)
SDVariable cumprod(String name, SDVariable in, int[] axis)

Cumulative product operation.

For input: [ a, b, c], output is:

exclusive=false, reverse=false: [a, ab, ab*c]

exclusive=true, reverse=false, [0, a, a*b]

exclusive=false, reverse=true: [abc, b*c, c]

exclusive=true, reverse=true: [b*c, c, 0]

in (NUMERIC) - Input variable
exclusive - If true: exclude the first value - default = false
reverse - If true: reverse the direction of the accumulation - default = false
axis - Scalar axis argument for dimension to perform cumululative sum operations along (Size: AtLeast(min=1))

cumsum

INDArray cumsum(INDArray in, boolean exclusive, boolean reverse, int[] axis)
INDArray cumsum(INDArray in, int[] axis)

SDVariable cumsum(SDVariable in, boolean exclusive, boolean reverse, int[] axis)
SDVariable cumsum(SDVariable in, int[] axis)
SDVariable cumsum(String name, SDVariable in, boolean exclusive, boolean reverse, int[] axis)
SDVariable cumsum(String name, SDVariable in, int[] axis)

Cumulative sum operation.

For input: [ a, b, c], output is:

exclusive=false, reverse=false: [a, a+b, a+b+c]

exclusive=true, reverse=false, [0, a, a+b]

exclusive=false, reverse=true: [a+b+c, b+c, c]

exclusive=true, reverse=true: [b+c, c, 0]

in (NUMERIC) - Input variable
exclusive - If true: exclude the first value - default = false
reverse - If true: reverse the direction of the accumulation - default = false
axis - Scalar axis argument for dimension to perform cumululative sum operations along (Size: AtLeast(min=1))

dot

INDArray dot(INDArray x, INDArray y, int[] dimensions)

SDVariable dot(SDVariable x, SDVariable y, int[] dimensions)
SDVariable dot(String name, SDVariable x, SDVariable y, int[] dimensions)

Pairwise dot product reduction along dimension

output = sum(i=0 ... size(dim)-1) x[i] * y[i]

x (NUMERIC) - first input
y (NUMERIC) - second input
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

dynamicPartition

INDArray dynamicPartition(INDArray x, INDArray partitions, int numPartitions)

SDVariable dynamicPartition(SDVariable x, SDVariable partitions, int numPartitions)
SDVariable dynamicPartition(String name, SDVariable x, SDVariable partitions, int numPartitions)

Dynamically partition the input variable values into the specified number of paritions, using the indices.

Example:



input = [1,2,3,4,5]

numPartitions = 2

partitions = [1,0,0,1,0]

out[0] = [2,3,5]

out[1] = [1,4] `

x (NUMERIC) - Input variable
partitions (INT) - 1D input with values 0 to numPartitions-1
numPartitions - Number of partitions, >= 1

dynamicStitch

INDArray dynamicStitch(INDArray indices, INDArray x)

SDVariable dynamicStitch(SDVariable indices, SDVariable x)
SDVariable dynamicStitch(String name, SDVariable indices, SDVariable x)

Dynamically merge the specified input arrays into a single array, using the specified indices

indices (INT) - Indices to use when merging. Must be >= 1, same length as input variables
x (NUMERIC) - Input variables.

eq

INDArray eq(INDArray x, double y)

SDVariable eq(SDVariable x, double y)
SDVariable eq(String name, SDVariable x, double y)

Equals operation: elementwise x == y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

eq

INDArray eq(INDArray x, INDArray y)

SDVariable eq(SDVariable x, SDVariable y)
SDVariable eq(String name, SDVariable x, SDVariable y)

Equal to operation: elementwise x == y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

expandDims

INDArray expandDims(INDArray x, int axis)

SDVariable expandDims(SDVariable x, int axis)
SDVariable expandDims(String name, SDVariable x, int axis)

Reshape the input by adding a 1 at the specified location.

For example, if input has shape [a, b], then output shape is:

axis = 0: [1, a, b]

axis = 1: [a, 1, b]

axis = 2: [a, b, 1]

x (NDARRAY) - Input variable
axis - Axis to expand

fill

INDArray fill(INDArray shape, DataType dataType, double value)

SDVariable fill(SDVariable shape, DataType dataType, double value)
SDVariable fill(String name, SDVariable shape, DataType dataType, double value)

Generate an output variable with the specified (dynamic) shape with all elements set to the specified value

shape (INT) - Shape: must be a 1D array/variable
dataType - Datatype of the output array
value - Value to set all elements to

gather

INDArray gather(INDArray df, int[] indices, int axis)

SDVariable gather(SDVariable df, int[] indices, int axis)
SDVariable gather(String name, SDVariable df, int[] indices, int axis)

Gather slices from the input variable where the indices are specified as fixed int[] values.

Output shape is same as input shape, except for axis dimension, which has size equal to indices.length.

df (NUMERIC) - Input variable
indices - Indices to get (Size: AtLeast(min=1))
axis - Axis that the indices refer to

gather

INDArray gather(INDArray df, INDArray indices, int axis)

SDVariable gather(SDVariable df, SDVariable indices, int axis)
SDVariable gather(String name, SDVariable df, SDVariable indices, int axis)

Gather slices from the input variable where the indices are specified as dynamic array values.

Output shape is same as input shape, except for axis dimension, which has size equal to indices.length.

df (NUMERIC) - Input variable
indices (INT) - Indices to get slices for. Rank 0 or 1 input
axis - Axis that the indices refer to

gatherNd

INDArray gatherNd(INDArray df, INDArray indices)

SDVariable gatherNd(SDVariable df, SDVariable indices)
SDVariable gatherNd(String name, SDVariable df, SDVariable indices)

Gather slices from df with shape specified by indices.

df (NUMERIC) -
indices (NUMERIC) -

gt

INDArray gt(INDArray x, double y)

SDVariable gt(SDVariable x, double y)
SDVariable gt(String name, SDVariable x, double y)

Greater than operation: elementwise x > y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

gt

INDArray gt(INDArray x, INDArray y)

SDVariable gt(SDVariable x, SDVariable y)
SDVariable gt(String name, SDVariable x, SDVariable y)

Greater than operation: elementwise x > y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

gte

INDArray gte(INDArray x, double y)

SDVariable gte(SDVariable x, double y)
SDVariable gte(String name, SDVariable x, double y)

Greater than or equals operation: elementwise x >= y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

gte

INDArray gte(INDArray x, INDArray y)

SDVariable gte(SDVariable x, SDVariable y)
SDVariable gte(String name, SDVariable x, SDVariable y)

Greater than or equal to operation: elementwise x >= y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

identity

INDArray identity(INDArray input)

SDVariable identity(SDVariable input)
SDVariable identity(String name, SDVariable input)

Elementwise identity operation: out = x

input (NUMERIC) - Input variable

invertPermutation

INDArray invertPermutation(INDArray input)

SDVariable invertPermutation(SDVariable input)
SDVariable invertPermutation(String name, SDVariable input)

Compute the inverse permutation indices for a permutation operation

Example: if input is [2, 0, 1] then output is [1, 2, 0]

The idea is that x.permute(input).permute(invertPermutation(input)) == x

input (INT) - 1D indices for permutation

isNumericTensor

INDArray isNumericTensor(INDArray x)

SDVariable isNumericTensor(SDVariable x)
SDVariable isNumericTensor(String name, SDVariable x)

Is the director a numeric tensor? In the current version of ND4J/SameDiff, this always returns true/1

x (NUMERIC) - Input variable

linspace

INDArray linspace(DataType dataType, double start, double stop, long number)

SDVariable linspace(DataType dataType, double start, double stop, long number)
SDVariable linspace(String name, DataType dataType, double start, double stop, long number)

Create a new 1d array with values evenly spaced between values 'start' and 'stop'

For example, linspace(start=3.0, stop=4.0, number=3) will generate [3.0, 3.5, 4.0]

dataType - Data type of the output array
start - Start value
stop - Stop value
number - Number of values to generate

linspace

INDArray linspace(INDArray start, INDArray stop, INDArray number, DataType dataType)

SDVariable linspace(SDVariable start, SDVariable stop, SDVariable number, DataType dataType)
SDVariable linspace(String name, SDVariable start, SDVariable stop, SDVariable number, DataType dataType)

Create a new 1d array with values evenly spaced between values 'start' and 'stop'

For example, linspace(start=3.0, stop=4.0, number=3) will generate [3.0, 3.5, 4.0]

start (NUMERIC) - Start value
stop (NUMERIC) - Stop value
number (LONG) - Number of values to generate
dataType - Data type of the output array

lt

INDArray lt(INDArray x, double y)

SDVariable lt(SDVariable x, double y)
SDVariable lt(String name, SDVariable x, double y)

Less than operation: elementwise x < y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

lt

INDArray lt(INDArray x, INDArray y)

SDVariable lt(SDVariable x, SDVariable y)
SDVariable lt(String name, SDVariable x, SDVariable y)

Less than operation: elementwise x < y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

lte

INDArray lte(INDArray x, double y)

SDVariable lte(SDVariable x, double y)
SDVariable lte(String name, SDVariable x, double y)

Less than or equals operation: elementwise x <= y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

lte

INDArray lte(INDArray x, INDArray y)

SDVariable lte(SDVariable x, SDVariable y)
SDVariable lte(String name, SDVariable x, SDVariable y)

Less than or equal to operation: elementwise x <= y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

matchCondition

INDArray matchCondition(INDArray in, Condition condition)

SDVariable matchCondition(SDVariable in, Condition condition)
SDVariable matchCondition(String name, SDVariable in, Condition condition)

Returns a boolean mask of equal shape to the input, where the condition is satisfied - value 1 where satisfied, 0 otherwise

in (NUMERIC) - Input
condition - Condition

matchConditionCount

INDArray matchConditionCount(INDArray in, Condition condition)

SDVariable matchConditionCount(SDVariable in, Condition condition)
SDVariable matchConditionCount(String name, SDVariable in, Condition condition)

Returns a count of the number of elements that satisfy the condition

in (NUMERIC) - Input
condition - Condition

matchConditionCount

INDArray matchConditionCount(INDArray in, Condition condition, boolean keepDim, int[] dimensions)
INDArray matchConditionCount(INDArray in, Condition condition, int[] dimensions)

SDVariable matchConditionCount(SDVariable in, Condition condition, boolean keepDim, int[] dimensions)
SDVariable matchConditionCount(SDVariable in, Condition condition, int[] dimensions)
SDVariable matchConditionCount(String name, SDVariable in, Condition condition, boolean keepDim, int[] dimensions)
SDVariable matchConditionCount(String name, SDVariable in, Condition condition, int[] dimensions)

Returns a count of the number of elements that satisfy the condition (for each slice along the specified dimensions)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

in (NUMERIC) - Input variable
condition - Condition
keepDim - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

max

INDArray max(INDArray x, boolean keepDims, int[] dimensions)
INDArray max(INDArray x, int[] dimensions)

SDVariable max(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable max(SDVariable x, int[] dimensions)
SDVariable max(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable max(String name, SDVariable x, int[] dimensions)

Max array reduction operation, optionally along specified dimensions

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

max

INDArray max(INDArray first, INDArray second)

SDVariable max(SDVariable first, SDVariable second)
SDVariable max(String name, SDVariable first, SDVariable second)

Element-wise maximum operation: out[i] = max(first[i], second[i])

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

first (NUMERIC) - First input array
second (NUMERIC) - Second input array

mean

INDArray mean(INDArray x, boolean keepDims, int[] dimensions)
INDArray mean(INDArray x, int[] dimensions)

SDVariable mean(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable mean(SDVariable x, int[] dimensions)
SDVariable mean(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable mean(String name, SDVariable x, int[] dimensions)

Mean (average) array reduction operation, optionally along specified dimensions

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

merge

INDArray merge(INDArray x, INDArray y)

SDVariable merge(SDVariable x, SDVariable y)
SDVariable merge(String name, SDVariable x, SDVariable y)

The merge operation is a control operation that forwards the either of the inputs to the output, when

the first of them becomes available. If both are available, the output is undefined (either input could

be forwarded to the output)

x (NUMERIC) - Input variable
y (NUMERIC) - Input variable

min

INDArray min(INDArray x, boolean keepDims, int[] dimensions)
INDArray min(INDArray x, int[] dimensions)

SDVariable min(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable min(SDVariable x, int[] dimensions)
SDVariable min(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable min(String name, SDVariable x, int[] dimensions)

Minimum array reduction operation, optionally along specified dimensions. out = min(in)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

min

INDArray min(INDArray first, INDArray second)

SDVariable min(SDVariable first, SDVariable second)
SDVariable min(String name, SDVariable first, SDVariable second)

Element-wise minimum operation: out[i] = min(first[i], second[i])

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

first (NUMERIC) - First input array
second (NUMERIC) - Second input array

mmul

INDArray mmul(INDArray x, INDArray y, boolean transposeX, boolean transposeY, boolean transposeZ)
INDArray mmul(INDArray x, INDArray y)

SDVariable mmul(SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
SDVariable mmul(SDVariable x, SDVariable y)
SDVariable mmul(String name, SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
SDVariable mmul(String name, SDVariable x, SDVariable y)

Matrix multiplication: out = mmul(x,y)

Supports specifying transpose argument to perform operation such as mmul(a^T, b), etc.

x (NUMERIC) - First input variable
y (NUMERIC) - Second input variable
transposeX - Transpose x (first argument) - default = false
transposeY - Transpose y (second argument) - default = false
transposeZ - Transpose result array - default = false

neq

INDArray neq(INDArray x, double y)

SDVariable neq(SDVariable x, double y)
SDVariable neq(String name, SDVariable x, double y)

Not equals operation: elementwise x != y

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input array
y - Double value argument to use in operation

neq

INDArray neq(INDArray x, INDArray y)

SDVariable neq(SDVariable x, SDVariable y)
SDVariable neq(String name, SDVariable x, SDVariable y)

Not equal to operation: elementwise x != y

If x and y arrays have equal shape, the output shape is the same as these inputs.

Note: supports broadcasting if x and y have different shapes and are broadcastable.

For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

Broadcast rules are the same as NumPy: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Return boolean array with values true where satisfied, or false otherwise.

x (NUMERIC) - Input 1
y (NUMERIC) - Input 2

norm1

INDArray norm1(INDArray x, boolean keepDims, int[] dimensions)
INDArray norm1(INDArray x, int[] dimensions)

SDVariable norm1(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable norm1(SDVariable x, int[] dimensions)
SDVariable norm1(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable norm1(String name, SDVariable x, int[] dimensions)

Norm1 (L1 norm) reduction operation: The output contains the L1 norm for each tensor/subset along the specified dimensions:

out = sum_i abs(x[i])

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - dimensions to reduce over (Size: AtLeast(min=0))

norm2

INDArray norm2(INDArray x, boolean keepDims, int[] dimensions)
INDArray norm2(INDArray x, int[] dimensions)

SDVariable norm2(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable norm2(SDVariable x, int[] dimensions)
SDVariable norm2(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable norm2(String name, SDVariable x, int[] dimensions)

Norm2 (L2 norm) reduction operation: The output contains the L2 norm for each tensor/subset along the specified dimensions:

out = sqrt(sum_i x[i]^2)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - dimensions dimensions to reduce over (Size: AtLeast(min=0))

normmax

INDArray normmax(INDArray x, boolean keepDims, int[] dimensions)
INDArray normmax(INDArray x, int[] dimensions)

SDVariable normmax(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable normmax(SDVariable x, int[] dimensions)
SDVariable normmax(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable normmax(String name, SDVariable x, int[] dimensions)

Max norm (infinity norm) reduction operation: The output contains the max norm for each tensor/subset along the

specified dimensions:

out = max(abs(x[i]))

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - dimensions to reduce over (Size: AtLeast(min=0))

oneHot

INDArray oneHot(INDArray indices, int depth, int axis, double on, double off, DataType dataType)
INDArray oneHot(INDArray indices, int depth, int axis, double on, double off)

SDVariable oneHot(SDVariable indices, int depth, int axis, double on, double off, DataType dataType)
SDVariable oneHot(SDVariable indices, int depth, int axis, double on, double off)
SDVariable oneHot(String name, SDVariable indices, int depth, int axis, double on, double off, DataType dataType)
SDVariable oneHot(String name, SDVariable indices, int depth, int axis, double on, double off)

Convert the array to a one-hot array with walues and for each entry

If input has shape [ a, ..., n] then output has shape [ a, ..., n, depth],

with {out[i, ..., j, in[i,...,j]] with other values being set to

indices (NUMERIC) - Indices - value 0 to depth-1
depth - Number of classes
axis -
on -
off -
dataType - Output data type - default = DataType.FLOAT

oneHot

INDArray oneHot(INDArray indices, int depth)

SDVariable oneHot(SDVariable indices, int depth)
SDVariable oneHot(String name, SDVariable indices, int depth)

Convert the array to a one-hot array with walues 0 and 1 for each entry

If input has shape [ a, ..., n] then output has shape [ a, ..., n, depth],

with out[i, ..., j, in[i,...,j]] = 1 with other values being set to 0

see oneHot(SDVariable, int, int, double, double)

indices (NUMERIC) - Indices - value 0 to depth-1
depth - Number of classes

onesLike

INDArray onesLike(INDArray input)

SDVariable onesLike(SDVariable input)
SDVariable onesLike(String name, SDVariable input)

Return a variable of all 1s, with the same shape as the input variable. Note that this is dynamic:

if the input shape changes in later execution, the returned variable's shape will also be updated

input (NUMERIC) - Input INDArray

onesLike

INDArray onesLike(INDArray input, DataType dataType)

SDVariable onesLike(SDVariable input, DataType dataType)
SDVariable onesLike(String name, SDVariable input, DataType dataType)

As per onesLike(String, SDVariable) but the output datatype may be specified

input (NUMERIC) -
dataType -

permute

INDArray permute(INDArray x, INDArray dimensions)

SDVariable permute(SDVariable x, SDVariable dimensions)
SDVariable permute(String name, SDVariable x, SDVariable dimensions)

Array permutation operation: permute the dimensions according to the specified permutation indices.

Example: if input has shape [a,b,c] and dimensions = [2,0,1] the output has shape [c,a,b]

x (NUMERIC) - Input variable
dimensions (INT) - Permute dimensions

permute

INDArray permute(INDArray x, int[] dimensions)

SDVariable permute(SDVariable x, int[] dimensions)
SDVariable permute(String name, SDVariable x, int[] dimensions)

Array permutation operation: permute the dimensions according to the specified permutation indices.

Example: if input has shape [a,b,c] and dimensions = [2,0,1] the output has shape [c,a,b]

x (NUMERIC) - Input variable
dimensions - (Size: AtLeast(min=0))

prod

INDArray prod(INDArray x, boolean keepDims, int[] dimensions)
INDArray prod(INDArray x, int[] dimensions)

SDVariable prod(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable prod(SDVariable x, int[] dimensions)
SDVariable prod(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable prod(String name, SDVariable x, int[] dimensions)

Product array reduction operation, optionally along specified dimensions

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

range

INDArray range(double from, double to, double step, DataType dataType)

SDVariable range(double from, double to, double step, DataType dataType)
SDVariable range(String name, double from, double to, double step, DataType dataType)

Create a new variable with a 1d array, where the values start at from and increment by step

up to (but not including) limit.

For example, range(1.0, 3.0, 0.5) will return [1.0, 1.5, 2.0, 2.5]

from - Initial/smallest value
to - Largest value (exclusive)
step - Step size
dataType -

range

INDArray range(INDArray from, INDArray to, INDArray step, DataType dataType)

SDVariable range(SDVariable from, SDVariable to, SDVariable step, DataType dataType)
SDVariable range(String name, SDVariable from, SDVariable to, SDVariable step, DataType dataType)

Create a new variable with a 1d array, where the values start at from and increment by step

up to (but not including) limit.

For example, range(1.0, 3.0, 0.5) will return [1.0, 1.5, 2.0, 2.5]

from (NUMERIC) - Initial/smallest value
to (NUMERIC) - Largest value (exclusive)
step (NUMERIC) - Step size
dataType -

rank

INDArray rank(INDArray in)

SDVariable rank(SDVariable in)
SDVariable rank(String name, SDVariable in)

Returns the rank (number of dimensions, i.e., length(shape)) of the specified INDArray as a 0D scalar variable

in (NUMERIC) - Input variable

replaceWhere

INDArray replaceWhere(INDArray update, INDArray from, Condition condition)

SDVariable replaceWhere(SDVariable update, SDVariable from, Condition condition)
SDVariable replaceWhere(String name, SDVariable update, SDVariable from, Condition condition)

Element-wise replace where condition:

out[i] = from[i] if condition(update[i]) is satisfied, or

out[i] = update[i] if condition(update[i]) is NOT satisfied

update (NUMERIC) - Source array
from (NUMERIC) - Replacement values array (used conditionally). Must be same shape as 'update' array
condition - Condition to check on update array elements

replaceWhere

INDArray replaceWhere(INDArray update, double value, Condition condition)

SDVariable replaceWhere(SDVariable update, double value, Condition condition)
SDVariable replaceWhere(String name, SDVariable update, double value, Condition condition)

Element-wise replace where condition:

out[i] = value if condition(update[i]) is satisfied, or

out[i] = update[i] if condition(update[i]) is NOT satisfied

update (NUMERIC) - Source array
value - Value to set at the output, if the condition is satisfied
condition - Condition to check on update array elements

reshape

INDArray reshape(INDArray x, INDArray shape)

SDVariable reshape(SDVariable x, SDVariable shape)
SDVariable reshape(String name, SDVariable x, SDVariable shape)

Reshape the input variable to the specified (fixed) shape. The output variable will have the same values as the

input, but with the specified shape.

Note that prod(shape) must match length(input) == prod(input.shape)

x (NUMERIC) - Input variable
shape (NUMERIC) - New shape for variable

reshape

INDArray reshape(INDArray x, long[] shape)

SDVariable reshape(SDVariable x, long[] shape)
SDVariable reshape(String name, SDVariable x, long[] shape)

Reshape the input variable to the specified (fixed) shape. The output variable will have the same values as the

input, but with the specified shape.

Note that prod(shape) must match length(input) == prod(input.shape)

x (NUMERIC) - Input variable
shape - New shape for variable (Size: AtLeast(min=0))

reverse

INDArray reverse(INDArray x, int[] dimensions)

SDVariable reverse(SDVariable x, int[] dimensions)
SDVariable reverse(String name, SDVariable x, int[] dimensions)

Reverse the values of an array for the specified dimensions

If input is:

[ 1, 2, 3]

[ 4, 5, 6]

then

reverse(in, 0):

[3, 2, 1]

[6, 5, 4]

reverse(in, 1):

[4, 5, 6]

[1, 2 3]

x (NUMERIC) - Input variable
dimensions - Input variable (Size: AtLeast(min=0))

reverseSequence

INDArray reverseSequence(INDArray x, INDArray seq_lengths, int seqDim, int batchDim)
INDArray reverseSequence(INDArray x, INDArray seq_lengths)

SDVariable reverseSequence(SDVariable x, SDVariable seq_lengths, int seqDim, int batchDim)
SDVariable reverseSequence(SDVariable x, SDVariable seq_lengths)
SDVariable reverseSequence(String name, SDVariable x, SDVariable seq_lengths, int seqDim, int batchDim)
SDVariable reverseSequence(String name, SDVariable x, SDVariable seq_lengths)

Reverse sequence op: for each slice along dimension seqDimension, the first seqLength values are reversed

x (NUMERIC) - Input variable
seq_lengths (INT) - Length of the sequences
seqDim - Sequence dimension - default = -1
batchDim - Batch dimension - default = 0

scalarFloorMod

INDArray scalarFloorMod(INDArray in, double value)

SDVariable scalarFloorMod(SDVariable in, double value)
SDVariable scalarFloorMod(String name, SDVariable in, double value)

Element-wise scalar floor modulus operation: out = floorMod(in, value).

i.e., returns the remainder after division by 'value'

in (NUMERIC) - Input variable
value - Scalar value to compare

scalarMax

INDArray scalarMax(INDArray in, double value)

SDVariable scalarMax(SDVariable in, double value)
SDVariable scalarMax(String name, SDVariable in, double value)

Element-wise scalar maximum operation: out = max(in, value)

in (NUMERIC) - Input variable
value - Scalar value to compare

scalarMin

INDArray scalarMin(INDArray in, double value)

SDVariable scalarMin(SDVariable in, double value)
SDVariable scalarMin(String name, SDVariable in, double value)

Element-wise scalar minimum operation: out = min(in, value)

in (NUMERIC) - Input variable
value - Scalar value to compare

scalarSet

INDArray scalarSet(INDArray in, double set)

SDVariable scalarSet(SDVariable in, double set)
SDVariable scalarSet(String name, SDVariable in, double set)

Return a variable with equal shape to the input, but all elements set to value 'set'

in (NUMERIC) - Input variable
set - Value to set

scatterAdd

INDArray scatterAdd(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterAdd(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterAdd(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter addition operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterDiv

INDArray scatterDiv(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterDiv(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterDiv(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter division operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterMax

INDArray scatterMax(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterMax(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterMax(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter max operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterMin

INDArray scatterMin(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterMin(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterMin(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter min operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterMul

INDArray scatterMul(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterMul(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterMul(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter multiplication operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterSub

INDArray scatterSub(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterSub(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterSub(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter subtraction operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

scatterUpdate

INDArray scatterUpdate(INDArray ref, INDArray indices, INDArray updates)

SDVariable scatterUpdate(SDVariable ref, SDVariable indices, SDVariable updates)
SDVariable scatterUpdate(String name, SDVariable ref, SDVariable indices, SDVariable updates)

Scatter update operation.

If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

ref (NUMERIC) - Initial/source variable
indices (NUMERIC) - Indices array
updates (NUMERIC) - Updates to add to the initial/source array

segmentMax

INDArray segmentMax(INDArray data, INDArray segmentIds)

SDVariable segmentMax(SDVariable data, SDVariable segmentIds)
SDVariable segmentMax(String name, SDVariable data, SDVariable segmentIds)

Segment max operation.

If data = [3, 6, 1, 4, 9, 2, 8]

segmentIds = [0, 0, 1, 1, 1, 2, 2]

then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

Note that the segment IDs must be sorted from smallest to largest segment.

See {unsortedSegment (String, SDVariable, SDVariable, int) ops

for the same op without this sorted requirement

data (NDARRAY) - Data to perform segment max on
segmentIds (NUMERIC) - Variable for the segment IDs

segmentMean

INDArray segmentMean(INDArray data, INDArray segmentIds)

SDVariable segmentMean(SDVariable data, SDVariable segmentIds)
SDVariable segmentMean(String name, SDVariable data, SDVariable segmentIds)

Segment mean operation.

If data = [3, 6, 1, 4, 9, 2, 8]

segmentIds = [0, 0, 1, 1, 1, 2, 2]

then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

Note that the segment IDs must be sorted from smallest to largest segment.

See {unsortedSegment (String, SDVariable, SDVariable, int) ops

for the same op without this sorted requirement

data (NDARRAY) - Data to perform segment max on
segmentIds (NUMERIC) - Variable for the segment IDs

segmentMin

INDArray segmentMin(INDArray data, INDArray segmentIds)

SDVariable segmentMin(SDVariable data, SDVariable segmentIds)
SDVariable segmentMin(String name, SDVariable data, SDVariable segmentIds)

Segment min operation.

If data = [3, 6, 1, 4, 9, 2, 8]

segmentIds = [0, 0, 1, 1, 1, 2, 2]

then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

Note that the segment IDs must be sorted from smallest to largest segment.

See {unsortedSegment (String, SDVariable, SDVariable, int) ops

for the same op without this sorted requirement

data (NDARRAY) - Data to perform segment max on
segmentIds (NUMERIC) - Variable for the segment IDs

segmentProd

INDArray segmentProd(INDArray data, INDArray segmentIds)

SDVariable segmentProd(SDVariable data, SDVariable segmentIds)
SDVariable segmentProd(String name, SDVariable data, SDVariable segmentIds)

Segment product operation.

If data = [3, 6, 1, 4, 9, 2, 8]

segmentIds = [0, 0, 1, 1, 1, 2, 2]

then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

Note that the segment IDs must be sorted from smallest to largest segment.

See {unsortedSegment (String, SDVariable, SDVariable, int) ops

for the same op without this sorted requirement

data (NDARRAY) - Data to perform segment max on
segmentIds (NUMERIC) - Variable for the segment IDs

segmentSum

INDArray segmentSum(INDArray data, INDArray segmentIds)

SDVariable segmentSum(SDVariable data, SDVariable segmentIds)
SDVariable segmentSum(String name, SDVariable data, SDVariable segmentIds)

Segment sum operation.

If data = [3, 6, 1, 4, 9, 2, 8]

segmentIds = [0, 0, 1, 1, 1, 2, 2]

then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

Note that the segment IDs must be sorted from smallest to largest segment.

See {unsortedSegment (String, SDVariable, SDVariable, int) ops

for the same op without this sorted requirement

data (NDARRAY) - Data to perform segment max on
segmentIds (NUMERIC) - Variable for the segment IDs

sequenceMask

INDArray sequenceMask(INDArray lengths, int maxLen, DataType dataType)

SDVariable sequenceMask(SDVariable lengths, int maxLen, DataType dataType)
SDVariable sequenceMask(String name, SDVariable lengths, int maxLen, DataType dataType)

Generate a sequence mask (with values 0 or 1) based on the specified lengths

Specifically, out[i, ..., k, j] = (j < lengths[i, ..., k] ? 1.0 : 0.0)

lengths (NUMERIC) - Lengths of the sequences
maxLen - Maximum sequence length
dataType -

sequenceMask

INDArray sequenceMask(INDArray lengths, INDArray maxLen, DataType dataType)

SDVariable sequenceMask(SDVariable lengths, SDVariable maxLen, DataType dataType)
SDVariable sequenceMask(String name, SDVariable lengths, SDVariable maxLen, DataType dataType)

Generate a sequence mask (with values 0 or 1) based on the specified lengths

Specifically, out[i, ..., k, j] = (j < lengths[i, ..., k] ? 1.0 : 0.0)

lengths (NUMERIC) - Lengths of the sequences
maxLen (INT) - Maximum sequence length
dataType -

sequenceMask

INDArray sequenceMask(INDArray lengths, DataType dataType)

SDVariable sequenceMask(SDVariable lengths, DataType dataType)
SDVariable sequenceMask(String name, SDVariable lengths, DataType dataType)

see sequenceMask(String, SDVariable, SDVariable, DataType)

lengths (NUMERIC) -
dataType -

shape

INDArray shape(INDArray input)

SDVariable shape(SDVariable input)
SDVariable shape(String name, SDVariable input)

Returns the shape of the specified INDArray as a 1D INDArray

input (NUMERIC) - Input variable

size

INDArray size(INDArray in)

SDVariable size(SDVariable in)
SDVariable size(String name, SDVariable in)

Returns the size (number of elements, i.e., prod(shape)) of the specified INDArray as a 0D scalar variable

in (NUMERIC) - Input variable

sizeAt

INDArray sizeAt(INDArray in, int dimension)

SDVariable sizeAt(SDVariable in, int dimension)
SDVariable sizeAt(String name, SDVariable in, int dimension)

Returns a rank 0 (scalar) variable for the size of the specified dimension.

For example, if X has shape [10,20,30] then sizeAt(X,1)=20. Similarly, sizeAt(X,-1)=30

in (NUMERIC) - Input variable
dimension - Dimension to get size of

slice

INDArray slice(INDArray input, int[] begin, int[] size)

SDVariable slice(SDVariable input, int[] begin, int[] size)
SDVariable slice(String name, SDVariable input, int[] begin, int[] size)

Get a subset of the specified input, by specifying the first element and the size of the array.

For example, if input is:

[a, b, c]

[d, e, f]

then slice(input, begin=[0,1], size=[2,1] will return:

[b]

[e]

Note that for each dimension i, begin[i] + size[i] <= input.size(i)

input (NUMERIC) - input Variable to get subset of
begin - Beginning index. Must be same length as rank of input array (Size: AtLeast(min=1))
size - Size of the output array. Must be same length as rank of input array (Size: AtLeast(min=1))

slice

INDArray slice(INDArray input, INDArray begin, INDArray size)

SDVariable slice(SDVariable input, SDVariable begin, SDVariable size)
SDVariable slice(String name, SDVariable input, SDVariable begin, SDVariable size)

Get a subset of the specified input, by specifying the first element and the size of the array.

For example, if input is:

[a, b, c]

[d, e, f]

then slice(input, begin=[0,1], size=[2,1] will return:

[b]

[e]

Note that for each dimension i, begin[i] + size[i] <= input.size(i)

input (NUMERIC) - input Variable to get subset of
begin (INT) - Beginning index. Must be same length as rank of input array
size (INT) - Size of the output array. Must be same length as rank of input array

squaredNorm

INDArray squaredNorm(INDArray x, boolean keepDims, int[] dimensions)
INDArray squaredNorm(INDArray x, int[] dimensions)

SDVariable squaredNorm(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable squaredNorm(SDVariable x, int[] dimensions)
SDVariable squaredNorm(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable squaredNorm(String name, SDVariable x, int[] dimensions)

Squared L2 norm: see norm2(String, SDVariable, boolean, int...)

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) -
keepDims - - default = false
dimensions - (Size: AtLeast(min=0))

squeeze

INDArray squeeze(INDArray x, int axis)

SDVariable squeeze(SDVariable x, int axis)
SDVariable squeeze(String name, SDVariable x, int axis)

Remove a single dimension of size 1.

For example, if input has shape [a,b,1,c] then squeeze(input, 2) returns an array of shape [a,b,c]

x (NUMERIC) - Input variable
axis - Size 1 dimension to remove

stack

INDArray stack(INDArray values, int axis)

SDVariable stack(SDVariable values, int axis)
SDVariable stack(String name, SDVariable values, int axis)

Stack a set of N INDArray of rank X into one rank X+1 variable.

If inputs have shape [a,b,c] then output has shape:

axis = 0: [N,a,b,c]

axis = 1: [a,N,b,c]

axis = 2: [a,b,N,c]

axis = 3: [a,b,c,N]

see unstack(String[], SDVariable, int, int)

values (NDARRAY) - Input variables to stack. Must have the same shape for all inputs
axis - Axis to stack on

standardDeviation

INDArray standardDeviation(INDArray x, boolean biasCorrected, boolean keepDims, int[] dimensions)
INDArray standardDeviation(INDArray x, boolean biasCorrected, int[] dimensions)

SDVariable standardDeviation(SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
SDVariable standardDeviation(SDVariable x, boolean biasCorrected, int[] dimensions)
SDVariable standardDeviation(String name, SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
SDVariable standardDeviation(String name, SDVariable x, boolean biasCorrected, int[] dimensions)

Stardard deviation array reduction operation, optionally along specified dimensions

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
biasCorrected - If true: divide by (N-1) (i.e., sample stdev). If false: divide by N (population stdev)
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

stridedSlice

INDArray stridedSlice(INDArray in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
INDArray stridedSlice(INDArray in, long[] begin, long[] end, long[] strides)

SDVariable stridedSlice(SDVariable in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
SDVariable stridedSlice(SDVariable in, long[] begin, long[] end, long[] strides)
SDVariable stridedSlice(String name, SDVariable in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
SDVariable stridedSlice(String name, SDVariable in, long[] begin, long[] end, long[] strides)

Get a subset of the specified input, by specifying the first element, last element, and the strides.

For example, if input is:

[a, b, c]

[d, e, f]

[g, h, i]

then stridedSlice(input, begin=[0,1], end=[2,2], strides=[2,1], all masks = 0) will return:

[b, c]

[h, i]

in (NUMERIC) - Variable to get subset of
begin - Beginning index (Size: AtLeast(min=1))
end - End index (Size: AtLeast(min=1))
strides - Stride ("step size") for each dimension. For example, stride of 2 means take every second element. (Size: AtLeast(min=1))
beginMask - Bit mask: If the ith bit is set to 1, then the value in the begin long[] is ignored, and a value of 0 is used instead for the beginning index for that dimension - default = 0
endMask - Bit mask: If the ith bit is set to 1, then the value in the end long[] is ignored, and a value of size(i)-1 is used instead for the end index for that dimension - default = 0
ellipsisMask - Bit mask: only one non-zero value is allowed here. If a non-zero value is set, then other dimensions are inserted as required at the specified position - default = 0
newAxisMask - Bit mask: if the ith bit is set to 1, then the begin/end/stride values are ignored, and a size 1 dimension is inserted at this point - default = 0
shrinkAxisMask - Bit mask: if the ith bit is set to 1, then the begin/end/stride values are ignored, and a size 1 dimension is removed at this point. Note that begin/end/stride values must result in a size 1 output for these dimensions - default = 0

sum

INDArray sum(INDArray x, boolean keepDims, int[] dimensions)
INDArray sum(INDArray x, int[] dimensions)

SDVariable sum(SDVariable x, boolean keepDims, int[] dimensions)
SDVariable sum(SDVariable x, int[] dimensions)
SDVariable sum(String name, SDVariable x, boolean keepDims, int[] dimensions)
SDVariable sum(String name, SDVariable x, int[] dimensions)

Sum array reduction operation, optionally along specified dimensions.

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

switchOp

INDArray[] switchOp(INDArray x, INDArray predicate)

SDVariable[] switchOp(SDVariable x, SDVariable predicate)
SDVariable[] switchOp(String name, SDVariable x, SDVariable predicate)

Switch operation

Predictate - if false, values are output to left (first) branch/output; if true, to right (second) branch/output

x (NDARRAY) - Input variable
predicate (BOOL) - Predictate - if false, values are output to left (first) branch/output; if true, to right (second) branch/output

tensorMmul

INDArray tensorMmul(INDArray x, INDArray y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
INDArray tensorMmul(INDArray x, INDArray y, int[] dimensionsX, int[] dimensionsY)

SDVariable tensorMmul(SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
SDVariable tensorMmul(SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY)
SDVariable tensorMmul(String name, SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
SDVariable tensorMmul(String name, SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY)

//TODO: Ops must be documented.

x (NUMERIC) - Input variable x
y (NUMERIC) - Input variable y
dimensionsX - dimensions for first input array (x) (Size: AtLeast(min=1))
dimensionsY - dimensions for second input array (y) (Size: AtLeast(min=1))
transposeX - Transpose x (first argument) - default = false
transposeY - Transpose y (second argument) - default = false
transposeZ - Transpose result array - default = false

tile

INDArray tile(INDArray x, INDArray repeat)

SDVariable tile(SDVariable x, SDVariable repeat)
SDVariable tile(String name, SDVariable x, SDVariable repeat)

Repeat (tile) the input tensor the specified number of times.

For example, if input is

[1, 2]

[3, 4]

and repeat is [2, 3]

then output is

[1, 2, 1, 2, 1, 2]

[3, 4, 3, 4, 3, 4]

[1, 2, 1, 2, 1, 2]

[3, 4, 3, 4, 3, 4]

x (NDARRAY) - Input variable
repeat (INT) - Number of times to repeat in each axis. Must have length equal to the rank of the input array

tile

INDArray tile(INDArray x, int[] repeat)

SDVariable tile(SDVariable x, int[] repeat)
SDVariable tile(String name, SDVariable x, int[] repeat)

see tile(String, SDVariable, int...)

x (NDARRAY) -
repeat - (Size: AtLeast(min=1))

transpose

INDArray transpose(INDArray x)

SDVariable transpose(SDVariable x)
SDVariable transpose(String name, SDVariable x)

Matrix transpose operation: If input has shape [a,b] output has shape [b,a]

x (NDARRAY) - Input variable

unsortedSegmentMax

INDArray unsortedSegmentMax(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentMax(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentMax(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment max operation. As per segmentMax(String, SDVariable, SDVariable) but without

the requirement for the indices to be sorted.

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [6, 9, 8] = [max(3,6), max(1,4,9), max(2,8)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unsortedSegmentMean

INDArray unsortedSegmentMean(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentMean(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentMean(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment mean operation. As per segmentMean(String, SDVariable, SDVariable) but without

the requirement for the indices to be sorted.

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [4.5, 4.666, 5] = [mean(3,6), mean(1,4,9), mean(2,8)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unsortedSegmentMin

INDArray unsortedSegmentMin(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentMin(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentMin(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment min operation. As per segmentMin(String, SDVariable, SDVariable) but without

the requirement for the indices to be sorted.

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [3, 1, 2] = [min(3,6), min(1,4,9), min(2,8)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unsortedSegmentProd

INDArray unsortedSegmentProd(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentProd(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentProd(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment product operation. As per segmentProd(String, SDVariable, SDVariable) but without

the requirement for the indices to be sorted.

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [4.5, 4.666, 5] = [mean(3,6), mean(1,4,9), mean(2,8)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unsortedSegmentSqrtN

INDArray unsortedSegmentSqrtN(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentSqrtN(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentSqrtN(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment sqrtN operation. Simply returns the sqrt of the count of the number of values in each segment

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [1.414, 1.732, 1.414] = [sqrt(2), sqrtN(3), sqrtN(2)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unsortedSegmentSum

INDArray unsortedSegmentSum(INDArray data, INDArray segmentIds, int numSegments)

SDVariable unsortedSegmentSum(SDVariable data, SDVariable segmentIds, int numSegments)
SDVariable unsortedSegmentSum(String name, SDVariable data, SDVariable segmentIds, int numSegments)

Unsorted segment sum operation. As per segmentSum(String, SDVariable, SDVariable) but without

the requirement for the indices to be sorted.

If data = [1, 3, 2, 6, 4, 9, 8]

segmentIds = [1, 0, 2, 0, 1, 1, 2]

then output = [9, 14, 10] = [sum(3,6), sum(1,4,9), sum(2,8)]

data (NUMERIC) - Data (variable) to perform unsorted segment max on
segmentIds (NUMERIC) - Variable for the segment IDs
numSegments - Number of segments

unstack

void unstack(INDArray value, int axis, int num)

void unstack(SDVariable value, int axis, int num)
void unstack(String name, SDVariable value, int axis, int num)

Unstack a variable of rank X into N rank X-1 variables by taking slices along the specified axis.

If input has shape [a,b,c] then output has shape:

axis = 0: [b,c]

axis = 1: [a,c]

axis = 2: [a,b]

value (NDARRAY) - Input variable to unstack
axis - Axis to unstack on
num - Number of output variables

variance

INDArray variance(INDArray x, boolean biasCorrected, boolean keepDims, int[] dimensions)
INDArray variance(INDArray x, boolean biasCorrected, int[] dimensions)

SDVariable variance(SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
SDVariable variance(SDVariable x, boolean biasCorrected, int[] dimensions)
SDVariable variance(String name, SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
SDVariable variance(String name, SDVariable x, boolean biasCorrected, int[] dimensions)

Variance array reduction operation, optionally along specified dimensions

Note that if keepDims = true, the output variable has the same rank as the input variable,

with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

the mean along a dimension).

Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

keepDims = true: [a,1,c]

keepDims = false: [a,c]

x (NUMERIC) - Input variable
biasCorrected - If true: divide by (N-1) (i.e., sample variable). If false: divide by N (population variance)
keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false
dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

zerosLike

INDArray zerosLike(INDArray input)

SDVariable zerosLike(SDVariable input)
SDVariable zerosLike(String name, SDVariable input)

Return a variable of all 0s, with the same shape as the input variable. Note that this is dynamic:

if the input shape changes in later execution, the returned variable's shape will also be updated

input (NUMERIC) - Input

Release Notes

New changes in each release of Eclipse Deeplearning4j.

Version 1.0.0-beta7

Read the announcement at https://blog.konduit.ai/2020/05/14/deeplearning4j-1-0-0-beta7-released/ for the highlights of this release.

Deeplearning4j

Features and Enhancements

Added Keras model import support for tf.keras models Link, Link
- Full inference and training support is available for ops/layers in the tf.keras namespace; inference only for general Tensorflow operations outside of the tf.keras namespace
- Note also improvements to Keras import for reshape, permute, etc operations due to NHWC and NWC support in DL4J
DL4J now supports NHWC (channels last) data format for all CNN 2D layers, in addition to NCHW Link
DL4J now supports NWC (channels last - [minibatch, sequence_length, size]) for all RNN and CNN 1D layers, in addition to NCW Link
Added Deconvolution3D layer Link
Keras import: added ReLU, ELU and Softmax advanced activation layers Link and Swish activation function Link
Added DL4J SameDiffLoss class (for easily-defined DL4J ILossFunction's via SameDiff) Link
Useful exceptions are now thrown when attempting to perform unsupported operations on FastText Link
Added MultiLayerNetwork.evaluate(MultiDataSetIterator) and .evaluateRegression(MultiDataSetIterator) methods Link, Link

Bug Fixes and Optimizations

Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff Link, Link
Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules) Link
- Note: this is a breaking change for some class packages/imports. See this link for details on exact package changes
Deeplearning4j UI: Webjars versions locked down using dependency management to avoid check on each build Link
Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Link
Refactored/merged modules dl4j-perf and dl4j-util into deeplearning4j-core Link
Fixed an issue with BertWordPieceTokenizer - potential StackOverflowError with certain inputs Link
Fixed an issue with GlobalPooling layer with masks of different datatype to the activations datatype Link
Fixed an issue with DL4JModelValidator for ComputationGraph Link
Fixed an issue where SameDiff layers in DL4J could throw an exception when used with transfer learning Link
Weight initialization for EmbeddingLayer and EmbeddingSequenceLayer now no longer depend on the vocabulary size (only the vector size) Link
Fixed an issue with Keras import with bidirectional layers + preprocessors Link
DL4J UI: added redirect from /train to /train/overview Link
Fixed an issue where RecordReaderDataSetIterator builder collectMetaData configuration was not being applied Link
Fixed an issue where MultiLayerNetwork evaluation was not passing metadata to the IEvaluation instances during evaluation Link, Link
Fixed an issue with Spark training SharedTrainingMaster when training with a ComputationGraph and MultiDataSets Link
Assorted fixes for edge cases for DL4J Keras import Link
deelpearning4j-nlp-korean will no longer be released for Scala 2.12 due to required dependency only having Scala 2.11 version avairable Link
Fix for ConvolutionalIterationListener for ComputationGraph Link
Fixed an issue where dataset and model zoo downloads could get stuck if the server fails to send any data (now: timeout + retry) Link
DL4J ModelSerializer no longer writes temporary files when restoring models from InputStream Link
Fixes issues with UIServer multi session mode, and potential shutdown race condition Link
Fixed an issue where TfidfVectorizer.vectorize() could throw a NPE when fit from LabelAwareIterator Link

ND4J/SameDiff:

Features and Enhancements

SameDiff multi-threaded inference enhanced (and fixed) - a single SameDiff instance can now be used for inference safely and efficiently from multiple threads Link Link
cuDNN support added to SameDiff (automatically enabled for nd4j-cuda-10.x backend) Link
Added ND4J namespaces: Nd4j.cnn, Nd4j.rnn, Nd4j.image Link
Added new Image operations namespace operations:
- rgbToHsv, hsvToRgb Link
- rgbToYiq, yiqToRgb, rgbToYuv, yuvToRgb Link
- imageResize Link
Added new Random operations namespace operations:
- gamma, poisson, shuffle Link
Added new Math namespace operations:
- clipByAvgNorm, embeddingLookup Link
- mergeMaxIndex Link
Added new NN namespace operations:
- cReLU Link
Added new CNN namespace operations:
- upsampling3d Link
Added new linalg operations namespace
- triangular_solve Link
- tri operation Link
- triu operation Link
Added new RNN operation namespace operations:
- lstmLayer (note old lstmLayer method renamed to lstmBlock) Link
- gru Link
Added new Loss operations namespace - Nd4j.loss Link
Mapped operations for Tensorflow import:
- HSVToRGB, RGBToHSV, Igamma, Igammac, RandomGamma, RandomPoisson, RandomPoissonV2, RandomShuffle Link
Added SameDiff ProfilingListener - writes op performance profiles in Chrome profiler format (load in chrome://tracing/) Link Link
Added SameDiff ProfileAnalyzer tool to compare profiles output from ProfilingListener (or Tensorflow) Link Link
SameDiff listener API: added frame and iteration information for listener methods Link Link
Added (non-backend-specific) method of accessing Nd4j environment: Nd4j.getEnvironment() method (environment info and low-level configuration options) Link Link
Improved memory limits/configuration support for libnd4j (c++) Link
Added pairwise (broadcastable) power backprop operation Link
Updated JavaCPP presets MKL version to 2020.0 from 2019.5 Link
Added DynamicCustomOp dargs - datatype arguments Link Link
- Output datatype configuration for Range op Link, SequenceOp Link, ConfusionMatrix Link
Added tensormmul_bp op Link
OpenBLAS version upgraded to 0.3.8 Link
libnd4j (c++ codebase underlying DL4J, ND4J and SameDiff) refactored to be more easily embeddable in other C++ projects Link
ImagePreProcessingScaler now supports preprocessing of labels (for segmentation) Link
Additional datatypes now supported for nd4j-tensorflow TensorflowConversion Link
SameDiff operation namespaces (sd.math, sd.image, etc) are now code generated to ensure SameDiff and ND4J namespaces are identical (all operations included, same API) Link
Added ND4J ArchiveUtils.unzipFileTo(String, String, boolean logFiles) overload to enable/disable extracted file path logging Link
Added weight format configuration for following operations: conv1D, conv2D, conv3D, deconv2d, deconv3d, depthwiseConv2d, pointwiseConv2d, sconv2d Link
Added backprop operation implementations for mergemax, mergeadd, mergeavg operations Link
MKL version upgraded to 2020.0 2020.1; OpenCV upgraded from 4.2.0 to 4.3.0 Link
SameDiff: DifferentialFunctionFactory class removed in favor of namespace methods (sd.math, sd.linalg, etc) Link
Added lstmLayer_bp operation Link
Added gru_bp operation Link
linspace operation can now use both targs and arrays for start/end/size arguments Link
Assorted dependency updates - OpenBLAS (0.3.9), OpenCV (4.3.0), Leptonica (1.79.0) Link
Upgraded assorted dependency versions: javax.activation:activation (1.1 -> 1.1.1), stream analytics (2.7.0->2.9.8), Apache Spark (2.4.3->2.4.5), Jackson databind (2.10.1 -> 2.10.3), Vertx (3.8.3 -> 3.9.0) Link
Added nd4j-common-tests ResourceUtils.listClassPathfiles method Link

Bug Fixes and Optimizations

Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff Link, Link
SameDiff - added CuDNN support Link
Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules) Link
- Note: this is a breaking change for some class packages/imports. See this link for details on exact package changes
Fixed some issues with Tensorflow import of FusedBatchNorm operation Link
Fixed an issue where the Roll operation did not match Tensorflow operation Link Link
Fixed an issue where ArchiveUtils could fail to create the top level destination directory when it does not exist Link
Fixed an issue where resize_bicubic operation did not match Tensorflow for some configuration values Link Link
Pad operation now supports long/int64 values for padding array Link Link
Fixed an issue where hashcode operation shape function wasn't always returning int64/long dtype Link
Fixed an issue with reshape operation on empty arrays with -1s Link Link
Improved performance on CUDA for concat operation Link and CPU/GPU Link
Improved performance for bias_add operation
- On CPU for NHWC case Link
- Generally Link
- On CUDA for 2D case Link
Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Link
Fixed a small SameDiff execution issue for switch operation where the predicate is a constant Link
Fixed an issue with batchnorm operation when input arrays have unusual strides Link
Merged nd4j-buffer, nd4j-content modules into nd4j-api Link
Deleted deprecated nd4j-jackson module (remaining functionality available in nd4j-api) Link
Deleted unused/unmaintained nd4j-camel and nd4j-gson modules Link
Optimization for legacy random ops Link
Optimization for broadcast operations Link, Link, Link, Link, Link
Performance optimization for multiple operations: softmax, squeeze, expand_dims, tanh Link
Optimization for transpose/permute operations Link
Performance enhancement: MKLDNN matmul used for some mmul operation cases Link
Optimization for gather operation on CPU Link
Optimization for stack/unstack operations on CPU Link
Optimization for split operation (CPU and CUDA) Link Link
ND4J initialization no longer logs number of OpenMP BLAS threads for CUDA Link
Optimization: Fixed issues with auto-vectorization on multple CPU operations Link
Optimization for reshape operation Link, Link
Fixed an issue where INDArray.hashCode() could cause an exception on some datatypes Link
Optimization for CPU: MKLDNN is now used for softmax, tanh, softmax_bp and tanh_bp operations Link, Link, Link, Link
Fixed random_exponential operation Link
Improved performance on C++ SameDiff graph execution via reduced array zeroing where safe to do so Link
Improved C++ indexing implementation impacting CPU performance on some operations Link
Fixed an issue where Split operation could have incorrect output shapes for empty arrays Link
Fixed some issues with SameDiff.equals method Link
Fixed an issue with reshape operation output shape on empty arrays Link, Link
Nd4j.gemm now uses Mmul operation internally to avoid potential threading issues with direct BLAS calls on CUDA Link
Fixed an edge case issue with percentile operation link
Fixed an edge case issue for cusolved (CUDA) in libnd4j Link
Fixed an issue with error formatting for segment operations for incorrect lengths Link
Fixed an issue where ND4J workspaces were not guaranteed to be unique Link
Fixed some operation implementations when operating on views (Batch/Space to Space/Batch/Depth; batchnorm_bp) Link
Fixed an issue where exponential distribution random number generation operation could produce infinities extremely rarely (~1 in 10^9 values) Link
Fixed an issue with long file paths for memory mapped workspaces on Windows Link
Memory for memory mapped workspaces are now deallocated immediately when workspace is destroyed, instead of waiting for GC to free memory Link
Fall-back to other BLAS implementation for cases where MKLDNN GEMM implementation is slow Link
Set nd4j-native source/target to Java 7 Link, Link

DataVec

Features and Enhancements

datavec-python: added zero-copy support for bytes/byte buffers Link
datavec-python: Python exceptions are now thrown as Java exceptions Link
datavec-python: Added support for additional NumPy datatypes Link
datavec-python: Python version upgraded from 3.7.6 to 3.7.7 Link

Bug Fixes and Optimizations

Deleted not properly maintained modules: datavec-camel, datavec-perf Link
Fixed missing BOOL datatype support for arrow conversion functionality Link
Assorted fixes for datavec-python Link Link, Link
Fixed an issue with LineRecordReader where initialization was performed unnecessarily (adding performance overhead) Link

RL4J

Features and Enhancements

Refactoring to decouple configuration and learning methods from their implementations Link
Added builder patterns for all configuration classes Link

Arbiter

Bug Fixes and Optimizations

Fixes an issue with GridSearchCandidateGenerator not working correctly for some cases Link, Link

Version 1.0.0-beta6

Highlights - 1.0.0-beta6 Release

Added support for CUDA 10.2. 1.0.0-beta6 released with CUDA 9.2, 10.0, 10.1 and 10.2 support
SameDiff optimizations - memory use for inference and training significantly reduced, with some performance improvements also
Deeplearning4j UI - Play framework replaced with Vertx; deeplearning4j-ui dependency now no longer has Scala dependency or Scala version suffix Link
- Note: No API changes, only artifact ID change: replace deeplearning4j-ui_2.1x with deeplearning4j-ui
ND4j namespace operation methods: operations are available through the Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network), for example Nd4j.math.abs(INDArray), Nd4j.random.logNormal etc Link.
- Note that additional ND4J namespaces API will have additions (new namespaces and methods), and may have some API changes, in the next release
OpenMP replaced with thread pool c++ parallelism framework; enabled c++ parallelism for platforms without C++-level threading for operations

Deeplearning4J

Deeplearning4J: Features and Enhancements

DNNL (MKL-DNN) upgraded to version 1.1
Added causal convolution mode for Convolution1D layer (ConvolutionMode.Causal) and added causal conv1d support for Keras import Link
Keras import now supports scaled identity weight initialization Link
Added Mish activation function Link, Link
BertIterator now has a BertIterator.featurizeSentences(List<String>) method for inference Link, Link
BertIterator now supports sentence pairs for supervised training Link
Added Sparse multi-class cross entropy for both Deeplearning4j and Keras import Link, Link
Deeplearning4j UI: migrated from Play to Vertx for web serving backend, also removing dependency on Scala libraries; no API changes, only artifact ID change - replace deeplearning4j-ui_2.1x with deeplearning4j-ui Link, Link
Added TimeDistributed wrapper layer Link

Deeplearning4J: Bug Fixes and Optimizations

KDTree implementation optimized Link
Deeplearning4j zoo models and datasets hosting location updated Link
Fixed nIn validation for Deconv2D layer Link
Fixed an issue with incorrect Deconvolution2d results for Keras import models Link
Added DNNL/MKLDNN support for batch normalization layer Link, Link
Fixed various integer casts to avoid overflows for very large arrays (with dimensions or length > Integer.MAX_VALUE) Link
Fixed an issue with UNet non-pretrained model architecture (last layer kernel size) Link
Deeplearning4j SameDiff layers now use DL4J workspaces for better performance and reduced memory consumption Link
Updated broken links in afew error messages Link
Cleaned up a few unused dependencies in various modules Link
Cleaned up duplicate SamplingDataSetIterator class Link
Fixed an issue where ComputationGraph instances with a single input going into multiple embedding layers could throw a NPE Link
Fixed an issue where loss function weights were not automatically cast to network datatype, resulting in an exception if not already correct type Link
Shaded Jackson version upgraded from 2.9.9/2.9.9.3 to 2.10.1 Link
Fixed an issue with KNN where getMostPopulatedClusters actually returned the least populated clusters Link

Deeplearning4j: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

Deeplearning4j UI artifact ID has changed: deeplearning4j-ui_2.1x (beta5 and earlier) with deeplearning4j-ui

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

Added suport for CUDA 10.2 Link
DNNL (MKL-DNN) upgraded to version 1.1 Link
Added ND4j namespaces to match SameDiff: Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network) Link
Added SameDiff.calculateGradientsAndOutputs method Link Link
Additional SameDiff single batch .output method overloads for DataSet/MultiDataSet added Link
TensorFlow import ops coverage enhanced (significant number of additional ops supported) Link, Link, Link, Link, Link
PRelu op added Link
adjust_contrast, igamma and igammac ops added Link
ND4J/SameDiff: BitCast, CompareAndBitpack, DivideNoNan, DrawBoundingBoxes, FakeQuantWithMinMaxVarsPerChannel ops added Link
non_max_suppression_overlaps op added Link
ImagePreProcessingScaler now supports segmentation use cases Link
concat operation now supports the concatenation axis being specified via the last input array Link
Added Gamma and Poisson RNG distributions Link
SameDiff’s use of DeviceLocal for variables/constants etc is now configurable Link
Uniform distribution op now supports random integer generation, not just random floating point generation Link
SameDiff: Added simple OpBenchmarkListener for benchmarking purposes Link
Added the ability to disable platform helpers (DNNL/MKLDNN etc) via Nd4jCPU.Environment.getInstance().allowHelpers(false); and Nd4jCuda.Environment.getInstance().allowHelpers(false); Link
Added draw_bounding_boxes operation Link
Added resize_bicubic operation Link
Added causal padding mode to conv1d operation Link
DNNL (MKLDNN) is included and enabled by default for non-AVX builds Link
Added SameDiff ArraySavingListener for debugging purposes Link

ND4J/SameDiff: Bug Fixes and Optimizations

OpenMP replaced with ThreadPool abstraction, enables parallelism for platforms without OpenMP support Link
SameDiff memory management overheauled for (in some cases significantlny) reduced memory consumption and improved performance Link, Link
Switched to Clang instead of gcc for OSX compilation to avoid compiler-related issues Link
Removed SameDiff.outputs() “best guess” output inference due to being unreliable, in favor of explicit SameDiff.setOutputs(String...) call Link
Fixed an issue with Nd4j.hstack on 1D arrays Link
SameDiff no longer allows empty arrays for variables Link
Fixed an issue with Nadam updater LR schedules not being cloned Link
Cleaned up IActivation interface Link
Added new LSTM op implementation with DNNL/MKLDNN support (forward pass only so far) Link
SameDiff API cleaned up; deprecated methods removed Link
Switched SameDiff variable initialization to non-lazy, to avoid unexpected behaviour when mixing execution and ND4J RNG seed setting Link
SameDiff.zero and .one methods now create constants, not vairables Link
Moved CUDA build version and device logging to Java logging, from c++ stdout to enable disabling logging (via ND4J config or slf4j config) Link
Added DNNL/MKLDNN support for batch normalization Link
SameDiff: Fixed an issue where listeners weren’t being called for gradient calculation Link
Added DNNL/MKLDNN support for deconv2d/3d operations Link
Fixed an issue with biasadd_bp operation and NHWC data format Link
Fixed an issue with certain strided slice backprop configurations Link, Link
Fixed an issue with LogSumExp reduction operation backprop for along dimension case Link, Link
INDArray.toString() now has correct brackets for rank 1+ scalars to avoid ambiguity Link
Fixed an issue where some ND4J methods could fail when the library is compiled on Java 9+ but run on Java 8 Link
Fixed empty array input case for is_strictly_increasing, non_decreasing and non_max_suppression ops Link, Link
Fixed empty input arrays for legacy ops (transform, scalar, pairwise, broadcast) Link
CUDA compute capability 3.0 is supported again Link
Improved performance for Scatter operations (1D case) + index validation Link
Fixed an issue where SameDiff TrainingConfig serialization would fail if evaluation instances are set Link, Link
SameDiff execution will now throw an exception when assertion operations in the graph fail Link
PolyGamma function now returns NaNs when passed double for args requiring integer values Link
Fixed some issues for pad and mirror_pad ops to ensure they conform with Tensorflow for imported networks Link
Updated and fixed some issues for TensorFlow graph runner Link
Improved performance for Reverse operation Link
Removed/cleanup up unused ND4J list functionality Link
Fixed reduce bool operation results (such as any, all, IsInf, etc) for empty array inputs Link

ND4J: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

SameDiff.outputs() now requires user to call SameDiff.setOutputs(String...) first; previous “best guess” output inference was unreliable Link
SameDiff.zero and .one methods now create constants, not vairables Link

DataVec

DataVec: Bug Fixes and Optimizations

NativeImageLoader now checks for empty input streams and throws an exception instead of crashing Link
NDArrayScalarOpTransform now supports modulus operator Link

RL4J

RL4J: Features and Enhancements

Added AsyncTrainingListener Link
Replaced multiple uses of java.util.Random with ND4J Random Link
Added Observable and LegacyMDPWrapper Link

RL4J: Bug Fixes and Optimizations

Refactored RL4J video recording to separate VideoRecorder class Link
Fixed an issue with target for DQN Link, Link
Refactoring for DQN and double DQN for improved maintainability Link
Internal refactoring and various bug fixes Link

PyDataVec

PyDataVec Features and Enhancements

PyDataVec TransformProcess now supports non-inplace operations Link

PyDataVec Bug Fixes and Optimizations

Fixed various issues with PyDataVec Link
Fixed an issue with data locality that could cause incorrect results under some circumstances when running on CUDA Link

Version 1.0.0-beta5

Highlights - 1.0.0-beta5 Release

Added model server - remote inference of SameDiff and DL4J models using JSON or (optionally) binary serialization
- Server: See JsonModelServer
- Client: See JsonRemoteInference
- Tests/examples: See Link and Link
Added Scala 2.12 support, dropped Scala 2.10 support. Modules with Scala dependencies are now released with Scala 2.11 and 2.12 versions
Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading: 1.0.0-beta4_spark2 -> 1.0.0-beta5
Added FastText support to deeplearning4j-nlp
CUDA support for all ND4J/SameDiff Operations
- In 1.0.0-beta4, some operations were CPU only. Now, all operations have full CUDA support
Added support for new data types in ND4J (and DL4J/SameDiff): BFLOAT16, UINT16, UINT32, UINT64
ND4J: Implicit broadcasting support added to INDArray (already present in SameDiff - for example shape [3,1]+[3,2]=[3,2])
CUDA 9.2, 10.0 and 10.1-Update2 still supported
- NOTE: For CUDA 10.1, CUDA 10.1 update 2 is recommended. CUDA 10.1 and 10.1 Update 1 will still run, but rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems
Dependency upgrades: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, and shaded to avoid dependency clashes)
CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

Deeplearning4J

Deeplearning4J: Features and Enhancements

Added FastText - inference and training, including OOV (out of vocabulary) support (Link)
Scala 2.12 support added, Scala 2.10 support dropped (Link)
Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link
Added saved model format validation utilities - DL4JModelValidator, DL4JKerasModelValidator (Link)
Added LabelLastTimeStepPreProcessor (Link)
BertIterator: added option to prepend token to the output (such as [cls] expected by some models) (Link)
Added trace level logging to MultiLayerNetwork and ComputationGraph assist with debugging certain issues (Link)
Upsampling3D: Added NDHWC support (Link)
MergeVertex now supports broadcasting (Link)
LSTM and Dropout will now fall back on built-in implementations if an exception is encountered from cuDNN (same as Subsampling/ConvolutionLayer) (Link)
Improved JavaDoc and cleanup up API for WordVectorSerializer (Link, Link)

Deeplearning4J: Bug Fixes and Optimizations

Updated deeplearning4j-ui theme (Link)
Fixed an issue with MergeVertex and CNN3D activations (Link)
Fixed typo in Yolo2OutputLayer builder/configuration method name (Link)
Improved ComputationGraph builder InputType validation (Link)
Removed dl4j-spark-ml module until it can be properly maintained (Link)
Fixed an issue with BertWordPieceTokenizerFactory and bad character encoding (Link)
Fixed an issue with LearnedSelfAttentionLayer and variable minibatch size (Link, Link)
Fixed issue with SharedTrainingMaster controller address when set from environment variable (Link)
Fixed issue with SameDiffOutputLayer initialization under some circumstances (Link)
https is now used by default for data and zoo model downloads (Link, Link)
Fixed an issue where UI WebJars dependencies would check for updates on every single build (Link, Link)
Fixed issue where Upsampling layer memory report could produce an OOM exception (Link)
Improved UX/validation for RecordReaderDataSetIterator (Link)
Fixed an issue where EmbeddingSequenceLayer would not check mask array datatype (Link)
Improved validation when initializing networks with a non rank-2 (shape [1, numParams]) array (Link)
Fixed a DataType issue for BertIterator (Link)
Fixed Word2Vec model backward compatibilty (beta3 and earlier models now loadable again) Link
Fixed issue where some Keras import models could fail with Could not read abnormally long HDF5 attribute (Link)
Added validation for RnnOutputLayer - feature/label array lengths (Link)
Fixed an issue where SameDiffOutputLayer would not support variable minibatch size (Link)
Fixed DL4J SameDiff layer mask support (Link)
DL4J UI: Fixed an issue where tab switching did not work when visualizing saved/stored data (Link, Link)
DL4J UI: Fixed a rare UI threading issue (Link)
Fixed a Keras import issue with JSON format change (Link)
Fixed a Keras import issue where updater learning rate schedule could be imported incorrectly (Link)
Fixed an issue with CnnSentenceDataSetIterator when using UnknownWordHandling.UseUnknownVector (Link, Link)
Fixes and optimizations to DL4J SameDiff layers (Link)
MultiLayerNetwork/ComputationGraph will now log the original exception if a second exception occurs during workspace closing, instead of swallowing it (inference/fit operation try/finally blocks) (Link)
Upgraded dependencies: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, shaded to avoid dependency clashes) (Link)
Logging framework can now be configured for DL4J UI (due to Play framework dependency upgrade) (Link)
Reduced amount of garbage produced by MnistDataFetcher (impacts MNIST and EMNIST DataSetIterators) (Link)
Activation function backpropagation has been optimized for many activation functions (Link, Link)

Deeplearning4j: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J, use org.nd4j.linalg.dataset.Async(Multi)DataSetIterator instead
Saved models with custom layers from 1.0.0-alpha and before can no longer be loaded. Workaround: load in 1.0.0-beta4, and re-save the model (Link). Models without custom layers can still be loaded back to 0.5.0
Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading, change versions as follows: 1.0.0-beta4_spark2 -> 1.0.0-beta5
Scala 2.10 dropped, Scala 2.12 added (for modules with Scala dependencies)

Deeplearning4j: 1.0.0-beta5 Known Issues

dl4j-spark_2.11 and _2.12 dependencies incorrectly pull in datavec-spark_2.11/2.12 version 1.0.0-SNAPSHOT. Workaround: control version using dependency management as per here or here
Some layers (such as LSTM) may run slower on 1.0.0-beta5 than 1.0.0-beta4 on CUDA when not using cuDNN, due to added synchronization. This synchronization will be removed in the next release after 1.0.0-beta5
CUDA 10.1: Rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems, when running CUDA 10.1 Update 1 (and maybe 10.1). CUDA 10.1 update 2 is recommended.

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

Added new data types: BFLOAT16, UINT16, UINT32, UINT64 (Link)
CUDA support for all operations without CUDA implementations (Link, Link, Link, Link, Link)
Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link
Added support for empty arrays with zeros in shape, for compatibility with TensorFlow import (Link)
CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer
Improved SameDiff training API - added "in line" test set evaluation, returning History object with loss curve, etc (Link)
Added saved model format validation utilities - Nd4jValidator, Nd4jCommonValidator (Link)
Added SameDiff ScoreListener (equivalent to DL4J ScoreIterationListener/PerformanceListener) (Link, Link)
Added SameDiff.convertDataTypes method, for variable dtype conversion (Link)
Added crop and resize op (Link)
DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J Link
Added basic/MVP SameDiff UI listener (Link)
Added SameDiff CheckpointListener (Link, Link)
Added SameDiff name scopes (Link)
SameDiff: Updater state and training configuration is now written to FlatBuffers format (Link)
Added c++ benchmark suite callable from Java - call using Nd4j.getExecutioner().runLightBenchmarkSuit() and Nd4j.getExecutioner().runFullBenchmarkSuit() (Link)
Added SameDiff.save/load methods with InputStream/OutputStream arguments (Link, Link)
Added axis configuraiton for evaluation instances (Evaluation, RegressionEvaluation, ROC, etc - getAxis and setAxis methods) to allow different data formats (NCHW vs. NHWC for CNNs, for example) (Link)
SameDiff: Added support to convert constants to placeholders, via SDVariable.convertToConstant() method (Link)
SameDiff: Added GradCheckUtil.checkActivationGradients method to check activation gradients for SameDiff instance (not just parameter gradients as in existing gradient check methods) (Link)
Added CheckNumerics op (Link)
Added FakeQuantWithMinMaxArgs and FakeQuantWithMinMaxVars ops (Link)
Added INDArray reduction methods with "keep dimensions" option - for example, INDArray.mean(boloean, int... dimension) (Link)
Added Nd4j SystemInfo class - SystemInfo.getSystemInfo, .writeSystemInfo(File) to aid with debugging issues (Link, Link)
Added INDArray.toString(NDArrayStrings options), toStringFull() and toString overloads for easier control of array printing (Link)
Added HashCode op, INDArray.hashCode() (Link)
SameDiff: added whileLoop, ifCond methods for loops/conditional ops (Link)
Cleaned up some infrequently used Nd4j methods (Link, Link, Link, Link)
Added bitwise integer operations: left/right bit shift, left/right cyclical bit shift, bitwise Hamming distance (Link, Link, Link, Link, Link)
deeplearning4j-nlp: renamed AggregatingSentencePreProcessor to sentencePreProcessor method (Link)
Upgraded (and shaded) Protobuf version - 3.5.1 to 3.8.0 (Link)
Switched to c=style error handling for libnd4j native operations (Link)
Renamed FlatBuffers enum org.nd4j.graph.DataType to org.nd4j.graph.DType to avoid users importing incorrect type when using Nd4j methods (Link, Link)
Added SameDiff.bitwise namespace for bitwise ops (Link, Link)

ND4J/SameDiff: Bug Fixes and Optimizations

Updated to JavaCPP/JavaCV 1.5.1-1 (Link)
SameDiff: Placeholders must now only be provided if required to calculate the requested variables (Link)
SameDiff: Fixed an issue with duplicate variable name validation (Link)
SameDiff: Fixed an issue with SDVariable.getArr for scalars (Link)
Added delayed mode to DeviceLocalNDArray (don't replicate to device until needed) (Link)
ND4J: Fixed an issue with writing 0d (scalar) NDArrays in numpy .npy format (Link)
Fixed an issue with Pad operation for some constant cases (Link)
Fixed some issues with strided_slice operation (Link, Link, Link)
SameDiff: Fixed issue with DataType inference for some ops using ND4J default datatype (Link)
INDArray.castTo(DataType) is now a no-op when array is already the correct type (Link)
SameDiff: Fixed an issue with training mixed precision networks (Link)
Fixed an issue where Evaluation class was incorrectly reporting macro-averaged precision for binary case (Link)
Removed trainableParams config/field from SameDiff TrainingConfig (no longer required) (Link)
Improvements and cleanup to ND4J Javadoc (Link, Link, Link, Link)
Fixed an issue with Cholesky Lapack op on CUDA (Link, Link)
Fixed an issue where [1,N] and [N,1] arrays were not considered a matrix (rank 2 array) according to INDArray.isMatrix() (Link)
Fixed RegressionEvaluation for 4D arrays (CNNs / segmentation) (Link, Link)
Fixed issue with INDArray.median(int... dimension) (Link)
Fixed NPE that could occur when executing gather operation backprop (Link)
Fixed issue with LogSumExp operation Java/C++ mapping (Link)
Added header validation when reading Numpy .npy files, to ensure file is valid (Link)
Fixed a possible issue with reading Numpy .npy files on CUDA (Link)
Fixed an issue when reading Numpy .npy boolean files (Link)
Various fixes for TensorFlow import (Link)
Fixed an issue with a small number of Nd4j.create methods not creating arrays corresponding to the java primitive (Link)
Improved shape validation for some Nd4j.create methods (Link)
Cleaned up unmaintained Nd4j.createSparse methods (Link)
Fixed a CUDA issue for CUDA GPUs with CC 3.0 (Link)
Fixed some possible integer overflows in c++ code (Link)
Removed deprecated methods: Nd4j.trueScalar and Nd4j.trueVector (Link, Link)
Fixed an issue where some JVMs could warn about "Illegal reflective access" due to a (now removed) SameDiff dependency (Link)
SDVariable now no longer extends DifferentialFunction (Link)
Moved numerous operation calculateOutputShape instances from Java to C++ (Link)
Fixed an issue where maxpool2d_bp could throw an exception when NaN values are present (Link)
Fixed an issue with concatenation of empty shapes (with zeros) (Link)
Removed INDArray.javaTensorAlongDimension (Link)
LayerNorm operation now properly supports axis arg, NCHW format data (Link)
libnd4j: cuBLAS hgemm (FP16 gemm) wil only be called for devices with compute capability >= 5.3 due to cuBLAS limitations (Link)
Nd4j.readNumpy optimized (Link)
Added configurable alpha parameter to ELU and lrelu_bp operations in c++ (Link)
Cleaned up SameDiff SDCNN/SDRNN (SameDiff.cnn, .rnn) API/methods (Link, Link)

ND4J: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

OldAddOp, OldSubOp, etc removed: Replace with AddOp, SubOp, etc
Nd4j.trueScalar and trueVector removed; use Nd4j.scalar and Nd4j.createFromArray methods
INDArray.javaTensorAlongDimension removed; use INDArray.tensorAlongDimension instead
INDArray.lengthLong() removed; use INDArray.length() instead

ND4J: 1.0.0-beta5 Known Issues

nd4j-native on some OSX systems can fail with Symbol not found: ___emutls_get_address - See this link
SBT 1.3.0 can fail with an Illegal character in path error; SBT 1.2.8 is OK. This is an SBT issue, not an ND4J issue. See this link for details

DataVec

DataVec: Features and Enhancements

ImageRecordReader: Support for 16-bit TIFF added (Link)
Added SequenceTrimToLengthTransform (Link)

DataVec: Bug Fixes and Optimizations

Fixed an issue with AnalyzeSpark and String columns (Link)
Fixed an issue with URL scheme detection in NumberedFileInputScheme (Link)
Fixed an issue with RandomPathFilter sampling being biased (Link, Link)

RL4J

RL4J: Features and Enhancements

API cleanup and refactoring (Link, Link, Link, Link)

RL4J: Bug Fixes and Optimizations

Fixed issue with compression for HistoryProcessor (Link)

Arbiter

Bug Fixes and Optimizations

Updated EvaluationScoreFunction to use ND4J Evaluation class metrics (Link)
Fixed incorrect search size in GridSearchCandidateGenerator (Link)

Arbiter: Known Issues

The Jackson version upgrade necessitated a change to how generic object serialization was performed; Arbiter JSON data stored in 1.0.0-beta4 or earlier format may not be readable in 1.0.0-beta5 (Link)

ND4S

ND4S Features and Enhancements

Added full data type support to ND4S as per ND4J (Link)
Added syntactic sugar for SameDiff (implicits, operator overloads) (Link)

Version 1.0.0-beta4

Highlights - 1.0.0-beta4 Release

Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following datatypes are supported:

DOUBLE: double precision floating point, 64-bit (8 byte)
FLOAT: single precision floating point, 32-bit (4 byte)
HALF: half precision floating point, 16-bit (2 byte), "FP16"
LONG: long signed integer, 64 bit (8 byte)
INT: signed integer, 32 bit (4 byte)
SHORT: signed short integer, 16 bit (2 byte)
UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255
BYTE: signed byte, 8 bit (1 byte), -128 to 127
BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization
UTF8: String array type, UTF8 format

ND4J Behaviour changes of note:

When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)
- For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray
- Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively
Some operations require matched datatypes for operands
- For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))
Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.

DL4J Behaviour changes of note:

MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.
- The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)
- Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType) method

Main new methods:

Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument
INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another
New Nd4j.createFromArray(...) methods for

ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

ND4J: Mac/OSX CUDA support dropped

Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)
SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)
BatchNormalization layer (and BatchNorm ND4J op)
LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)
Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)

MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

ND4J: Improved Performance due to Memory Management Changes

Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

Nd4j.getMemoryManager().togglePeriodicGc(true);
Nd4j.getMemoryManager().setAutoGcWindow(5000);

ND4J: Improved Rank 0/1 Array Support

In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

DL4J: Attention layers added

Deeplearning4J

Deeplearning4J: Features and Enhancements

Added MKL-DNN support for Conv/Pool/BatchNorm/LRN layers. MKL-DNN will be used automatically when using nd4j-native backend. (Link, Link)
L1/L2 regularization now made into a class; weight decay added, with better control as to when/how it is applied. See this page for more details on the difference between L2 and weight decay. In general, weight decay should be preferred to L2 regularization. (Link, Link)
Added dot product attention layers: AttentionVertex, LearnedSelfAttentionLayer, RecurrentAttentionLayer and SelfAttentionLayer
The parameter/activation datatypes for new models can be set for new networks using the dataType(DataType) method on NeuralNetConfiguration.Builder (Link)
MultiLayerNetwork/ComputationGraph can be converted between (floating point) datatypes FP16/32/64 for the parameters and activations using the MultiLayerNetwork/ComputationGraph.convertDataType(DataType) methods (Link, Link)
EmbeddingLayer and EmbeddingSequenceLayer builders now have .weightInit(INDArray) and .weightInit(Word2Vec) methods for initializing parameters from pretrained word vectors (Link)
PerformanceListener can now be configured to report garbage collection information (number/duration) Link
Evaluation class will now check for NaNs in the predicted output and throw an exception instead treating argMax(NaNs) as having value 0 (Link)
Added ModelAdapter for ParallelInference for convenience and for use cases such as YOLO (allows improved performance by avoiding detached (out-of-workspace) arrays) (Link)
Added GELU Activation function (Link)
Added BertIterator (a MultiDataSetIterator for BERT training - supervised and unsupervised) Link
Added validation to MultiLayerNetwork/ComputationGraph that throws an exception when attempting to perform Regression evaluation on a classifier, or vice-versa (Link, Link)
Added ComputationGraph.output(List<String> layers, boolean train, INDArray[] features, INDArray[] featureMasks) method to get the activations for a specific set of layers/vertices only (without redundant calculations) (Link)
Weight initialization for networks is now implemented as classes (not just enumerations) and hence is now extesible via IWeightInit interface (Link); i.e., custom weight initializations are now supported (Link, Link)
Added Capsule Network layers (no GPU acceleration until next release) - CapsuleLayer, CapsuleStrengthLayer and PrimaryCapsules (Link)
Added Cifar10DataSetIterator to replace CifarDataSetIterator (Link, Link)
Keras import: Importing models from InputStream is now supported (Link, Link)
Layer/NeuralNetConfiguration builders now have getter/setter methods also, for better Kotlin support (Link)
Most JavaScript dependencies and fonts for UI have been migrated to WebJars (Link)
CheckpointListener now has static availableCheckpoints(File), loadCheckpointMLN(File, int) and lostLastCheckpointMLN(File) etc methods (Link)
MultiLayerNetwork/ComputationGraph now validate and throw an exception in certain incompatible RNN configurations, like truncated backpropagation through time combined with LastTimeStepLayer/Vertex (Link)
Added BERT WordPiece tokenizers (Link)
Deeplearning4j UI now has multi-user/multi-session support - use UIServer.getInstance(boolean multiSession, Function<String,StatsStorage>) to start UI in multi-session mode (Link)
Layer/NeuralNetworkConfiguration builder method validation standardized and improved (Link)
WordVectorSerializer now supports reading and exporting text forwat vectors via WordVectorSerializer.writeLookupTable and readLookupTable (Link]
Updated to JavaCPP, JavaCPP presets, and JavaCV version 1.5 (Link)
Added EvaluationBinary false alarm rate calculation (Link)
ComputationGraph GraphBuilder now has an appendLayer method that can be used to add layers connected to the last added layer/vertex (Link)
Added Wasserstein loss function (Link)
Keras import: Improved errors/exceptions for lambda layer import (Link)
Apache Lucene/Solr upgraded from 7.5.0 to 7.7.1 (Link)
KMeans clustering strategy is now configurable (Link)

Deeplearning4J: Bug Fixes and Optimizations

DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly (Link)
cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown (Link)
Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training (Link)
Fixed a bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout (Link, Link)
Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs (Link)
Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores (Link)
Fixed an edge case with ParallelInference on CUDA where (very rarely) input array operations (such as normalization) may not be fully completed before transferring an array between threads (Link, Link)
Fixed an edge case with KFoldIterator when the total number of examples is not a multiple of the batch size (Link, Link)
Fixed an issue where DL4J UI could throw a NoClassDefFoundError on Java 9/10/11 (Link, Link)
Keras import: added aliases for weight initialization (Link)
Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned (Link)
Fixed workspace issue with ElementwiseVertex with single input (Link)
Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception (Link)
Fixed issue where LossMultiLabel would generate NaNs when all labels in minibatch are the same class. Now 0 gradient is returned instead. (Link, Link)
Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format (Link)
Fixed issue where BaseDatasetIterator.next() would not apply preprocessors, if one was set (Link)
Improved default configuration for CenterLossOutputLayer (Link)
Fixed an issue for UNet non-pretrained configuration (Link)
Fixed an issue where Word2Vec VocabConstructor could deadlock under some circumstances (Link)
SkipGram and CBOW (used in Word2Vec) were made native operations for better performance (Link)
Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener (Link)
Optimization: Workspaces were added to SequenceVectors and Word2Vec (Link)
Improved validation for RecordReaderDataSetIterator (Link)
Improved handling of unknown words in WordVectors implementation (Link)
Yolo2OutputLayer: Added validation for incorrect labels shape. (Link)
LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) (Link)
Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases (Link)
Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() (Link)
Async iterators are now used in EarlyStoppingTrained to improve data loading performance (Link)
EmbeddingLayer and EmbeddingSequenceLayer performance has been improved on CUDA (Link)
Removed outdated/legacy scala tools repository (Link, Link)
Fixed issues in L2NormalizeVertex equals/hashcode methods (Link)
Fixed Workspace issue in ConvolutionalListener (Link)
Fixed EvaluationBinary falsePositiveRate calculation (Link)
Added validation and useful exception for MultiLayerNetwork.output(DataSetIterator) methods (Link)
Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called (Link)
Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training (Link)
Improved performance for KMeans implementation (Link)
Fixed an issue with rnnGetPreviousState for RNNs in 'wrapper' layers such as FrozenLayer (Link)
Keras import: Fixed an issue with order of words when importing some Keras tokenizers (Link)
Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class (Link)
Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers (Link)
Keras import: fixed an import issue with input type inference for some RNN models (Link)
Fixed some padding issues in LocallyConnected1D/2D layers (Link)

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays (Link)
Added INDArray.close() method to allow users to manually release off-heap memory immediately (Link)
SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff (Link)
Added Nd4j.createFromNpzFile method to load Numpy npz files (Link)
Added support for importing BERT models into SameDiff (Link, Link)
Added SameDiff GraphTransformUtil for performing transfer learning and other graph modifications (Link, Link, Link)
Evaluation, RegressionEvaluation etc now support 4d (CNN segmentation) data formats; also added Evaluation.setAxis(int) method to support other data formats such as channels-last/NHWC for CNNs and NWC for CNN1D/RNNs. Defaults to axis 1 (which matches DL4J CNN and RNN data formats) (Link, Link)
Added basic ("technology preview") of SameDiff UI. Should be considered early WIP with breaking API changes expected in future releases. Supports plotting of SameDiff graphs as well as various metrics (line charts, histograms, etc)
- Currenty embedding in the DL4J UI - call UIServer.getInstance() then go to localhost:9000/samediff to access.
- For more details, see 1, 2, 3
Added DotProductAttention and MultiHeadDotProductAttention operations (Link)
Added Nd4j.exec(Op) and Nd4j.exec(CustomOp) convenience methods (Link)
ND4J/SameDiff - new operations added:
SameDiff TensorFlow Import
- Import of TF Assertions added (Link)
- Support/fixes for control dependencies (Link)
- Support/fixes for TensorArray and related ops (Link, Link, Link)
nd4j-common - tar/tar.gz support added; Zip file listing and single file extraction added (Link, Link)
SameDiff: reductions operations now support "dynamic" (non-constant) inputs for axis argument (Link)
ROCBinary now has .getROC(int outputNum) method (Link)
SameDiff: L1/L2 regularization added (Link, Link)
SameDiff: Added SDVariable.convertToVariable() and convertToConstant() - to change SDVariable type (Link)
Added checks and useful exceptions for reductions on empty arrays (Link)
SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)
SameDiff TensorFlow import: import can now be overridden for cases such as user-defined functions (Link, Link)
Libnd4j (c++) benchmarking framework added (Link)
Added OpExecutioner.inspectArray(INDArray) method to get summary statistics for analysis/debugging purposes (Link)
Added INDArray.reshape(char order, boolean enforceView, long... newShape) to reshape array whilst throwing an exception (instead of returning a copy) if the reshape cannot be performed (Link, Link)
Added SDVariable method overloads (plus, minus, times, etc) for Kotlin (Link)
Added SDVariable convenience methods for dot, reshape, permute (Link)
Added SameDiff SDIndex.point(long, boolean keepDim) method (to keep point indices in output array as size 1 axis) (Link)
Added SameDiff ProtoBufToFlatBufConversion command line tool for doing TensorFlow frozen model (protobuf) to SameDiff FlatBuffers conversion (Link)
Improved DataType validation for SameDiff operations (Link)

ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

ND4J datatypes - significant changes, see highlights at top of this section
nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api (Link)
When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. (Link)
Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. (Link)
Removed Nd4j.ENFORCE_NUMERICAL_STABILITY, Nd4j.copyOnOps, etc (Link)
SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)
Nd4j.emptyLike(INDArray) has been removed. Use Nd4j.like(INDArray) instead (Link)
org.nd4jutil.StringUtils removed; suggest using Apache commons lang3 StringUtils instead (Link)
ND4J Jackson RowVector(De)Serializer has been deprecated due to datatype changes; NDArrayText(De)Serializer should be used instead (Link, Link)
nd4j-instrumentation module has been removed due to lack of use/maintenance (Link)

ND4J/SameDiff: Bug Fixes and Optimizations

Fixed bug with InvertMatrix.invert() with [1,1] shape matrices (Link)
Fixed edge case bug for Updater instances with length 1 state arrays (Link)
Fixed edge case with FileDocumentIterator with empty documents (Link)
SameDiff: Numerous fixes and enhancements
- 1, 2, 3, 4
- Improved functionality for losses (Link, Link, Link, Link)
- Improved errors for missing/misspelled placeholders (Link)
- Fixed edge cases in loops (Link, Link)
Fixed issue with Nd4j.vstack on 1d arrays returning 1d output, not 2d stacked output (Link)
Conv2D op can infer kernel size from input arrays directly when required (Link, Link)
Fixed an issue with Numpy format export - Nd4j.toNpyByteArray(INDArray) (Link)
Fixes for SameDiff when it is used within an external workspace (Link)
Fixed an issue where empty NDArrays would be reported as having scalar shape information, length 1 (Link)
Optimization: libnd4j (c++) indexing for ops will use uint for faster offset calculations when required and possible (Link)
Optimization: libnd4j loops performance improved for faster execution of some operations (Link, Link, Link)
Local response normalization op optimized (Link, Link)
Fixed an issue with INDArray.repeat on some view arrays (Link)
Improved performance for execution of some operations on view arrays (Link)
Improved performance on broadcast operations (Link, Link, Link)
Improved performance for non-EWS reduction along dimension operations (Link)
Improved performance fo IndexReduce operations (Link) and small reductions (Link)
Improved performonce of one_hot operation (Link), tanh operation (Link)
Improved performance for transform operations (Link)
Optimization: empty arrays are created only once and cached (as they are immutable) (Link)
Improved performance on operations using tensor along dimension for parallelization (Link, Link)
Improved performance on "reduce 3" reduction operations (Link)
Improved handling of CUDA contexts in heavily multi-threaded environments (Link)
Fixed an issue where Evaluation.reset() would incorrectly clear the String class labels (Link)
SameDiff: Improved gradient calculation performance/efficiency; "gradients" are now no longer defined for non-floating-point variables, and variables that aren't required to calculate loss or parameter gradients (Link)
Behaviour of IEvaluation instances now no longer depends on the global (default) datatype setting (Link)
INDArray.get(point(x), y) or .get(y, point(x)) now returns rank 1 arrays when performed on rank 2 arrays (Link)
Removed reliance on Guava for SameDiff, fixing potential issue for Java 11/12 and when earlier versions of Guava are on the classpath (Link, Link)
ND4J indexing (INDArray.get) implementation rewritten for better performance and reliability (Link)
Fixes for local response normalization backprop op (Link)

ND4J: Known Issues

Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.
Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See Issue 7637. Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

DataVec

DataVec: Features and Enhancements

Added PythonTransform (arbitrary python code execution for pre processing) (Link, Link)
Added FirstDigit (Benford's law) transform (Link, Link)
StringToTimeTransform now supports setting Locale (Link, Link)
Added StreamInputSplit for creating local data pipelines where data is stored remotely on storage such as HDFS or S3 (Link, Link)
LineRecordReader (and subtypes) now have the option to define the character set (Link)
Added TokenizerBagOfWordsTermSequenceIndexTransform (TFIDF transform), GazeteerTransform (binary vector for word present) and MultiNlpTransform transforms; added BagOfWordsTransform interface (Link)

DataVec: Optimizations and Bug Fixes

Fixed issue with ImageLoader.scalingIfNeeded (Link)

Arbiter

Arbiter: Enhancements

Arbiter now supports genetic algorithm search (Link)

Arbiter: Fixes

Fixed an issue where early stopping used in Arbiter would result in a serialization exception (Link)

Version 1.0.0-beta3

Highlights - 1.0.0-beta3 Release

ND4J/Deeplearning4j: Added support for CUDA 10.0. Dropped support for CUDA 8.0. (1.0.0-beta3 release has CUDA 9.0, 9.2 and 10.0 support)
SameDiff now supports training and evaluation from DataSetIterator and MultiDataSetIterator. Evaluation classes have been moved to ND4J.
DL4J Spark training (gradient sharing) is now fully fault tolerant, and has improvements for threshold adaption (potentially more robust convergence). Ports can now be easily configured independently on master/workers.

Deeplearning4J

Deeplearning4J: New Features

Added OutputAdapter interface and MultiLayerNetwork/ComputationGraph.output method overloads using OutputAdapter (avoids allocating off-heap memory that needs to be cleaned up by GC) Link, Link, Link
Added ComputationGraph/MultiLayerNetwork rnnTimeStep overload with user-specified workspace. Link
Added Cnn3DLossLayer Link
ParallelInference: Instances can now update the model in real-time (without re-init) Link
ParallelInferenc: Added ParallelInference INPLACE mode Link
Added validation for incompatible loss/activation function combinations (such as softmax+nOut=1, or sigmoid+mcxent). New validation can be disabled using outputValidation(false) Link
Spark training: Added full fault tolerance (robust failure recovery) for gradient sharing implementation Link Link
Spark training now supports configuring ports more flexibly (and differently for different workers) using PortSupplier Link Link
Spark training: overhauled gradient sharing threshold adaption algorithms; made it possible to customize threshold settings, plus made defaults more robust to initial threshold configuration improving convergence speed in some cases. Link
Spark training: implemented chunked messaging to reduce memory requirements (and insufficient buffer length issues) for large messages Link
Spark training: Added MeshBuildMode configuration for improved scalability for large clusters Link
Spark network data pipelines: added FileBatch, FileBatchRecordReader etc for "small files" (images etc) distributed training use cases Link
Added FailureTestingListener for fault tolerance/debugging purposes Link
Upgraded Apache Lucene/Solr to version 7.5.0 (from 7.4.0) Link
Added system properties (org.deeplearning4j.tempdir and org.nd4j.tempdir) to allow overriding of the temporary directories ND4J and DL4J use Link Link
Mode MultiLayerNetwork/ComputationGraph.clearLayerStates methods public (was protected) Link
AbstactLayer.layerConf() method is now public Link
ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper Link
Improved validation and error mesages for invalid inputs/labels in Yolo2OutputLayer Link
Spark training: added SharedTrainingMaster.Builder.workerTogglePeriodicGC and .workerPeriodicGCFrequency to easily configure the ND4J garbage collection configuration on workers. Set default GC to 5 seconds on workers Link
Spark training: added threshold encoding debug mode (logs current threshold and encoding statistics on each worker during training). Enable using SharedTrainingConfiguration.builder.encodingDebugMode(true). Note this operation has computational overhead. Link

Deeplearning4J: Bug Fixes and Optimizations

Fixed an issue where L1/L2 and updaters (Adam, Nesterov, etc) were applied before dividing gradients by minibatch to obtain average gradient. To maintain old behaviour, use NeuralNetConfiguration.Builder.legacyBatchScaledL2(true) Link.
- Note that learning rates may need to be decreased for some updaters (such as Adam) to account for this change vs. earlier versions. Some other updaters (such as SGD, NoOp, etc) should be unaffected.
- Note that deserialized (loaded) configurations/networks saved in 1.0.0-beta2 or earlier will default to old behaviour for backward compatibility. All new networks (created in 1.0.0-beta3) will default to the new behaviour.
Fixed an issue where EarlyStoppingScoreCalculator would not correctly handle "maximize score" cases instead of minimizing Link
Fixed order (BGR vs. RGB) for VGG16ImagePreProcessor channel offset values Link
Fixed bug with variational autoencoders using weight noise Link
Fixed issue with BaseDataSetIterator not respecting the 'maximum examples' configuration Link
Optimization: A workspace is now used for ComputationGraph/MultiLayerNetwork evaluation methods (avoids allocating off-heap memory during evaluation that must be cleaned up by garbage collector) Link
Fixed an issue where shuffling combined with a subset for MnistDataSetIterator would not maintain the same subset between resets Link
Fixed issue with StackVertex.getOutputType Link
Fix issue with CNN to/from RNN preprocessors handling of mask arrays Link
Fixed issue with VGG16 non-pretrained configuration in model zoo Link
Fixed issue with TransferLearning nOutReplace where multiple layers in a row are modified Link
Fixed issue with CuDNN workspaces where backpropagation is performed outside of a standard fit call Link
Fixed an issue with dropout masks being cleared prematurely on output layers in ComputationGraph Link
RecordReaderMultiDataSetIterator now supports 5D arrays (for 3D CNNs) Link
Fixed bug in multi input/output ComputationGraphs with TBPTT combined with both masking and different number of input/output arrays Link
Improved input validation/exceptions for batch normalization layer Link
Fixed bug with TransferLearning GraphBuilder nOutReplace when combined with subsampling layers Link
SimpleRnnParamInitializer now properly respects bias initialization configuration Link
Fixed SqueezeNet zoo model non-pretrained configuration Link
Fixed Xception zoo model non-pretrained configuration Link
Fixed an issue with some evaluation signatures for multi-output ComputationGraphs Link
Improved MultiLayerNetwork/ComputationGraph summary method formatting for large nets Link
Fixed an issue where gradient normalization could result in NaNs if gradient is exactly 0.0 for all parameters in a layer Link
Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate could throw an exception for SGD and NoOp updaters Link
Fixed an issue with StackVertex plus masking in some rare cases Link
Fixed an issue with JSON deserialization of frozen layers in pre-1.0.0-alpha format Link
Fixed an issue where GraphBuilder.removeVertex can fail under some limited circumstances Link
Fixed a bug in CacheableExtractableDataSetFetcher Link
DL4J Spark training: Fixed issues with thread/device affinity for multi-GPU training + evaluation Link
DL4J Spark training: Made all Aeron threads daemon threads to prevent Aeron from stopping JVM shutdown when all other threads have completed Link
Added cudnnAllowFallback configuration for BatchNormalization layer (fallback to built-in implementation if CuDNN fails unexpectedly) Link
Fixed some rare concurrency issues with multi-worker (multi-GPU) nodes for Spark training Link Link
Fixed an issue with BatchNormalization layers that prevented the mean/variance estimates from being synced properly on each worker for GradientSharing training, causing convergence issues Link
Added a check to detect ZipSlip CVE attempts in ArchiveUtils Link
DL4J Spark training and evaluation: methods now use Hadoop Configuration from Spark context to ensure runtime-set configuration is available in Spark functions reading directly from remote storage (HDFS etc) Link
MultiLayerNetwork and ComputationGraph now properly support more than Integer.MAX_VALUE parameters Link Link
Added data validation for Nd4j.readTxt - now throws exception on invalid input instead of returning incorrect values Link
Fixed an issue with KNN implementation where a deadlock could occur if an invalid distance function (one returning "distances" less than 0) was utilized Link
Added synchronization to loading of Keras import models to avoid thread safety issues in the underlying HDFS library used for loading Link
Fixed rare issue for Async(Multi)DataSetIterator with large prefetch values Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

IEvaluation classes in DL4J have been deprecated and moved to ND4J so they are available for SameDiff training. Functionality and APIs are unchanged
MultiLayerConfiguration/ComputationGraphConfiguration pretrain(boolean) and backprop(boolean) have been deprecated and are no longer used. Use fit and pretrain/pretrainLayer methods instead. Link
ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper which should be used instead Link
deeplearning4j-nlp-korean module now has Scala version suffix due to scala dependencies; new artifact ID is deeplearning4j-nlp-korean_2.10 and deeplearning4j-nlp-korean_2.11 Link

Deeplearning4J: Known issues: 1.0.0-beta3

Running multiple Spark training jobs simultaneously on the one physical node (i.e., multiple JVMs from one or more Spark jobs) may cause problems with network communication. A workaround for this is to manually set a unique stream ID manually in the VoidConfiguration. Use a unique (or random) integer value for different jobs Link

Deeplearning4J: Keras Import

Fixed import issue due to Keras JSON format changes for Keras 2.2.3+ Link
Added Keras import for timeseries preprocessing Link
Elephas Link
Fixed issue with importing models with reshaping after an embedding layer Link
Added support for Keras masking layers Link
Fixed JSON deserialization issue with some layers/preprocessors, such as Permute Link
Fixed issue with Keras import of Nadam configuration Link

ND4J

ND4J: New Features

Added SameDiff training and evaluation: SameDiff instances can now be trained directly using DataSetIterator and MultiDataSetIterator, and evaluated using IEvaluation instances (that have been moved from ND4J to DL4J) Link
Added GraphServer implementation: c++ inference server for SameDiff (and Tensorflow, via TF import) with Java API Link
SameDiff instances can now be loaded from serialized FlatBuffers format (SameDiff.asFlatFile plus fromFlatFile) Link Link
Added MKL-DNN support for some operations (Conv2d, etc) Link
Upgraded ND4J (and DataVec) to Arrow 0.11.0 Link, which also fixes Link
Added Nd4j.where op method (same semantics as numpy.where) Link
Added Nd4j.stack op method (combine arrays + increase array rank by 1) Link
Libnd4j new ops:
- Matrix band part Link
- Scatter ND, ND-add, ND-sub and ND-update ops Link
- Sparse softmax cross entropy loss with logits Link
- Histogram fixed width op Link
- broadcast_to op Link
- deconv3d op added Link
- Unsorted segment ops added Link
- Segment_X backprop ops added Link
- batchnorm_new op added that supports multiple axes for mean/variance Link
- GRU cell backprop added Link
Nd4j Preconditions class now has methods for formatting INDArray arguments Link, Link
SameDiff loss functions: cleanup plus forward pass implementation Link
CudaGridExecutioner now warns that exception stack traces may be delayed to avoid confusion in debugging exceptions occuring during asynchronous execution of ops Link
JavaCPP and JavaCPP-presets have been upgraded to version 1.4.3 Link
Improved Javadoc on SDVariable class Link

ND4J: Bug Fixes and Optimizations

Fixes for android: Remove use of RawIndexer Link
Libnd4j custom ops: conv op weight layouts are now not dependent on the input format (NCHW/NHWC) - now always [kH, kW, inChannels, outChannels] for 2d CNNs, [kH, kW, kD, inChannels, outChannels] for 3d CNNs. Link, Link
Libnd4j native op fixes:
- Dot operation backprop Link, determinant Link
- Backprop op fix for the broadcast case for some pairwise transform custom op implementations Link
- Fix for reverse custom op with rank 1 inputs Link
- ATan2 op is now broadcastable Link
- Boolean custom op broadcast fixes/additions Link
- Scatter op edge case fixes Link
- ArgMin shape function fix Link, negative axis fix Link
- Unique op fix Link
- Pad op fix Link
- Fixed where op shape function Link
- SVD rank 1 edge case fix Link
- Range op Link
- Split and space_to_batch fixes Link
- Broadcast dynamic shape Link
- embedding_lookup op now supports multiple input arrays Link
- Matrix determinant op edge case (rank 0 result) shape fix Link
SameDiff TensorFlow import: fixes for multiple operations Link, Link, Link, Link
SameDiff: Improved error handling for multiple outputs case Link
Fixed issue where INDArray.permute would not correctly throw an exception for invalid length case Link
Fixed issues with INDArray.get/put with SpecifiedIndex Link, Link
Minor change to DataSet.merge - signature now accepts any DataSet subtypes Link
INDArray.transposei operation was not in-place Link
Fixed issues with INDArray.mmul with MMulTranspose Link
Added additional order validation for ND4J creation methods (create, rand, etc) Link
Fix for ND4J binary deserialization (BinarySerde) when deserializing from heap byte buffers Link
Fixed issue with Nd4j-common ClassPathResource path resolution in some IDEs Link
Fixed issue where INDArray.get(interval) on rank 1 array would return rank 2 array Link
Fixed a validation issue with Nd4j.gemm/mmuli on views Link Link
INDArray.assign(INDArray) no longer allows assigning different shape arrays (other than scalar/vector cases) Link
NDarrayStrings (and INDArray.toString()) now always uses US locale when formatting numbers Link
Fixed an issue with GaussianDistribution specific to V100 GPUs Link
Fixed an issue with bitmap compression/encoding specific to V100 GPUs Link
Transforms.softmax now throws an error on unsupported shapes instead of simply not applying operation Link
VersionCheck functionality: handle case where SimpleFileVisitor is not available on earlier versions of Android Link
SameDiff convolution layer configuration (Conv2dConfig/Conv3dConfig/Pooling3dConfig etc) have had parameter names aligned Link

ND4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

CUDA 8.0 support has been removed. CUDA 9.0, 9.2 and 10.0 support is available in 1.0.0-beta3
nd4j-base64 module contents have been deprecated; use the equivalent classes in nd4j-api from now on Link
Some classes in nd4j-jackson module has been deprecated; use the equivalent classes in nd4j-api from now on Link

ND4J: Known issues: 1.0.0-beta3

Android users may need to manually exclude the (now deprecated) module nd4j-base64. This is due to org.nd4j.serde.base64.Nd4jBase64 class being present in both nd4j-api and nd4j-base64 modules. Both versions have identical content. Use exclude group: 'org.nd4j', module: 'nd4j-base64' to exclude.

DataVec

DataVec: New Features

Added NativeImageLoader method overloads for org.opencv.core.Mat and String as filename Link

DataVec: Optimizations and Bug Fixes

Fix for JDBCRecordReader handling of null values Link
Improved errors/validation for ObjectDetectionRecordReader for invalid input (where image object centers are outside of image bounds) Link
Fixed issue where FileSplit using methods that are unavailable on earlier versions of Android Link
Added SerializableHadoopConfiguration and BroadcastHadoopConfigHolder for cases where a Hadoop configuration is required in Spark functions Link Link
Fixed issue with JDBCRecordReader's handling of real-valued column result types Link
Added validation and useful exception for CSVRecordReader/LineRecordReader being used without initialization Link

Arbiter

Arbiter: Fixes

Fixed some issues with dropout layers Link

ND4S

Added conversion between org.nd4j.linalg.primitives.Pair/Triple and Scala Tuple Link

Version 1.0.0-beta2

Highlights - 1.0.0-beta2 Release

ND4J/Deeplearning4j: Added support for CUDA 9.2. Dropped support for CUDA 9.1. (1.0.0-beta2 release has CUDA 8.0, 9.0 and 9.2 support)
Deeplearning4j: New SameDiff layers with training support - Link Link
Deeplearning4j resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property
ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)
ND4J: nd4j-native-platform will now use Intel MKL-DNN as the default/bundled BLAS implementation (replacing OpenBLAS as the previous default)
Deeplearning4j: Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs (to assist with debugging and tuning memory configuration).
Deeplearning4j - new layers: Locally connected 1d Link, Locally connected 2d Link

Deeplearning4J

Deeplearning4J: New Features

Added new SameDiff layers (automatic differentiation - only single class, forward pass definition required) to DL4J with full training support - SameDiffLayer, SameDiffVertex, SameDiffOutputLayer, SameDiffLambdaLayer, SameDiffLambdaVertex - note that these are CPU-only execution for now Link Link Link
Resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property. Note that it is also possible to set a different base location for downloads (for local mirrors of DL4J resources) Link
Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs. Same information is available (without a crash) for MultiLayerNetwork/ComputationGraph.memoryInfo methods. Can be disabled (or output directory set) using system properties - Link
Added Composite[Multi]DataSetPreProcessor to enable multiple [Multi]DataSetPreProcessors to be applied in a single iterator Link
Added ComputationGraph evaluate methods for multi-output networks: evaluate(DataSetIterator, Map<Integer,IEvaluation[]>) and evaluate(MultiDataSetIterator, Map<Integer,IEvaluation[]>) Link
Added JointMultiDataSetIterator - utility iterator used to create MultiDataSetIterator from multiple DataSetIterators Link
GraphVertices may now have trainable parameters directly (not just enclose layers with trainable parameters) Link
Added MultiLayerNetwork/ComputationGraph getLearningRate methods Link
Added RandomDataSetIterator and RandomMultiDataSetIterator (mainly for testing/debugging) Link Link
Added cyclical "1cycle" schedule for learning rate schedules etc - Link
RDD repartitioning for Spark training is more configurable (adds Repartitioner interface) Link
Added ComputationGraph.getIterationCount() and .getEpochCount() for consistency with MultiLayerNetwork Link
Added locally connected 1d layer Link Link
Spark "data loader" API (mainly for Spark) Link Link Link
Spark evaluation: added evaluation method overloads that allow specifying the number of evaluation workers (less than number of Spark threads) Link
CnnSentenceDataSetIterator now has a Format argument, and supports outputting data for RNNs and 1D CNNs Link
Added ComputationGraph/MultiLayerNetwork.pretrain((Multi)DataSetIterator, int epochs) method overloads Link
MultiLayerNetwork and ComputationGraph now have output method overloads where the network output can be placed in the user-specified workspace, instead of being detached Link Link. This can be used to avoid creating INDArrays that need to be garbage collected before native memory can be freed.
EmbeddingSequenceLayer now supports [minibatch,1,seqLength] format sequence data in addition to [minibatch,seqLength] format data Link
CuDNN batch norm implementation will now be used for rank 2 input, not just rank 4 input Link
Environment variables and system properties for DL4J have been centralized into DL4JResources and DL4JEnvironmentVars classes, with proper descriptions Link Link
MultiLayerNetwork and ComputationGraph output/feedForward/fit methods are now thread-safe via synchronization. Note that concurrent use is not recommended due to performance (instead: use ParallelInference); however the now-synchronized methods should avoid obscure errors due to concurrent modifications Link
BarnesHutTSNE now throws a useful exception in the case where the distance metric is undefined (for example, all zeros plus cosine similarity) Link

Deeplearning4J: Bug Fixes and Optimizations

ComputationGraph.addListeners was not working correctly if listeners were already present Link, Link
TinyImageNetDataSetIterator did not validate/correctly use input shape configuration Link, Link
BatchNormalization layer now correctly asserts that nOut is set if required (instead of unfriendly shape errors later) Link
Fixed issue where OutputLayer may not initialize parameter constraints correctly Link
Fixed performance issue with Nesterov updater using CPU-only op for CUDA execution Link
Removed TerminationCondition for DL4J optimizers - was not used in practice, and had minor overhead Link
Fixed issue where EvaluativeListener could hit a workspace validation exception when workspaces are enabled Link
Fixed issue where TrainingListener.onEpochStart/onEpochEnd were not being called correctly for ComputationGraph Link
Fixed workspace issue with TensorFlowCnnToFeedForwardPreProcessor Link
Performance optimization for BatchNormalization when using CuDNN Link
Performance optimization: Dropout will be applied in-place when safe to do so, avoiding a copy Link
Added CuDNN implementation of Dropout Link
Reduced memory use for CuDNN: CuDNN working memory is now shared and reused between layers within a network Link
CuDNN batch normalization implementation would fail with FP16 datatype Link
Fixed issue Bidirectional LSTM may incorrectly use workspaces causing an exception Link
Fixed issue with early stopping where scores to be maximized (accuracy, f1, etc) were not properly triggering termination conditions Link
Fixed issue where label mask counter could be incorrectly incremented in ComputationGraph.computeGradientAndScore() Link
ComputationGraph was not setting lastEtlTime field during training Link
Fixed issue with AutoEncoder layer when workspaces are enabled Link
Fixed issue with EmbeddingSequenceLayer use of mask arrays Link
Lombok is now provided scope everywhere, isn't on user classpath when using DL4J Link
Fixed issue where WordVectorSerializer.readParagraphVectors(File) initialization of label source Link
Spark training (gradient sharing) now properly handles empty partition edge case when encountered during training Link
Errors are propagated better/more consistently for Spark gradient sharing training Link
Fixed issue with 1D CNN layers with mask arrays and stride > 1 (masks not being correctly downsized) Link
DL4J Batch norm implementation was not correctly adding epsilon value during inference, only during training (CuDNN unaffected) Link
CuDNN subsampling layers with max pooling and ConvolutionMode.SAME may have taken padding value (0) as the maximum for border values when all non-padding values are less than 0 Link
Spark training with gradient sharing now passes listeners to workers correctly Link
Fixed rare (and non-terminal) concurrent modification issue with UI and FileStatsStorage Link
CuDNN convolution layer now supports dilation > 2 (previously: used DL4J conv layer implementation as a fallback) Link
Yolo2OutputLayer now implements computeScoreForExamples() Link
SequenceRecordReeaderDataSetIterator now handles the "no labels" case correctly Link
Fixed issue where BarnesHutTSNE could hit a workspace validation exception Link
EMNIST iterator could produce incorrect data in some cases after a reset Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

GravesLSTM has been deprecated in favor of LSTM due to lack of CuDNN support but otherwise similar accuracy to in practice. Use LSTM class instead.
deeplearning4j-modelexport-solr: now uses Lucene/Solr version 7.4.0 (was 7.3.0) Link
Mask arrays for CNN2d layers must be in broadcastable 4d format: [minibatch,depth or 1, height or 1, width or 1] - previously they were 2d with shape [minibatch,height] or [minibatch,width]. This provents ambiguity in later cases (pooling layers), and allows for more complex masking scenarios (such as masking for different image sizes in same minibatch). Link
Some older/deprecated Model and Layer methods have been removed. (validateInput(), initParams()). Some custom layers may need to be updated as a result Link

Deelpearning4J: 1.0.0-beta2 Known Issues

Windows users are unable to load the HDF5 files used in SvhnLabelProvider (used in HouseNumberDetection example). Linux/Mac users are unaffected. A workaround for windows users is to add the sonatype snapshot dependency org.bytedeco.javacpp-presets:hdf5-platform:jar:1.10.2-1.4.3-SNAPSHOT Link

Deeplearing4J: Keras Import

Keras model import now imports every Keras application
Supports GlobalPooling3D layer import
Supports RepeatVector layer import
Supports LocallyConnected1D and LocallyConnected2D layers
Keras Lambda layers can now be imported by registering custom SameDiff layers
All Keras optimizers are now supported
All advanced activation functions can now be imported.
Many minor bugs have been fixed, including proper weight setting for all configurations of BatchNormalization, improvements to Reshape SeparableConvolution2D, and full support of Bidirectional layers.

ND4J

ND4J: New Features

ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)
Added the ability to write Numpy .npy format using Nd4j.writeAsNumpy(INDArray,File) and convert an INDArray to a numpy strict in-memory using Nd4j.convertToNumpy(INDArray) Link
ND4j-common ClassPathResource: added ClassPathResource.copyDirectory(File) Link
SameDiff: A significant number of new ops, and backprop implementations for existing ops
Added Nd4j.randomBernoulli/Binomial/Exponential convenience methods Link
Added way to disable/suppress ND4J initialization logging via org.nd4j.log.initialization system property Link
SameDiff class - most op/constructor methods now have complete/useful javadoc Link
Workspaces can now be disabled globally, ignoring workspace configuration. This is mainly used for debugging; use Nd4j.getWorkspaceManager().setDebugMode(DebugMode.DISABLED) or Nd4j.getWorkspaceManager().setDebugMode(DebugMode.SPILL_EVERYTHING); to enable this. Link [Link]
Added EnvironmentalAction API for environment variable processing Link
ND4J environment variables and system properties have been centralized in ND4jEnvironmentVars and ND4jSystemProperties classes Link and Link

ND4J: Bug Fixes and Optimizations

SameDiff: a significant number of bug fixes for execution and individual ops
Fixed issue where INDArray.toDoubleArray() with true scalars (rank 0 arrays) Link
Fixed issue with DataSet.sample() not working for rank 3+ features Link
IActivation implementations now validate/enforce same shape for activations and gradients Link
Fixed issue with muliColumnVector where vector is 1d Link
ImagePreProcessingScaler now supports serialization via NormalizerSerializerStrategy and ModelSerializer Link
Performance optimization for threshold encoding used in DL4J's Spark gradient sharing distributed training implementation Link
SameDiff: Fixed issue where memory wasn't always released after execution Link
DataSet.save() and MultiDataSet.save() methods now save example metadata when present Link
Fixed issue with KFoldIterator when dataset does not divide equally into folds with no remainder Link
Fixed issue where version check functionality could fail to load resources if resources are on a path with spaces Link

ND4J: Known Issues

ND4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

CUDA 9.1 support has been removed. CUDA 8.0, 9.0 and 9.2 support is available
Due to long indexing changes, long/long[] should be used in place of int/int[] in some places (such as INDArray.size(int), INDArray.shape())
Simplified DataSetIterator API: totalExamples(), cursor() and numExamples() - these were unsupported on most DataSetIterator implementations, and not used in practice for training. Custom iterators should remove these methods also Link
Long-deprecated DataSet.getFeatureMatrix() has been removed. Use DataSet.getFeatures() instead. Link
Unused and not properly tested/maintained utility class BigDecimalMath has been removed. Users should find an aternative library for this functionality, if required.
Not properly maintained complex number support classes (IComplexNumber, IComplexNDArray) have been removed entirely Link

DataVec

DataVec: New Features

Added AnalyzeLocal class to mirror functionality of AnalyzeSpark (but without Spark dependency) Link
Added JacksonLineSequenceRecordReader: RecordReader used for multi-example JSON/XML where each line in a file is an independent example Link
Added RecordConvert.toRecord(Schema, List<Object>) Link
Added missing FloatColumnCondition Link
Added CSVLineSequenceRecordReader for "each line in CSV is a sequence, and sequence is single-valued/univariate" Link
Added CSVMultiSequenceRecordReader for "multiple multi-valued sequences in a single CSV" data Link

DataVec: Optimizations and Bug Fixes

Fixed issue with NativeImageLoader on Android Link
Fixed issue with ExcelRecordReader Link
Fixed issue where bad args for CSVRecordReader.next(int) could cause an unnecessarily large list to be generated Link

DataVec: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

Arbiter

Arbiter: New Features

Added DataSource interface. Unlike old DataProvider, this does not require JSON serializability (only a no-arg constructor) Link
Added numerous enhancements and missing configuration options (constraints, dilation, etc) Link Link

Arbiter: Fixes

DataProvider has been deprecated. Use DataSource instead.

RL4J

stepCounter, epochCounter and historyProcessor can now be set Link
Random seed is now loaded for ACPolicy is loaded Link

Version 1.0.0-beta

Highlights - 1.0.0-beta Release

Performance and memory optimizations for DL4J

Deeplearning4J

Deeplearning4J: New Features

New or enhanced layers:
- Added Cropping1D layer Link
- Added Convolution3D, Cropping3D, UpSampling3D, ZeroPadding3D, Subsampling3D layers (all with Keras import support): Link Link
- Added EmbeddingSequenceLayer (EmbeddingLayer for time series) Link
- Added OCNNOutputLayer (one-class neural network) - implementation of this paper - Link
- Added FrozenLayerWithBackprop layer Link
- Added DepthwiseConvolution2D layer Link
Added ComputationGraph.output(DataSetIterator) method Link
Added MultiLayerNetwork/ComputationGraph.layerInputSize methods Link Link
Added SparkComputationGraph.feedForwardWithKey overload with feature mask support Link
Added MultiLayerNetwork.calculateGradients method (for easily getting parameter and input gradients, for example for some model interpretabilithy approaches) Link Link
Added support to get input/activation types for each layer from configuration: ComputationGraphConfiguration.getLayerActivationTypes(InputType...), ComputationGraphConfiguration.GraphBuilder.getLayerActivationTypes(), NeuralNetConfiguration.ListBuilder.getLayerActivationTypes(), MultiLayerConfiguration.getLayerActivationTypes(InputType) methods Link
Evaluation.stats() now prints confusion matrix in easier to read matrix format, rather than list format Link
Added ModelSerializer.addObjectToFile, .getObjectFromFile and .listObjectsInFile for storing arbitrary Java objects in same file as saved network Link
Added SpatialDropout support (with Keras import support) Link
Added MultiLayerNetwork/ComputationGraph.fit((Multi)DataSetIterator, int numEpochs) overloads Link
Added performance (hardware) listeners: SystemInfoPrintListener and SystemInfoFilePrintListener Link

Deeplearning4J: Bug Fixes and Optimizations

Performance and memory optimizations via optimizations of internal use of workspaces Link
Reflections library has entirely been removed from DL4J and is no longer required for custom layer serialization/deserialization Link, Link
- Fixes issues with custom and some Keras import layers on Android
RecordReaderMultiDataSetIterator will no longer try to convert unused columns to numerical values Link
Added new model zoo models:
- (to do)
Fixes for Android compilation (removed duplicate classes, aligned versions, removed some dependencies) Link Link Link
Fix for RecordReaderMulitDataSetIterator where output could be incorrect for some constructors Link
Non-frozen layers before a frozen layer will no longer be skipped during backprop (useful for GANs and similar architectures) Link Link
Fixed issue where ComputationGraph topological sort may not be consistent on all platforms; could sometimes break ComputationGraphs (with multiple valid topological orderings) trained on PC and deployed on Android Link
Fixed issue with CuDNN batch norm using 1-decay instead of decay Link
deeplearning4j-cuda no longer throws exceptions if present on classpath with nd4j-native backend set to higher priority Link
Added RNG control for CifarDataSetIterator Link
WordVectorSerializer now deletes temp files immediately once done Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

WorkspaceMode.SINGLE and SEPARATE have been deprecated; use WorkspaceMode.ENABLED instead
Internal layer API changes: custom layers will need to be updated to the new Layer API - see built-in layers or custom layer example
Custom layers etc in pre-1.0.0-beta JSON (ModelSerializer) format need to be registered before they can be deserialized due to JSON format change. Built-in layers and models saved in 1.0.0-beta or later do not require this. Use NeuralNetConfiguration.registerLegacyCustomClassesForJSON(Class) for this purpose
IterationListener has been deprecated in favor of TrainingListener. For existing custom listeners, switch from implements TrainingListener to extends BaseTrainingListener Link
ExistingDataSetIterator has been deprecated; use fit(DataSetIterator, int numEpochs) method instead

Deelpearning4J: 1.0.0-beta Known Issues

ComputationGraph TrainingListener onEpochStart and onEpochEnd methods are not being called correctly
DL4J Zoo Model FaceNetNN4Small2 model configuration is incorrect, causing issues during forward pass
Early stopping score calculators with values thar should be maximized (accuracy, f1 etc) are not working properly (values are minimized not maximized). Workaround: override ScoreCalculator.calculateScore(...) and return 1.0 - super.calculateScore(...).

Deeplearing4J: Keras Import

Deeplearning4J: Keras Import - API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

ND4J

ND4J: New Features

ND4J: Known Issues

Not all op gradients implemented for automatic differentiation
Vast majority of new operations added in 1.0.0-beta do NOT use GPU yet.

ND4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

DataVec

DataVec: New Features

ImageRecordReader now logs number of inferred label classes (to reduce risk of users missing a problem if something is misconfigured) Link
Added AnalyzeSpark.getUnique overload for multiple columns Link
Added performance/timing module Link

DataVec: Optimizations and Bug Fixes

Reduced ImageRecordReader garbage generation via buffer reuse Link
Fixes for Android compilation (aligned versions, removed some dependencies) Link Link
Removed Reflections library use in DataVec Link
Fix for TransformProcessRecordReader batch support Link
Fix for TransformProcessRecordReader with filter operations Link
Fixed issue with ImageRecordReader/ParentPathLabelGenerator incorrectly filtering directories containing . character(s) Link
ShowImageTransform now initializes frame lazily to avoid blank windows Link

DataVec: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

DataVec ClassPathResource has been deprecated; use nd4j-common version instead Link

Arbiter

Arbiter: New Features

Added LayerSpace for OCNN (one-class neural network)

Arbiter: Fixes

Fixed timestamp issue that could cause incorrect rendering of first model's results in UI Link
Execution now waits for last model(s) to complete before returning when a termination condition is hit Link
As per DL4J etc: use of Reflections library has been removed entirely from Arbiter Link
Remove use of Eclipse Collections library due to issues with Android compilation Link
Improved cleanup of completed models to reduce maximum memory requirements for training Link

Version 1.0.0-alpha

Highlights - 1.0.0-alpha Release

ND4J: Added SameDiff - Java automatic differentiation library (alpha release) with Tensorflow import (technology preview) and hundreds of new operations
ND4J: Added CUDA 9.0 and 9.1 support (with cuDNN), dropped support for CUDA 7.5, continued support for CUDA 8.0
ND4J: Native binaries (nd4j-native on Maven Central) now ship with AVX/AVX2/AVX-512 support (Windows/Linux)
DL4J: Large number of new layers and API improvements
DL4J: Keras 2.0 import support

Deeplearning4J

Deeplearning4J: New Features

Layers (new and enhanced)
- Added Yolo2OutputLayer CNN layer for object detection (Link). See also DataVec's ObjectDetectionRecordReader
- Adds support for 'no bias' layers via hasBias(boolean) config (DenseLayer, EmbeddingLayer, OutputLayer, RnnOutputLayer, CenterLossOutputLayer, ConvolutionLayer, Convolution1DLayer). EmbeddingLayer now defaults to no bias (Link)
- Adds support for dilated convolutions (aka 'atrous' convolutions) - ConvolutionLayer, SubsamplingLayer, and 1D versions there-of. (Link)
- Added Upsampling2D layer, Upsampling1D layer (Link, Link)
- ElementWiseVertex now (additionally) supports Average and Max modes in addition to Add/Subtract/Product (Link)
- Added SeparableConvolution2D layer (Link)
- Added Deconvolution2D layer (aka transpose convolution, fractionally strided convolution layer) (Link)
- Added ReverseTimeSeriesVertex (Link)
- Added RnnLossLayer - no-parameter version of RnnOutputLayer, or RNN equivalent of LossLayer (Link)
- Added CnnLossLayer - no-parameter CNN output layer for use cases such as segmentation, denoising, etc. (Link)
- Added Bidirectional layer wrapper (converts any uni-directional RNN to a bidirectional RNN) (Link)
- Added SimpleRnn layer (aka "vanilla" RNN layer) (Link)
- Added LastTimeStep wrapper layer (wraps a RNN layer to get last time step, accounting for masking if present) (Link)
- Added MaskLayer utility layer that simply zeros out activations on forward pass when a mask array is present (Link)
- Added alpha-version (not yet stable) SameDiff layer support to DL4J (Note: forward pass, CPU only for now)(Link)
- Added SpaceToDepth and SpaceToBatch layers (Link, Link)
- Added Cropping2D layer (Link)
Added parameter constraints API (LayerConstraint interface), and MaxNormConstraint, MinMaxNormConstraint, NonNegativeConstraint, UnitNormConstraint implementations (Link)
Significant refactoring of learning rate schedules (Link)
- Added ISchedule interface; added Exponential, Inverse, Map, Poly, Sigmoid and Step schedule implementations (Link)
- Added support for both iteration-based and epoch-based schedules via ISchedule. Also added support for custom (user defined) schedules
- Learning rate schedules are configured on the updaters, via the .updater(IUpdater) method
Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types (Link)
Added support for dropout schedules via ISchedule interface (Link)
Added weight/parameter noise API (IWeightNoise interface); added DropConnect and WeightNoise (additive/multiplicative Gaussian noise) implementations (Link); dropconnect and dropout can now be used simultaneously
Adds layer configuration alias .units(int) equivalent to .nOut(int) (Link)
Adds ComputationGraphConfiguration GraphBuilder .layer(String, Layer, String...) alias for .addLayer(String, Layer, String...)
Layer index no longer required for MultiLayerConfiguration ListBuilder (i.e., .list().layer(<layer>) can now be used for configs) (Link)
Added MultiLayerNetwork.summary(InputType) and ComputationGraph.summary(InputType...) methods (shows layer and activation size information) (Link)
MultiLayerNetwork, ComputationGraph and layerwise trainable layers now track the number of epochs (Link)
Added deeplearning4j-ui-standalone module: uber-jar for easy launching of UI server (usage: java -jar deeplearning4j-ui-standalone-1.0.0-alpha.jar -p 9124 -r true -f c:/UIStorage.bin)
Weight initializations:
- Added .weightInit(Distribution) convenience/overload (previously: required .weightInit(WeightInit.DISTRIBUTION).dist(Distribution)) (Link)
- WeightInit.NORMAL (for self-normalizing neural networks) (Link)
- Ones, Identity weight initialization (Link)
- Added new distributions (LogNormalDistribution, TruncatedNormalDistribution, OrthogonalDistribution, ConstantDistribution) which can be used for weight initialization (Link)
- RNNs: Added ability to specify weight initialization for recurrent weights separately to "input" weights (Link)
Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) (Link)
Added Spark IteratorUtils - wraps a RecordReaderMultiDataSetIterator for use in Spark network training (Link)
CuDNN-supporting layers (ConvolutionLayer, etc) now warn the user if using CUDA without CuDNN (Link)
Binary cross entropy (LossBinaryXENT) now implements clipping (1e-5 to (1 - 1e-5) by default) to avoid numerical underflow/NaNs (Link)
SequenceRecordReaderDataSetIterator now supports multi-label regression (Link)
TransferLearning FineTuneConfiguration now has methods for setting training/inference workspace modes (Link)
IterationListener iterationDone method now reports both current iteration and epoch count; removed unnecessary invoke/invoked methods (Link)
Added MultiLayerNetwork.layerSize(int), ComputationGraph.layerSize(int)/layerSize(String) to easily determine size of layers (Link)
Added MultiLayerNetwork.toComputationGraph() method (Link)
Added NetworkUtils convenience methods to easily change the learning rate of an already initialized network (Link)
Added MultiLayerNetwork.save(File)/.load(File) and ComputationGraph.save(File)/.load(File) convenience methods (Link)
Added CheckpointListener to periodically save a copy of the model during training (every N iter/epochs, every T time units) (Link)
Added ComputationGraph output method overloads with mask arrays (Link)
New LossMultiLabel loss function for multi-label classification (Link)
Added new model zoo models:
- Darknet19 (Link)
- TinyYOLO (Link)
New iterators, and iterator improvements:
- Added FileDataSetIterator, FileMultiDataSetIterator for flexibly iterating over directories of saved (Multi)DataSet objects (Link)
- UCISequenceDataSetIterator (Link)
- RecordReaderDataSetIterator now has builder pattern for convenience, improved javadoc (Link)
- Added DataSetIteratorSplitter, MultiDataSetIteratorSplitter (Link, Link)
Added additional score functions for early stopping (ROC metrics, full set of Evaluation/Regression metrics, etc) (Link)
Added additional ROC and ROCMultiClass evaluation overloads for MultiLayerNetwork and ComputationGraph (Link)
Clarified Evaluation.stats() output to refer to "Predictions" instead of "Examples" (former is more correct for RNNs) (Link)
EarlyStoppingConfiguration now supports Supplier<ScoreCalculator> for use with non-serializable score calculators (Link)
Improved ModelSerializer exceptions when trying to load a model via wrong method (i.e., try to load ComputationGraph via restoreMultiLayerNetwork) (Link)
Added SparkDataValidation utility methods to validate saved DataSet and MultiDataSet on HDFS or local (Link)
ModelSerializer: added restoreMultiLayerNetworkAndNormalizer and restoreComputationGraphAndNormalizer methods (Link)
ParallelInference now has output overloads with support for input mask arrays (Link)

Deeplearning4J: Bug Fixes and Optimizations

Lombok is no longer included as a transitive dependency (Link)
ComputationGraph can now have a vertex as the output (not just layers) (Link, Link)
Performance improvement for J7FileStatsStorage with large amount of history (Link)
Fixed UI layer sizes for variational autoencoder layers (Link)
Fixes to avoid HDF5 library crashes (Link, Link)
UI Play servers switch to production (PROD) mode (Link)
Related to the above: users can now set play.crypto.secret system property to manually set the Play application secret; is randomly generated by default (Link).
SequenceRecordReaderDataSetIterator would apply preprocessor twice (Link)
Evaluation no-arg constructor could cause NaN evaluation metrics when used on Spark
CollectScoresIterationListener could recurse endlessly (Link)
Async(Multi)DataSetIterator calling reset() on underlying iterator could cause issues in some situations (Link)
In some cases, L2 regularization could be (incorrectly) applied to frozen layers (Link)
Logging fixes for NearestNeighboursServer (Link)
Memory optimization for BaseStatsListener (Link)
ModelGuesser fix for loading Keras models from streams (previously would fail) (Link)
Various fixes for workspaces in MultiLayerNetwork and ComputationGraph (Link, Link, Link, Link, Link, Link)
Fix for incorrect condition in DuplicateToTimeSeriesVertex (Link)
Fix for getMemoryReport exception on some valid ComputationGraph networks (Link)
RecordReaderDataSetIterator when used with preprocessors could cause an exception under some circumstances (Link)
CnnToFeedForwardPreProcessor could silently reshape invalid input, as long as the input array length matches the expected length (Link)
ModelSerializer temporary files would not be deleted if JVM crashes; now are deleted immediately when no longer required (Link)
RecordReaderMultiDataSetIterator may not add mask arrays under some circumstances, when set to ALIGN_END mode (Link)
ConvolutionIterationListener previously produced an IndexOutOfBoundsException when all convolution layers are frozen (Link)
PrecisionRecallCurve.getPointAtRecall could return a point with a correct but sub-optimal precision when multiple points had identical recall (Link)
Setting dropout(0) on transfer learning FineTuneConfiguration did not remove dropout if present on existing layer (Link)
Under some rare circumstances, Spark evaluation could lead to a NullPointerException (Link)
ComputationGraph: disconnected vertices were not always detected in configuration validation (Link)
Activation layers would not always inherit the global activation function configuration (Link)
RNN evaluation memory optimization: when TBPTT is configured for training, also use TBPTT-style splitting for evaluation (identical result, less memory) (Link, Link)
PerformanceListener is now serializable (Link)
ScoreIterationListener and PerformanceListener now report model iteration, not "iterations since listener creation" (Link)
Precision/recall curves cached values in ROC class may not be updated after merging ROC instances (Link)
ROC merging after evaluating a large number of examples may produce IllegalStateException (Link)
Added checks for invalid input indices to EmbeddingLayer (Link)
Fixed possible NPE when loading legacy (pre-0.9.0) model configurations from JSON (Link)
Fixed issues with EvaluationCalibration HTML export chart rendering (Link)
Fixed possible incorrect redering of UI/StatsStorage charts with J7FileStatsStorage when used with Spark training (Link)
MnistDataSetIterator would not always reliably detect and automatically fix/redownload on corrupted download data (Link)
MnistDataSetIterator / EmnistDataSetIterator: updated download location after hosting URL change (Link, Link)
Fixes to propagation of thread interruptions (Link)
MultiLayerNetwork/ComputationGraph will no longer throw an ND4JIllegalStateException during initialization if a network contains no parameters (Link, Link)
Fixes for TSNE posting of data to UI for visualization (Link)
PerformanceListener now throws a useful exception (in constructor) on invalid frequency argument, instead of runtime ArithmeticException (Link)
RecordReader(Multi)DataSetIterator now throws more useful exceptions when Writable values are non-numerical (Link)
UI: Fixed possible character encoding issues for non-English languages when internationalization data .txt files are read from uber JARs (Link)
UI: Fixed UI incorrectly trying to parse non-DL4J UI resources when loading I18N data (Link)
Various threading fixes (Link)
Evaluation: no-arg methods (f1(), precion(), etc) now return single class value for binary case instead of macro-averaged value; clarify values in stats() method and javadoc (Link)
Early stopping training: TrainingListener opEpochStart/End (etc) methods were not being called correctly (Link)
Fixes issue where dropout was not always applied to input of RNN layers (Link)
ModelSerializer: improved validation/exceptions when reading from invalid/empty/closed streams (Link)
ParallelInference fixes:
- fixes for variable size inputs (variable length time series, variable size CNN inputs) when using batch mode (Link)
- fixes undelying model exceptions during output method are now properly propagated back to the user (Link)
- fixes support for 'pre-batched' inputs (i.e., inputs where minibatch size is > 1) (Link)
Memory optimization for network weight initialization via in-place random ops (Link)
Fixes for CuDNN with SAME mode padding (Link, Link)
Fix for VariationalAutoencoder builder decoder layer size validation (Link)
Improved Kmeans throughputlink
Add RPForest to nearest neighbors link

Deeplearning4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

Default training workspace mode has been switched to SEPARATE from NONE for MultiLayerNetwork and ComputationGraph (Link)
Behaviour change: fit(DataSetIterator) and similar methods no longer perform layerwise pretraining followed by backprop - only backprop is performed in these methods. For pretraining, use pretrain(DataSetIterator) and pretrain(MultiDataSetIterator) methods (Link)
Previously deprecated updater configuration methods (.learningRate(double), .momentum(double) etc) all removed
- To configure learning rate: use .updater(new Adam(lr)) instead of .updater(Updater.ADAM).learningRate(lr)
- To configure bias learning rate: use .biasUpdater(IUpdater) method
- To configure learning rate schedules: use .updater(new Adam(ISchedule)) and similar
Updater configuration via enumeration (i.e., .updater(Updater)) has been deprecated; use .updater(IUpdater)
.regularization(boolean) config removed; functionality is now always equivalent to .regularization(true)
.useDropConnect(boolean) removed; use .weightNoise(new DropConnect(double)) instead
.iterations(int) method has been removed (was rarely used and confusing to users)
Multiple utility classes (in org.deeplearning4j.util) have been deprecated and/or moved to nd4j-common. Use same class names in nd4j-common org.nd4j.util instead.
DataSetIterators in DL4J have been moved from deeplearning4j-nn module to new deeplearning4j-datasets, deeplearning4j-datavec-iterators and deeplearning4j-utility-iterators modules. Packages/imports are unchanged; deeplearning4j-core pulls these in as transitive dependencies hence no user changes should be required in most cases (Link)
Previously deprecated .activation(String) has been removed; use .activation(Activation) or .activation(IActivation) instead
Layer API change: Custom layers may need to implement applyConstraints(int iteration, int epoch) method
Parameter initializer API change: Custom parameter initializers may need to implement isWeightParam(String) and isBiasParam(String) methods
RBM (Restricted Boltzmann Machine) layers have been removed entirely. Consider using VariationalAutoencoder layers as a replacement (Link)
GravesBidirectionalLSTM has been deprecated; use new Bidirectional(Bidirectional.Mode.ADD, new GravesLSTM.Builder()....build())) instead
Previously deprecated WordVectorSerializer methods have now been removed (Link)
Removed deeplearning4j-ui-remote-iterationlisteners module and obsolete RemoteConvolutionalIterationListener (Link)

Deeplearning4J: 1.0.0-alpha Known Issues

Performance on some networks types may be reduced on CUDA compared to 0.9.1 (with workspaces configured). This will be addressed in the next release
Some issues have been noted with FP16 support on CUDA (Link)

Deeplearing4J: Keras Import

Keras 2 support, keeping backward compatibility for keras 1
Keras 2 and 1 import use exact same API and are inferred by DL4J
Keras unit test coverage increased by 10x, many more real-world integration tests
Unit tests for importing and checking layer weights
Leaky ReLU, ELU, SELU support for model import
All Keras layers can be imported with optional bias terms
Old deeplearning4j-keras module removed, old "Model" API removed
All Keras initializations (Lecun normal, Lecun uniform, ones, zeros, Orthogonal, VarianceScaling, Constant) supported
1D convolution and pooling supported in DL4J and Keras model import
Atrous Convolution 1D and 2D layers supported in Keras model import
1D Zero padding layers supported
Keras constraints module fully supported in DL4J and model import
Upsampling 1D and 2D layers in DL4J and Keras model import (including GAN examples in tests)
Most merge modes supported in Keras model import, Keras 2 Merge layer API supported
Separable Convolution 2D layer supported in DL4J and Keras model import
Deconvolution 2D layer supported in DL4J and Keras model import
Full support of Keras noise layers on import (Alpha dropout, Gaussian dropout and noise)
Support for SimpleRNN layer in Keras model import
Support for Bidirectional layer wrapper Keras model import
Addition of LastTimestepVertex in DL4J to support return_sequences=False for Keras RNN layers.
DL4J support for recurrent weight initializations and Keras import integration.
SpaceToBatch and BatchToSpace layers in DL4J for better YOLO support, plus end-to-end YOLO Keras import test.
Cropping2D support in DL4J and Keras model import

Deeplearning4J: Keras Import - API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

In 0.9.1 deprecated Model and ModelConfiguration have been permanently removed. Use KerasModelImport instead, which is now the only entry point for Keras model import.

Deeplearning4J: Keras Import - Known Issues

Embedding layer: In DL4J the output of an embedding layer is 2D by default, unless preprocessors are specified. In Keras the output is always 3D, but depending on specified parameters can be interpreted as 2D. This often leads to difficulties when importing Embedding layers. Many cases have been covered and issues fixed, but inconsistencies remain.
Batchnormalization layer: DL4J's batch normalization layer is much more restrictive (in a good way) than Keras' version of it. For instance, DL4J only allows to normalize spatial dimensions for 4D convolutional inputs, while in Keras any axis can be used for normalization. Depending on the dimension ordering (NCHW vs. NHWC) and the specific configuration used by a Keras user, this can lead to expected (!) and unexpected import errors.
Support for importing a Keras model for training purposes in DL4J (enforceTrainingConfig == true) is still very limited and will be tackled properly for the next release.
Keras Merge layers: seem to work fine with the Keras functional API, but have issues when used in a Sequential model.
Reshape layers: can be somewhat unreliable on import. DL4J rarely has a need to explicitly reshape input beyond (inferred) standard input preprocessors. In Keras, Reshape layers are used quite often. Mapping the two paradigms can be difficult in edge cases.

ND4J

ND4J: New Features

Hundreds of new operations added
New DifferentialFunction api with automatic differentiation (see samediff section) Link
Technology preview of tensorflow import added (supports 1.4.0 and up)
Apache Arrow serialization added supporting new tensor API Link
Add support for AVX/AVX2 and AVX-512 instruction sets for Windows/Linux for nd4j-native backend Link
nVidia CUDA 8/9.0/9.1 now supported
Worskpaces improvements were introduced to ensure safety: SCOPE_PANIC profiling mode is enabled by default
FlatBuffers support for INDArray serde
Support for auto-broadcastable operations was added
libnd4j, underlying c++ library, got functionality boost and now offers: NDArray class, Graph class, and can be used as standalone library or executable.
Convolution-related ops now support NHWC in addition to NCHW data format.
Accumulation ops now have option to keep reduced dimensions.

ND4J: Known Issues

Not all op gradients implemented for automatic differentiation
Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

ND4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

ND4J - SameDiff

Initial tech preview Link
Control flow is supported with IF and WHILE primitives.

Alpha release of SameDiff auto-differentiation engine for ND4J.

Features

Two execution modes available: Java-driven execution, and Native execution for serialized graphs.
SameDiff graphs can be serialized using FlatBuffers
Building and running computation graphs build from SameDiff operations.
Graphs can run forward pass on input data and compute gradients for the backward pass.
Already supports many high-level layers, like dense layers, convolutions (1D-3D) deconvolutions, separable convolutions, pooling and upsampling, batch normalization, local response normalization, LSTMs and GRUs.
In total there are about 350 SameDiff operations available, including many basic operations used in building complex graphs.
Supports rudimentary import of TensorFlow and ONNX graphs for inference.
TFOpTests is a dedicated project for creating test resources for TensorFlow import.

Known Issues and Limitations

Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.
While many of the widely used base operations and high-level layers used in practice are supported, op coverage is still limited. Goal is to achieve feature parity with TensorFlow and fully support import for TF graphs.
Some of the existing ops do not have a backward pass implemented (called doDiff in SameDiff).

DataVec

DataVec: New Features

Added ObjectDetectionRecordReader - for use with DL4J's Yolo2OutputLayer (Link) (also supports image transforms: Link)
Added ImageObjectLabelProvider, VocLabelProvider and SvhnLabelProvider (Streetview house numbers) for use with ObjectDetectionRecordReader (Link, Link)
Added LocalTransformExecutor for single machine execution (without Spark dependency) (Link)
Added ArrowRecordReader (for reading Apache Arrow format data) (Link)
Added RecordMapper class for conversion between RecordReader and RecordWriter (Link)
RecordWriter and InputSplit APIs have been improved; more flexible and support for partitioning across all writers (Link, Link, Link)
Added ArrowWritableRecordBatch and NDArrayRecordBatch for efficient batch storage (List<List<Writable>>) (Link, Link)
Added BoxImageTransform - an ImageTransform that either crops or pads without changing aspect ratio (Link)
TransformProcess now has executeToSequence(List<Writable)), executeSequenceToSingle(List<List<Writable>>) and executeToSequenceBatch(List<List<Writable>>) methods (Link, Link)
Added CSVVariableSlidingWindowRecordReader (Link)
ImageRecordReader: supports regression use cases for labels (previously: only classification) (Link)
ImageRecordReader: supports multi-class and multi-label image classification (via PathMultiLabelGenerator interface) (Link, Link)
DataAnalysis/AnalyzeSpark now includes quantiles (via t-digest) (Link)
Added AndroidNativeImageLoader.asBitmap(), Java2DNativeImageLoader.asBufferedImage() (Link)
Add new RecordReader / SequenceRecordReader implementations:
- datavec-excel module and ExcelRecordReader (Link)
- JacksonLineRecordReader (Link)
- ConcatenatingRecordReader (Link)
Add new transforms:
- TextToTermIndexSequenceTransform (Link)
- ConditionalReplaceValueTransformWithDefault (Link)
- GeographicMidpointReduction (Link)
StringToTimeTransform will con try to guess time format if format isn't provided (Link)
Improved performance for NativeImageLoader on Android (Link)
Added BytesWritable (Writable for byte[] data) (Link)
Added TranformProcess.inferCategories methods to auto-infer categories from a RecordReader (Link)

DataVec: Fixes

Lombok is no longer included as a transitive dependency (Link)
MapFileRecordReader and MapFileSequenceRecordReader can handle empty partitions/splits for multi-part map files (Link)
CSVRecordReader is now properly serializable using Java serialization (Link) and Kryo serialization (Link)
Writables: equality semantics have been changed: for example, now DoubleWritable(1.0) is equal to IntWritable(1) (Link)
NumberedFileInputSplit now supports leading zeros (Link)
CSVSparkTransformServer and ImageSparkTransformServer Play severs changed to production mode (Link)
Fix for JSON subtype info for FloatMetaData (Link)
Serialization fixes for JacksonRecordReader, RegexSequenceRecordReader (Link)
Added RecordReader.resetSupported() method (Link)
SVMLightRecordReader now implements nextRecord() method (Link)
Fix for custom reductions when using conditions (Link)
SequenceLengthAnalysis is now serializable (Link) and supports to/from JSON (Link)
Fixes for FFT functionality (Link, Link)
Remove use of backported java.util.functions; use ND4J functions API instead (Link)
Fix for transforms data quality analysis for time columns (Link)

DataVec: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

Many of the util classes (in org.datavec.api.util mainly) have been deprecated or removed; use equivalently named util clases in nd4j-common module (Link)
RecordReader.next(int) method now returns List<List<Writable>> for batches, not List<Writable>. See also NDArrayRecordBatch
RecordWriter and SequenceRecordWriter APIs have been updated with multiple new methods

Arbiter

Arbiter: New Features

Workspace support added (Link, Link)
Added new layer spaces: LSTM, CenterLoss, Deconvolution2D, LossLayer, Bidirectional layer wrapper (Link, Link)
As per DL4J API changes: Updater configuration options (learning rate, momentum, epsilon, rho etc) have been moved to ParameterSpace instead. Updater spaces (AdamSpace, AdaGradSpace etc) introduced (Link)
As per DL4J API changes: Dropout configuration is now via ParameterSpace<IDropout>, DropoutSpace introduced (Link)
RBM layer spaces removed (Link)
ComputationGraphSpace: added layer/vertex methods with overloads for preprocessors (Link)
Added support to specify 'fixed' layers using DL4J layers directly (instead of using LayerSpaces, even for layers without hyperparameters) (Link)
Added LogUniformDistribution (Link)
Improvements to score functions; added ROC score function (Link)
Learning rate schedule support added (Link)
Add math ops for ParameterSpace<Double> and ParameterSpace<Integer> (Link)

Arbiter: Fixes

Fix parallel job execution (when using multiple execution threads) (Link, Link)
Improved logging for failed task execution (Link)
Fix for UI JSON serialization (Link)
Fix threading issues when running on CUDA and multiple execution threads (Link, Link, Link)
Rename saved model file to model.bin (Link)
Fix threading issues with non thread-safe candidates / parameter spaces (Link)
Lombok is no longer included as a transitive dependency (Link)

Arbiter: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

As per DL4J updater API changes: old updater configuration (learningRate, momentum, etc) methods have been removed. Use .updater(IUpdater) or .updater(ParameterSpace<IUpdater>) methods instead

RL4J

Add support for LSTM layer to A3C
Fix A3C to make it actually work using new ActorCriticLoss and correct use of randomness
Fix cases when QLearning would fail (non-flat input, incomplete serialization, incorrect normalization)
Fix logic of HistoryProcessor with async algorithms and failures when preprocessing images
Tidy up and correct the output of statistics, also allowing the use of IterationListener
Fix issues preventing efficient execution with CUDA
Provide access to more of the internal structures with NeuralNet.getNeuralNetworks(), Policy.getNeuralNet(), and convenience constructors for Policy
Add MDPs for ALE (Arcade Learning Environment) and MALMO to support Atari games and Minecraft
Update MDP for Doom to allow using the latest version of VizDoom

ScalNet

First release of ScalNet Scala API, which closely resembles Keras' API.
Can be built with sbt and maven.
Supports both Keras inspired Sequential models, corresponding to DL4J's MultiLayerNetwork, and Model, corresponding to ComputationGraph.
Project structure is closely aligned to both DL4J model-import module and Keras.
Supports the following layers: Convolution2D, Dense, EmbeddingLayer, AvgPooling2D, MaxPooling2D, GravesLSTM, LSTM, Bidirectional layer wrapper, Flatten, Reshape. Additionally, DL4J OutputLayers are supported.

ND4S

Scala 2.12 support

Version 0.9.1

Deeplearning4J

Fixed issue with incorrect version dependencies in 0.9.0
Added EmnistDataSetIterator Link
Numerical stability improvements to LossMCXENT / LossNegativeLogLikelihood with softmax (should reduce NaNs with very large activations)

ND4J

Added runtime version checking for ND4J, DL4J, RL4J, Arbiter, DataVec Link

Known Issues

Deeplearning4j: Use of Evaluation class no-arg constructor (i.e., new Evaluation()) can result in accuracy/stats being reported as 0.0. Other Evaluation class constructors, and ComputationGraph/MultiLayerNetwork.evaluate(DataSetIterator) methods work as expected.
- This also impacts Spark (distributed) evaluation: workaround is to replace sparkNet.evaluate(testData); with sparkNet.doEvaluation(testData, 64, new Evaluation(10))[0];, where 10 is the number of classes and 64 in the evaluation minibatch size to use.
SequenceRecordReaderDataSetIterator applies preprocessors (such as normalization) twice to each DataSet (possible workaround: use RecordReaderMultiDataSetIterator + MultiDataSetWrapperIterator)
TransferLearning: ComputationGraph may incorrectly apply l1/l2 regularization (defined in FinetuneConfiguration) to frozen layers. Workaround: set 0.0 l1/l2 on FineTuneConfiguration, and required l1/l2 on new/non-frozen layers directly. Note that MultiLayerNetwork with TransferLearning appears to be unaffected.

Version 0.9.0

Deeplearning4J

Workspaces feature added (faster training performance + less memory) Link
SharedTrainingMaster added for Spark network training (improved performance) Link 1, Link 2
ParallelInference added - wrapper that server inference requests using internal batching and queues Link
ParallelWrapper now able to work with gradients sharing, in addition to existing parameters averaging mode Link
VPTree performance significantly improved
CacheMode network configuration option added - improved CNN and LSTM performance at the expense of additional memory use Link
LSTM layer added, with CuDNN support Link (Note that the existing GravesLSTM implementation does not support CuDNN)
New native model zoo with pretrained ImageNet, MNIST, and VGG-Face weights Link
Convolution performance improvements, including activation caching
Custom/user defined updaters are now supported Link
Evaluation improvements
- EvaluationBinary, ROCBinary classes added: for evaluation of binary multi-class networks (sigmoid + xent output layers) Link
- Evaluation and others now have G-Measure and Matthews Correlation Coefficient support; also macro + micro-averaging support for Evaluation class metrics Link
- ComputationGraph and SparkComputationGraph evaluation convenience methods added (evaluateROC, etc)
- ROC and ROCMultiClass support exact calculation (previous: thresholded calculation was used) Link
- ROC classes now support area under precision-recall curve calculation; getting precision/recall/confusion matrix at specified thresholds (via PrecisionRecallCurve class) Link
- RegressionEvaluation, ROCBinary etc now support per-output masking (in addition to per-example/per-time-step masking)
- EvaluationCalibration added (residual plots, reliability diagrams, histogram of probabilities) Link 1 Link 2
- Evaluation and EvaluationBinary: now supports custom classification threshold or cost array Link
Optimizations: updaters, bias calculation
Network memory estimation functionality added. Memory requirements can be estimated from configuration without instantiating networks Link 1 Link 2
New loss functions:
- Mixture density loss function Link
- F-Measure loss function Link

ND4J

Workspaces feature added Link
Native parallel sort was added
New ops added: SELU/SELUDerivative, TAD-based comparisons, percentile/median, Reverse, Tan/TanDerivative, SinH, CosH, Entropy, ShannonEntropy, LogEntropy, AbsoluteMin/AbsoluteMax/AbsoluteSum, Atan2
New distance functions added: CosineDistance, HammingDistance, JaccardDistance

DataVec

MapFileRecordReader and MapFileSequenceRecordReader added Link 1 Link 2
Spark: Utilities to save and load JavaRDD<List<Writable>> and JavaRDD<List<List<Writable>> data to Hadoop MapFile and SequenceFile formats Link
TransformProcess and Transforms now support NDArrayWritables and NDArrayWritable columns
Multiple new Transform classes

Arbiter

Arbiter UI: Link
- UI now uses Play framework, integrates with DL4J UI (replaces Dropwizard backend). Dependency issues/clashing versions fixed.
- Supports DL4J StatsStorage and StatsStorageRouter mechanisms (FileStatsStorage, Remote UI via RemoveUIStatsStorageRouter)
- General UI improvements (additional information, formatting fixes)

0.8.0 -> 0.9.0 Transition Notes

Deeplearning4j

Updater configuration methods such as .momentum(double) and .epsilon(double) have been deprecated. Instead: use .updater(new Nesterovs(0.9)) and .updater(Adam.builder().beta1(0.9).beta2(0.999).build()) etc to configure

DataVec

CsvRecordReader constructors: now uses characters for delimiters, instead of Strings (i.e., ',' instead of ",")

Arbiter

Arbiter UI is now a separate module, with Scala version suffixes: arbiter-ui_2.10 and arbiter-ui_2.11

Version 0.8.0

Added transfer learning API Link
Spark 2.0 support (DL4J and DataVec; see transition notes below)
New layers
- Global pooling (aka "pooling over time"; usable with both RNNs and CNNs) Link
- Center loss output layer Link
- 1D Convolution and subsampling layers Link Link2
- ZeroPaddingLayer Link
New ComputationGraph vertices
- L2 distance vertex
- L2 normalization vertex
Per-output masking is now supported for most loss functions (for per output masking, use a mask array equal in size/shape to the labels array; previous masking functionality was per-example for RNNs)
L1 and L2 regularization can now be configured for biases (via l1Bias and l2Bias configuration options)
Evaluation improvements:
- DL4J now has an IEvaluation class (that Evaluation, RegressionEvaluation, etc all implement. Also allows custom evaluation on Spark) Link
- Added multi-class (one vs. all) ROC: ROCMultiClass Link
- For both MultiLayerNetwork and SparkDl4jMultiLayer: added evaluateRegression, evaluateROC, evaluateROCMultiClass convenience methods
- HTML export functionality added for ROC charts Link
- TSNE re-added to new UI
- Training UI: now usable without an internet connection (no longer relies on externally hosted fonts)
- UI: improvements to error handling for ‘no data’ condition
Epsilon configuration now used for Adam and RMSProp updaters
Fix for bidirectional LSTMs + variable-length time series (using masking)
Added CnnSentenceDataSetIterator (for use with ‘CNN for Sentence Classification’ architecture) Link Link2
Spark + Kryo: now test serialization + throw exception if misconfigured (instead of logging an error that can be missed)
MultiLayerNetwork now adds default layer names if no name is specified
DataVec:
- JSON/YAML support for DataAnalysis, custom Transforms etc
- ImageRecordReader refactored to reduce garbage collection load (hence improve performance with large training sets)
- Faster quality analysis.
Arbiter: added new layer types to match DL4J
- Performance improvement for Word2Vec/ParagraphVectors tokenization & training.
Batched inference introduced for ParagraphVectors
Nd4j improvements
- New native operations available for ND4j: firstIndex, lastIndex, remainder, fmod, or, and, xor.
- OpProfiler NAN_PANIC & INF_PANIC now also checks result of BLAS calls.
- Nd4.getMemoryManager() now provides methods to tweak GC behavior.
Alpha version of parameter server for Word2Vec/ParagraphVectors were introduced for Spark. Please note: It’s not recommended for production use yet.
Performance improvements for CNN inference

0.7.2 -> 0.8.0 Transition Notes

Spark versioning schemes: with the addition of Spark 2 support, the versions for Deeplearning4j and DataVec Spark modules has changed
- For Spark 1: use <version>0.8.0_spark_1</version>
- For Spark 2: use <version>0.8.0_spark_2</version>
- Also note: Modules with Spark 2 support are released with Scala 2.11 support only. Spark 1 modules are released with both Scala 2.10 and 2.11 support

0.8.0 Known Issues (At Launch)

UI/CUDA/Linux issue: Link
Dirty shutdown on JVM exit is possible for CUDA backend sometimes: Link
Issues with RBM implementation Link
Keras 1D convolutional and pooling layers cannot be imported yet. Will be supported in forthcoming release.
Keras v2 model configurations cannot be imported yet. Will be supported in forthcoming release.

Version 0.7.2

Added variational autoencoder Link
Activation function refactor
- Activation functions are now an interface Link
- Configuration now via enumeration, not via String (see examples - Link)
- Custom activation functions now supported Link
- New activation functions added: hard sigmoid, randomized leaky rectified linear units (RReLU)
Multiple fixes/improvements for Keras model import
Added P-norm pooling for CNNs (option as part of SubsamplingLayer configuration)
Iteration count persistence: stored/persisted properly in model configuration + fixes to learning rate schedules for Spark network training
LSTM: gate activation function can now be configured (previously: hard-coded to sigmoid)
UI:
- Added Chinese translation
- Fixes for UI + pretrain layers
- Added Java 7 compatible stats collection compatibility Link
- Improvements in front-end for handling NaNs
- Added UIServer.stop() method
- Fixed score vs. iteration moving average line (with subsampling)
Solved Jaxb/Jackson issue with Spring Boot based applications
RecordReaderDataSetIterator now supports NDArrayWritable for the labels (set regression == true; used for multi-label classification + images, etc)

0.7.1 -> 0.7.2 Transition Notes

Activation functions (built-in): now specified using Activation enumeration, not String (String-based configuration has been deprecated)

Version 0.7.1

RBM and AutoEncoder key fixes:
- Ensured visual bias updated and applied during pretraining.
- RBM HiddenUnit is the activation function for this layer; thus, established derivative calculations for backprop according to respective HiddenUnit.
RNG performance issues fixed for CUDA backend
OpenBLAS issues fixed for macOS, powerpc, linux.
DataVec is back to Java 7 now.
Multiple minor bugs fixed for ND4J/DL4J

Version 0.7.0

UI overhaul: new training UI has considerably more information, supports persistence (saving info and loading later), Japanese/Korean/Russian support. Replaced Dropwizard with Play framework. Link
Import of models configured and trained using Keras
- Imports both Keras model configurations and stored weights
- Supported models: Sequential models
- Supported layers: Dense, Dropout, Activation, Convolution2D, MaxPooling2D, LSTM
Added ‘Same’ padding more for CNNs (ConvolutionMode network configuration option) Link
Weighted loss functions: Loss functions now support a per-output weight array (row vector)
ROC and AUC added for binary classifiers Link
Improved error messages on invalid configuration or data; improved validation on both
Added metadata functionality: track source of data (file, line number, etc) from data import to evaluation. Loading a subset of examples/data from this metadata is now supported. Link
Removed Jackson as core dependency (shaded); users can now use any version of Jackson without issue
Added LossLayer: version of OutputLayer that only applies loss function (unlike OutputLayer: it has no weights/biases)
Functionality required to build triplet embedding model (L2 vertex, LossLayer, Stack/Unstack vertices etc)
Reduced DL4J and ND4J ‘cold start’ initialization/start-up time
Pretrain default changed to false and backprop default changed to true. No longer needed to set these when setting up a network configuration unless defaults need to be changed.
Added TrainingListener interface (extends IterationListener). Provides access to more information/state as network training occurs Link
Numerous bug fixes across DL4J and ND4J
Performance improvements for nd4j-native & nd4j-cuda backends
Standalone Word2Vec/ParagraphVectors overhaul:
- Performance improvements
- ParaVec inference available for both PV-DM & PV-DBOW
- Parallel tokenization support was added, to address computation-heavy tokenizers.
Native RNG introduced for better reproducibility within multi-threaded execution environment.
Additional RNG calls added: Nd4j.choice(), and BernoulliDistribution op.
Off-gpu storage introduced, to keep large things, like Word2Vec model in host memory. Available via WordVectorSerializer.loadStaticModel()
Two new options for performance tuning on nd4j-native backend: setTADThreshold(int) & setElementThreshold(int)

0.6.0 -> 0.7.0 Transition Notes

Notable changes for upgrading codebases based on 0.6.0 to 0.7.0:

UI: new UI package name is deeplearning4j-ui_2.10 or deeplearning4j-ui_2.11 (previously: deeplearning4j-ui). Scala version suffix is necessary due to Play framework (written in Scala) being used now.
Histogram and Flow iteration listeners deprecated. They are still functional, but using new UI is recommended Link
DataVec ImageRecordReader: labels are now sorted alphabetically by default before assigning an integer class index to each - previously (0.6.0 and earlier) they were according to file iteration order. Use .setLabels(List) to manually specify the order if required.
CNNs: configuration validation is now less strict. With new ConvolutionMode option, 0.6.0 was equivalent to ‘Strict’ mode, but new default is ‘Truncate’
- See ConvolutionMode javadoc for more details: Link
Xavier weight initialization change for CNNs and LSTMs: Xavier now aligns better with original Glorot paper and other libraries. Xavier weight init. equivalent to 0.6.0 is available as XAVIER_LEGACY
DataVec: Custom RecordReader and SequenceRecordReader classes require additional methods, for the new metadata functionality. Refer to existing record reader implementations for how to implement these methods.
Word2Vec/ParagraphVectors:
- Few new builder methods:
  - allowParallelTokenization(boolean)
  - useHierarchicSoftmax(boolean)
- Behaviour change: batchSize: now batch size is ALSO used as threshold to execute number of computational batches for sg/cbow

Version 0.6.0

Custom layer support
Support for custom loss functions
Support for compressed INDArrays, for memory saving on huge data
Native support for BooleanIndexing where applicable
Initial support for combined operations on CUDA
Significant performance improvements on CPU & CUDA backends
Better support for Spark environments using CUDA & cuDNN with multi-gpu clusters
New UI tools: FlowIterationListener and ConvolutionIterationListener, for better insights of processes within NN.
Special IterationListener implementation for performance tracking: PerformanceListener
Inference implementation added for ParagraphVectors, together with option to use existing Word2Vec model
Severely decreased file size on the deeplearnning4j api
nd4j-cuda-8.0 backend is available now for cuda 8 RC
Added multiple new built-in loss functions
Custom preprocessor support
Performance improvements to Spark training implementation
Improved network configuration validation using InputType functionality

Version 0.5.0

FP16 support for CUDA
Better performance for multi-gpu
Including optional P2P memory access support
Normalization support for time series and images
Normalization support for labels
Removal of Canova and shift to DataVec: Javadoc, Github Repo
Numerous bug fixes
Spark improvements

Version 0.4.0

Initial multi-GPU support viable for standalone and Spark.
Refactored the Spark API significantly
Added CuDNN wrapper
Performance improvements for ND4J
Introducing DataVec: Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)
New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator
New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively
New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs
Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph
Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint