arrow-left

Only this pageAll pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 159 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

EN 1.0.0-beta7

Loading...

Getting Started

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Configuration

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Models

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Model Zoo

Loading...

Loading...

ND4J

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

SAMEDIFF

Loading...

Loading...

Loading...

Loading...

ND4J & SameDiff Ops

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Tuning & Training

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Keras Import

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Untitled

Eclipse DeepLearning4J

circle-info

如果您希望阅读中文文档,请查看中文文档arrow-up-right。

Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs.

Distributed

DL4J takes advantage of the latest distributed computing frameworks including Apache Spark and Hadoop to accelerate training. On multi-GPUs, it is equal to Caffe in performance.

Open Source

The libraries are completely open-source, Apache 2.0, and maintained by the developer community and Konduit team.

JVM/Python/C++

Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure or Kotlin. The underlying computations are written in C, C++ and Cuda. Keras will serve as the Python API.

hashtag
What's included?

Deep neural nets are capable of record-breaking accuracy. For a quick neural net introduction, please visit our page. In a nutshell, Deeplearning4j lets you compose deep neural nets from various shallow nets, each of which form a so-called `layer`. This flexibility lets you combine variational autoencoders, sequence-to-sequence autoencoders, convolutional nets or recurrent nets as needed in a distributed, production-grade framework that works with Spark and Hadoop on top of distributed CPUs or GPUs.

There are a lot of parameters to adjust when you're training a deep-learning network. We've done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers.

overview

Tutorials

Deeplearning4j Tutorials

While Deeplearning4j is written in Java, the Java Virtual Machine (JVM) lets you import and share code in other JVM languages. These tutorials are written in Scala, the de facto standard for data science in the Java environment. There’s nothing stopping you from using any other interpreter such as Java, Kotlin, or Clojure.

If you’re coming from non-JVM languages like Python or R, you may want to read about how the JVM works before using these tutorials. Knowing the basic terms such as classpath, virtual machine, “strongly-typed” languages, and functional programming will help you debug, as well as expand on the knowledge you gain here. If you don’t know Scala and want to learn it, Coursera has a great course named Functional Programming Principles in Scalaarrow-up-right.

circle-exclamation

The tutorials are currently being reworked. You will likely find stumbling points. If you need any support while working through them, feel free to ask questions on .

hashtag
Tutorials covering basic DL4J features

hashtag
End to End Tutorials showing specific solutions

https://community.konduit.ai/arrow-up-right
Quickstart with MNISTchevron-right
MultiLayerNetwork And ComputationGraphchevron-right
Logistic Regressionchevron-right
Built-in Data Iteratorschevron-right
Feed Forward Networkschevron-right
Basic Autoencoderchevron-right
Advanced Autoencoderchevron-right
Convolutional Networkschevron-right
Recurrent Networkschevron-right
Early Stoppingchevron-right
Layers and Preprocessorschevron-right
Hyperparameter Optimizationchevron-right
Using Multiple GPUschevron-right
Clinical Time Series LSTMchevron-right
Sea Temperature Convolutional LSTMchevron-right
Sea Temperature Convolutional LSTM 2chevron-right
Instacart Multitask Examplechevron-right
Instacart Single Task Examplechevron-right
Cloud Detection Examplechevron-right

Quickstart with MNIST

Deeplearning4j - also known as “DL4J” - is a high performance domain-specific language to configure deep neural networks, which are made of multiple layers. Deeplearning4j is open sourcearrow-up-right, written in C++, Java, Scala, and Python, and maintained by the Eclipse Foundation & community contributors.

hashtag
Before you get started

If you are having difficulty, we recommend you join our community forumsarrow-up-right. There you can request help and give feedback, but please do use this guide before asking questions we’ve answered below. If you are new to deep learning, we’ve included a road map for beginners with links to courses, readings and other resources. For a longer and more detailed version of this guide, please visiting the Deeplearning4j Getting Started guide.

hashtag
Handwriting classification

In this quickstart, you will create a deep neural network using Deeplearning4j and train a model capable of classifying random handwriting digits. While handwriting recognition has been attempted by different machine learning algorithms over the years, deep learning performs remarkably well and achieves an accuracy of over 99.7% on the dataset. For this tutorial, we will classify digits in , the “next generation” of MNIST and a larger dataset.

hashtag
What you will learn

  1. Load a dataset for a neural network.

  2. Format EMNIST for image recognition.

  3. Create a deep neural network.

hashtag
Prepare your workspace

Like most programming languages, you need to explicitly import the classes you want to use into scope. Below, we will import common Deeplearning4j classes that will help us configure and train a neural network. The code below is written in Scala.

Note we import methods from Scala’s JavaConversions class because this will allow us to use native Scala collections while maintaining compatibility with Deeplearning4j’s Java collections.

hashtag
Prepare data for loading

Dataset iterators are important pieces of code that help batch and iterate across your dataset for training and inferring with neural networks. Deeplearning4j comes with a built-in implementation of a BaseDatasetIterator for EMNIST known as EmnistDataSetIterator. This particular iterator is a convenience utility that handles downloading and preparation of data.

Note that we create two different iterators below, one for training data and the other for for evaluating the accuracy of our model after training. The last boolean parameter in the constructor indicates whether we are instantiating test/train.

You won’t need it for this tutorial, you can learn more about loading data for neural networks in this . DL4J comes with many record readers that can load and convert data into ND-Arrays from CSVs, images, videos, audio, and sequences.

hashtag
Build the neural network

For any neural network you build in Deeplearning4j, the foundation is the NeuralNetConfiguration class. This is where you configure hyperparameters, the quantities that define the architecture and how the algorithm learns. Intuitively, each hyperparameter is like one ingredient in a meal, a meal that can go very right, or very wrong… Luckily, you can adjust hyperparameters if they don’t produce the right results.

The list() method specifies the number of layers in the net; this function replicates your configuration n times and builds a layerwise configuration.

hashtag
So what exactly is the hidden layer?

Each node (the circles) in the hidden layer represents a feature of a handwritten digit in the MNIST dataset. For example, imagine you are looking at the number 6. One node may represent rounded edges, another node may represent the intersection of curly lines, and so on and so forth. Such features are weighted by importance by the model’s coefficients, and recombined in each hidden layer to help predict whether the handwritten number is indeed 6. The more layers of nodes you have, the more complexity and nuance they can capture to make better predictions.

You could think of a layer as “hidden” because, while you can see the input entering a net, and the decisions coming out, it’s difficult for humans to decipher how and why a neural net processes data on the inside. The parameters of a neural net model are simply long vectors of numbers, readable by machines.

hashtag
Train the model

Now that we’ve built a NeuralNetConfiguration, we can use the configuration to instantiate a MultiLayerNetwork. When we call the init() method on the network, it applies the chosen weight initialization across the network and allows us to pass data to train. If we want to see the loss score during training, we can also pass a listener to the network.

An instantiated model has a fit() method that accepts a dataset iterator (an iterator that extends BaseDatasetIterator), a single DataSet, or an ND-Array (an implementation of INDArray). Since our EMNIST iterator already extends the iterator base class, we can pass it directly to fit. If we want to train for multiple epochs, put the number of total epochs in the second argument of fit() method.

hashtag
Evaluate the model

Deeplearning4j exposes several tools to of a model. You can perform basic evaluation and get metrics such as precision and accuracy, or use a Receiver Operating Characteristic (ROC). Note that the general ROC class works for binary classifiers, whereas ROCMultiClass is meant for classifiers such as the model we are building here.

A MultiLayerNetwork conveniently has a few methods built-in to help us perform evaluation. You can pass a dataset iterator with your testing/validation data to an evaluate() method.

hashtag
What’s next

Now that you’ve learned how to get started and train your first model, head to the Deeplearning4j website to see that await you. Learn how to build dataset iterators, train a facial recognition network like FaceNet, and more.

Quickstart

Quickstart for Java using Maven

hashtag
Get started

This is everything you need to run DL4J examples and begin your own projects.

We recommend that you join our community forumarrow-up-right. There you can request help and give feedback, but please do use this guide before asking questions we've answered below. If you are new to deep learning, we've included a road map for beginners with links to courses, readings and other resources.

circle-info

We are currently reworking the Getting Started Guide.

If you find that you have trouble following along here, take a look at the Konduit blog, as it features .

hashtag
A Taste of Code

Deeplearning4j is a domain-specific language to configure deep neural networks, which are made of multiple layers. Everything starts with a MultiLayerConfiguration, which organizes those layers and their hyperparameters.

Hyperparameters are variables that determine how a neural network learns. They include how many times to update the weights of the model, how to initialize those weights, which activation function to attach to the nodes, which optimization algorithm to use, and how fast the model should learn. This is what one configuration would look like:

With Deeplearning4j, you add a layer by calling layer on the NeuralNetConfiguration.Builder(), specifying its place in the order of layers (the zero-indexed layer below is the input layer), the number of input and output nodes, nIn and nOut, as well as the type: DenseLayer.

Once you've configured your net, you train the model with model.fit.

hashtag
Prerequisites

  • 1.7 or later (Only 64-Bit versions supported)

  • (automated build and dependency manager)

  • or Eclipse

You should have these installed to use this QuickStart guide. DL4J targets professional Java developers who are familiar with production deployments, IDEs and automated build tools. Working with DL4J will be easiest if you already have experience with these.

If you are new to Java or unfamiliar with these tools, read the details below for help with installation and setup. Otherwise, skip to .

hashtag

If you don't have Java 1.7 or later, download the current . To check if you have a compatible version of Java installed, use the following command:

Please make sure you have a 64-Bit version of java installed, as you will see an error telling you no jnind4j in java.library.path if you decide to try to use a 32-Bit version instead. Make sure the JAVA_HOME environment variable is set.

hashtag

Maven is a dependency management and automated build tool for Java projects. It works well with IDEs such as IntelliJ and lets you install DL4J project libraries easily. to the latest release following for your system. To check if you have the most recent version of Maven installed, enter the following:

If you are working on a Mac, you can simply enter the following into the command line:

Maven is widely used among Java developers and it's pretty much mandatory for working with DL4J. If you come from a different background, and Maven is new to you, check out and our , which includes some additional troubleshooting tips. such as Ivy and Gradle can also work, but we support Maven best.

hashtag

An Integrated Development Environment () allows you to work with our API and configure neural networks in a few steps. We strongly recommend using , which communicates with Maven to handle dependencies. The is free.

There are other popular IDEs such as and . However, IntelliJ is preferred, and using it will make finding help on the easier if you need it.

hashtag

Install the . If you already have Git, you can update to the latest version using Git itself:

The latest version of Mac's Mojave OS breaks git, producing the following error message:

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

This can be fixed by running:

hashtag

  1. Use the command line to enter the following:

  2. Open IntelliJ and choose Import Project. Then select the main 'dl4j-examples' directory. (Note: the example in the illustration below refers to an outdated repository named dl4j-0.4-examples. However, the repository that you will download and install will be called dl4j-examples).

  3. Choose 'Import project from external model' and ensure that Maven is selected.

hashtag
Using DL4J In Your Own Projects: Configuring the POM.xml File

To run DL4J in your own projects, we highly recommend using Maven for Java users, or a tool such as SBT for Scala. The basic set of dependencies and their versions are shown below. This includes:

  • deeplearning4j-core, which contains the neural network implementations

  • nd4j-native-platform, the CPU version of the ND4J library that powers DL4J

  • datavec-api - Datavec is our library vectorizing and loading data

Every Maven project has a POM file. Here is when you run your examples.

Within IntelliJ, you will need to choose the first Deeplearning4j example you're going to run. We suggest MLPClassifierLinear, as you will almost immediately see the network classify two groups of data in our UI. The file on .

To run the example, right click on it and select the green button in the drop-down menu. You will see, in IntelliJ's bottom window, a series of scores. The rightmost number is the error score for the network's classifications. If your network is learning, then that number will decrease over time with each batch it processes. At the end, this window will tell you how accurate your neural-network model has become:

In another window, a graph will appear, showing you how the multilayer perceptron (MLP) has classified the data in the example. It will look like this:

Congratulations! You just trained your first neural network with Deeplearning4j.

hashtag
Next Steps

  1. Join our community forums on .

  2. Read the .

  3. Check out the more detailed .

circle-info

Python folks: If you plan to run benchmarks on Deeplearning4j comparing it to well-known Python framework [x], please read on how to optimize heap space, garbage collection and ETL on the JVM. By following them, you will see at least a 10x speedup in training time.

hashtag
Additional links

hashtag
Troubleshooting

Q: I'm using a 64-Bit Java on Windows and still get the no jnind4j in java.library.path error

A: You may have incompatible DLLs on your PATH. To tell DL4J to ignore those, you have to add the following as a VM parameter (Run -> Edit Configurations -> VM Options in IntelliJ):

Q: SPARK ISSUES I am running the examples and having issues with the Spark based examples such as distributed training or datavec transform options.

A: You may be missing some dependencies that Spark requires. See this for a discussion of potential dependency issues. Windows users may need the winutils.exe from Hadoop.

Download winutils.exe from and put it into the null/bin/winutils.exe (or create a hadoop folder and add that to HADOOP_HOME)

hashtag
Troubleshooting: Debugging UnsatisfiedLinkError on Windows

Windows users might be seeing something like:

If that is the issue, see . In this case replace with "Nd4jCpu".

hashtag
Quickstart template

Now that you've learned how to run the different examples, we've made a template available for you that has a basic MNIST trainer with simple evaluation code.

The Quickstart template is available at .

To use the template:

  1. Copy the standalone-sample-project from the examples and give it the name of your project.

  2. Import the folder into IntelliJ.

  3. Start coding!

hashtag
More about Eclipse Deeplearning4j

Deeplearning4j is a framework that lets you pick and choose with everything available from the beginning. We're not Tensorflow (a low-level numerical computing library with automatic differentiation) or Pytorch. Deeplearning4j has several subprojects that make it easy-ish to build end-to-end applications.

If you'd like to deploy models to production, you might like our .

Deeplearning4j has several submodules. These range from a visualization UI to distributed training on Spark. For an overview of these modules, please look at the .

To get started with a simple desktop app, you need two things: An and deeplearning4j-core. For more code, see the .

If you want a flexible deep-learning API, there are two ways to go. You can use nd4j standalone See our or the .

If you want distributed training on Spark, you can see our . Keep in mind that we cannot setup Spark for you. If you want to set up distributed Spark and GPUs, that is largely up to you. Deeplearning4j simply deploys as a JAR file on an existing Spark cluster.

If you want Spark with GPUs, we recommend .

If you want to deploy on mobile, you can see our .

We deploy optimized code for various hardware architectures natively. We use C++ based for loops just like everybody else. For that, please see our .

Deeplearning4j has two other notable components:

Deeplearning4j is meant to be an end-to-end platform for building real applications, not just a tensor library with automatic differentiation. If you want a tensor library with autodiff, please see ND4J and . Samediff is still in beta, but if you want to contribute, please join our .

Lastly, if you are benchmarking Deeplearnin4j, please consider coming in to our and asking for tips. Deeplearning4j has , but some may not work exactly like the Python frameworks do.

Custom Layers

Extend DL4J functionality for custom layers.

There are two components to adding a custom layer:

  1. Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer

  2. Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer

The configuration layer ((1) above) class handles the settings. It's the one you would use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom settings here, and use them in your layer.

Get Started

Getting started with model import.

Below is a demonstrating working code to load a Keras model into Deeplearning4j and validating the working network. Instructor Tom Hanlon provides an overview of a simple classifier over Iris data built in Keras with a Theano backend, and exported and loaded into Deeplearning4j:

If you have trouble viewing the video, please click here to .

The implementation layer ((2) above) class has parameters, and handles network forward pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...) method. In other words: the instantiate method is how we go from the configuration to the implementation; MultiLayerNetwork or ComputationGraph will call this method when initializing the

An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the implementation class). Both of these classes have extensive comments regarding their methods.

You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes, etc: the reason is because one is for the configuration, one is for the implementation. We have not followed this "same name" pattern here to hopefully avoid confusion.

hashtag
Testing Your Custom Layer

Once you have added a custom layer, it is necessary to run some tests to ensure it is correct.

These tests should at a minimum include the following:

  1. Tests to ensure that the JSON configuration (to/from JSON) works correctly

    This is necessary for networks with your custom layer to function with both

    model serialization (saving) and Spark training.

  2. Gradient checks to ensure that the implementation is correct.

hashtag
Example

A full custom layer example is available in our examples repositoryarrow-up-right.

Overview

All operations in ND4J and SameDiff are available in "Operation Namespaces". Each namespace is available on the Nd4j and SameDiff classes with its lowercase name.

For example, if you want to use the absoluteDifference operation it would look like this

hashtag
Namespaces

Bitwisechevron-rightLinalgchevron-rightMathchevron-rightRandomchevron-rightBaseOpschevron-rightCNNchevron-rightImagechevron-rightLosschevron-rightNNchevron-rightRNNchevron-right

// ND4J mode
INDArray output = Nd4j.loss.absoluteDifference(labels, predictions, null);

// SameDiff mode
SDVariable output = SameDiff.loss.absoluteDifference(labels, predictions, null);
Train a model.
  • Evaluate the performance of your model.

  • MNISTarrow-up-right
    EMNISTarrow-up-right
    ETL user guide
    evaluate the performance
    all the other tutorials
    MNIST Digits
    hidden layers
    Git
  • Continue through the wizard's options. Select the SDK that begins with jdk. (You may need to click on a plus sign to see your options...) Then click Finish. Wait a moment for IntelliJ to download all the dependencies. You'll see the horizontal bar working on the lower right.

  • Pick an example from the file tree on the left. Right-click the file to run.

  • some getting started guides from the communityarrow-up-right
    Java (developer version)
    Apache Maven
    IntelliJ IDEA
    DL4J Examples
    Java
    Java Development Kit (JDK) herearrow-up-right
    Apache Maven
    Install or update Mavenarrow-up-right
    their instructionsarrow-up-right
    Apache's Maven overviewarrow-up-right
    introduction to Maven for non-Java programmers
    Other build toolsarrow-up-right
    Paul Dubs' guide to mavenarrow-up-right
    Maven In Five Minutesarrow-up-right
    IntelliJ IDEA
    IDEarrow-up-right
    IntelliJarrow-up-right
    community edition of IntelliJarrow-up-right
    Eclipsearrow-up-right
    Netbeansarrow-up-right
    community forumsarrow-up-right
    Git
    latest version of Gitarrow-up-right
    DL4J Examples in a Few Easy Steps
    how the POM file should appeararrow-up-right
    Github can be found herearrow-up-right
    community.konduit.aiarrow-up-right
    introduction to deep neural networksarrow-up-right
    Comprehensive Setup Guidearrow-up-right
    these instructions
    Deeplearning4j artifacts on Maven Centralarrow-up-right
    ND4J artifacts on Maven Centralarrow-up-right
    Datavec artifacts on Maven Centralarrow-up-right
    Scala code for UCI notebookarrow-up-right
    Stack Overflow discussionarrow-up-right
    https://github.com/steveloughran/winutilsarrow-up-right
    this pagearrow-up-right
    https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-templatearrow-up-right
    model import from Kerasarrow-up-right
    Deeplearning4j examples on Githubarrow-up-right
    nd4j backendarrow-up-right
    simpler examples submodulearrow-up-right
    nd4j examplesarrow-up-right
    computation graph API
    Spark page
    Spark with Mesosarrow-up-right
    Android page
    C++ framework libnd4jarrow-up-right
    Arbiter: hyperparameter optimization and model evaluationarrow-up-right
    DataVec: built-in ETL for machine-learning data pipelinesarrow-up-right
    Samediffarrow-up-right
    community forumarrow-up-right
    community forumarrow-up-right
    all the knobsarrow-up-right
    video tutorialarrow-up-right
    view it on YouTubearrow-up-right

    About

    Facts and introduction to Eclipse Deeplearning4j, the top JVM deep learning framework.

    hashtag
    About Eclipse Deeplearning4j

    Eclipse Deeplearning4j is an open-source, distributed deep-learning project in Java and Scala spearheaded by the people at Konduitarrow-up-right, a business intelligence and enterprise software firm. We're a team of data scientists, deep-learning specialists, Java systems engineers and semi-sentient robots.

    There are a lot of knobs to turn when you're training a distributed deep-learning network. We've done our best to explain them, so that Eclipse Deeplearning4j can serve as a DIY tool for Java, Scala and Clojure programmers working on Hadoop and other file systems.

    hashtag
    Media

    Deeplearning4j has been featured in , , , , , and .

    hashtag
    Cite Eclipse Deeplearning4j

    If you plan to publish an academic paper and wish to cite Deeplearning4j, please use this format:

    Eclipse Deeplearning4j Development Team. Deeplearning4j: Open-source distributed deep learning for the JVM, Apache Software Foundation License 2.0.

    hashtag
    Supporters

    Profiling supported by .

    Multilayer Network

    Simple and sequential network configuration.

    The MultiLayerNetwork class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph.

    You will not want to use MultiLayerNetwork configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.

    hashtag
    Usage

    The example below shows how to build a simple linear classifier using DenseLayer (a basic multiperceptron layer).

    You can also create convolutional configurations:

    Maven

    Configure the Maven build tool for Deeplearning4j.

    hashtag
    Configuring the Maven build tool

    You can use Deeplearning4j with Maven by adding the following to your pom.xml:

    <dependencies>
      <dependency>
          <groupId>org.deeplearning4j</groupId>
          <artifactId>deeplearning4j-core</artifactId>
          <version>1.0.0-beta7</version>
      </dependency>
    </dependencies>

    The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

    hashtag
    Add a backend

    DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your pom.xml:

    You can also swap the standard CPU implementation for .

    SBT, Gradle, & Others

    Configure the build tools for Deeplearning4j.

    hashtag
    Configuring your build tool

    While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects.

    The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

    hashtag
    Gradle

    You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:

    Add a backend by adding the following:

    You can also swap the standard CPU implementation for .

    hashtag
    SBT

    You can use Deeplearning4j with SBT by adding the following to your build.sbt:

    Add a backend by adding the following:

    You can also swap the standard CPU implementation for .

    hashtag
    Ivy

    You can use Deeplearning4j with ivy by adding the following to your ivy.xml:

    Add a backend by adding the following:

    You can also swap the standard CPU implementation for .

    hashtag
    Leinengen

    Clojure programmers may want to use or to work with Maven. A .

    NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.

    Activations

    Supported Keras activations.

    We support all Keras activation functionsarrow-up-right, namely:

    • softmax

    • elu

    • selu

    • softplus

    • softsign

    • relu

    • tanh

    • sigmoid

    • hard_sigmoid

    • linear

    The mapping of Keras to DL4J activation functions is defined in

    Losses

    Supported Keras loss functions.

    DL4J supports all available Keras lossesarrow-up-right (except for logcosh), namely:

    • mean_squared_error

    • mean_absolute_error

    • mean_absolute_percentage_error

    • mean_squared_logarithmic_error

    • squared_hinge

    • hinge

    • categorical_hinge

    • logcosh

    • categorical_crossentropy

    • sparse_categorical_crossentropy

    • binary_crossentropy

    • kullback_leibler_divergence

    • poisson

    • cosine_proximity

    The mapping of Keras loss functions can be found in .

    Contribute

    How to contribute to the Eclipse Deeplearning4j source code.

    hashtag
    Prerequisites

    Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j . These include:

    • DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.

    Matrix Manipulation

    There are several other basic matrix manipulations to highlight as you learn ND4J’s workings.

    hashtag
    Transpose

    The transpose of a matrix is its mirror image. An element located in row 1, column 2, in matrix A will be located in row 2, column 1, in the transpose of matrix A, whose mathematical notation is A to the T, or A^T. Notice that the elements along the diagonal of a square matrix do not move – they are at the hinge of the reflection. In ND4J, transpose matrices like this:

    And a long matrix like this

    Elementwise Operations

    Elementwise operations are more intuitive than vectorwise operations, because the elements of one matrix map clearly onto the other, and to obtain the result, you have to perform just one arithmetical operation.

    With vectorwise matrix operations, you will have to first build intuition and also perform multiple steps. There are two basic types of matrix multiplication: inner (dot) product and outer product. The inner product results in a matrix of reduced dimensions, the outer product results in one of expanded dimensions. A helpful mnemonic: Expand outward, contract inward.

    hashtag
    Inner product

    CPU and AVX

    CPU and AVX support in ND4J/Deeplearning4j

    hashtag
    What is AVX, and why does it matter?

    AVX (Advanced Vector Extensions) is a set of CPU instructions for accelerating numerical computations. See for more details.

    Note that AVX only applies to nd4j-native (CPU) backend for x86 devices, not GPUs and not ARM/PPC devices.

    Why AVX matters: performance. You want to use the version of ND4J compiled with the highest level of AVX supported by your system.

    t-SNE Visualization

    Data visualizaiton with t-SNE with higher dimensional data.

    (t-SNE) is a data-visualization tool created by Laurens van der Maaten at Delft University of Technology.

    While it can be used for any data, t-SNE (pronounced Tee-Snee) is only really meaningful with labeled data, which clarify how the input is clustering. Below, you can see the kind of graphic you can generate in DL4J with t-SNE working on MNIST data.

    Look closely and you can see the numerals clustered near their likes, alongside the dots.

    Here's how t-SNE appears in Deeplearning4j code.

    Here is an image of the tsne-standard-coords.csv file plotted using gnuplot.

    Regularizers

    Supported Keras regularizers.

    All [Keras regularizers] are supported by DL4J model import:

    • l1

    • l2

    • l1_l2

    import scala.collection.JavaConversions._
    
    import org.deeplearning4j.datasets.iterator._
    import org.deeplearning4j.datasets.iterator.impl._
    import org.deeplearning4j.nn.api._
    import org.deeplearning4j.nn.multilayer._
    import org.deeplearning4j.nn.graph._
    import org.deeplearning4j.nn.conf._
    import org.deeplearning4j.nn.conf.inputs._
    import org.deeplearning4j.nn.conf.layers._
    import org.deeplearning4j.nn.weights._
    import org.deeplearning4j.optimize.listeners._
    import org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator
    import org.nd4j.evaluation.classification._
    
    import org.nd4j.linalg.learning.config._ // for different updaters like Adam, Nesterovs, etc.
    import org.nd4j.linalg.activations.Activation // defines different activation functions like RELU, SOFTMAX, etc.
    import org.nd4j.linalg.lossfunctions.LossFunctions // mean squared error, multiclass cross entropy, etc.
    import org.deeplearning4j.datasets.iterator.impl.EmnistDataSetIterator
    
    val batchSize = 128 // how many examples to simultaneously train in the network
    val emnistSet = EmnistDataSetIterator.Set.BALANCED
    val emnistTrain = new EmnistDataSetIterator(emnistSet, batchSize, true)
    val emnistTest = new EmnistDataSetIterator(emnistSet, batchSize, false)
    val outputNum = EmnistDataSetIterator.numLabels(emnistSet) // total output classes
    val rngSeed = 123 // integer for reproducability of a random number generator
    val numRows = 28 // number of "pixel rows" in an mnist digit
    val numColumns = 28
    
    val conf = new NeuralNetConfiguration.Builder()
                .seed(rngSeed)
                .updater(new Adam())
                .l2(1e-4)
                .list()
                .layer(new DenseLayer.Builder()
                    .nIn(numRows * numColumns) // Number of input datapoints.
                    .nOut(1000) // Number of output datapoints.
                    .activation(Activation.RELU) // Activation function.
                    .weightInit(WeightInit.XAVIER) // Weight initialization.
                    .build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nIn(1000)
                    .nOut(outputNum)
                    .activation(Activation.SOFTMAX)
                    .weightInit(WeightInit.XAVIER)
                    .build())
                .build()
    // create the MLN
    val network = new MultiLayerNetwork(conf)
    network.init()
    
    // pass a training listener that reports score every 10 iterations
    val eachIterations = 10
    network.addListeners(new ScoreIterationListener(eachIterations))
    
    // fit a dataset for a single epoch
    // network.fit(emnistTrain)
    
    // fit for multiple epochs
    // val numEpochs = 2
    // network.fit(emnistTrain, numEpochs)
    
    // or simply use for loop
    // for(i <- 1 to numEpochs) {
    //   println("Epoch " + i + " / " + numEpochs)
    //   network.fit(emnistTrain)
    // }
    // evaluate basic performance
    val eval = network.evaluate[Evaluation](emnistTest)
    println(eval.accuracy())
    println(eval.precision())
    println(eval.recall())
    
    // evaluate ROC and calculate the Area Under Curve
    val roc = network.evaluateROCMultiClass[ROCMultiClass](emnistTest, 0)
    roc.calculateAUC(classIndex)
    
    // optionally, you can print all stats from the evaluations
    print(eval.stats())
    print(roc.stats())
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .activation(Activation.RELU)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Sgd(0.05))
            // ... other hyperparameters
            .list()
            .backprop(true)
            .build();
            .layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
                    .build())
    java -version
    mvn --version
    brew install maven
    $ git clone git://git.kernel.org/pub/scm/git/git.git
    xcode-select --install
    git clone https://github.com/eclipse/deeplearning4j-examples.git
    cd deeplearning4j-examples/dl4j-examples/
    mvn clean install
    -Djava.library.path=""
    Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:624)
    at org.deeplearning4j.examples.feedforward.anomalydetection.MNISTAnomalyExample.main(MNISTAnomalyExample.java:46)
    Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5556)
    at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:189)
    ... 2 more
    Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
    at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:259)
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5553)
    ... 3 more
    Mapping of regularizers can be found in KerasRegularizerUtilsarrow-up-right.
    Wiredarrow-up-right
    GigaOMarrow-up-right
    Businessweekarrow-up-right
    Venturebeatarrow-up-right
    The Wall Street Journalarrow-up-right
    Fusionarrow-up-right
    Java Magazinearrow-up-right
    YourKitarrow-up-right
    KerasActivationUtilsarrow-up-right
    KerasLossUtilsarrow-up-right
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .seed(seed)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .learningRate(learningRate)
        .updater(Updater.NESTEROVS).momentum(0.9)
        .list()
        .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
                .weightInit(WeightInit.XAVIER)
                .activation("relu")
                .build())
        .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
                .weightInit(WeightInit.XAVIER)
                .activation("softmax").weightInit(WeightInit.XAVIER)
                .nIn(numHiddenNodes).nOut(numOutputs).build())
        .pretrain(false).backprop(true).build();
    <dependencies>
      <dependency>
          <groupId>org.nd4j</groupId>
          <artifactId>nd4j-native-platform</artifactId>
          <version>1.0.0-beta7</version>
      </dependency>
    </dependencies>
    GPUs
    GPUsarrow-up-right
    GPUsarrow-up-right
    GPUsarrow-up-right
    Leiningenarrow-up-right
    Bootarrow-up-right
    Leiningen tutorial is herearrow-up-right
    Looks like this when it is transposed

    In fact, transpose is just an important subset of a more general operation: reshape.

    hashtag
    Reshape

    Yes, matrices can be reshaped. You can change the number of rows and columns they have. The reshaped matrix has to fulfill one condition: the product of its rows and columns must equal the product of the row and columns of the original matrix. For example, proceeding columnwise, you can reshape a 3 by 4 matrix into a 2 by 6 matrix:

    The array nd2 looks like this

    Reshaping it is easy, and follows the same convention by which we gave it shape to begin with

    hashtag
    Broadcast

    Broadcast is advanced. It usually happens in the background without having to be called. The simplest way to understand it is by working with one long row vector, like the one above.

    Broadcasting will actually take multiple copies of that row vector and put them together into a larger matrix. The first parameter is the number of copies you want “broadcast,” as well as the number of rows involved. In order not to throw a compiler error, make the second parameter of broadcast equal to the number of elements in your row vector.

    INDArray nd = Nd4j.create(new float[]{1, 2, 3, 4}, new int[]{2, 2});
    
    [1.0 ,3.0]
    [2.0 ,4.0]                                                                                                                      
    nd.transpose();
    
    [1.0 ,2.0]
    [3.0 ,4.0]
    Unlike Hadamard products, which require that both matrices have equal rows and columns, inner products simply require that the number of columns of the first matrix equal the number of rows of the second. For example, this works

    Notice a 1 x 2 row times a 2 x 1 column produces a scalar. This operation reduces the dimensions to 1,1. You can imagine rotating the row vector [1.0 ,2.0] clockwise to stand on its end, placed against the column vector. The two top elements are then multiplied by each other, as are the bottom two, and the two products are added to consolidate in a single scalar.

    In ND4J, you would create the two vectors like this:

    And multiply them like this

    Notice ND4J code mirrors the equation in that nd * nd2 is row vector times column vector. The method is mmul, rather than the mul we used for elementwise operations, and the extra “m” stands for “matrix.”

    Now let’s take the same operation, while adding an additional column to a new array we’ll call nd4.

    Now let’s add an extra row to the first matrix, call it nd3, and multiply it by nd4

    The equation will look like this

    hashtag
    Outer product

    Taking the outer product of the two vectors we first worked with is as simple as reversing their order.

    It turns out the multiplying nd2 by nd is the same as multiplying it by two nd’s stacked on top of each other. That’s an outer product. As you can see, outer products also require fewer operations, since they don’t combine two products into one element in the final matrix.

    A few aspects of ND4J code should be noted here. Firstly, the method mmul takes two parameters.

    which could be expressed like this

    which is the same as this line

    Using the second parameter to specify the nd-array to which the product should be assigned is a convention common in ND4J.

                 [3.0]
    [1.0 ,2.0] * [4.0] = (1.0 * 3.0) + (2.0 * 4.0) = 11
    AVX support for different CPUs - summary:
    • Most modern x86 CPUs: AVX2 is supported

    • Some high-end server CPUs: AVX512 may be supported

    • Old CPUs (pre 2012) and low power x86 (Atom, Celeron): No AVX support (usually)

    Note that CPUs supporting later versions of AVX include all earlier versions also. This means it's possible run a generic x86 or AVX2 binary on a system supporting AVX512. However it is not possible to run binaries built for later versions (such as avx512) on a CPU that doesn't have support for those instructions.

    In version 1.0.0-beta6 and later you may get a warning as follows, if AVX is not configured optimally:

    hashtag
    Configuring AVX in ND4J/DL4J

    As noted earlier, for best performance you should use the version of ND4J that matches your CPU's supported AVX level.

    ND4J defaults configuration (when just including the nd4j-native or nd4j-native-platform dependencies without maven classifier configuration) is "generic x86" (no AVX) for nd4j/nd4j-platform dependencies.

    To configure AVX2 and AVX512, you need to specify a classifier for the appropriate architecture.

    The following binaries (nd4j-native classifiers) are provided for x86 architectures:

    • Generic x86 (no AVX): linux-x86_64, windows-x86_64, macosx-x86_64

    • AVX2: linux-x86_64-avx2, windows-x86_64-avx2, macosx-x86_64-avx2

    • AVX512: linux-x86_64-avx512

    Example: Configuring AVX2 on Windows (Maven pom.xml)

    Example: Configuring AVX512 on Linux (Maven pom.xml)

    Note that you need both nd4j-native dependencies - with and without the classifier.

    In the examples above, it is assumed that a Maven property nd4j.version is set to an appropriate ND4J version such as 1.0.0-beta6

    Wikipediaarrow-up-right
    MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
        .seed(seed)
        .regularization(true).l2(0.0005)
        .learningRate(0.01)
        .weightInit(WeightInit.XAVIER)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(Updater.NESTEROVS).momentum(0.9)
        .list()
        .layer(0, new ConvolutionLayer.Builder(5, 5)
                //nIn and nOut specify depth. nIn here is the nChannels and nOut is the number of filters to be applied
                .nIn(nChannels)
                .stride(1, 1)
                .nOut(20)
                .activation("identity")
                .build())
        .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2,2)
                .stride(2,2)
                .build())
        .layer(2, new ConvolutionLayer.Builder(5, 5)
                //Note that nIn need not be specified in later layers
                .stride(1, 1)
                .nOut(50)
                .activation("identity")
                .build())
        .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2,2)
                .stride(2,2)
                .build())
        .layer(4, new DenseLayer.Builder().activation("relu")
                .nOut(500).build())
        .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(outputNum)
                .activation("softmax")
                .build());
    implementation "org.deeplearning4j:deeplearning4j-core:1.0.0-beta6"
    implementation "org.nd4j:nd4j-native-platform:1.0.0-beta6"
    libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-beta6"
    libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "1.0.0-beta6"
    <dependency org="org.deeplearning4j" name="deeplearning4j-core" rev="1.0.0-beta6" conf="build" />
    <dependency org="org.nd4j" name="nd4j-native-platform" rev="1.0.0-beta6" conf="build" />
    [1.0 ,3.0 ,5.0 ,7.0 ,9.0 ,11.0]
    [2.0 ,4.0 ,6.0 ,8.0 ,10.0 ,12.0]
    [1.0 ,2.0]
    [3.0 ,4.0]
    [5.0 ,6.0]
    [7.0 ,8.0]
    [9.0 ,10.0]
    [11.0 ,12.0]
        INDArray nd2 = Nd4j.create(new float[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, new int[]{2, 6});
    [1.0 ,3.0 ,5.0 ,7.0 ,9.0 ,11.0]
    [2.0 ,4.0 ,6.0 ,8.0 ,10.0 ,12.0]
    nd2.reshape(3,4);
    
    [1.0 ,4.0 ,7.0 ,10.0]
    [2.0 ,5.0 ,8.0 ,11.0]
    [3.0 ,6.0 ,9.0 ,12.0]
    nd2 = Nd4j.create(new float[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12});
    nd2.broadcast(new int[]{3,12});
    
    [1.0 ,4.0 ,7.0 ,10.0 ,1.0 ,4.0 ,7.0 ,10.0 ,1.0 ,4.0 ,7.0 ,10.0]
    [2.0 ,5.0 ,8.0 ,11.0 ,2.0 ,5.0 ,8.0 ,11.0 ,2.0 ,5.0 ,8.0 ,11.0]
    [3.0 ,6.0 ,9.0 ,12.0 ,3.0 ,6.0 ,9.0 ,12.0 ,3.0 ,6.0 ,9.0 ,12.0]
    
    nd2.broadcast(new int[]{6,12});
    
    [1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0 ,1.0 ,7.0]
    [2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0 ,2.0 ,8.0]
    [3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0 ,3.0 ,9.0]
    [4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0 ,4.0 ,10.0]
    [5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0 ,5.0 ,11.0]
    [6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0 ,6.0 ,12.0]
    INDArray nd = Nd4j.create(new float[]{1,2},new int[]{2}); //row vector
    INDArray nd2 = Nd4j.create(new float[]{3,4},new int[]{2, 1}); //column vector
    nd.mmul(nd2);
    INDArray nd4 = Nd4j.create(new float[]{3,4,5,6},new int[]{2, 2});
    nd.mmul(nd4);                                                                                                                                                                                                                                                   
                 [3.0 ,5.0]
    [1.0 ,2.0] * [4.0 ,6.0] = [(1.0 * 3.0) + (2.0 * 4.0), (1.0 * 5.0) + (2.0 * 6.0)] = [11, 17]
    INDArray nd3 = Nd4j.create(new float[]{1,3,2,4},new int[]{2,2});
    nd3.mmul(nd4);
    [1.0 ,2.0]   [3.0 ,5.0]   [(1.0 * 3.0) + (2.0 * 4.0), (1.0 * 5.0) + (2.0 * 6.0),    [11, 17]
    [3.0 ,4.0] * [4.0 ,6.0] = (3.0 * 3.0) + (4.0 * 4.0), (3.0 * 5.0) + (4.0 * 6.0),] =  [25, 39]
    nd2.mmul(nd);
    
     [3.0]                [(3.0 * 1.0), (3.0 * 2.0)   [3.0 ,6.0]   [3.0]   [1.0 ,2.0]
     [4.0] * [1.0 ,2.0] =  (4.0 * 1.0), (4.0 * 2.0) = [4.0 ,8.0] = [4.0] * [1.0 ,2.0]
    nd.mmul(MATRIX TO MULTIPLY WITH, MATRIX TO WHICH THE PRODUCT SHOULD BE ASSIGNED);
    nd.mmul(nd2, ndv);
    ndv = nd.mmul(nd2);
    *********************************** CPU Feature Check Warning ***********************************
    Warning: Initializing ND4J with Generic x86 binary on a CPU with AVX/AVX2 support
    Using ND4J with AVX/AVX2 will improve performance. See deeplearning4j.org/cpu for more details
    Or set environment variable ND4J_IGNORE_AVX=true to suppress this warning
    ************************************************************************************************
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native</artifactId>
        <version>${nd4j.version}</version>
    </dependency>
    
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native</artifactId>
        <version>${nd4j.version}</version>
        <classifier>windows-x86_64-avx2</classifier>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native</artifactId>
        <version>${nd4j.version}</version>
    </dependency>
    
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native</artifactId>
        <version>${nd4j.version}</version>
        <classifier>linux-x86_64-avx512</classifier>
    </dependency>

    ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.

  • DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.

  • Arbiter: Arbiter is a package for (amongst other things) hyperparameter optimization of neural networks. Hyperparameter optimization refers to the process of automating the selection of network hyperparameters (learning rate, number of layers, etc) in order to obtain good performance.

  • We also have an extensive examples repository at dl4j-examplesarrow-up-right.

    hashtag
    Ways to contribute

    There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:

    • Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)

    • Add a new training feature

    • Bug fixes

    • DL4J examples: Is there an application or network architecture that we don’t have examples for?

    • Testing performance and identifying bottlenecks or areas to improve

    • Improve website documentation (or write tutorials, etc)

    • Improve the JavaDocs

    There are a number of different ways to find things to work on. These include:

    • Looking at the issue trackers:

      https://github.com/eclipse/deeplearning4j/issuesarrow-up-right

      https://github.com/eclipse/deeplearning4j-examples/issuesarrow-up-right

    • Reviewing our Roadmap

    • Talking to the developers on the community forumsarrow-up-right

    • Reviewing recent papers and blog posts on training features, network architectures and applications

    • Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?

    hashtag
    General guidelines

    Before you dive in, there’s a few things you need to know. In particular, the tools we use:

    • Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.

    • Git: the version control system we use

    • Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE

    • VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.

    • IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.

    Things to keep in mind:

    • Code should be Java 7 compliant

    • If you are adding a new method or class: add JavaDocs

    • You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)

    • Provide informative comments throughout your code. This helps to keep all code maintainable.

    • Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.

    • If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct

    • If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.

    • If you are unsure about something - ask us on the !

    monorepoarrow-up-right
    public class TSNEStandardExample {
    
        private static Logger log = LoggerFactory.getLogger(TSNEStandardExample.class);
    
        public static void main(String[] args) throws Exception  {
            //STEP 1: Initialization
            int iterations = 100;
            //create an n-dimensional array of doubles
            DataTypeUtil.setDTypeForContext(DataBuffer.Type.DOUBLE);
            List<String> cacheList = new ArrayList<>(); //cacheList is a dynamic array of strings used to hold all words
    
            //STEP 2: Turn text input into a list of words
            log.info("Load & Vectorize data....");
            File wordFile = new ClassPathResource("words.txt").getFile();   //Open the file
            //Get the data of all unique word vectors
            Pair<InMemoryLookupTable,VocabCache> vectors = WordVectorSerializer.loadTxt(wordFile);
            VocabCache cache = vectors.getSecond();
            INDArray weights = vectors.getFirst().getSyn0();    //seperate weights of unique words into their own list
    
            for(int i = 0; i < cache.numWords(); i++)   //seperate strings of words into their own list
                cacheList.add(cache.wordAtIndex(i));
    
            //STEP 3: build a dual-tree tsne to use later
            log.info("Build model....");
            BarnesHutTsne tsne = new BarnesHutTsne.Builder()
                    .setMaxIter(iterations).theta(0.5)
                    .normalize(false)
                    .learningRate(500)
                    .useAdaGrad(false)
    //                .usePca(false)
                    .build();
    
            //STEP 4: establish the tsne values and save them to a file
            log.info("Store TSNE Coordinates for Plotting....");
            String outputFile = "target/archive-tmp/tsne-standard-coords.csv";
            (new File(outputFile)).getParentFile().mkdirs();
            tsne.plot(weights,2,cacheList,outputFile);
            //This tsne will use the weights of the vectors as its matrix, have two dimensions, use the words strings as
            //labels, and be written to the outputFile created on the previous line
    
        }
    
    
    
    }
    t-Distributed Stochastic Neighbor Embeddingarrow-up-right
    Alt text
    Tsne data plot

    Feed Forward Networks

    In our previous tutorial, we learned about a very simple neural network model - the logistic regression model. Although you can solve many tasks with a simple model like that, most of the problems require a much complex network configuration. Typical Deep leaning model consists of many layers between the inputs and outputs. In this tutorial, we are going to learn about one of those configuration i.e. Feed-forward neural networks.

    hashtag
    Feed-Forward Networks

    Feed-forward networks are those in which there is not cyclic connection between the network layers. The input flows forward towards the output after going through several intermediate layers. A typical feed-forward network looks like this:

    Here you can see a different layer named as a hidden layer. The layers in between our input and output layers are called hidden layers. It’s called hidden because we don’t directly deal with them and hence not visible. There can be more than one hidden layer in the network.

    Just as our softmax activation after our output layer in the previous tutorial, there can be activation functions between each layer of the network. They are responsible to allow (activate) or disallow our network output to the next layer node. There are different activation functions such as sigmoid and relu etc.

    hashtag
    Imports

    hashtag
    Let’s create the feed-forward network configuration

    hashtag
    What we did here?

    As you can see above that we have made a feed-forward network configuration with one hidden layer. We have used a RELU activation between our hidden and output layer. RELUs are one of the most popularly used activation functions. Activation functions also introduce non-linearities in our network so that we can learn on more complex features present in our data. Hidden layers can learn features from the input layer and it can send those features to be analyzed by our output layer to get the corresponding outputs. You can similarly make network configurations with more hidden layers as:

    Logistic Regression

    With deep learning, we can compose a deep neural network to suit the input data and its features. The goal is to train the network on the data to make predictions, and those predictions are tied to the outcomes that you care about; i.e. is this transaction fraudulent or not, or which object is contained in the photo? There are different techniques to configure a neural network, and all of them build a relational hierarchy between the inputs and outputs.

    In this tutorial, we are going to configure the simplest neural network and that is logistic regression model network.

    Regression is a process that helps show the relations between the independent variables (inputs) and the dependent variables (outputs). Logistic regression is one in which the dependent variable is categorical rather than continuous - meaning that it can predict only a limited number of classes or categories, like a switch you flip on or off. For example, it can predict that an image contains a cat or a dog, or it can classify input in ten buckets with the integers 0 through 9.

    A simple logistic regression calculates x*w + b = y. Where x is an instance of input data, w is the weight or coefficient that transforms that input, b is the bias and y is the output, or prediction about the data. The biological terms show how this artificial neuron loosely maps to a neuron in the human brain. The most important point is how data flows through and is transformed by this structure.

    hashtag
    What will we learn in this tutorial?

    We’re going to configure the simplest network, with just one input layer and one output layer, to show how logistic regression works.

    hashtag
    Imports

    hashtag
    Configuring logistic regression layers

    We are going to first build the layers and then feed these layers into the network configuration.

    hashtag
    Why we didn’t build an input layer

    You may be wondering why didn’t we write any code for building our input layer. The input layer is only a set of inputs values fed into the network. It doesn’t perform a calculation. It’s just an input sequence (raw or pre-processed data) coming into the network, data to be trained on or to be evaluated. Later, we are going to work with data iterators, which feed input to a network in a specific pattern, and which can be thought of as an input layer of the network.

    Eclipse Contributors

    IP/Copyright requirements for Eclipse Foundation Projects

    This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository: https://github.com/eclipse/deeplearning4jarrow-up-right

    Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:

    1. Sign the Eclipse Contributor Agreement (once)

    2. Sign commits (each time)

    hashtag
    Why Is This Required?

    These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here:

    By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.

    By signing your commits, you are asserting that the code in that particular commit is your own.

    hashtag
    Signing the Eclipse Contributor Agreement

    You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:

    Step 1: Sign up for an Eclipse account

    This can be done at

    Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).

    Step 2: Sign the ECA

    Go to and follow the instructions.

    hashtag
    Signing Your Commits

    hashtag
    Signing a New Commit

    There are a few ways to sign commits. Note that you can use any of these aoptions.

    Option 1: Use -s When Committing on Command Line

    Signing commits here is simple:

    Note the use of -s (lower case s) - upper-case S (i.e., -S) is for GPG signing (see below).

    Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing

    For example, you could set up the following alias in Bash:

    Then committing would be done with the following:

    For Windows command line, similar options are available through a few mechanisms (see )

    One simple way is to create a gcm.bat file with the following contents, and add it to your system path:

    You can then commit using the same process as above (i.e., gcm "My Commit")

    Option 3: Use GPG Signing

    For details on GPG signing, see

    Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m' - note the upper case -S for GPG signing.

    Option 4: Commit using IntelliJ with Auto Signing

    IntelliJ can be used to perform git commits, including through signed commits. See for details.

    hashtag
    Checking If A Commit Is Signed

    After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1 to show the signature for the last commit (use -5 to show the last 5 commits, for example)

    The output will look like:

    The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by).

    hashtag
    If You Forget to Sign a Commit - Amending the Last Commit

    If you forgot to sign the last commit, you can use the following command:

    hashtag
    If You Forget to Sign Multiple Commits

    Suppose your branch has 3 new commits, all of which are unsigned:

    One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)

    The result:

    You can confirm that the commit is signed using git log -1 --show-signature as shown earlier.

    Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.

    If you are updating an existing PR, you may need to force push using -f (as in git push X -f).

    cuDNN

    Using the NVIDIA cuDNN library with DL4J.

    hashtag
    Using Deeplearning4j with cuDNN

    Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

    The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-10.0, deeplearning4j-cuda-10.1, or deeplearning4j-cuda-10.2, for example:

    or

    or

    The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

    Note there are multiple combinations of cuDNN and CUDA supported. At this time the following combinations are supported by Deeplearning4j:

    To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

    Alternatively, in the case of CUDA 10.2, cuDNN comes bundled with the "redist" package of the . , we can add the following dependencies instead of installing CUDA and cuDNN:

    Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the , instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

    Overview

    Overview of model import.

    hashtag
    Deeplearning4j: Keras model import

    Keras model importarrow-up-right provides routines for importing neural network models originally configured and trained using Kerasarrow-up-right, a popular Python deep learning library.

    Once you have imported your model into DL4J, our full production stack is at your disposal. We support import of all Keras model types, most layers and practically all utility functionality. Please check here for a complete list of supported Keras features.

    hashtag
    Getting started: Import a Keras model in 60 seconds

    To import a Keras model, you need to create and such a model first. Here's a simple example that you can use. The model is a simple MLP that takes mini-batches of vectors of length 100, has two Dense layers and predicts a total of 10 categories. After defining the model, we serialize it in HDF5 format.

    If you put this model file (simple_mlp.h5) into the base of your resource folder of your project, you can load the Keras model as DL4J MultiLayerNetwork as follows

    circle-info

    This shows only how to import a Keras Sequential model. For more details take a look at both import and import.

    That's it! The KerasModelImport is your main entry point to model import and class takes care of mapping Keras to DL4J concepts internally. As user you just have to provide your model file, see our for more details and options to load Keras models into DL4J.

    You can now use your imported model for inference (here with dummy data for simplicity)

    Here's how you do training in DL4J for your imported model:

    The full example just shown can be found in our .

    hashtag
    Project setup

    To use Keras model import in your existing project, all you need to do is add the following dependency to your pom.xml.

    If you need a project to get started in the first place, consider cloning and follow the instructions in the repository to build the project.

    hashtag
    Backend

    DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models can be imported into DL4J.

    hashtag
    Popular models and applications

    We support import for a growing number of applications, check for a full list of currently covered models. These applications include

    • Deep convolutional and Wasserstein GANs

    • UNET

    • ResNet50

    • SqueezeNet

    hashtag
    Troubleshooting and support

    An IncompatibleKerasConfigurationException message indicates that you are attempting to import a Keras model configuration that is not currently supported in Deeplearning4j (either because model import does not cover it, or DL4J does not implement the layer, or feature).

    Once you have imported your model, we recommend our own ModelSerializer class for further saving and reloading of your model.

    You can inquire further by visiting the . You might consider filing a so that this missing functionality can be placed on the DL4J development roadmap or even sending us a pull request with the necessary changes!

    hashtag
    Why Keras model import?

    Keras is a popular and user-friendly deep learning library written in Python. The intuitive API of Keras makes defining and running your deep learning models in Python easy. Keras allows you to choose which lower-level library it runs on, but provides a unified API for each such backend. Currently, Keras supports Tensorflow, CNTK and Theano backends.

    There is often a gap between the production system of a company and the experimental setup of its data scientists. Keras model import allows data scientists to write their models in Python, but still seamlessly integrates with the production stack.

    Keras model import is targeted at users mainly familiar with writing their models in Python with Keras. With model import you can bring your Python models to production by allowing users to import their models into the DL4J ecosphere for either further training or evaluation purposes.

    You should use this module when the experimentation phase of your project is completed and you need to ship your models to production. commercial support for Keras implementations in enterprise.

    Snapshots

    Using daily builds for access to latest Eclipse Deeplearning4j features.

    hashtag
    Contents

    Tensors

    A vector, that column of numbers we feed into neural nets, is simply a subclass of a more general mathematical structure called a tensor. A tensor is a multidimensional array.

    You are already familiar with a matrix composed of rows and columns: the rows extend along the y axis and the columns along the x axis. Each axis is a dimension. Tensors have additional dimensions.

    Tensors also have a so-called : a scalar, or single number, is of rank 0; a vector is rank 1; a matrix is rank 2; and entities of rank 3 and above are all simply called tensors.

    It may be helpful to think of a scalar as a point, a vector as a line, a matrix as a plane, and tensors as objects of three dimensions or more. A matrix has rows and columns, two dimensions, and therefore is of rank 2. A three-dimensional tensor, such as those we use to represent color images, has channels, rows and columns, and therefore counts as rank 3.

    As mathematical objects with multiple dimensions, tensors have a shape, and we specify that shape by treating tensors as n-dimensional arrays.

    Vertices

    Computation graph nodes for advanced configuration.

    hashtag
    What is a vertex?

    In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a ComputationGraph. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.

    Memory Workspaces

    Workspaces are an efficient model for memory paging in DL4J.

    hashtag
    What are workspaces?

    ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all INDArrays' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.

    The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call INDArray.detach()

    Overview

    Prebuilt model architectures and weights for out-of-the-box application.

    Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

    If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

    hashtag
    Getting started

    Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel

    community forumsarrow-up-right
    https://projects.eclipse.org/arrow-up-right
    https://accounts.eclipse.org/user/registerarrow-up-right
    https://accounts.eclipse.org/user/ecaarrow-up-right
    herearrow-up-right
    this linkarrow-up-right
    this pagearrow-up-right

    CUDA Version

    cuDNN Version

    10.0

    7.4

    10.1

    7.6

    10.2

    7.6

    NVIDIA cuDNNarrow-up-right
    JavaCPP Presets for CUDAarrow-up-right
    After agreeing to the licensearrow-up-right
    NO_WORKSPACE mode settable via the network configurationarrow-up-right

    MobileNet

  • Inception

  • Xception

  • serializearrow-up-right
    Functional Model
    Sequential Model
    Getting started guide
    DL4J examplesarrow-up-right
    DL4J examplesarrow-up-right
    herearrow-up-right
    community forumsarrow-up-right
    feature request via Githubarrow-up-right
    Konduit arrow-up-right
  • Limitations

  • Configuration of ND4J Backend

  • Note to Gradle Users

  • hashtag
    Overview/Introduction

    We provide automated daily builds of repositories such as ND4J, DataVec, DeepLearning4j, RL4J etc. So all the newest functionality and most recent bug fixes are released daily.

    Snapshots work like any other Maven dependency. The only difference is that they are served from a custom repository rather than from Maven Central.

    Due to ongoing development, snapshots should be considered less stable than releases: breaking changes or bugs can in principle be introduced at any point during the course of normal development. Typically, releases (not snapshots) should be used when possible, unless a bug fix or new feature is required.

    hashtag
    Setup Instructions

    Step 1: To use snapshots in your project, you should add snapshot repository information like this to your pom.xml file:

    Step 2: Make sure to specify the snapshot version. We follow a simple rule: If the latest stable release version is A.B.C, the snapshot version will be A.B.(C+1)-SNAPSHOT. The current snapshot version is 1.0.0-SNAPSHOT. For more details on the repositories section of the pom.xml file, see Maven documentationarrow-up-right

    If using properties like the DL4J examples, change: From version:

    To version:

    Sample pom.xml using Snapshots

    A sample pom.xml is provided here: sample pom.xml using snapshotsarrow-up-right This has been taken from the DL4J standalone sample project and modified using step 1 and 2 above. The original (using the last release) can be found herearrow-up-right

    hashtag
    Limitations

    Both -platform (all operating systems) and single OS (non-platform) snapshot dependencies are released. Due to the multi-platform build nature of snapshots, it is possible (though rare) for the -platform artifacts to temporarily get out of sync, which can cause build issues.

    If you are building and deploying on just one platform, it is safter use the non-platform artifacts, such as:

    hashtag
    Useful Maven Commands for Snapshots

    Two commands that might be useful when using snapshot dependencies in Maven is as follows: 1. -U - for example, in mvn package -U. This -U option forces Maven to check (and if necessary, download) of new snapshot releases. This can be useful if you need the be sure you have the absolute latest snapshot release. 2. -nsu - for example, in mvn package -nsu. This -nsu option stops Maven from checking for snapshot releases. Note however your build will only succeed with this option if you have some snapshot dependencies already downloaded into your local Maven cache (.m2 directory)

    An alternative approach to (1) is to set <updatePolicy>always</updatePolicy> in the <repositories> section found earlier in this page. An alternative approach to (2) is to set <updatePolicy>never</updatePolicy> in the <repositories> section found earlier in this page.

    hashtag
    Note to Gradle users

    Snapshots will not work with Gradle. You must use Maven to download the files. After that, you may try using your local Maven repository with mavenLocal().

    In order to download specific snapshot artifacts into your local Maven repository, you can run the following Maven command.

    In this example, it will download the nd4j-native (CPU backend) artifact for macOS. If you are on Windows or Linux, you'd use windows-x86_64 or linux-x86_64 respectively.

    triangle-exclamation

    A bare minimum file like the following should work in theory, but it does not. This is due to a bug in Gradlearrow-up-right. Gradle with snapshots and Maven classifiers appears to be a problem.

    Of note when using the nd4j-native backend (in contrast to nd4j-native-platform) on Gradle (and SBT - but not Maven), you need to add openblas as a dependency. We do this for you in the -platform pom. Reference the -platform pom herearrow-up-right to double check your dependencies. Note that these are version properties. See the <properties> section of the pom for current versions of the openblas and javacpp presets required to run nd4j-native.

    Introduction to Snapshots
    Setup Instructions

    With ND4J, we do that by creating a new nd array and feeding it data, shape and order as its parameters. In pseudo code, this would be

    In real code, this line

    creates an array with four elements, whose shape is 2 by 2, and whose order is “row major”, or rows first, which is the default in C. (In contrast, Fortran uses “column major” ordering, and could be specified with an ‘f’ as the third parameter.) The distinction between thetwo orderings, for the array created above, is best illustrated with a table:

    Row-major (C)

    Column-major (Fortran)

    [1,2]

    [1,3]

    [3,4]

    [2,4]

    Once we create an n-dimensional array, we may want to work with slices of it. Rather than copying the data, which is expensive, we can simply “view” muli-dimensional slices. A slice of array “a” could be defined like this:

    which would give you the first 5 channels, rows 3 to 4 and columns 6 to 7, and so forth for n dimensions, which each individual dimension’s slice starting before the colon and ending after it.

    hashtag
    Linear Buffer

    Now, while it is useful to imagine matrices as two-dimensional planes, and 3-D tensors are cubic volumes, we store all tensors as a linear buffer. That is, they are all flattened to a row of numbers.

    For that linear buffer, we specify something called stride. Stride tells the computation layer how to interpret the flattened representation. It is the number of elements you skip in the buffer to get to the next channel or row or column. There’s a stride for each dimension.

    Here’s a brief video summarizing how tensors are converted into linear byte buffers for ND4J.

    hashtag
    Additional Resources and Definitions

    The word tensor derives from the Latin tendere, or “to stretch”; therefore, tensor relates to that which stretches, the stretcher. Tensor was introduced to English from the German in 1915, after being coined by Woldemar Voigt in 1898. The mathematical object is called a tensor because an early application of the idea was the study of materials stretching under tension.

    • Multidimensional Arraysarrow-up-right

    • Tensor on Wikipediaarrow-up-right

    rankarrow-up-right
    hashtag
    Available Vertices

    hashtag
    L2NormalizeVertex

    [source]arrow-up-right

    L2NormalizeVertex performs L2 normalization on a single input.

    hashtag
    L2Vertex

    [source]arrow-up-right

    L2Vertex calculates the L2 least squares error of two inputs.

    For example, in Triplet Embedding you can input an anchor and a pos/neg class and use two parallel L2 vertices to calculate two real numbers which can be fed into a LossLayer to calculate TripletLoss.

    hashtag
    PoolHelperVertex

    [source]arrow-up-right

    A custom layer for removing the first column and row from an input. This is meant to allow importation of Caffe’s GoogLeNet from https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14arrow-up-right.

    hashtag
    ReshapeVertex

    [source]arrow-up-right

    Adds the ability to reshape and flatten the tensor in the computation graph. This is the equivalent to the next layer. ReshapeVertex also ensures the shape is valid for the backward pass.

    hashtag
    ScaleVertex

    [source]arrow-up-right

    A ScaleVertex is used to scale the size of activations of a single layer For example, ResNet activations can be scaled in repeating blocks to keep variance under control.

    hashtag
    ShiftVertex

    [source]arrow-up-right

    A ShiftVertex is used to shift the activations of a single layer One could use it to add a bias or as part of some other calculation. For example, Highway Layers need them in two places. One, it’s often useful to have the gate weights have a large negative bias. (Of course for this, we could just initialize the biases that way.) But, also it needs to do this: (1-sigmoid(weight input + bias)) () input + sigmoid(weight input + bias) () activation(w2 input + bias) (() is hadamard product) So, here, we could have

    1. a DenseLayer that does the sigmoid

    2. a ScaleVertex(-1) and

    3. a ShiftVertex(1) to accomplish that.

    hashtag
    StackVertex

    [source]arrow-up-right

    StackVertex allows for stacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where shared parameters are not supported by the network.

    This vertex will automatically stack all available inputs.

    hashtag
    UnstackVertex

    [source]arrow-up-right

    UnstackVertex allows for unstacking of inputs so that they may be forwarded through a network. This is useful for cases such as Triplet Embedding, where embeddings can be separated and run through subsequent layers.

    Works similarly to SubsetVertex, except on dimension 0 of the input. stackSize is explicitly defined by the user to properly calculate an step.

    hashtag
    ReverseTimeSeriesVertex

    [source]arrow-up-right

    ReverseTimeSeriesVertex is used in recurrent neural networks to revert the order of time series. As a result, the last time step is moved to the beginning of the time series and the first time step is moved to the end. This allows recurrent layers to backward process time series.

    Masks: The input might be masked (to allow for varying time series lengths in one minibatch). In this case the present input (mask array = 1) will be reverted in place and the padding (mask array = 0) will be left untouched at the same place. For a time series of length n, this would normally mean, that the first n time steps are reverted and the following padding is left untouched, but more complex masks are supported (e.g. [1, 0, 1, 0, …].

    setBackpropGradientsViewArray

    Gets the current mask array from the provided input

    • return The mask or null, if no input was provided

    and you'll get an independent
    INDArray
    copy.

    hashtag
    Neural Networks

    For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards. Thus for most users, no explicit worspaces configuration is required.

    To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:

    .trainingWorkspaceMode(WorkspaceMode.SEPARATE) and/or .inferenceWorkspaceMode(WorkspaceMode.SINGLE) in your neural network configuration.

    The difference between SEPARATE and SINGLE workspaces is a tradeoff between the performance & memory footprint:

    • SEPARATE is slightly slower, but uses less memory.

    • SINGLE is slightly faster, but uses more memory.

    That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).

    With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference. The only exclusion is the output() method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting INDArray from the workspaces, thus providing you with independent INDArray which will be handled by the JVM GC.

    Please note: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.

    hashtag
    Garbage Collector

    If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:

    Put that somewhere before your model.fit(...) call.

    hashtag
    ParallelWrapper & ParallelInference

    For ParallelWrapper, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.

    hashtag
    Iterators

    We provide asynchronous prefetch iterators, AsyncDataSetIterator and AsyncMultiDataSetIterator, which are usually used internally.

    These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first DataSet coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).

    Caution: If you’re using a custom iterator or the RecordReader, please make sure you’re not initializing something huge within the first next() call. Do that in your constructor to avoid undesired workspace growth.

    Caution: With AsyncDataSetIterator being used, DataSets are supposed to be used before calling the next() DataSet. You are not supposed to store them, in any way, without the detach() call. Otherwise, the memory used for INDArrays within DataSet will be overwritten within AsyncDataSetIterator eventually.

    If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: AsyncShieldDataSetIterator and AsyncShieldMultiDataSetIterator. Basically, those are just thin wrappers that prevent prefetch.

    hashtag
    Evaluation

    Usually, evaluation assumes use of the model.output() method, which essentially returns an INDArray detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:

    This piece of code will run a single cycle over iteratorTest, and it will update both (or less/more if required by your needs) IEvaluation implementations without any additional INDArray allocation.

    hashtag
    Workspace Destruction

    There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.

    That could be done like so: Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

    This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.

    hashtag
    Workspace Exceptions

    If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:

    hashtag
    DL4J's LayerWorkspaceMgr

    DL4J's Layer API includes the concept of a "layer workspace manager".

    The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces. For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons. However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.

    What does this mean in practice? Usually it's quite simple...

    • When returning activations (activate(boolean training, LayerWorkspaceMgr workspaceMgr) method), make sure the returned array is defined in ArrayType.ACTIVATIONS (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)

    • When returning activation gradients (backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)), similarly return an array defined in ArrayType.ACTIVATION_GRAD

    You can also leverage an array defined in any workspace to the appropriate workspace using, for example, LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)

    Note that if you are not implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use LayerWorkspaceMgr.noWorkspaces().

    git commit -s -m "My signed commit"
    alias gcm='git commit -s -m'
    gcm "My Commit"
    @echo off
    echo.
    git commit -s -m %*
    $ git log --show-signature -2
    commit 81681455918371e29da1490d3f0ca3deecaf0490 (HEAD -> commit_test_branch)
    Author: YourName <[email protected]>
    Date:   Fri Jun 21 22:27:50 2019 +1000
    
        This commit is unsigned
    
    commit 2349c6aa3497bd65866d7d0a18fe82bb691bb868
    Author: YourName <[email protected]>
    Date:   Fri Jun 21 21:42:38 2019 +1000
    
        My signed commit
    
        Signed-off-by: YourName <[email protected]>
    git commit --amend --signoff
    $ git log -4 --oneline
    4b164026 (HEAD -> commit_test_branch) Your new commit 3
    d7799615 Your new commit 2
    6bb6113a Your new commit 1
    ef09606c This commit already exists
    git reset --soft HEAD~3
    git commit -s -m "Squashed and signed"
    $ git log -2 --oneline
    31658e11 (HEAD -> commit_test_branch) Squashed and signed
    ef09606c This commit already exists
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-10.0</artifactId>
        <version>1.0.0-beta7</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-10.1</artifactId>
        <version>1.0.0-beta7</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-10.2</artifactId>
        <version>1.0.0-beta7</version>
    </dependency>
     <dependency>
         <groupId>org.bytedeco</groupId>
         <artifactId>cuda-platform-redist</artifactId>
         <version>10.2-7.6-1.5.3</version>
     </dependency>
        // for the whole network
        new NeuralNetConfiguration.Builder()
                .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
                // ...
        // or separately for each layer
        new ConvolutionLayer.Builder(h, w)
                .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
                // ...
    from keras.models import Sequential
    from keras.layers import Dense
    
    model = Sequential()
    model.add(Dense(units=64, activation='relu', input_dim=100))
    model.add(Dense(units=10, activation='softmax'))
    model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])
    
    model.save('simple_mlp.h5')
    String simpleMlp = new ClassPathResource("simple_mlp.h5").getFile().getPath();
    MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp);
    INDArray input = Nd4j.create(DataType.FLOAT, 256, 100);
    INDArray output = model.output(input);
    model.fit(input, output);
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-modelimport</artifactId>
        <version>1.0.0-beta6</version> // This version should match that of your other DL4J project dependencies.
    </dependency>
    <repositories>
        <repository>
            <id>snapshots-repo</id>
            <url>https://oss.sonatype.org/content/repositories/snapshots</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>daily</updatePolicy>  <!-- Optional, update daily -->
            </snapshots>
        </repository>
    </repositories>
    <dl4j.version>1.0.0-beta6</dl4j.version>
    <nd4j.version>1.0.0-beta6</nd4j.version>
    <dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
    <nd4j.version>1.0.0-SNAPSHOT</nd4j.version>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-native</artifactId>
                <version>${nd4j.version}</version>
            </dependency>
    mvn dependency:get -DremoteRepositories=snapshots::::https://oss.sonatype.org/content/repositories/snapshots -Dartifact=org.nd4j:nd4j-native:1.0.0-SNAPSHOT:jar:macos-x86_64
    version '1.0-SNAPSHOT'
    
    apply plugin: 'java'
    
    sourceCompatibility = 1.8
    
    repositories {
        maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
        mavenCentral()
    }
    
    dependencies {
        compile group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-SNAPSHOT'
        compile group: 'org.deeplearning4j', name: 'deeplearning4j-modelimport', version: '1.0.0-SNAPSHOT'
        compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT"
        // Use windows-x86_64 or linux-x86_64 if you are not on macos
        compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT:macosx-x86_64"
        testCompile group: 'junit', name: 'junit', version: '4.12'
    
    }
    nd4j.createArray(data, shape, order)
    INDArray arr = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2},'c');
    a[0:5,3:4,6:7]
    Tensors are generalizations of scalars (that have no indices), vectors (that have exactly one index), and matrices (that have exactly two indices) to an arbitrary number of indices. - Mathworld
    
    tensor, n. a mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space.
    public void setBackpropGradientsViewArray(INDArray backpropGradientsViewArray) 
    // this will limit frequency of gc calls to 5000 milliseconds
    Nd4j.getMemoryManager().setAutoGcWindow(5000)
    
    // OR you could totally disable it
    Nd4j.getMemoryManager().togglePeriodicGc(false);
    ParallelWrapper wrapper = new ParallelWrapper.Builder(model)
          // DataSets prefetching options. Buffer size per worker.
          .prefetchBuffer(8)
    
          // set number of workers equal to number of GPUs.
          .workers(2)
    
          // rare averaging improves performance but might reduce model accuracy
          .averagingFrequency(5)
    
          // if set to TRUE, on every averaging model score will be reported
          .reportScoreAfterAveraging(false)
    
          // 3 options here: NONE, SINGLE, SEPARATE
          .workspaceMode(WorkspaceMode.SINGLE)
    
          .build();
    Evaluation eval = new Evaluation(outputNum);
    ROC roceval = new ROC(outputNum);
    model.doEvaluation(iteratorTest, eval, roceval);
    org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
    For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic
    abstract class and uses the
    InstantiableModel
    interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

    hashtag
    Initializing fresh configurations

    You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

    If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

    hashtag
    Initializing pretrained weights

    Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

    For example, you can initialize a VGG-16 model with ImageNet weights like so:

    And initialize another VGG16 model with weights trained on VGGFace:

    If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

    Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

    hashtag
    What's in the zoo?

    The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

    You can find a complete list of models using this deeplearning4j-zoo Github linkarrow-up-right.

    This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

    • AlexNetarrow-up-right

    • Darknet19arrow-up-right

    • FaceNetNN4Small2arrow-up-right

    • InceptionResNetV1arrow-up-right

    hashtag
    Advanced usage

    The zoo comes with a couple additional features if you're looking to use the models for different use cases.

    hashtag
    Changing Inputs

    Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

    NOTE: this applies to fresh configurations only, and will not affect pretrained models:

    hashtag
    Transfer Learning

    Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J here.

    hashtag
    Workspaces

    Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see this section.

    Source: https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif
    Source: http://cs231n.github.io/neural-networks-1/

    Autoencoders

    hashtag
    What are autoencoders?

    Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

    hashtag
    Where’s Restricted Boltzmann Machine?

    RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

    hashtag
    Supported layers

    hashtag
    AutoEncoder

    Autoencoder layer. Adds noise to input and learn a reconstruction function.

    corruptionLevel

    Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

    sparsity

    Autoencoder sparity parameter

    • param sparsity Sparsity

    hashtag
    VariationalAutoencoder

    Variational Autoencoder layer

    See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes -

    This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

    A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

    encoderLayerSizes

    Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

    setEncoderLayerSizes

    Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

    • param encoderLayerSizes Size of each encoder layer in the variational autoencoder

    decoderLayerSizes

    Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

    • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

    setDecoderLayerSizes

    Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

    • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

    reconstructionDistribution

    The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example: - {- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data - {- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data

    • param distribution Reconstruction distribution

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    lossFunction

    Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

    • param outputActivationFn Activation function for the output/reconstruction

    • param lossFunction Loss function to use

    pzxActivationFn

    Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

    • param activationFunction Activation function for p(z| x)

    pzxActivationFunction

    Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

    • param activation Activation function for p(z | x)

    nOut

    Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

    • param nOut Size of P(Z | data) and output size

    numSamples

    Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

    This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

    • param numSamples Number of samples per data point for pretraining

    Examples Tour

    Brief tour of available examples in DL4J.

    Deeplearning4J has a wealth of examples of how to use its many parts. You can find the examples in the Examples Repositoryarrow-up-right.

    hashtag
    Prerequisites

    The example repositoryarrow-up-right consists of several separate Maven Java projects, each with their own pom files. Maven is a popular build automation tool for Java Projects. The contents of a "pom.xml" file dictate the configurations. Read more about how to configure Maven here.

    Users can also refer to the to get started with a clean project from scratch.

    Build tools are considered standard software engineering best practice. Besides this the complexities posed by the projects in the DL4J ecosystem make dependencies too difficult to manage manually. All the projects in the DL4J ecosystem can be used with other build tools like Gradle, SBT etc. More information on that can be found .

    hashtag
    Example Content

    Projects are based on what functionality the included examples demonstrate to the user and not necessarily which library in the DL4J stack the functionality lives in.

    Examples in a project are in general separated into "quickstart" and "advanced".

    Each project README also lists all the examples it contains, with a recommended order to explore them in.

    • This project contains a set of examples that demonstrate use of the high level DL4J API to build a variety of neural networks. Some of these examples are end to end, in the sense they start with raw data, process it and then build and train neural networks on it.

    • This project contains a set of examples that demonstrate how to import Keras h5 models and TensorFlow frozen pb models into the DL4J ecosystem. Once imported into DL4J these models can be treated like any other DL4J model - meaning you can continue to run training on them or modify them with the transfer learning API or simply run inference on them.

    hashtag
    Feedback & Contributions

    While these set of examples don't cover all the features available in DL4J the intent is to cover functionality required for most users - beginners and advanced. File an issue if you have feedback or feature requests that are not covered here. We are also available via our for questions. We welcome contributions from the community. More information can be found We love hearing from you. Cheers!

    Transfer Learning

    hashtag
    DL4J’s Transfer Learning API

    The DL4J transfer learning API enables users to:

    • Modify the architecture of an existing model

    • Fine tune learning configurations of an existing model.

    • Hold parameters of a specified layer constant during training, also referred to as “frozen"

    Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

    hashtag
    The transfer learning helper

    The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

    • Featurize an input dataset to save for future use

    • Fit the model with frozen layers with a featurized dataset

    • Output from the model with frozen layers given a featurized input.

    When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

    hashtag
    Show me the code

    This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from

    hashtag
    I. Import a zoo model

    Deeplearning4j has a new native model zoo. Read about the module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:

    hashtag
    II. Set up a fine-tune configuration

    hashtag
    III. Build new models based on VGG16

    hashtag
    A.Modifying only the last layer, keeping other frozen

    The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

    After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

    hashtag
    B. Attach new layers to the bottleneck (block5_pool)

    Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

    hashtag
    C. Fine tune layers from a previously saved model

    Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

    hashtag
    IV. Saving “featurized” datasets and training with them.

    We use the transfer learning helper API. Note this freezes the layers of the model passed in.

    Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

    Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.

    hashtag
    Notes

    • The TransferLearning builder returns a new instance of a dl4j model.

    Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

    • The trained model helper imports models from Keras without enforcing a training configuration.

    Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

    • Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

    When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

    • Frozen layer configurations are not saved when writing the model to disk.

    In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

    • On a copy: With the transfer learning API which will return a new model with the relevant frozen layers

    • In place: With the transfer learning helper API which will apply the frozen layers to the given model.

    • FineTune configurations will selectively update learning parameters.

    For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.

    hashtag
    Utilities

    Evaluation

    Tools and classes for evaluating neural network performance

    hashtag
    Why evaluate?

    When training or deploying a Neural Network it is useful to know the accuracy of your model. In DL4J the Evaluation Class and variants of the Evaluation Class are available to evaluate your model's performance.

    hashtag

    The Evaluation class is used to evaluate the performance for binary and multi-class classifiers (including time series classifiers). This section covers basic usage of the Evaluation Class.

    Given a dataset in the form of a DataSetIterator, the easiest way to perform evaluation is to use the built-in evaluate methods on MultiLayerNetwork and ComputationGraph:

    However, evaluation can be performed on individual minibatches also. Here is an example taken from our dataexamples/CSVExample in the project.

    The CSV example has CSV data for 3 classes of flowers and builds a simple feed forward neural network to classify the flowers based on 4 measurements.

    The first line creates an Evaluation object with 3 classes. The second line gets the labels from the model for our test dataset. The third line uses the eval method to compare the labels array from the testdata with the labels generated from the model. The fourth line logs the evaluation data to the console.

    The output.

    By default the .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:

    • Confusion Matrix

    • False Positive/Negative Rate

    • True Positive/Negative

    • Class Counts

    Display the Confusion Matrix.

    Displays

    Additionaly the confusion matrix can be accessed directly, converted to csv or html using.

    hashtag

    To Evaluate a network performing regression use the RegressionEvaluation Class.

    As with the Evaluation class, RegressionEvaluation on a DataSetIterator can be performed as follows:

    Here is a code snippet with single column, in this case the neural network was predicting the age of shelfish based on measurements.

    Print the statistics for the Evaluation.

    Returns

    Columns are Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, Relative Squared Error, and R^2 Coefficient of Determination

    See

    hashtag

    When performing multiple types of evaluations (for example, Evaluation and ROC on the same network and dataset) it is more efficient to do this in one pass of the dataset, as follows:

    hashtag

    Time series evaluation is very similar to the above evaluation approaches. Evaluation in DL4J is performed on all (non-masked) time steps separately - for example, a time series of length 10 will contribute 10 predictions/labels to an Evaluation object. One difference with time seires is the (optional) presence of mask arrays, which are used to mark some time steps as missing or not present. See for more details on masking.

    For most users, it is simply sufficient to use the MultiLayerNetwork.evaluate(DataSetIterator) or MultiLayerNetwork.evaluateRegression(DataSetIterator) and similar methods. These methods will properly handle masking, if mask arrays are present.

    hashtag

    The EvaluationBinary is used for evaluating networks with binary classification outputs - these networks usually have Sigmoid activation functions and XENT loss functions. The typical classification metrics, such as accuracy, precision, recall, F1 score, etc. are calculated for each output.

    See

    hashtag

    ROC (Receiver Operating Characteristic) is another commonly used evaluation metric for the evaluation of classifiers. Three ROC variants exist in DL4J:

    • ROC - for single binary label (as a single column probability, or 2 column 'softmax' probability distribution).

    • ROCBinary - for multiple binary labels

    • ROCMultiClass - for evaluation of non-binary classifiers, using a "one vs. all" approach

    These classes have the ability to calculate the area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC), via the calculateAUC() and calculateAUPRC() methods. Furthermore, the ROC and Precision-Recall curves can be obtained using getRocCurve() and getPrecisionRecallCurve().

    The ROC and Precision-Recall curves can be exported to HTML for viewing using: EvaluationTools.exportRocChartsToHtmlFile(ROC, File), which will export a HTML file with both ROC and P-R curves, that can be viewed in a browser.

    Note that all three support two modes of operation/calculation

    • Thresholded (approximate AUROC/AUPRC calculation, no memory issues)

    • Exact (exact AUROC/AUPRC calculation, but can require large amount of memory with very large datasets - i.e., datasets with many millions of examples)

    The number of bins can be set using the constructors. Exact can be set using the default constructor new ROC() or explicitly using new ROC(0)

    See is used to evaluate Binary Classifiers.

    hashtag

    Deeplearning4j also has the EvaluationCalibration class, which is designed to analyze the calibration of a classifier. It provides a number of tools for this purpose:

    • Counts of the number of labels and predictions for each class

    • Reliability diagram (or reliability curve)

    • Residual plot (histogram)

    • Histograms of probabilities, including probabilities for each class separately

    hashtag

    SparkDl4jMultiLayer and SparkComputationGraph both have similar methods for evaluation:

    hashtag

    A multi-task network is a network that is trained to produce multiple outputs. For example a network given audio samples can be trained to both predict the language spoken and the gender of the speaker. Multi-task configuration is briefly described .

    Evaluation Classes useful for Multi-Task Network

    See

    See

    hashtag
    Available evaluations

    Importing TensorFlow models

    hashtag
    What models can be imported into SameDiff

    Currently SameDiff supports the import of TensorFlow frozen graphs through the various SameDiff.importFrozenTF methods. TensorFlow documentation on frozen models can be found herearrow-up-right.

    hashtag
    Finding the model input/outputs and running inference

    After you import the TensorFlow model there are 2 ways to find the inputs and outputs. The first method is to look at the output of

    Where the input variables are the output of no ops, and the output variables are the input of no ops. Another way to find the inputs is

    To run inference use:

    For multiple outputs, use exec() instead of execSingle(), to return a Map<String,INDArray> of outputs instead. Alternatively, you can use methods such as SameDiff.output(Map<String, INDArray> placeholders, String... outputs) to get the same output.

    hashtag
    Import Validation

    We have a TensorFlow graph analyzing utility which will report any missing operations (operations that still need to be implemented)

    hashtag
    Advanced: Node Skipping and Import Overrides

    It is possible to remove nodes from the network. For example TensorFlow 1.x models can have hard coded dropout layers. See the for an example.

    hashtag
    List of models known to work with SameDiff.

    hashtag
    Operations Coverage

    SameDiff’s TensorFlow import is still being developed, and does not yet have support for every single operation and datatype in TensorFlow. Almost all of the common/standard operations are importable and tested, however - including almost everything in the tf, tf.math, tf.layers, tf.losses, tf.bitwise and tf.nn namespaces. The majority of existing pretrained models out there should be importable into SameDiff.

    If you run into an operation that can’t be imported, feel free to .

    Early Stopping

    Terminate a training session given certain conditions.

    hashtag
    What is early stopping?

    When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).

    Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.

    The idea behind early stopping is relatively simple:

    • Split data into training and test sets

    • At the end of each epoch (or, every N epochs):

      • evaluate the network performance on the test set

    This is shown graphically below:

    The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.

    Using DL4J's early stopping functionality requires you to provide a number of configuration options:

    • A score calculator, such as the DataSetLossCalculator(, ) for a Multi Layer Network, or DataSetLossCalculatorCG (, ) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)

    • How frequently we want to calculate the score function (default: every epoch)

    • One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:

    An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:

    You can also implement your own iteration and epoch termination conditions.

    hashtag
    Early Stopping w/ Parallel Wrapper

    The early stopping implementation described above will only work with a single device. However, EarlyStoppingParallelTrainer provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. EarlyStoppingParallelTrainer wraps your model in a ParallelWrapper class and performs localized distributed training.

    Note that EarlyStoppingParallelTrainer doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.

    Layers and Preprocessors

    In previous tutorials we learned how to configure different neural networks such as feed forward, convolutional, and recurrent networks. The type of neural network is determined by the type of hidden layers they contain. For example, feed forward neural networks are comprised of dense layers, while recurrent neural networks can include Graves LSTM (long short-term memory) layers. In this tutorial we will learn how to use combinations of different layers in a single neural network using the MultiLayerNetwork class of deeplearning4j (DL4J). Additionally, we will learn how to use preprocess our data to more efficiently train the neural networks. The MNIST dataset (images of handwritten digits) will be used as an example for a convolutional network.

    hashtag
    Imports

    hashtag
    Convolutional Neural Network Example

    Now that everything needed is imported, we can start by configuring a convolutional neural network for a MultiLayerNetwork. This network will consist of two convolutional layers, two max pooling layers, one dense layer, and an output layer. This is easy to do using DL4J’s functionality; we simply add a dense layer after the max pooling layer to convert the output into vectorized form before passing it to the output layer. The neural network will then attempt to classify an observation using the vectorized data in the output layer.

    The only tricky part is getting the dimensions of the input to the dense layer correctly after the convolutional and max pooling layers. Note that we first start off with a 28 by 28 matrix and after applying the convolution layer with a 5 by 5 kernel we end up with twenty 24 by 24 matrices. Once the input is passed through the max pooling layer with a 2 by 2 kernel and a stride of 2 by 2, we end up with twenty 12 by 12 matrices. After the second convolutional layer with a 5 by 5 kernel, we end up with fifty 8 by 8 matrices. This output is reduced to fifty 4 by 4 matrices after the second max pooling layer which has the same kernel size and stride of the first max pooling layer. To vectorize these final matrices, we require an input of dimension 5044 or 800 in the dense layer.

    Before training the neural network, we will instantiate built-in DataSetIterators for the MNIST data. One example of data preprocessing is scaling the data. The data we are using in raw form are greyscale images, which are represented by a single matrix filled with integer values from 0 to 255. A 0 value indicates a black pixel, while a 1 value indicates a white pixel. It is helpful to scale the image pixel value from 0 to 1 instead of from 0 to 255. To do this, the ImagePreProcessingScaler class is used directly on the MnistDataSetIterators. Note that this process is typtical for data preprocessing. Once this is done, we are ready to train the neural network.

    To train the neural network, we use 5 epochs or complete passes through the training set by simply calling the fit method.

    Lastly, we use the test split of the data to evaluate how well our final model performs on data it has never seen. We can see that the model performs pretty well using only 5 epochs!

    Random

    hashtag
    bernoulli

    Generate a new random INDArray, where values are randomly sampled according to a Bernoulli distribution,

    with the specified probability. Array values will have value 1 with probability P and value 0 with probability

    1-P.

    • p - Probability of value 1

    • datatype - Data type of the output variable

    • shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    hashtag
    binomial

    Generate a new random INDArray, where values are randomly sampled according to a Binomial distribution,

    with the specified number of trials and probability.

    • nTrials - Number of trials parameter for the binomial distribution

    • p - Probability of success for each trial

    • datatype - Data type of the output variable

    hashtag
    exponential

    Generate a new random INDArray, where values are randomly sampled according to a exponential distribution:

    P(x) = lambda exp(-lambda x)

    • lambda - lambda parameter

    • datatype - Data type of the output variable

    • shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    hashtag
    logNormal

    Generate a new random INDArray, where values are randomly sampled according to a Log Normal distribution,

    i.e., log(x) ~ N(mean, stdev)

    • mean - Mean value for the random array

    • stddev - Standard deviation for the random array

    • datatype - Data type of the output variable

    hashtag
    normal

    Generate a new random INDArray, where values are randomly sampled according to a Gaussian (normal) distribution,

    N(mean, stdev)

    • mean - Mean value for the random array

    • stddev - Standard deviation for the random array

    • datatype - Data type of the output variable

    hashtag
    normalTruncated

    Generate a new random INDArray, where values are randomly sampled according to a Gaussian (normal) distribution,

    N(mean, stdev). However, any values more than 1 standard deviation from the mean are dropped and re-sampled

    • mean - Mean value for the random array

    • stddev - Standard deviation for the random array

    • datatype - Data type of the output variable

    hashtag
    uniform

    Generate a new random INDArray, where values are randomly sampled according to a uniform distribution,

    U(min,max)

    • min - Minimum value

    • max - Maximum value.

    • datatype - Data type of the output variable

    select directory
    select directory

    Memory Management

    Setting available Memory/RAM for a DL4J application

    hashtag
    Memory Management for ND4J/DL4J: How does it work?

    ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.

    "Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.

    Backends

    Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.

    ND4J works atop so-called backends, or linear-algebra libraries, such as Native nd4j-native and nd4j-cuda-10.2 (GPUs), which you can select by pasting the right dependency into your project’s POM.xml file.

    hashtag
    ND4J backends for GPUs and CPUs

    You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.

    If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:

    Using Multiple GPUs

    Training neural network models can be a computationally expensive task. In order to speed up the training process, you can choose to train your models in parallel with multiple GPU’s if they are installed on your machine. With deeplearning4j (DL4J), this isn’t a difficult thing to do. In this tutorial we will use the MNIST dataset (dataset of handwritten images) to train a feed forward neural network in parallel with multiple GPUs.

    Note: This also works if you can't fully load your CPU. In that case you just stay with the CPU specific backend.

    hashtag
    Prerequisite

    Basics

    Elementwise Operations And Basic Usage

    The basic operations of linear algebra are matrix creation, addition and multiplication. This guide will show you how to perform those operations with ND4J, as well as various advanced transforms.

    The Java code below will create a simple 2 x 2 matrix, populate it with integers, and place it in the nd-array variable nd:

    If you print out this array

    you’ll see this

    A matrix with two rows and two columns, which orders its elements by column and which we’ll call matrix nd.

    A matrix that ordered its elements by row would look like this:

    Updaters

    Special algorithms for gradient descent.

    hashtag
    What are updaters?

    The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

    Trouble Shooting

    Understanding common errors like NaNs and tuning hyperparameters.

    hashtag
    Troubleshooting Neural Net Training

    Neural networks can be difficult to tune. If the network hyperparameters are poorly chosen, the network may learn slowly, or perhaps not at all. This page aims to provide some baseline steps you should take when tuning your network.

    Many of these tips have already been discussed in the academic literature. Our purpose is to consolidate them in one site and express them as clearly as possible.

    Bitwise

    hashtag
    and

    Bitwise AND operation. Supports broadcasting.

    • x (INT) - First input array

    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-zoo</artifactId>
        <version>1.0.0-beta6</version>
    </dependency>
    import org.deeplearning4j.zoo.model.AlexNet
    import org.deeplearning4j.zoo.*;
    
    ...
    
    int numberOfClassesInYourData = 1000;
    int randomSeed = 123;
    
    ZooModel zooModel = AlexNet.builder()
                    .numClasses(numberOfClassesInYourData)
                    .seed(randomSeed)
                    .build();
    Model net = zooModel.init();
    ZooModel zooModel = AlexNet.builder()
                    .numClasses(numberOfClassesInYourData)
                    .seed(randomSeed)
                    .build();
    MultiLayerConfiguration net = ((AlexNet) zooModel).conf();
    import org.deeplearning4j.zoo.model.VGG16;
    import org.deeplearning4j.zoo.*;
    
    ...
    
    ZooModel zooModel = VGG16.builder().build();;
    Model net = zooModel.initPretrained(PretrainedType.IMAGENET);
    ZooModel zooModel = VGG16.builder().build();
    Model net = zooModel.initPretrained(PretrainedType.VGGFACE);
    int numberOfClassesInYourData = 10;
    int randomSeed = 123;
    
    ZooModel zooModel = ResNet50.builder()
            .numClasses(numberOfClassesInYourData)
            .seed(randomSeed)
            .build();
    zooModel.setInputShape(new int[][]{{3, 28, 28}});
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.conf.graph.MergeVertex
    import org.deeplearning4j.nn.conf.layers.{DenseLayer, GravesLSTM, OutputLayer, RnnOutputLayer}
    import org.deeplearning4j.nn.conf.{ComputationGraphConfiguration, MultiLayerConfiguration, NeuralNetConfiguration, Updater}
    import org.deeplearning4j.nn.graph.ComputationGraph
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config.AdaGrad
    import org.nd4j.linalg.lossfunctions.LossFunctions
    val conf = new NeuralNetConfiguration.Builder()
        .seed(12345)
        .weightInit(WeightInit.XAVIER)
        .updater(new AdaGrad(0.5))
        .activation(Activation.RELU)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .l2(0.0001)
        .list()
        .layer(0, new DenseLayer.Builder().nIn(784).nOut(250).weightInit(WeightInit.XAVIER).activation(Activation.RELU) //First hidden layer
                .build())
        .layer(1, new OutputLayer.Builder().nIn(250).nOut(10).weightInit(WeightInit.XAVIER).activation(Activation.SOFTMAX) //Output layer
                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .build())
        .build()
    //Just make sure the number of inputs of the next layer equals to the number of outputs in the previous layer.
    val conf = new NeuralNetConfiguration.Builder()
        .seed(12345)
        .weightInit(WeightInit.XAVIER)
        .updater(new AdaGrad(0.5))
        .activation(Activation.RELU)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .learningRate(0.05)
        .l2(0.0001)
        .list()
        //First hidden layer
        .layer(0, new DenseLayer.Builder()
                .nIn(784).nOut(250)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU) 
                .build())
        //Second hidden layer
        .layer(1, new DenseLayer.Builder()
                .nIn(250).nOut(100)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU) 
                .build())
         //Third hidden layer
        .layer(2, new DenseLayer.Builder()
                .nIn(100).nOut(50)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .build())
        //Output layer
        .layer(3, new OutputLayer.Builder()
                .nIn(50).nOut(10)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.SOFTMAX) 
                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .build())
        .build()
    import org.deeplearning4j.nn.conf.graph.MergeVertex
    import org.deeplearning4j.nn.conf.layers.{DenseLayer, GravesLSTM, OutputLayer, RnnOutputLayer}
    import org.deeplearning4j.nn.conf.{ComputationGraphConfiguration, MultiLayerConfiguration, NeuralNetConfiguration}
    import org.deeplearning4j.nn.graph.ComputationGraph
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config.Nesterovs
    import org.nd4j.linalg.lossfunctions.LossFunctions
    //Building the output layer
    val outputLayer : OutputLayer = new OutputLayer.Builder()
        .nIn(784) //The number of inputs feed from the input layer
        .nOut(10) //The number of output values the output layer is supposed to take
        .weightInit(WeightInit.XAVIER) //The algorithm to use for weights initialization
        .activation(Activation.SOFTMAX) //Softmax activate converts the output layer into a probability distribution
        .build() //Building our output layer
    //Since this is a simple network with a stack of layers we're going to configure a MultiLayerNetwork
    val logisticRegressionConf : MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
        //High Level Configuration
        .seed(123)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(new Nesterovs(0.1, 0.9)) 
        //For configuring MultiLayerNetwork we call the list method
        .list() 
        .layer(0, outputLayer) //    <----- output layer fed here
        .build() //Building Configuration
    import org.nd4j.autodiff.SameDiff.SameDiff;
    SameDiff sd = SameDiff.importFrozenTF(modelFile);
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
    import org.nd4j.evaluation.classification.Evaluation
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration
    import org.deeplearning4j.nn.conf.Updater
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.nn.conf.layers.SubsamplingLayer
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer
    import org.deeplearning4j.nn.conf.inputs.InputType
    import org.deeplearning4j.nn.conf.distribution.UniformDistribution
    import org.deeplearning4j.nn.conf.layers.{DenseLayer, OutputLayer}
    import org.deeplearning4j.nn.conf.{ComputationGraphConfiguration, MultiLayerConfiguration, NeuralNetConfiguration, Updater}
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config.Nesterovs
    import org.nd4j.linalg.lossfunctions.LossFunctions
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.dataset.DataSet
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization
    import org.nd4j.linalg.dataset.api.preprocessor.ImagePreProcessingScaler
    import org.slf4j.Logger
    import org.slf4j.LoggerFactory
    INDArray bernoulli(double p, DataType datatype, long[] shape)
    
    SDVariable bernoulli(double p, DataType datatype, long[] shape)
    SDVariable bernoulli(String name, double p, DataType datatype, long[] shape)
    LeNetarrow-up-right
    ResNet50arrow-up-right
    SimpleCNNarrow-up-right
    TextGenerationLSTMarrow-up-right
    TinyYOLOarrow-up-right
    VGG16arrow-up-right
    VGG19arrow-up-right
    dl4j-distributed-training-examplesarrow-up-right This project contains a set of examples that demonstrate how to do distributed training, inference and evaluation in DL4J on Apache Spark. DL4J distributed training employs a "hybrid" asynchronous SGD approach - further details can be found in the distributed deep learning documentation herearrow-up-right
  • cuda-specific-examplesarrow-up-right This project contains a set of examples that demonstrate how to leverage multiple GPUs for data-parallel training of neural networks for increased performance.

  • samediff-examplesarrow-up-right This project contains a set of examples that demonstrate the SameDiff API. SameDiff (which is part of the ND4J library) can be used to build lower level auto-differentiating computation graphs. An analogue to the SameDiff API vs the DL4J API is the low level TensorFlow API vs the higher level of abstraction Keras API.

  • data-pipeline-examplesarrow-up-right This project contains a set of examples that demonstrate how raw data in various formats can be loaded, split and preprocessed to build serializable (and hence reproducible) ETL pipelines.

  • nd4j-ndarray-examplesarrow-up-right This project contains a set of examples that demonstrate how to manipulate NDArrays. The functionality of ND4J demonstrated here can be likened to NumPy.

  • arbiter-examplesarrow-up-right This project contains a set of examples that demonstrate usage of the Arbiter library for hyperparameter tuning of Deeplearning4J neural networks.

  • rl4j-examplesarrow-up-right This project contains examples of using RL4J, the reinforcement learning library in DL4J.

  • android-examplesarrow-up-right This project contains an Android example project, that shows DL4J being used in an Android application.

  • simple sample project providedarrow-up-right
    here
    dl4j-examplesarrow-up-right
    tensorflow-keras-import-examplesarrow-up-right
    herearrow-up-right
    community forumarrow-up-right
    herearrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1312.6114arrow-up-right
    http://download.tensorflow.org/example_images/flower_photos.tgzarrow-up-right
    deeplearning4j-zooarrow-up-right

    F-beta, G-measure, Matthews Correlation Coefficient and more, see Evaluation JavaDocarrow-up-right

    Evaluation of a classifier using EvaluationCalibration is performed in a similar manner to the other evaluation classes. The various plots/histograms can be exported to HTML for viewing using EvaluationTools.exportevaluationCalibrationToHtmlFile(EvaluationCalibration, File).

    Evaluation for Classification
    Examplesarrow-up-right
    Evaluation for Regression
    RegressionEvaluation JavaDocarrow-up-right
    Performing Multiple Evaluations Simultaneously
    Evaluation of Time Series
    Using RNNs - Masking
    Evaluation for Binary Classifiers
    EvaluationBinary JavaDocarrow-up-right
    ROC
    ROCBinary JavaDocarrow-up-right
    Evaluating Classifier Calibration
    Distributed Evaluation for Spark Networks
    Evaluation for Multi-task Networks
    here
    ROCMultiClass JavaDocarrow-up-right
    ROCBinary JavaDocarrow-up-right
    deeplab_mobilenetv2_coco_voc_trainvalarrow-up-right
  • densenet_2018_04_27arrow-up-right

  • inception_resnet_v2_2018_04_27arrow-up-right

  • inception_v4_2018_04_27arrow-up-right

  • labelsarrow-up-right

  • mobilenet_v1_0.5_128arrow-up-right

  • mobilenet_v2_1.0_224arrow-up-right

  • nasnet_mobile_2018_04_27arrow-up-right

  • resnetv2_imagenet_frozen_grapharrow-up-right

  • squeezenet_2018_04_27arrow-up-right

  • temperature_bidirectional_63arrow-up-right

  • temperature_stacked_63arrow-up-right

  • text_gen_81arrow-up-right

  • herearrow-up-right
    BERT Graph testarrow-up-right
    PorV-RNNarrow-up-right
    alexnetarrow-up-right
    cifar10_gan_85arrow-up-right
    open an issuearrow-up-right
    val rngSeed = 12345
    val mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed)
    val mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed)
    
    val scaler : DataNormalization = new ImagePreProcessingScaler(0,1);
    scaler.fit(mnistTrain);
    mnistTrain.setPreProcessor(scaler);
    mnistTest.setPreProcessor(scaler);
    
    

    shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    shape - Shape of the new random INDArray, as a 1D array (Size: AtLeast(min=0))

    To manage memory allocations, we use two approaches:
    • JVM Garbage Collector (GC) and WeakReference tracking

    • MemoryWorkspaces - see Workspaces guide for details

    Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and MemoryWorkspaces approaches is in when and how the memory is released.

    • For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.

    • For MemoryWorkspaces: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.

    hashtag
    Configuring Memory Limits

    With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:

    • -Xms - this defines how much memory JVM heap will use at application start.

    • -Xmx - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.

    • -Dorg.bytedeco.javacpp.maxbytes - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.

    • -Dorg.bytedeco.javacpp.maxphysicalbytes - this specifies the maximum bytes for the entire process - usually set to maxbytes plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. Unlike setting maxbytes setting maxphysicalbytes is optional. This can also be a percentage (>100%), in which case it would apply to maxMemory.

    Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:

    hashtag
    Gotchas: A few things to watch out for

    • With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.

    • For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.

    • If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.

    • If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.

    • If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. -Xmx8G will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.

    • In limited memory environments, it's usually a bad idea to use high -Xmx value together with -Xms option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set -Xms14G: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.

    hashtag
    Memory-mapped files

    ND4J supports the use of a memory-mapped file instead of RAM when using the nd4j-native backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.

    Here's sample code:

    In this case, a 1GB temporary file will be created and mmap'ed, and NDArray x will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.

    hashtag
    GPUs

    When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.

    We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.

    We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.

    If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.

    Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the Xmx option.

    Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM at minimum. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.

    It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.

    Example:

    It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the INDArray.unsafeDuplication() method.

    As of now, the artifactId for the CUDA versions can be one of nd4j-cuda-9.2, nd4j-cuda-10.0, nd4j-cuda-10.1 or nd4j-cuda-10.2.

    You can also find the available CUDA versions via Maven Central searcharrow-up-right or in the Release Notes.

    Otherwise you will need to use the native implementation of ND4J as a CPU backend:

    hashtag
    Building for Multiple Operating Systems

    If you are developing your project on multiple operating systems/system architectures, you can add -platform to the end of your artifactId which will download binaries for most major systems.

    hashtag
    Bundling multiple Backends

    For enabling different backends at runtime, you set the priority with your environment via the environment variable

    Relative to the priority, it will allow you to dynamically set the backend type.

    hashtag
    CuDNN

    See our page on CuDNN.

    hashtag
    CUDA Installation

    Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA websitearrow-up-right.

    hashtag
    Troubleshooting

    hashtag
    Nd4jBackend$NoAvailableBackendException

    There are multiple reasons why you might run into this error message.

    1. You haven't configured an ND4J backend at all.

    2. You have a jar file that doesn't contain a backend for your platform.

    3. You have a jar file that doesn't contain service loader files.

    hashtag
    You haven't configured any ND4J Backend

    Read this page and add a ND4J Backend to your dependencies:

    hashtag
    You have a jar file that doesn't contain a backend for your platform.

    This happens when you use a non -platform type backend dependency definition. In this case, only the Backend for the system that the jar file was built on will be included.

    To solve this issue, use nd4j-native-platform instead of nd4j-native, if you are running on CPU and nd4j-cuda-10.2-platform instead of nd4j-cuda-10.2 when using the GPU backend.

    If the jar file only contains the GPU backend, but your system has no CUDA capable (CC >= 3.5) GPU or CUDA isn't installed on the system, the CPU Backend should be used instead.

    hashtag
    You have a jar file that doesn't contain service loader files.

    ND4J uses the Java ServiceLoaderarrow-up-right in order to detect which backends are available on the class path. Depending on your uberjar packaging configuration, those files might be stripped away or broken.

    To double check that the required files are included, open your uberjar and make sure it contains /META-INF/services/org.nd4j.linalg.factory.Nd4jBackend. Then open the file, and make sure there are entries for all of your configured backends.

    If your uberjar does not contain that file, or if not all of the configured backends are listed there, you will have to reconfigure your shade plugin. See ServicesResourceTransformer arrow-up-rightdocumentation for how to do that.

    Backendschevron-right

    You must have multiple CUDA compatible GPUs, ideally of the same speed

  • You must setup your project to use the CUDA Backend, for help see Backends

  • hashtag
    Imports

    hashtag
    Data Set

    To obtain the data, we use built-in DataSetIterators for the MNIST with a random seed of 12345. These DataSetIterators can be used to directly feed the data into a neural network.

    hashtag
    Model Configuration

    Next, we set up the neural network configuration using a convolutional configuration and initialize the model.

    hashtag
    Parallel Wrapper

    Next we need to configure the parallel training with the ParallelWrapper class using the MultiLayerNetwork as the input. The ParallelWrapper will take care of load balancing between different GPUs.

    The notion is that the model will be duplicated within the ParallelWrapper. The prespecified number of workers (in this case 2) will then train its own model using its data. After a specified number of iterations (in this case 3), all models will be averaged and workers will receive duplicate models. The training process will then continue in this way until the model is fully trained.

    To train the model, the fit method of the ParallelWrapper is used directly on the DataSetIterator. Because the ParallelWrapper class handles all the training details behind the scenes, it is very simple to parallelize this process using dl4j.

    hashtag
    Elementwise scalar operations

    The simplest operations you can perform on a matrix are elementwise scalar operations; for example, adding the scalar 1 to each element of the matrix, or multiplying each element by the scalar 5. Let’s try it.

    This line of code represents this operation:

    and here is the result

    There are two ways to perform any operation in ND4J, destructive and nondestructive; i.e. operations that change the underlying data, or operations that simply work with a copy of the data. Destructive operations will have an “i” at the end – addi, subi, muli, divi. The “i” means the operation is performed “in place,” directly on the data rather than a copy, while nd.add() leaves the original untouched.

    Elementwise scalar multiplication looks like this:

    And produces this:

    Subtraction and division follow a similar pattern:

    If you perform all these operations on your initial 2 x 2 matrix, you should end up with this matrix:

    hashtag
    Elementwise vector operations

    When performed with simple units like scalars, the operations of arithmetic are unambiguous. But working with matrices, addition and multiplication can mean several things. With vector-on-matrix operations, you have to know what kind of addition or multiplication you’re performing in each case.

    First, we’ll create a 2 x 2 matrix, a column vector and a row vector.

    Notice that the shape of the two vectors is specified with their final parameters. {2,1} means the vector is vertical, with elements populating two rows and one column. A simple {2} means the vector populates along a single row that spans two columns – horizontal. You’re first matrix will look like this

    Here’s how you add a column vector to a matrix:

    And here’s the best way to visualize what’s happening. The top element of the column vector combines with the top elements of each column in the matrix, and so forth. The sum matrix represents the march of that column vector across the matrix from left to right, adding itself along the way.

    But let’s say you preserved the initial matrix and instead added a row vector.

    Then your equation is best visualized like this:

    In this case, the leftmost element of the row vector combines with the leftmost elements of each row in the matrix, and so forth. The sum matrix represents that row vector falling down the matrix from top to bottom, adding itself at each level.

    So vector addition can lead to different results depending on the orientation of your vector. The same is true for multiplication, subtraction and division and every other vector operation.

    In ND4J, row vectors and column vectors look the same when you print them out with

    They will appear like this.

    Don’t be fooled. Getting the parameters right at the beginning is crucial. addRowVector and addColumnVector will not produce different results when using the same initial vector, because they do not change a vector’s orientation as row or column.

    hashtag
    Elementwise matrix operations

    To carry out scalar and vector elementwise operations, we basically pretend we have two matrices of equal shape. Elementwise scalar multiplication can be represented several ways.

    So you see, elementwise operations match the elements of one matrix with their precise counterparts in another matrix. The element in row 1, column 1 of matrix nd will only be added to the element in row one column one of matrix c.

    This is clearer when we start elementwise vector operations. We imaginee the vector, like the scalar, as populating a matrix of equal dimensions to matrix nd. Below, you can see why row and column vectors lead to different sums.

    Column vector:

    Row vector:

    Now you can see why row vectors and column vectors produce different results. They are simply shorthand for different matrices.

    Given that we’ve already been doing elementwise matrix operations implicitly with scalars and vectors, it’s a short hop to do them with more varied matrices:

    Here’s how you can visualize that command:

    Muliplying the initial matrix nd with matrix nd4 works the same way:

    The term of art for this particular matrix manipulation is a Hadamard productarrow-up-right.

    These toy matrices are a useful heuristic to introduce the ND4J interface as well as basic ideas in linear algebra. This framework, however, is built to handle billions of parameters in n dimensions (and beyond…).

    hashtag
    Usage

    To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

    hashtag
    Available updaters

    hashtag
    NadamUpdater

    [source]arrow-up-right

    The Nadam updater. https://arxiv.org/pdf/1609.04747.pdf

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    NesterovsUpdater

    [source]arrow-up-right

    Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

    applyUpdater

    Get the nesterov update

    • param gradient the gradient to get the update for

    • param iteration

    • return

    hashtag
    RmsPropUpdater

    [source]arrow-up-right

    RMS Prop updates:

    http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf http://cs231n.github.io/neural-networks-3/#ada

    hashtag
    AdaGradUpdater

    [source]arrow-up-right

    Vectorized Learning Rate used per Connection Weight

    Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent See also http://cs231n.github.io/neural-networks-3/#ada

    applyUpdater

    Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

    • param gradient the gradient to get learning rates for

    • param iteration

    hashtag
    AdaMaxUpdater

    [source]arrow-up-right

    The AdaMax updater, a variant of Adam. http://arxiv.org/abs/1412.6980

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    NoOpUpdater

    [source]arrow-up-right

    NoOp updater: gradient updater that makes no changes to the gradient

    hashtag
    AdamUpdater

    [source]arrow-up-right

    The Adam updater. http://arxiv.org/abs/1412.6980

    applyUpdater

    Calculate the update based on the given gradient

    • param gradient the gradient to get the update for

    • param iteration

    • return the gradient

    hashtag
    AdaDeltaUpdater

    [source]arrow-up-right

    http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf https://arxiv.org/pdf/1212.5701v1.pdf

    Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

    applyUpdater

    Get the updated gradient for the given gradient and also update the state of ada delta.

    • param gradient the gradient to get the updated gradient for

    • param iteration

    • return the update gradient

    hashtag
    SgdUpdater

    [source]arrow-up-right

    SGD updater applies a learning rate only

    hashtag
    GradientUpdater

    [source]arrow-up-right

    Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

    hashtag
    AMSGradUpdater

    [source]arrow-up-right

    The AMSGrad updater Reference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ

    y (INT) - Second input array

    hashtag
    bitRotl

    Roll integer bits to the left, i.e. var << 4 | var >> (32 - 4)

    • x (INT) - Input 1

    • shift (INT) - Number of bits to shift.

    hashtag
    bitRotr

    Roll integer bits to the right, i.e. var >> 4 | var << (32 - 4)

    • x (INT) - Input 1

    • shift (INT) - Number of bits to shift.

    hashtag
    bitShift

    Shift integer bits to the left, i.e. var << 4

    • x (INT) - Input 1

    • shift (INT) - Number of bits to shift.

    hashtag
    bitShiftRight

    Shift integer bits to the right, i.e. var >> 4

    • x (INT) - Input 1

    • shift (INT) - Number of bits to shift.

    hashtag
    bitsHammingDistance

    Bitwise Hamming distance reduction over all elements of both input arrays. For example, if x=01100000 and y=1010000 then the bitwise Hamming distance is 2 (due to differences at positions 0 and 1)

    • x (INT) - First input array.

    • y (INT) - Second input array.

    hashtag
    leftShift

    Bitwise left shift operation. Supports broadcasting.

    • x (INT) - Input to be bit shifted

    • y (INT) - Amount to shift elements of x array

    hashtag
    leftShiftCyclic

    Bitwise left cyclical shift operation. Supports broadcasting.

    Unlike #leftShift(INDArray, INDArray) the bits will "wrap around":

    leftShiftCyclic(01110000, 2) -> 11000001

    • x (INT) - Input to be bit shifted

    • y (INT) - Amount to shift elements of x array

    hashtag
    or

    Bitwise OR operation. Supports broadcasting.

    • x (INT) - First input array

    • y (INT) - First input array

    hashtag
    rightShift

    Bitwise right shift operation. Supports broadcasting.

    • x (INT) - Input to be bit shifted

    • y (INT) - Amount to shift elements of x array

    hashtag
    rightShiftCyclic

    Bitwise right cyclical shift operation. Supports broadcasting.

    Unlike rightShift(INDArray, INDArray) the bits will "wrap around":

    rightShiftCyclic(00001110, 2) -> 10000011

    • x (INT) - Input to be bit shifted

    • y (INT) - Amount to shift elements of x array

    hashtag
    xor

    Bitwise XOR operation (exclusive OR). Supports broadcasting.

    • x (INT) - First input array

    • y (INT) - First input array

    public Builder corruptionLevel(double corruptionLevel) 
    public Builder sparsity(double sparsity) 
    public Builder encoderLayerSizes(int... encoderLayerSizes) 
    public void setEncoderLayerSizes(int... encoderLayerSizes) 
    public Builder decoderLayerSizes(int... decoderLayerSizes) 
    public void setDecoderLayerSizes(int... decoderLayerSizes) 
    public Builder reconstructionDistribution(ReconstructionDistribution distribution) 
    public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction) 
    public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction) 
    public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction) 
    public Builder pzxActivationFn(IActivation activationFunction) 
    public Builder pzxActivationFunction(Activation activation) 
    public Builder nOut(int nOut) 
    public Builder numSamples(int numSamples) 
    ZooModel zooModel = VGG16.builder().build();
    ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);
    FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(new Nesterovs(5e-5))
                .seed(seed)
                .build();
    ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
        .fineTuneConfiguration(fineTuneConf)
                  .setFeatureExtractor("fc2")
                  .removeVertexKeepConnections("predictions") 
                  .addLayer("predictions", 
            new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                            .nIn(4096).nOut(numClasses)
                            .weightInit(WeightInit.XAVIER)
                            .activation(Activation.SOFTMAX).build(), "fc2")
                  .build();
    ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
                  .fineTuneConfiguration(fineTuneConf)
                  .setFeatureExtractor("block5_pool")
                  .nOutReplace("fc2",1024, WeightInit.XAVIER)
                  .removeVertexAndConnections("predictions") 
                  .addLayer("fc3",new DenseLayer.Builder()
                  .activation(Activation.RELU)
                  .nIn(1024).nOut(256).build(),"fc2") 
                  .addLayer("newpredictions",new OutputLayer
                  .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                                    .activation(Activation.SOFTMAX)
                                    .nIn(256).nOut(numClasses).build(),"fc3") 
                  .setOutputs("newpredictions") 
                  .build();
    ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
                  .fineTuneConfiguration(fineTuneConf)
                  .setFeatureExtractor(“block4_pool”)
                  .build();
    TransferLearningHelper transferLearningHelper = 
        new TransferLearningHelper(pretrainedNet, "fc2");
    while(trainIter.hasNext()) {
            DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
            saveToDisk(currentFeaturized,trainDataSaved,true);
      trainDataSaved++;
    }
    TransferLearningHelper transferLearningHelper = 
        new TransferLearningHelper(vgg16Transfer);
    while (trainIter.hasNext()) {
           transferLearningHelper.fitFeaturized(trainIter.next());
    }
    DataSetIterator myTestData = ...
    Evaluation eval = model.evaluate(myTestData);
    Evaluation eval = new Evaluation(3);
    INDArray output = model.output(testData.getFeatures());
    eval.eval(testData.getLabels(), output);
    log.info(eval.stats());
    Examples labeled as 0 classified by model as 0: 24 times
    Examples labeled as 1 classified by model as 1: 11 times
    Examples labeled as 1 classified by model as 2: 1 times
    Examples labeled as 2 classified by model as 2: 17 times
    
    
    ==========================Scores========================================
     # of classes:    3
     Accuracy:        0.9811
     Precision:       0.9815
     Recall:          0.9722
     F1 Score:        0.9760
    Precision, recall & F1: macro-averaged (equally weighted avg. of 3 classes)
    ========================================================================
    System.out.println(eval.confusionToString());
    Predicted:         0      1      2
    Actual:
    0  0          |      16      0      0
    1  1          |       0     19      0
    2  2          |       0      0     18
    eval.getConfusionMatrix() ;
    eval.getConfusionMatrix().toHTML();
    eval.getConfusionMatrix().toCSV();
    DataSetIterator myTestData = ...
    RegressionEvaluation eval = model.evaluateRegression(myTestData);
    RegressionEvaluation eval =  new RegressionEvaluation(1);
    System.out.println(eval.stats());
    Column    MSE            MAE            RMSE           RSE            R^2            
    col_0     7.98925e+00    2.00648e+00    2.82653e+00    5.01481e-01    7.25783e-01
    DataSetIterator testData = ...
    Evaluation eval = new Evaluation();
    ROC roc = new ROC();
    model.doEvaluation(testdata, eval, roc);
    EvaluationBinary eval = new EvaluationBinary(int size)
    Evaluation eval = SparkDl4jMultiLayer.evaluate(JavaRDD<DataSet>);
    
    //Multiple evaluations in one pass:
    SparkDl4jMultiLayer.doEvaluation(JavaRDD<DataSet>, IEvaluation...);
     sd.summary();
    List<String> inputs = sd.inputs();
    INDArray out = sd.batchOutput()
        .input(inputName, inputArray)
        .output(outputs)
        .execSingle();
    val nChannels = 1; // Number of input channels
    val outputNum = 10; // The number of possible outcomes
    val batchSize = 64; // Test batch size
    val iterations = 1; // Number of training iterations
    val seed = 123; // Random seed
    
    val conf : MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
        .seed(12345)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .list()
        .layer(0, new ConvolutionLayer.Builder(5, 5)
            .nIn(1)
            .stride(1, 1)
            .nOut(20)
            .activation(Activation.IDENTITY)
            .build())
        .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2,2)
            .stride(2,2)
            .build())
         .layer(2, new ConvolutionLayer.Builder(5, 5)
            .stride(1, 1)
            .nOut(50)
            .activation(Activation.IDENTITY)
            .build())
        .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2,2)
            .stride(2,2)
            .build())
        .layer(4, new DenseLayer.Builder().activation(Activation.RELU)
            .nIn(800)
            .nOut(500).build())
        .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .nIn(500)
            .nOut(outputNum)
            .activation(Activation.SOFTMAX)
            .build())
        .setInputType(InputType.convolutionalFlat(28,28,1)) 
        .build()
    	
    val model = new MultiLayerNetwork(conf)
    val nEpochs = 5
    model.fit(mnistTrain, nEpochs)
    val eval = model.evaluate[Evaluation](mnistTest)
    println(eval.stats())
    INDArray binomial(int nTrials, double p, DataType datatype, long[] shape)
    
    SDVariable binomial(int nTrials, double p, DataType datatype, long[] shape)
    SDVariable binomial(String name, int nTrials, double p, DataType datatype, long[] shape)
    INDArray exponential(double lambda, DataType datatype, long[] shape)
    
    SDVariable exponential(double lambda, DataType datatype, long[] shape)
    SDVariable exponential(String name, double lambda, DataType datatype, long[] shape)
    INDArray logNormal(double mean, double stddev, DataType datatype, long[] shape)
    
    SDVariable logNormal(double mean, double stddev, DataType datatype, long[] shape)
    SDVariable logNormal(String name, double mean, double stddev, DataType datatype, long[] shape)
    INDArray normal(double mean, double stddev, DataType datatype, long[] shape)
    
    SDVariable normal(double mean, double stddev, DataType datatype, long[] shape)
    SDVariable normal(String name, double mean, double stddev, DataType datatype, long[] shape)
    INDArray normalTruncated(double mean, double stddev, DataType datatype, long[] shape)
    
    SDVariable normalTruncated(double mean, double stddev, DataType datatype, long[] shape)
    SDVariable normalTruncated(String name, double mean, double stddev, DataType datatype, long[] shape)
    INDArray uniform(double min, double max, DataType datatype, long[] shape)
    
    SDVariable uniform(double min, double max, DataType datatype, long[] shape)
    SDVariable uniform(String name, double min, double max, DataType datatype, long[] shape)
    -Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=8G -Dorg.bytedeco.javacpp.maxphysicalbytes=10G
    WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
                    .initialSize(1000000000)
                    .policyLocation(LocationPolicy.MMAP)
                    .build();
    
    try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) {
        INDArray x = Nd4j.create(10000);
    }
    WorkspaceConfiguration basicConfig = WorkspaceConfiguration.builder()
        .policyAllocation(AllocationPolicy.STRICT)
        .policyLearning(LearningPolicy.FIRST_LOOP)
        .policyMirroring(MirroringPolicy.HOST_ONLY) // <--- this option does this trick
        .policySpill(SpillPolicy.EXTERNAL)
        .build();
    <dependency>
     <groupId>org.nd4j</groupId>
     <artifactId>nd4j-cuda-10.2</artifactId>
     <version>1.0.0-beta7</version>
    </dependency>
    <dependency>
     <groupId>org.nd4j</groupId>
     <artifactId>nd4j-native</artifactId>
     <version>1.0.0-beta7</version>
    </dependency>
    <dependency>
     ...
     <artifactId>nd4j-native-platform</artifactId>
     ...
    </dependency>
    BACKEND_PRIORITY_CPU=SOME_NUM
    BACKEND_PRIORITY_GPU=SOME_NUM
     org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
    	at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:221)
    	at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5091)
    	... 2 more
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator;
    import org.deeplearning4j.eval.Evaluation;
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.Updater;
    import org.deeplearning4j.nn.conf.inputs.InputType;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
    import org.deeplearning4j.nn.conf.layers.DenseLayer;
    import org.deeplearning4j.nn.conf.layers.OutputLayer;
    import org.deeplearning4j.nn.conf.layers.SubsamplingLayer;
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.api.buffer.DataBuffer;
    import org.nd4j.linalg.api.buffer.util.DataTypeUtil;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.nd4j.linalg.dataset.DataSet;
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
    import org.nd4j.linalg.lossfunctions.LossFunctions;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import org.deeplearning4j.parallelism.ParallelWrapper;
    val batchSize = 128
    val mnistTrain = new MnistDataSetIterator(batchSize,true,12345)
    val mnistTest = new MnistDataSetIterator(batchSize,false,12345)
    val nChannels = 1
    val outputNum = 10
    val seed = 123
    
    val conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .weightInit(WeightInit.XAVIER)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(Updater.NESTEROVS)
                .list()
                .layer(0, new ConvolutionLayer.Builder(5, 5)
                    //nIn and nOut specify depth. nIn here is the nChannels and nOut is the number of filters to be applied
                    .nIn(nChannels)
                    .stride(1, 1)
                    .nOut(20)
                    .activation(Activation.IDENTITY)
                    .build())
                .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                    .kernelSize(2,2)
                    .stride(2,2)
                    .build())
                .layer(2, new ConvolutionLayer.Builder(5, 5)
                    //Note that nIn need not be specified in later layers
                    .stride(1, 1)
                    .nOut(50)
                    .activation(Activation.IDENTITY)
                    .build())
                .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                    .kernelSize(2,2)
                    .stride(2,2)
                    .build())
                .layer(4, new DenseLayer.Builder().activation(Activation.RELU)
                    .nOut(500).build())
                .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nOut(outputNum)
                    .activation(Activation.SOFTMAX)
                    .build())
                .setInputType(InputType.convolutionalFlat(28,28,1)) //See note below
                .build()
    
    val model = new MultiLayerNetwork(conf)
    model.init()
    val wrapper = new ParallelWrapper.Builder(model)
                .prefetchBuffer(24)
                .workers(2)
                .averagingFrequency(3)
                .reportScoreAfterAveraging(true)
                .build()
    wrapper.fit(mnistTrain)
    INDArray nd = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2});
    System.out.println(nd);
    [[1.0 ,3.0]
    [2.0 ,4.0]
    ]
    [[1.0 ,2.0]
    [3.0 ,4.0]
    ]
    nd.add(1);
    [[1.0 + 1 ,3.0 + 1]
    [2.0 + 1,4.0 + 1]
    ]
    [[2.0 ,4.0]
    [3.0 ,5.0]
    ]
    nd.mul(5);
    [[10.0 ,20.0]
    [15.0 ,25.0]
    ]
    nd.subi(3);
    nd.divi(2);
    [[3.5 ,8.5]
    [6.0 ,11.0]
    ]
    INDArray nd = Nd4j.create(new float[]{1,2,3,4},new int[]{2,2});
    INDArray nd2 = Nd4j.create(new float[]{5,6},new int[]{2,1}); //vector as column
    INDArray nd3 = Nd4j.create(new float[]{5,6},new int[]{2}); //vector as row
    [[1.00, 2.00],
     [3.00, 4.00]]
        nd.addColumnVector(nd2);
    [1.0 ,2.0]     [5.0]    [6.0 ,7.0]
    [3.0 ,4.0]  +  [6.0] =  [9.0 ,10.0]
    nd.addRowVector(nd3);
    [1.0 ,2.0]                   [6.0 ,8.0]
    [3.0 ,4.0]  +  [5.0 ,6.0] =  [8.0 ,10.0]
    System.out.println(nd);
    [5.0 ,6.0]
        [1.0 ,3.0]   [c , c]   [1.0 ,3.0]   [1c ,3c]
    c * [2.0 ,4.0] = [c , c] * [2.0 ,4.0] = [2c ,4c]
    [1.0 ,3.0]     [5.0]   [1.0 ,3.0]   [5.0 ,5.0]   [6.0 ,8.0]
    [2.0 ,4.0]  +  [6.0] = [2.0 ,4.0] + [6.0 ,6.0] = [8.0 ,10.0]
    [1.0 ,3.0]                   [1.0 ,3.0]    [5.0 ,6.0]   [6.0 ,9.0]    
    [2.0 ,4.0]  +  [5.0 ,6.0] =  [2.0 ,4.0] +  [5.0 ,6.0] = [7.0 ,10.0]
    INDArray nd4 = Nd4j.create(new float[]{5,6,7,8},new int[]{2,2});
    
    nd.add(nd4);
    [1.0 ,3.0]   [5.0 ,7.0]   [6.0 ,10.0]
    [2.0 ,4.0] + [6.0 ,8.0] = [8.0 ,12.0]
    nd.muli(nd4);
    
    [1.0 ,3.0]   [5.0 ,7.0]   [5.0 ,21.0]
    [2.0 ,4.0] * [6.0 ,8.0] = [12.0 ,32.0]
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Adam(0.01))
        // add your layers and hyperparameters below
        .build();
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    public void applyUpdater(INDArray gradient, int iteration, int epoch) 
    INDArray and(INDArray x, INDArray y)
    
    SDVariable and(SDVariable x, SDVariable y)
    SDVariable and(String name, SDVariable x, SDVariable y)
    INDArray bitRotl(INDArray x, INDArray shift)
    
    SDVariable bitRotl(SDVariable x, SDVariable shift)
    SDVariable bitRotl(String name, SDVariable x, SDVariable shift)
    INDArray bitRotr(INDArray x, INDArray shift)
    
    SDVariable bitRotr(SDVariable x, SDVariable shift)
    SDVariable bitRotr(String name, SDVariable x, SDVariable shift)
    INDArray bitShift(INDArray x, INDArray shift)
    
    SDVariable bitShift(SDVariable x, SDVariable shift)
    SDVariable bitShift(String name, SDVariable x, SDVariable shift)
    INDArray bitShiftRight(INDArray x, INDArray shift)
    
    SDVariable bitShiftRight(SDVariable x, SDVariable shift)
    SDVariable bitShiftRight(String name, SDVariable x, SDVariable shift)
    INDArray bitsHammingDistance(INDArray x, INDArray y)
    
    SDVariable bitsHammingDistance(SDVariable x, SDVariable y)
    SDVariable bitsHammingDistance(String name, SDVariable x, SDVariable y)
    INDArray leftShift(INDArray x, INDArray y)
    
    SDVariable leftShift(SDVariable x, SDVariable y)
    SDVariable leftShift(String name, SDVariable x, SDVariable y)
    INDArray leftShiftCyclic(INDArray x, INDArray y)
    
    SDVariable leftShiftCyclic(SDVariable x, SDVariable y)
    SDVariable leftShiftCyclic(String name, SDVariable x, SDVariable y)
    INDArray or(INDArray x, INDArray y)
    
    SDVariable or(SDVariable x, SDVariable y)
    SDVariable or(String name, SDVariable x, SDVariable y)
    INDArray rightShift(INDArray x, INDArray y)
    
    SDVariable rightShift(SDVariable x, SDVariable y)
    SDVariable rightShift(String name, SDVariable x, SDVariable y)
    INDArray rightShiftCyclic(INDArray x, INDArray y)
    
    SDVariable rightShiftCyclic(SDVariable x, SDVariable y)
    SDVariable rightShiftCyclic(String name, SDVariable x, SDVariable y)
    INDArray xor(INDArray x, INDArray y)
    
    SDVariable xor(SDVariable x, SDVariable y)
    SDVariable xor(String name, SDVariable x, SDVariable y)
    if the network outperforms the previous best model: save a copy of the network at the current epoch
  • Take as our final model the model that has the best test set performance

  • Epoch termination conditions: evaluated every N epochs

  • Iteration termination conditions: evaluated once per minibatch

  • A model saver, that defines how models are saved

  • JavaDocarrow-up-right
    Source Codearrow-up-right
    JavaDocarrow-up-right
    Source Codearrow-up-right
    hashtag
    Contents
    • Data Normalization

    • Weight Initialization

    • Epochs and Iterations

    • Learning Rate

    hashtag
    Data Normalization

    What's distribution of your data? Are you scaling it properly? As a general rule:

    • For continuous values: you want these to be in the range of -1 to 1, 0 to 1 or ditributed normally with mean 0 and standard deviation 1. This does not have to be exact, but ensuring your inputs are approximately in this range can help during training. Scale down large inputs, and scale up small inputs.

    • For discrete classes (and, for classification problems for the output), generally use a one-hot representation. That is, if you have 3 classes, then your data will be represeted as [1,0,0], [0,1,0] or [0,0,1] for each of the 3 classes respectively.

    Note that it's very important to use the exact same normalization method for both the training data and testing data.

    hashtag
    Weight Initialization

    Deeplearning4j supports several different kinds of weight initializations with the weightInit parameter. These are set using the .weightInit(WeightInit) method in your configuration.

    You need to make sure your weights are neither too big nor too small. Xavier weight initialization is usually a good choice for this. For networks with rectified linear (relu) or leaky relu activations, RELU weight initialization is a sensible choice.

    hashtag
    Number of Epochs and Number of Iterations

    An epoch is defined as a full pass of the data set.

    Too few epochs don't give your network enough time to learn good parameters; too many and you might overfit the training data. One way to choose the number of epochs is to use early stopping. Early stopping can also help to prevent the neural network from overfitting (i.e., can help the net generalize better to unseen data).

    hashtag
    Learning Rate

    The learning rate is one of, if not the most important hyperparameter. If this is too large or too small, your network may learn very poorly, very slowly, or not at all. Typical values for the learning rate are in the range of 0.1 to 1e-6, though the optimal learning rate is usually data (and network architecture) specific. Some simple advice is to start by trying three different learning rates – 1e-1, 1e-3, and 1e-6 – to get a rough idea of what it should be, before further tuning this. Ideally, they run models with different learning rates simultaneously to save time.

    The usual approach to selecting an appropriate learning rate is to use DL4J's visualization interface to visualize the progress of training. You want to pay attention to both the loss over time, and the ratio of update magnitudes to parameter magnitudes (a ratio of approximately 1:1000 is a good place to start). For more information on tuning the learning rate, see this linkarrow-up-right.

    For training neural networks in a distributed manner, you may need a different (frequently higher) learning rate compared to training the same network on a single machine.

    hashtag
    Policies and Scheduling

    You can optionally define a learning rate policy for your neural network. A policy will change the learning rate over time, achieving better results since the learning rate can "slow down" to find closer local minima for convergence. A common policy used is scheduling. See the LeNet examplearrow-up-right for a learning rate schedule used in practice.

    Note that if you're using multiple GPUs, this will affect your scheduling. For example, if you have 2x GPUs, then you will need to divide the iterations in your schedule by 2, since the throughput of your training process will be double, and the learning rate schedule is only applicable to the local GPU.

    hashtag
    Activation Function

    There are two aspects to be aware of, with regard to the choice of activation function.

    First, the activation function of the hidden (non-output) layers. As a general rule, 'relu' or 'leakyrelu' activations are good choices for this. Some other activation functions (tanh, sigmoid, etc) are more prone to vanishing gradient problems, which can make learning much harder in deep neural networks. However, for LSTM layers, the tanh activation function is still commonly used.

    Second, regarding the activation function for the output layer: this is usually application specific. For classification problems, you generally want to use the softmax activation function, combined with the negative log likelihood / MCXENT (multi-class cross entropy). The softmax activation function gives you a probability distribution over classes (i.e., outputs sum to 1.0). For regression problems, the "identity" activation function is frequently a good choice, in conjunction with the MSE (mean squared error) loss function.

    hashtag
    Loss Function

    Loss functions for each neural network layer can either be used in pretraining, to learn better weights, or in classification (on the output layer) for achieving some result. (In the example above, classification happens in the override section.)

    Your net's purpose will determine the loss function you use. For pretraining, choose reconstruction entropy. For classification, use multiclass cross entropy.

    hashtag
    Regularization

    Regularization methods can help to avoid overfitting during training. Overfitting occurs when the network predicts the training set very well, but makes poor predictions on data the network has never seen. One way to think about overfitting is that the network memorizes the training data (instead of learning the general relationships in it).

    Common types of regularization include:

    • l1 and l2 regularization penalizes large network weights, and avoids weights becoming too large. Some level of l2 regularization is commonly used in practice. However, note that if the l1 or l2 regularization coefficients are too high, they may over-penalize the network, and stop it from learning. Common values for l2 regularization are 1e-3 to 1e-6.

    • Dropoutarrow-up-right, is a frequently used regularization method can be very effective. Dropout is most commoly used with a dropout rate of 0.5.

    • Dropconnect (conceptually similar to dropout, but used much less frequently)

    • Restricting the total number of network size (i.e., limit the number of layers and size of each layer)

    To use l1/l2/dropout regularization, use .regularization(true) followed by .l1(x), .l2(y), .dropout(z) respectively. Note that z in dropout(z) is the probability of retaining an activation.

    hashtag
    Minibatch Size

    A minibatch refers to the number of examples used at a time, when computing gradients and parameter updates. In practice (for all but the smallest data sets), it is standard to break your data set up into a number of minibatches.

    The ideal minibatch size will vary. For example, a minibatch size of 10 is frequently too small for GPUs, but can work on CPUs. A minibatch size of 1 will allow a network to train, but will not reap the benefits of parallelism. 32 may be a sensible starting point to try, with minibatches in the range of 16-128 (sometimes smaller or larger, depending on the application and type of network) being common.

    hashtag
    Updater and Optimization Algorithm

    In DL4J, the term 'updater' refers to training mechanisms such as momentum, RMSProp, adagrad, and others. Using one of these methods can result in much faster network training companed to 'vanilla' stochastic gradient descent. You can set the updater using the .updater(Updater) configuration option.

    The optimization algorithm is how updates are made, given the gradient. The simplest (and most commonly used) method is stochastic gradient descent (SGD), however DL4J also provides SGD with line search, conjugate gradient and LBFGS optimization algorithms. These latter algorithms are more powerful compared to SGD, but considerably more costly per parameter update due to a line search component, and aren't used as much in practice. Note that you can in principle combine any updater with any optimization algorithm.

    A good default choice in most cases is to use the stochastic gradient descent optimization algorithm combined with one of the momentum/rmsprop/adagrad updaters, with momentum frequently being used in practice. Note that for momentum, the updater is called NESTEROVS (a reference to the Nesterovs variant of momentum), and the momentum rate can be set by the .momentum(double) option.

    hashtag
    Gradient Normalization

    When training a neural network, it can sometimes be helpful to apply gradient normalization, to avoid the gradients being too large (the so-called exploding gradient problem, common in recurrent neural networks) or too small. This can be applied using the .gradientNormalization(GradientNormalization) and .gradientNormalizationThreshould(double) methods. For an example of gradient normalization see, GradientNormalization.javaarrow-up-right. The test code for that example is herearrow-up-right.

    hashtag
    Recurrent Neural Networks: Truncated Backpropagation through Time

    When training recurrent networks with long time series, it is generally advisable to use truncated backpropagation through time. With 'standard' backpropagation through time (the default in DL4J) the cost per parameter update can become prohibative. For more details, see this page.

    hashtag
    Visible/Hidden Unit

    When using a deep-belief network, pay close attention here. An RBM (the component of the DBN used for feature extraction) is stochastic and will sample from different probability distributions relative to the visible or hidden units specified.

    See Geoff Hinton's definitive work, A Practical Guide to Training Restricted Boltzmann Machinesarrow-up-right, for a list of all of the different probability distributions.

    hashtag
    NaN, Not a Number Errors

    Q. Why is my Neural Network throwing nan values?

    A. Backpropagation involves the multiplication of very small gradients, due to limited precision when representing real numbers values very close to zero can not be represented. The term for this issue is Arithmetic Underflow. If your Neural Network is throwing nan's then the solution is to retune your network to avoid the very small gradients. This is more likely an issue with deeper Neural Networks.

    You can try using double data type but it's usually recommended to retune the net first.

    Following the basic tuning tips and monitoring the results is the way to ensure NAN doesn't show up anymore.

    Early Stopping

    When training neural networks, it is important to avoid overfitting the training data. Overfitting occurs when the neural network learns the noise in the training data and thus does not generalize well to data it has not been trained on. One hyperparameter that affects whether the neural network will overfit or not is the number of epochs or complete passes through the training split. If we use too many epochs, then the neural network is likely to overfit. On the other hand, if we use too few epochs, the neural network might not have the chance to learn fully from the training data.

    Early stopping is one mechanism used to manually set the number of epochs to prevent underfitting and overfitting. The idea behind early stopping is intuitive. First the data is split into training and testing sets. At the end of each epoch, the neural network is evaluated on the test set. If the neural network outperforms the previous best model, then we save the neural network. The best overall model is then taken to be the final model.

    In this tutorial we will show how to use early stopping with deeplearning4j (DL4J). We will apply the method on a feed forward neural network using the MNIST dataset, which is a dataset consisting of handwritten digits.

    hashtag
    Imports

    hashtag
    Loading the data

    Now that we have imported everything needed to run this tutorial, we can start by setting the parameters for the neural network and initializing the data. We will set the maximum number of epochs to run early stopping on to be 15.

    hashtag
    Network Configuration

    Next we will set the neural network configuration using the MultiLayerNetwork class of DL4J and initialize the MultiLayerNetwork.

    hashtag
    Early Stopping

    If we weren’t using early stopping, we would proceed by training the neural network using for loops and the fit method of the MultiLayerNetwork. But since we are using early stopping we need to configure how early stopping will be applied. Looking at the next cell, we will use a maximum epoch number of 10 and a maximum training time of 5 minutes. The evaluation will be done on mnistTest after each epoch. Each model will be saved in the DL4JEarlyStoppingExample directory that we specified.

    Once the EarlyStoppingConfiguration is specified, we only need to initialize an EarlyStoppingTrainer using the training data and the two previous configuraitons. The results are obtained just by calling the fit method of EarlyStoppingTrainer.

    We can then print out the details of the best model.

    Built-in Data Iterators

    Toy datasets are essential for testing hypotheses and getting started with any neural network training process. Deeplearning4j comes with built-in dataset iterators for common datasets, including but not limited to:

    • MNIST

    • Iris

    • TinyImageNet (subset of ImageNet)

    • CIFAR-10

    • Labelled Faces in the Wild

    • Curve Fragment Ground-Truth Dataset

    These datasets are also used as a baseline for testing other machine learning algorithms. Please remember to use these datasets correctly within the terms of their license (for example, you must obtain special permission to use ImageNet in a commercial project).

    hashtag
    What are we going to learn in this tutorial?

    Building on what we know about MultiLayerNetwork and ComputationGraph, we will instantiate a couple data iterators to feed a toy dataset into a neural network for training. This tutorial is focused on training a classifier (you can also train networks for regression, or use them for unsupervised training via an autoencoder), and you will also learn how to interpret the output in the console.

    hashtag
    Imports

    hashtag
    The MNIST classifier network

    A MultiLayerNetwork can classify MNIST digits. If you are not familiar with MNIST, it is a dataset originally assembled for recognizing hand-written numerals. You can read more about MNIST .

    Once you have imported what you need, set up a basic MultiLayerNetwork like below.

    hashtag
    Using the MNIST iterator

    The MNIST iterator, like most of Deeplearning4j’s built-in iterators, extends the DataSetIterator class. This API allows for simple instantiation of datasets and automatic downloading of data in the background. The MNIST data iterator API specifically allows you to specify whether you are using the training or testing dataset, so instantiate two different iterators to evaluate your network.

    hashtag
    Performing basic training

    Now that the network configuration is set up and instantiated along with our MNIST test/train iterators, training takes just a few lines of code. The fun begins.

    Earlier we attached a ScoreIterationListener to the model by using the setListeners() method. Depending on the browser you are using to run this notebook, you can open the debugger/inspector to view listener output. This output is redirected to the console since the internals of Deeplearning4j use SLF4J for logging, and the output is being redirected by Zeppelin. This is a good thing since it can reduce clutter in notebooks.

    As a well-tuned model continues to train, its error score will decrease with each iteration. This error or loss score will eventually converge to a value close to zero. Note that more complex networks and problems may never yield an optimal score. This is where you need to become the expert and continue to tune and change your model’s configuration.

    hashtag
    Evaluating the model

    “Overfitting” is a common problem in deep learning where your model doesn’t generalize well to the problem you are trying to solve. This can happen when you have run the algorithm for too many epochs over a training dataset, when you haven’t used a regularization technique like , or the training dataset isn’t big enough and doesn’t encapsulate all of the features that are descriptive of your classes in the real world.

    Deeplearning4j comes with built-in tools for model evaluation. The simplest is to pass a testing iterator to eval() and retrieve an Evaluation object. Many more, including ROC plotting and regression evaluation, are available in the org.nd4j.evaluation.classification package.

    Deep Learning Beginners

    Road map for beginners new to deep learning.

    hashtag
    How Do I Start Using Deep Learning?

    Where you start depends on what you already know.

    The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model.

    In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven.

    Below you'll find a list of resources. The sections are roughly organized in the order they will be useful.

    hashtag
    Free Machine- and Deep-learning Courses Online

    • (For those interested in a survey of artificial intelligence.)

    • (For those interested in image recognition.)

    hashtag
    Math

    The math involved with deep learning is basically linear algebra, calculus and probility, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).

    hashtag
    Programming

    If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start.

    If you want to jump into deep-learning from here without Java, we recommend and the various Python frameworks built atop it, including and .

    hashtag
    Python

    hashtag
    Java

    Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.

    hashtag
    Deeplearning4j

    With that under your belt, we recommend you approach Deeplearning4j through its .

    hashtag
    Other Resources

    Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups .

    While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like and .

    Activations

    Special algorithms for gradient descent.

    hashtag
    What are activations?

    At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

    hashtag
    Usage

    The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

    hashtag
    Available activations

    hashtag
    ActivationRectifiedTanh

    Rectified tanh

    Essentially max(0, tanh(x))

    Underlying implementation is in native code

    hashtag
    ActivationELU

    f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

    alpha defaults to 1, if not specified

    hashtag
    ActivationReLU

    f(x) = max(0, x)

    hashtag
    ActivationRationalTanh

    Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

    f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

    Underlying implementation is in native code

    hashtag
    ActivationThresholdedReLU

    Thresholded RELU

    f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

    hashtag
    ActivationReLU6

    f(x) = min(max(input, cutoff), 6)

    hashtag
    ActivationHardTanH

    hashtag
    ActivationSigmoid

    f(x) = 1 / (1 + exp(-x))

    hashtag
    ActivationGELU

    GELU activation function - Gaussian Error Linear Units

    hashtag
    ActivationPReLU

    / Parametrized Rectified Linear Unit (PReLU)

    f(x) = alpha x for x < 0, f(x) = x for x >= 0

    alpha has the same shape as x and is a learned parameter.

    hashtag
    ActivationIdentity

    f(x) = x

    hashtag
    ActivationSoftSign

    hashtag
    ActivationHardSigmoid

    f(x) = min(1, max(0, 0.2x + 0.5))

    hashtag
    ActivationSoftmax

    f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

    hashtag
    ActivationCube

    f(x) = x^3

    hashtag
    ActivationRReLU

    f(x) = max(0,x) + alpha min(0, x)

    alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

    hashtag
    ActivationTanH

    f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

    hashtag
    ActivationSELU

    https://arxiv.org/pdf/1706.02515.pdf

    hashtag
    ActivationLReLU

    Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

    hashtag
    ActivationSwish

    f(x) = x sigmoid(x)

    hashtag
    ActivationSoftPlus

    f(x) = log(1+e^x)

    Core Concepts

    Introduction to Deeplearning4J concepts.

    hashtag
    Overview

    Every machine-learning workflow consists of at least two parts. The first is loading your data and preparing it to be used for learning. We refer to this part as the ETL (extract, transform, load) process. is the library we built to make building data pipelines easier. The second part is the actual learning system itself. That is the algorithmic core of DL4J.

    All deep learning is based on vectors and tensors, and DL4J relies on a tensor library called . It provides us with the ability to work with n-dimensional arrays (also called tensors). Thanks to its different backends, it even enables us to use both CPUs and GPUs.

    Listeners

    Adding hooks and listeners on DL4J models.

    hashtag
    What are listeners?

    Listeners allow users to "hook" into certain events in Eclipse Deeplearning4j. This allows you to collect or print information useful for tasks like training. For example, a ScoreIterationListener allows you to print training scores from the output layer of a neural network.

    Convolutional Neural Network

    Also known as CNN.

    hashtag
    Available layers

    hashtag
    Convolution1D

    Recurrent Networks

    Sequence Classification Of Synthetic Control Data

    Recurrent neural networks (RNN’s) are used when the input is sequential in nature. Typically RNN’s are much more effective than regular feed forward neural networks for sequential data because they can keep track of dependencies in the data over multiple time steps. This is possible because the output of a RNN at a time step depends on the current input and the output of the previous time step.

    RNN’s can also be applied to situations where the input is sequential but the output isn’t. In these cases the output of the last time step of the RNN is typically taken as the output for the overall observation. For classification, the output of the last time step will be the predicted class label for the observation.

    In this tutorial we will show how to build a RNN using the MultiLayerNetwork class of deeplearning4j (DL4J). This tutorial will focus on applying a RNN for a classification task. We will be using the MNIST data, which is a dataset that consists of images of handwritten digits, as the input for the RNN. Although the MNIST data isn’t time series in nature, we can interpret it as such since there are 784 inputs. Thus, each observation or image will be interpreted to have 784 time steps consisting of one scalar value for a pixel. Note that we use a RNN for this task for purely pedagogical reasons. In practice, convolutional neural networks (CNN’s) are better suited for image classification tasks.

    Hyperparameter Optimization

    Neural network hyperparameters are parameters set prior to training. They include the learning rate, batch size, number of epochs, regularization, weight initialization, number of hidden layers, number of nodes, and etc. Unlike the weights and biases of the nodes of the neural network, they cannot be estimated directly using the data. Setting an optimal or near-optimal configuration of the hyperparameters can significantly affect neural network performance. Thus, time should be set aside to tune these hyperparameters. Deeplearning4j (DL4J) provides functionality to do exactly this task. Arbiter was created explicitly for tuning neural network models and is part of the DL4J suite of deep learning tools. In this tutorial, we will show an example of using Arbiter to tune the learning rate and the number of hidden nodes or layer size of a neural network model. We will use the MNIST dataset (images of handwritten digits) to train the neural network.

    hashtag
    Imports

    Computation Graph

    How to build complex networks with DL4J computation graph.

    hashtag
    Building Complex Network Architectures with Computation Graph

    This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.

    hashtag

    MultiLayerConfiguration myNetworkConfiguration = ...;
    DataSetIterator myTrainData = ...;
    DataSetIterator myTestData = ...;
    
    EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
            .epochTerminationConditions(new MaxEpochsTerminationCondition(30))
            .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
            .scoreCalculator(new DataSetLossCalculator(myTestData, true))
            .evaluateEveryNEpochs(1)
            .modelSaver(new LocalFileModelSaver(directory))
            .build();
    
    EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);
    
    //Conduct early stopping training:
    EarlyStoppingResult result = trainer.fit();
    
    //Print out the results:
    System.out.println("Termination reason: " + result.getTerminationReason());
    System.out.println("Termination details: " + result.getTerminationDetails());
    System.out.println("Total epochs: " + result.getTotalEpochs());
    System.out.println("Best epoch number: " + result.getBestModelEpoch());
    System.out.println("Score at best epoch: " + result.getBestModelScore());
    
    //Get the best model:
    MultiLayerNetwork bestModel = result.getBestModel();
    Activation Function
    Loss Function
    Regularization
    Minibatch Size
    Updater and Optimization Algorithm
    Gradient Normalization
    Recurrent Neural Networks
    Deep Belief Network
    Restricted Boltzmann Machines
    NaN, Not a Number issues
    Early stopping
  • ML@B: Machine Learning Crash Course: Part 1arrow-up-right

  • ML@B: Machine Learning Crash Course: Part 2arrow-up-right

  • Gradient descent, how neural networks learn, Deep learning, part 2arrow-up-right

  • Linear Algebra for Machine Learningarrow-up-right; Patrick van der Smagt

  • CMU's Linear Algebra Reviewarrow-up-right

  • Math for Machine Learningarrow-up-right

  • Immersive Linear Algebraarrow-up-right

  • Probability Cheatsheetarrow-up-right

  • The best linear algebra booksarrow-up-right

  • Markov Chains, Visually Explainedarrow-up-right

  • An Introduction to MCMC for Machine Learningarrow-up-right

  • Eigenvectors, Eigenvalues, PCA, Covariance and Entropyarrow-up-right

  • Markov Chain Monte Carlo (MCMC) & Machine Learningarrow-up-right

  • Relearning Matrices as Linear Functionsarrow-up-right

  • Additional command-line tutorialarrow-up-right

  • A Vim Tutorial and Primerarrow-up-right (Vim is an editor accessible from the command line.)

  • Intro to Computer Science (CS50 @Harvard edX)arrow-up-right

  • A Gentle Introduction to Machine Fundamentalsarrow-up-right

  • Teaching Carrow-up-right

  • David Beazley: Python Tutorialsarrow-up-right

  • CS231n: Python Numpy Tutorialarrow-up-right

  • Pyret: A Python Learning Environmentarrow-up-right

  • Java Resourcesarrow-up-right

  • Java Ranch: A Community for Java Beginnersarrow-up-right

  • Intro to Programming in Java @Princetonarrow-up-right

  • Head First Javaarrow-up-right

  • Java in a Nutshellarrow-up-right

  • Java Programming for Complete Beginners in 250 Stepsarrow-up-right

  • Andrew Ng's Machine-Learning Class on YouTubearrow-up-right
    Geoff Hinton's Neural Networks Class on YouTubearrow-up-right
    Patrick Winston's Introduction to Artificial Intelligence @MITarrow-up-right
    Andrej Karpathy's Convolutional Neural Networks Class at Stanfordarrow-up-right
    Calculus Made Easy, by Silvanus P. Thompsonarrow-up-right
    Seeing Theory: A Visual Introduction to Probability and Statisticsarrow-up-right
    Andrew Ng's 6-Part Review of Linear Algebraarrow-up-right
    Khan Academy's Linear Algebra Coursearrow-up-right
    Scratch: A Visual Programming Environment From MITarrow-up-right
    Learn to Program (Ruby)arrow-up-right
    Grasshopper: A Mobile App to Learn Basic Coding (Javascript)arrow-up-right
    Intro to the Command Linearrow-up-right
    Theanoarrow-up-right
    Kerasarrow-up-right
    Lasagnearrow-up-right
    Learn Python the Hard Wayarrow-up-right
    Google's Python Classarrow-up-right
    Udemy: Complete Python 3 Masterclass Journeyarrow-up-right
    MIT: Introduction to Computer Science and Python Programmingarrow-up-right
    Think Java: Interactive Web-based Dev Environmentarrow-up-right
    Learn Java The Hard Wayarrow-up-right
    Introduction to JShellarrow-up-right
    JShell in 5 Minutesarrow-up-right
    examplesarrow-up-right
    Quickstart
    herearrow-up-right
    Stackoverflowarrow-up-right
    Math Stackexchangearrow-up-right
    run IntelliJ example
    herearrow-up-right
    Dropoutarrow-up-right

    f_i(x) = x_i / (1+

    x_i

    )

    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    Empirical Evaluation of Rectified Activations in Convolutional Networkarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    hashtag
    Preparing Data for Learning and Prediction

    Unlike other machine learning or deep learning frameworks, DL4J treats the tasks of loading data and training algorithms as separate processes. You don't just point the model at data saved somewhere on disk, you load the data using DataVec. This gives you a lot more flexibility, and retains the convenience of simple data loading.

    Before the algorithm can start learning, you have to prepare the data, even if you already have a trained model. Preparing data means loading it and putting it in the right shape and value range (e.g. normalization, zero-mean and unit variance). Building these processes from scratch is error prone, so use DataVec wherever possible.

    Deeplearning4j works with a lot of different data types, such as images, CSV, plain text, images, audio, video and, pretty much any other data type you can think of.

    To use DataVec, you will need one of the implementations of the RecordReaderarrow-up-right interface along with the RecordReaderDataSetIteratorarrow-up-right.

    Once you have a DataSetIteratorarrow-up-right, which is just a pattern that describes sequential access to data, you can use it to retrieve the data in a format suited for training a neural net model.

    hashtag
    Normalizing Data

    Neural networks work best when the data they're fed is normalized, constrained to a range between -1 and 1. There are several reasons for that. One is that nets are trained using gradient descentarrow-up-right, and their activation functions usually having an active range somewhere between -1 and 1. Even when using an activation function that doesn't saturate quickly, it is still good practice to constrain your values to this range to improve performance.

    Normalizing data is pretty easy in DL4J. Decide how you want to normalize your data, and set the corresponding DataNormalizationarrow-up-right up as a preprocessor for your DataSetIteratorarrow-up-right.

    The ImagePreProcessingScaler is obviously a good choice for image data. The NormalizerMinMaxScaler is a good choice if you have a uniform range along all dimensions of your input data, and NormalizerStandardize is what you would usually use in other cases.

    If you need other types of normalization, you are also free to implement the DataNormalization interface.

    If you use NormalizerStandardize, note that this is a normalizer that depends on statistics that it extracts from the data. So you will have to save those statistics along with the model to restore them when you restore your model.

    hashtag
    DataSets, INDArrays and Mini-Batches

    As the name suggests, a DataSetIterator returns DataSetarrow-up-right objects. DataSet objects are containers for the features and labels of your data. But they aren't constrained to holding just a single example at once. A DataSet can contain as many examples as needed.

    It does that by keeping the values in several instances of INDArrayarrow-up-right: one for the features of your examples, one for the labels and two additional ones for masking, if you are using timeseries data (see Using RNNs / Masking for more information).

    An INDArray is one of the n-dimensional arrays, or tensors, used in ND4J. In the case of the features, it is a matrix of the size Number of Examples x Number of Features. Even with only a single example, it will have this shape.

    Why doesn't it contain all of the data examples at once?

    This is another important concept for deep learning: mini-batching. In order to produce accurate results, a lot of real-world training data is often needed. Often that is more data than can fit in available memory, so storing it in a single DataSet sometimes isn't possible. But even if there is enough data storage, there is another important reason not to use all of your data at once. With mini-batches you can get more updates to your model in a single epoch.

    So why bother having more than one example in a DataSet? Since the model is trained using gradient descentarrow-up-right, it requires a good gradient to learn how to minimize error. Using only one example at a time will create a gradient that only takes errors produced with the current example into consideration. This would make the learning behavior erratic, slow down the learning, and may not even lead to a usable result.

    A mini-batch should be large enough to provide a representative sample of the real world (or at least your data). That means that it should always contain all of the classes that you want to predict and that the count of those classes should be distributed in approximately the same way as they are in your overall data.

    hashtag
    Building a Neural Net Model

    DL4J gives data scientists and developers tools to build a deep neural networks on a high level using concepts like layer. It employs a builder pattern in order to build the neural net declaratively, as you can see in this (simplified) example:

    If you are familiar with other deep learning frameworks, you will notice that this looks a bit like Keras.

    Unlike other frameworks, DL4J splits the optimization algorithm from the updater algorithm. This allows for flexibility as you seek a combination of optimizer and updater that works best for your data and problem.

    Besides the DenseLayerarrow-up-right and OutputLayerarrow-up-right that you have seen in the example above, there are several other layer typesarrow-up-right, like GravesLSTM, ConvolutionLayer, RBM, EmbeddingLayer, etc. Using those layers you can define not only simple neural networks, but also recurrent and convolutionalarrow-up-right networks.

    hashtag
    Training a Model

    After configuring your neural, you will have to train the model. The simplest case is to simply call the .fit() method on the model configuration with your DataSetIterator as an argument. This will train the model on all of your data once. A single pass over the entire dataset is called an epoch. DL4J has several different methods for passing through the data more than just once.

    The simplest way, is to reset your DataSetIterator and loop over the fit call as many times as you want. This way you can train your model for as many epochs as you think is a good fit.

    Yet another way would be to use an EarlyStoppingTrainerarrow-up-right. You can configure this trainer to run for as many epochs as you like and additionally for as long as you like. It will evaluate the performance of your network after each epoch (or what ever you have configured) and save the best performing version for later use.

    Also note that DL4J does not only support training just MultiLayerNetworks, but it also supports a more flexible ComputationGraph.

    hashtag
    Evaluating Model Performance

    As you train your model, you will want to test how well it performs. For that test, you will need a dedicated data set that will not be used for training but instead will only be used for evaluating your model. This data should have the same distribution as the real-world data you want to make predictions about with your model. The reason you can't simply use your training data for evaluation is because machine learning methods are prone to overfitting (getting good at making predictions about the training set, but not performing well on larger datasets).

    The Evaluationarrow-up-right class is used for evaluation. Slightly different methods apply to evaluating a normal feed forward networks or recurrent networks. For more details on using it, take a look at the corresponding examplesarrow-up-right.

    hashtag
    Troubleshooting a Neural Net Model

    Building neural networks to solve problems is an empirical process. That is, it requires trial and error. So you will have to try different settings and architectures in order to find a neural net configuration that performs well.

    DL4J provides a listener facility help you monitor your network's performance visually. You can set up listeners for your model that will be called after each mini-batch is processed. One of most often used listeners that DL4J ships out of the box is ScoreIterationListenerarrow-up-right. Check out all Listeners for more.

    While ScoreIterationListener will simply print the current error score for your network, HistogramIterationListener will start up a web UI that to provide you with a host of different information that you can use to fine tune your network configuration. See Visualize, Monitor and Debug Network Learning on how to interpret that data.

    See Troubleshooting neural nets for more information on how to improve results.

    DataVec
    ND4J
    hashtag
    Usage

    To add one or more listeners to a MultiLayerNetwork or ComputationGraph, use the addListener method:

    hashtag
    Available listeners

    hashtag
    EvaluativeListener

    [source]arrow-up-right

    This TrainingListener implementation provides simple way for model evaluation during training. It can be launched every Xth Iteration/Epoch, depending on frequency and InvocationType constructor arguments

    EvaluativeListener

    This callback will be invoked after evaluation finished

    iterationDone

    • param iterator Iterator to provide data for evaluation

    • param frequency Frequency (in number of iterations/epochs according to the invocation type) to perform evaluation

    • param type Type of value for ‘frequency’ - iteration end, epoch end, etc

    hashtag
    ScoreIterationListener

    [source]arrow-up-right

    Score iteration listener. Reports the score (value of the loss function )of the network during training every N iterations

    ScoreIterationListener

    • param printIterations frequency with which to print scores (i.e., every printIterations parameter updates)

    hashtag
    ComposableIterationListener

    [source]arrow-up-right

    A group of listeners

    hashtag
    CollectScoresIterationListener

    [source]arrow-up-right

    CollectScoresIterationListener simply stores the model scores internally (along with the iteration) every 1 or N iterations (this is configurable). These scores can then be obtained or exported.

    CollectScoresIterationListener

    Constructor for collecting scores with default saving frequency of 1

    iterationDone

    Constructor for collecting scores with the specified frequency.

    • param frequency Frequency with which to collect/save scores

    exportScores

    Export the scores in tab-delimited (one per line) UTF-8 format.

    exportScores

    Export the scores in delimited (one per line) UTF-8 format with the specified delimiter

    • param outputStream Stream to write to

    • param delimiter Delimiter to use

    exportScores

    Export the scores to the specified file in delimited (one per line) UTF-8 format, tab delimited

    • param file File to write to

    exportScores

    Export the scores to the specified file in delimited (one per line) UTF-8 format, using the specified delimiter

    • param file File to write to

    • param delimiter Delimiter to use for writing scores

    hashtag
    CheckpointListener

    [source]arrow-up-right

    CheckpointListener: The goal of this listener is to periodically save a copy of the model during training.. Model saving may be done:

    1. Every N epochs

    2. Every N iterations

    3. Every T time units (every 15 minutes, for example) Or some combination of the 3. Example 1: Saving a checkpoint every 2 epochs, keep all model files

    Example 2: Saving a checkpoint every 1000 iterations, but keeping only the last 3 models (all older model files will be automatically deleted)

    Example 3: Saving a checkpoint every 15 minutes, keeping the most recent 3 and otherwise every 4th checkpoint file:

    Note that you can mix these: for example, to save every epoch and every 15 minutes (independent of last save time): To save every epoch, and every 15 minutes, since the last model save use: Note that is this last example, the sinceLast parameter is true. This means the 15-minute counter will be reset any time a model is saved.

    CheckpointListener

    List all available checkpoints. A checkpoint is ‘available’ if the file can be loaded. Any checkpoint files that have been automatically deleted (given the configuration) will not be returned here.

    • return List of checkpoint files that can be loaded

    hashtag
    SharedGradient

    [source]arrow-up-right

    hashtag
    SleepyTrainingListener

    [source]arrow-up-right

    This TrainingListener implementation provides a way to “sleep” during specific Neural Network training phases. Suitable for debugging/testing purposes only.

    PLEASE NOTE: All timers treat time values as milliseconds. PLEASE NOTE: Do not use it in production environment.

    onEpochStart

    In this mode parkNanos() call will be used, to make process really idle

    hashtag
    CollectScoresListener

    [source]arrow-up-right

    A simple listener that collects scores to a list every N iterations. Can also optionally log the score.

    hashtag
    PerformanceListener

    [source]arrow-up-right

    Simple IterationListener that tracks time spend on training per iteration.

    PerformanceListener

    This method defines, if iteration number should be reported together with other data

    • param reportIteration

    • return

    hashtag
    ParamAndGradientIterationListener

    [source]arrow-up-right

    An iteration listener that provides details on parameters and gradients at each iteration during traning. Attempts to provide much of the same information as the UI histogram iteration listener, but in a text-based format (for example, when learning on a system accessed via SSH etc). i.e., is intended to aid network tuning and debugging This iteration listener is set up to calculate mean, min, max, and mean absolute value of each type of parameter and gradient in the network at each iteration.

    hashtag
    TimeIterationListener

    [source]arrow-up-right

    Time Iteration Listener. This listener displays into INFO logs the remaining time in minutes and the date of the end of the process. Remaining time is estimated from the amount of time for training so far, and the total number of iterations specified by the user

    TimeIterationListener

    Constructor

    • param iterationCount The global number of iteration for training (all epochs)

    [source]arrow-up-right

    1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]

    hashtag
    Convolution2D

    [source]arrow-up-right

    2D convolution layer

    hashtag
    Convolution3D

    [source]arrow-up-right

    3D convolution layer configuration

    hasBias

    An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].

    kernelSize

    The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

    stride

    Set stride size for 3D convolutions in (depth, height, width) order

    • param stride kernel size

    • return 3D convolution layer builder

    padding

    Set padding size for 3D convolutions in (depth, height, width) order

    • param padding kernel size

    • return 3D convolution layer builder

    dilation

    Set dilation size for 3D convolutions in (depth, height, width) order

    • param dilation kernel size

    • return 3D convolution layer builder

    dataFormat

    The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

    • param dataFormat Data format to use for activations

    setKernelSize

    Set kernel size for 3D convolutions in (depth, height, width) order

    • param kernelSize kernel size

    setStride

    Set stride size for 3D convolutions in (depth, height, width) order

    • param stride kernel size

    setPadding

    Set padding size for 3D convolutions in (depth, height, width) order

    • param padding kernel size

    setDilation

    Set dilation size for 3D convolutions in (depth, height, width) order

    • param dilation kernel size

    hashtag
    Deconvolution2D

    [source]arrow-up-right

    2D deconvolution layer configuration

    Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.

    See the paper by Matt Zeiler for details: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdfarrow-up-right

    For an intuitive guide to convolution arithmetic and shapes, see: https://arxiv.org/abs/1603.07285v1arrow-up-right

    hasBias

    Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size

    convolutionMode

    Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details

    • param convolutionMode Convolution mode for layer

    kernelSize

    Size of the convolution rows/columns

    • param kernelSize the height and width of the kernel

    hashtag
    Cropping1D

    [source]arrow-up-right

    Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom

    getOutputType

    • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

    setCropping

    Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

    build

    • param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

    hashtag
    Cropping2D

    [source]arrow-up-right

    Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right

    getOutputType

    • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

    • param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations

    setCropping

    Cropping amount for top/bottom/left/right (in that order). A length 4 array.

    build

    • param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.

    hashtag
    Cropping3D

    [source]arrow-up-right

    Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.

    getOutputType

    • param cropDepth Amount of cropping to apply to both depth boundaries of the input activations

    • param cropHeight Amount of cropping to apply to both height boundaries of the input activations

    • param cropWidth Amount of cropping to apply to both width boundaries of the input activations

    setCropping

    Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

    build

    • param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

    hashtag
    Imports

    hashtag
    Download the dataset

    UCI has a number of datasets available for machine learning, make sure you have enough space on your local disk. The UCI synthetic control dataset can be found at http://archive.ics.uci.edu/ml/datasets/synthetic+control+chart+time+seriesarrow-up-right. The code below will check if the data already exists and download the file.

    hashtag
    Iterating from disk

    Now that we’ve saved our dataset to a CSV sequence format, we need to set up a CSVSequenceRecordReader and iterator that will read our saved sequences and feed them to our network. If you have already saved your data to disk, you can run this code block (and remaining code blocks) as much as you want without preprocessing the dataset again. Convenient!

    hashtag
    Configuring a RNN for Classification

    Once everything needed is imported we can jump into the code. To build the neural network, we can use a set up like what is shown below. Because there are 784 timesteps and 10 class labels, nIn is set to 784 and nOut is set to 10 in the MultiLayerNetwork configuration.

    hashtag
    Training the classifier

    To train the model, pass the training iterator to the model’s fit() method. We can pass the number of epochs or passes through the training data directly to the fit() method.

    hashtag
    Model Evaluation

    Once training is complete we only a couple lines of code to evaluate the model on a test set. Using a test set to evaluate the model typically needs to be done in order to avoid overfitting on the training data. If we overfit on the training data, we have essentially fit to the noise in the data.

    An Evaluation class has more built-in methods if you need to extract a confusion matrix, and other tools are also available for calculating the Area Under Curve (AUC).

    hashtag
    Configuration Space

    Our goal of this tutorial is to tune the learning rate and the layer size. We can start by setting up the parameter space of the learning rate and the layer size. We will consider values between 0.0001 and 0.1 for the learning rate and integer values between 16 and 256 for the layer size.

    Next, we set up a MultiLayerSpace, which is similar in structure to the MultiLayerNetwork class we’ve seen below. Here, we can set the hyperparameters of the neural network model. However, we can set the learning rate and the number of hidden nodes using the ParameterSpaces we’ve initialized before and not a set value like the other hyperparameters.

    Lastly, we use the CandidateGenerator class to configure how candidate values of the learning rate and the layer size will be generated. In this tutorial, we will use random search; thus, values for the learning rate and the layer size will be generated uniformly within their ranges.

    hashtag
    Loading Data

    To obtain the data, we will use the built-in MnistDataProvider class and use two training epochs or complete passes through the data and a batch size of 64 for training.

    hashtag
    Optimization

    We’ve set how we are going to generate new values of the two hyperparameters we are considering but there still remains the question of how to evaluate them. We will use the accuracy score metric to evaluate different configurations of the hyperparameters so we initialize a EvaluationScoreFunction.

    We also want to set how long the hyperparameter search will last. There are infinite configurations of the learning rate and hidden layer size, since the learning rate space is continuous. Thus, we set a termination condition of 15 minutes.

    To save the best model, we can set the directory to save it in.

    Given all the configurations we have already set, we need to put them together using the OptimizationConfiguration. To execute the hyperparameter search, we initialize an IOptimizaitonRunner using the OptimizationConfiguration.

    Lastly, we can print out the details of the best model and the results.

    import org.apache.commons.io.FilenameUtils;
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
    import org.deeplearning4j.earlystopping.EarlyStoppingConfiguration;
    import org.deeplearning4j.earlystopping.EarlyStoppingModelSaver;
    import org.deeplearning4j.earlystopping.EarlyStoppingResult;
    import org.deeplearning4j.earlystopping.saver.LocalFileModelSaver;
    import org.deeplearning4j.earlystopping.scorecalc.DataSetLossCalculator;
    import org.deeplearning4j.earlystopping.termination.MaxEpochsTerminationCondition;
    import org.deeplearning4j.earlystopping.termination.MaxTimeIterationTerminationCondition;
    import org.deeplearning4j.earlystopping.trainer.EarlyStoppingTrainer;
    import org.deeplearning4j.eval.Evaluation
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration
    import org.deeplearning4j.nn.conf.Updater
    import org.deeplearning4j.nn.conf.layers.DenseLayer
    import org.deeplearning4j.nn.conf.layers.OutputLayer
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.dataset.DataSet
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction
    import org.slf4j.Logger
    import org.slf4j.LoggerFactory
    
    import java.io.File;
    import java.util.concurrent.TimeUnit;
    
    val numRows = 28
    val numColumns = 28
    val outputNum = 10 
    val batchSize = 128
    val rngSeed = 123
    
    val mnistTrain: DataSetIterator = new MnistDataSetIterator(batchSize, true, rngSeed)
    val mnistTest: DataSetIterator = new MnistDataSetIterator(batchSize, false, rngSeed)
    val conf : MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
            .seed(rngSeed) //include a random seed for reproducibility
            // use stochastic gradient descent as an optimization algorithm
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Nesterovs(0.006)) // Nesterovs update scheme with 0.006 learning rate
            .l2(1e-4)
            .list()
            .layer(0, new DenseLayer.Builder() //create the first, input layer with xavier initialization
                    .nIn(numRows * numColumns)
                    .nOut(1000)
                    .activation(Activation.RELU)
                    .weightInit(WeightInit.XAVIER)
                    .build())
            .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) //create hidden layer
                    .nIn(1000)
                    .nOut(outputNum)
                    .activation(Activation.SOFTMAX)
                    .weightInit(WeightInit.XAVIER)
                    .build())
            .build()
                    
    val model : MultiLayerNetwork = new MultiLayerNetwork(conf)
    val tempDir : String = System.getProperty("java.io.tmpdir")
    val exampleDirectory : String = FilenameUtils.concat(tempDir, "DL4JEarlyStoppingExample/")
    val dirFile : File = new File(exampleDirectory)
    dirFile.mkdir()
    
    val saver  = new LocalFileModelSaver(exampleDirectory)
    
    val esConf  = new EarlyStoppingConfiguration.Builder()
    		.epochTerminationConditions(new MaxEpochsTerminationCondition(10))
    		.iterationTerminationConditions(new MaxTimeIterationTerminationCondition(5, TimeUnit.MINUTES))
    		.scoreCalculator(new DataSetLossCalculator(mnistTest, true))
            .evaluateEveryNEpochs(1)
    		.modelSaver(saver)
    		.build()
    
    val trainer  = new EarlyStoppingTrainer(esConf,conf,mnistTrain)
    val result = trainer.fit()
    println("Termination reason: " + result.getTerminationReason())
    println("Termination details: " + result.getTerminationDetails())
    println("Total epochs: " + result.getTotalEpochs())
    println("Best epoch number: " + result.getBestModelEpoch())
    println("Score at best epoch: " + result.getBestModelScore())
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
    import org.nd4j.evaluation.classification.Evaluation
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration
    import org.nd4j.linalg.learning.config.Nesterovs
    import org.deeplearning4j.nn.conf.layers.DenseLayer
    import org.deeplearning4j.nn.conf.layers.OutputLayer
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.dataset.DataSet
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction
    import org.slf4j.Logger
    import org.slf4j.LoggerFactory
    //number of rows and columns in the input pictures
    val numRows = 28
    val numColumns = 28
    val outputNum = 10 // number of output classes
    val batchSize = 128 // batch size for each epoch
    val rngSeed = 123 // random number seed for reproducibility
    val numEpochs = 15 // number of epochs to perform
    
    val conf: MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
        //include a random seed for reproducibility
        .seed(rngSeed) 
        //specify the learning rate and the rate of change of the learning rate.
        .updater(new Nesterovs(0.006, 0.9))
        .l2(1e-4)
        .list()
        //create the first, input layer with xavier initialization
        .layer(0, new DenseLayer.Builder() 
                .nIn(numRows * numColumns)
                .nOut(1000)
                .activation(Activation.RELU)
                .weightInit(WeightInit.XAVIER)
                .build())
        //create hidden layer
        .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) 
                .nIn(1000)
                .nOut(outputNum)
                .activation(Activation.SOFTMAX)
                .weightInit(WeightInit.XAVIER)
                .build())
        .build()
    
    val model = new MultiLayerNetwork(conf)
    model.init()
    //print the score with every 10 iteration
    model.setListeners(new ScoreIterationListener(10))
    //Get the DataSetIterators:
    val mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed)
    val mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed)
    // the simplest way to do multiple epochs is to pass them to `fit`
    model.fit(mnistTrain, numEpochs)
    
    /* try below if you want to check the current number of epoch
    for (i <- 1 to numEpochs) {
        println("Epoch " + i + " / " + numEpochs)
        model.fit(mnistTrain)
    }
    */
    val evaluation = model.evaluate[Evaluation](mnistTest)
    
    // print the basic statistics about the trained classifier
    println("Accuracy: "+evaluation.accuracy())
    println("Precision: "+evaluation.precision())
    println("Recall: "+evaluation.recall())
    
    // in more complex scenarios, a confusion matrix is quite helpful
    println(evaluation.confusionToString())
    GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
        // add hyperparameters and other layers
        .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
        // add more layers and output
        .build();
              ⎧  1, if x >  1
     f(x) =   ⎨ -1, if x < -1
              ⎩  x, otherwise
    MultiLayerConfiguration conf = 
        new NeuralNetConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Nesterovs(learningRate, 0.9))
            .list(
                new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes).activation("relu").build(),
                new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax").nIn(numHiddenNodes).nOut(numOutputs).build()
            ).backprop(true).build();
    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    model.init();
    //print the score with every 1 iteration
    model.setListeners(new ScoreIterationListener(1));
    public EvaluativeListener(@NonNull DataSetIterator iterator, int frequency) 
    public void iterationDone(Model model, int iteration, int epoch) 
    public ScoreIterationListener(int printIterations) 
    public CollectScoresIterationListener() 
    public void iterationDone(Model model, int iteration, int epoch) 
    public void exportScores(OutputStream outputStream) throws IOException 
    public void exportScores(OutputStream outputStream, String delimiter) throws IOException 
    public void exportScores(File file) throws IOException 
    public void exportScores(File file, String delimiter) throws IOException 
    .keepAll() //Don't delete any models
    .saveEveryNEpochs(2)
    .build()
    }
    .keepLast(3)
    .saveEveryNIterations(1000)
    .build();
    }
    .keepLastAndEvery(3, 4)
    .saveEvery(15, TimeUnit.MINUTES)
    .build();
    }
    public CheckpointListener build()
    public void onEpochStart(Model model) 
    public PerformanceListener build() 
    public TimeIterationListener(int iterationCount) 
    public boolean hasBias() 
    public Builder kernelSize(int... kernelSize) 
    public Builder stride(int... stride) 
    public Builder padding(int... padding) 
    public Builder dilation(int... dilation) 
    public Builder dataFormat(DataFormat dataFormat) 
    public void setKernelSize(int... kernelSize) 
    public void setStride(int... stride) 
    public void setPadding(int... padding) 
    public void setDilation(int... dilation) 
    public boolean hasBias() 
    public Builder convolutionMode(ConvolutionMode convolutionMode) 
    public Builder kernelSize(int... kernelSize) 
    public InputType getOutputType(int layerIndex, InputType inputType) 
    public void setCropping(int... cropping) 
    public Cropping1D build() 
    public InputType getOutputType(int layerIndex, InputType inputType) 
    public void setCropping(int... cropping) 
    public Cropping2D build() 
    public InputType getOutputType(int layerIndex, InputType inputType) 
    public void setCropping(int... cropping) 
    public Cropping3D build() 
    import org.deeplearning4j.eval.Evaluation
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration
    import org.deeplearning4j.nn.conf.GradientNormalization
    import org.deeplearning4j.nn.conf.Updater
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.nn.conf.layers.{DenseLayer, LSTM, OutputLayer, RnnOutputLayer}
    import org.deeplearning4j.nn.conf.distribution.UniformDistribution
    import org.deeplearning4j.nn.conf.layers.GravesLSTM
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer
    import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    
    import org.datavec.api.split.NumberedFileInputSplit
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader
    
    import org.nd4j.linalg.dataset.DataSet
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    
    import org.slf4j.Logger
    import org.slf4j.LoggerFactory
    import org.apache.commons.io.{FileUtils, IOUtils}
    
    import java.nio.charset.Charset
    import java.util.Random
    import java.net.URL
    import java.io.File
    val cache = "~/.deeplearning4j" // your cache directory
    val dataPath = new File(cache, "/uci_synthetic_control/")
    
    if(!dataPath.exists()) {
        val url = "https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data"
        println("Downloading file...")
        val data = IOUtils.toString(new URL(url), Charset.defaultCharset())
        val lines = data.split("\n")
    
        var lineCount = 0;
        var index = 0
    
        val linesList = new mutable.ListBuffer[String]
        println("Extracting file...")
    
        for (line <- lines) {
            val count = new java.lang.Integer(lineCount / 100)
            var newLine: String = null
            newLine = line.replaceAll("\\s+", ", " + count.toString() + "\n")
            newLine = line + ", " + count.toString()
            linesList.addOne(newLine)
            lineCount += 1
        }
    
        for (line <- linesList) {
            val outPath = new File(dataPath, index + ".csv")
            FileUtils.writeStringToFile(outPath, line, Charset.defaultCharset())
            index += 1
        }
        println("Done.")
    } else {
        println("File already exists.")
    }
    val batchSize = 128
    val numLabelClasses = 6
    
    // training data
    val trainRR = new CSVSequenceRecordReader(0, ", ")
    trainRR.initialize(new NumberedFileInputSplit(dataPath.getAbsolutePath() + "/%d.csv", 0, 449))
    val trainIter = new SequenceRecordReaderDataSetIterator(trainRR, batchSize, numLabelClasses, 1)
    
    // testing data
    val testRR = new CSVSequenceRecordReader(0, ", ")
    testRR.initialize(new NumberedFileInputSplit(dataPath.getAbsolutePath() + "/%d.csv", 450, 599))
    val testIter = new SequenceRecordReaderDataSetIterator(testRR, batchSize, numLabelClasses, 1)
    val conf = new NeuralNetConfiguration.Builder()
        .seed(123)    //Random number generator seed for improved repeatability. Optional.
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .weightInit(WeightInit.XAVIER)
        .updater(new Nesterovs(0.005))
        .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)  //Not always required, but helps with this data set
        .gradientNormalizationThreshold(0.5)
        .list()
        .layer(0, new LSTM.Builder().activation(Activation.TANH).nIn(1).nOut(10).build())
        .layer(1, new RnnOutputLayer.Builder(LossFunction.MCXENT)
                .activation(Activation.SOFTMAX).nIn(10).nOut(numLabelClasses).build())
        .build();
    
    val model: MultiLayerNetwork = new MultiLayerNetwork(conf)
    model.setListeners(new ScoreIterationListener(20))
    val numEpochs = 1
    model.fit(trainIter, numEpochs)
    val evaluation = model.evaluate[Evaluation](testIter)
    
    // print the basic statistics about the trained classifier
    println("Accuracy: "+evaluation.accuracy())
    println("Precision: "+evaluation.precision())
    println("Recall: "+evaluation.recall())
    import org.deeplearning4j.api.storage.StatsStorage
    import org.deeplearning4j.arbiter.conf.updater.SgdSpace
    import org.deeplearning4j.arbiter.MultiLayerSpace
    import org.deeplearning4j.arbiter.layers.DenseLayerSpace
    import org.deeplearning4j.arbiter.layers.OutputLayerSpace
    import org.deeplearning4j.arbiter.optimize.api.CandidateGenerator
    import org.deeplearning4j.arbiter.optimize.api.OptimizationResult
    import org.deeplearning4j.arbiter.optimize.api.ParameterSpace
    import org.deeplearning4j.arbiter.optimize.api.data.DataProvider
    import org.deeplearning4j.arbiter.data.MnistDataProvider
    import org.deeplearning4j.arbiter.scoring.impl.EvaluationScoreFunction
    import org.deeplearning4j.arbiter.optimize.api.saving.ResultReference
    import org.deeplearning4j.arbiter.optimize.api.saving.ResultSaver
    import org.deeplearning4j.arbiter.optimize.api.score.ScoreFunction
    import org.deeplearning4j.arbiter.optimize.api.termination.MaxCandidatesCondition
    import org.deeplearning4j.arbiter.optimize.api.termination.MaxTimeCondition
    import org.deeplearning4j.arbiter.optimize.api.termination.TerminationCondition
    import org.deeplearning4j.arbiter.optimize.config.OptimizationConfiguration
    import org.deeplearning4j.arbiter.optimize.generator.RandomSearchGenerator
    import org.deeplearning4j.arbiter.optimize.parameter.continuous.ContinuousParameterSpace
    import org.deeplearning4j.arbiter.optimize.parameter.integer.IntegerParameterSpace
    import org.deeplearning4j.arbiter.optimize.runner.IOptimizationRunner
    import org.deeplearning4j.arbiter.optimize.runner.LocalOptimizationRunner
    import org.deeplearning4j.arbiter.saver.local.FileModelSaver
    import org.deeplearning4j.arbiter.scoring.impl.TestSetAccuracyScoreFunction
    import org.deeplearning4j.arbiter.task.MultiLayerNetworkTaskCreator
    import org.nd4j.evaluation.classification.Evaluation.Metric
    import org.deeplearning4j.datasets.iterator.MultipleEpochsIterator
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    import org.nd4j.linalg.lossfunctions.LossFunctions
    import org.nd4j.shade.jackson.annotation.JsonProperty
    import org.nd4j.linalg.factory.Nd4j
    import org.nd4j.linalg.cpu.nativecpu.CpuAffinityManager
    
    
    import java.io.File
    import java.io.IOException
    import java.util.List
    import java.util.Map
    import java.util.concurrent.TimeUnit
    val learningRateHyperparam  = new ContinuousParameterSpace(0.0001, 0.1)
    val layerSizeHyperparam  = new IntegerParameterSpace(16,256)            
    
    
    val hyperparameterSpace  = new MultiLayerSpace.Builder()
        //These next few options: fixed values for all models
        .weightInit(WeightInit.XAVIER)
        .l2(0.0001)
        //Learning rate hyperparameter: search over different values, applied to all models
        .updater(new SgdSpace(learningRateHyperparam))
        .addLayer( new DenseLayerSpace.Builder()
                //Fixed values for this layer:
                .nIn(784)  //Fixed input: 28x28=784 pixels for MNIST
                .activation(Activation.LEAKYRELU)
                //One hyperparameter to infer: layer size
                .nOut(layerSizeHyperparam)
                .build())
        .addLayer( new OutputLayerSpace.Builder()
                .nOut(10)
                .activation(Activation.SOFTMAX)
                .lossFunction(LossFunctions.LossFunction.MCXENT)
                .build())
        .build()
        
    val candidateGenerator = new RandomSearchGenerator(hyperparameterSpace, null)   
    val nTrainEpochs = 2
    val batchSize = 64
    
    val dataProvider = new MnistDataProvider(nTrainEpochs, batchSize)
    val scoreFunction = new EvaluationScoreFunction(Metric.ACCURACY)
    val terminationConditions = { new MaxTimeCondition(15, TimeUnit.MINUTES)}
    val baseSaveDirectory = "arbiterExample/"
    val f = new File(baseSaveDirectory)
    if(f.exists()) f.delete()
    f.mkdir()
    val modelSaver = new FileModelSaver(baseSaveDirectory)
    val configuration = new OptimizationConfiguration.Builder()
                    .candidateGenerator(candidateGenerator)
                    .dataProvider(dataProvider)
                    .modelSaver(modelSaver)
                    .scoreFunction(scoreFunction)
                    .terminationConditions(terminationConditions)
                    .build()
    
    val runner = new LocalOptimizationRunner(configuration, new MultiLayerNetworkTaskCreator())
    
    //Start the hyperparameter optimization
    
    runner.execute()
    val s = "Best score: " + runner.bestScore() + "\n" + "Index of model with best score: " + runner.bestScoreCandidateIndex() + "\n" + "Number of configurations evaluated: " + runner.numCandidatesCompleted() + "\n"
    println(s)
    
    
    //Get all results, and print out details of the best result:
    val indexOfBestResult = runner.bestScoreCandidateIndex()
    val allResults = runner.getResults()
    
    val bestResult = allResults.get(indexOfBestResult).getResult()
    val bestModel = bestResult.asInstanceOf[MultiLayerNetwork]
    
    
    println("\n\nConfiguration of best model:\n")
    println(bestModel.getLayerWiseConfigurations().toJson())
    Overview of Computation Graph

    DL4J has two types of networks comprised of multiple layers:

    • The MultiLayerNetworkarrow-up-right, which is essentially a stack of neural network layers (with a single input layer and single output layer), and

    • The ComputationGrapharrow-up-right, which allows for greater freedom in network architectures

    Specifically, the ComputationGraph allows for networks to be built with the following features:

    • Multiple network input arrays

    • Multiple network outputs (including mixed classification/regression architectures)

    • Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)

    As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.

    hashtag
    Computation Graph: Some Example Use Cases

    Examples of some architectures that can be built using ComputationGraph include:

    • Multi-task learning architectures

    • Recurrent neural networks with skip connections

    • GoogLeNetarrow-up-right, a complex type of convolutional netural network for image classification

    hashtag
    Configuring a Computation Graph

    hashtag
    Types of Graph Vertices

    The basic idea is that in the ComputationGraph, the core building block is the GraphVertexarrow-up-right, instead of layers. Layers (or, more accurately the LayerVertexarrow-up-right objects), are but one type of vertex in the graph. Other types of vertices include:

    • Input Vertices

    • Element-wise operation vertices

    • Merge vertices

    • Subset vertices

    • Preprocessor vertices

    These types of graph vertices are described briefly below.

    LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...) method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an InputPreProcessorarrow-up-right (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...) method.

    InputVertex: Input vertices are specified by the addInputs(String...) method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).

    ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.

    MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.

    SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1"): this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.

    PreProcessorVertex: Occasionally, you might want to the functionality of an InputPreProcessorarrow-up-right without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.

    Finally, it is also possible to define custom graph vertices by implementing both a configurationarrow-up-right and implementationarrow-up-right class for your custom GraphVertex.

    hashtag
    Example 1: Recurrent Network with Skip Connections

    Suppose we wish to build the following recurrent neural network architecture:

    For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:

    Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.

    hashtag
    Example 2: Multiple Inputs and Merge Vertex

    Consider the following architecture:

    Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.

    To build the above network, we use the following configuration:

    hashtag
    Example 3: Multi-Task Learning

    In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.

    In this case, the network configuration is:

    hashtag
    Automatically Adding PreProcessors and Calculating nIns

    One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...) method in the configuration.

    The setInputType method has two effects:

    1. It will automatically add any InputPreProcessorarrow-up-rights as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.

    2. It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...) functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.

    For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))

    hashtag
    Training Data for ComputationGraph

    There are two types of data that can be used with the ComputationGraph.

    hashtag
    DataSet and the DataSetIterator

    The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).

    A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see this for more details). A DataSetIterator is essentially an iterator over DataSet objects.

    hashtag
    MultiDataSet and the MultiDataSetIterator

    MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.

    There are currently two ways to use a MultiDataSetIterator:

    • By implementing the MultiDataSetIteratorarrow-up-right interface directly

    • By using the RecordReaderMultiDataSetIteratorarrow-up-right in conjuction with DataVec record readers

    The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:

    • Multiple DataVec RecordReaders may be used simultaneously

    • The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader

    • It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output

    • It is possible to convert single columns from a class index to a one-hot representation

    Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find these unit testsarrow-up-right to be useful.

    hashtag
    Example 1: Regression Data (RecordReaderMultiDataSetIterator)

    Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:

    hashtag
    Example 2: Classification and Multi-Task Learning (RecordReaderMultiDataSetIterator)

    Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:

    • Input file: myInput.csv, and we want to use all columns as input (without modification)

    • Output file: myOutput.csv.

      • Network output 1 - regression: columns 0 to 3

      • Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.

    In this case, we can build our iterator as follows:

    Basic Autoencoder

    Anomaly Detection Using Reconstruction Error

    Why use an autoencoder? In practice, autoencoders are often applied to data denoising and dimensionality reduction. This works great for representation learning and a little less great for data compression.

    In deep learning, an autoencoder is a neural network that “attempts” to reconstruct its input. It can serve as a form of feature extraction, and autoencoders can be stacked to create “deep” networks. Features generated by an autoencoder can be fed into other algorithms for classification, clustering, and anomaly detection.

    Autoencoders are also useful for data visualization when the raw input data has high dimensionality and cannot easily be plotted. By lowering the dimensionality, the output can sometimes be compressed into a 2D or 3D space for better data exploration.

    hashtag
    How do autoencoders work?

    Autoencoders are comprised of:

    1. Encoding function (the “encoder”)

    2. Decoding function (the “decoder”)

    3. Distance function (a “loss function”)

    An input is fed into the autoencoder and turned into a compressed representation. The decoder then learns how to reconstruct the original input from the compressed representation, where during an unsupervised training process, the loss function helps to correct the error produced by the decoder. This process is automatic (hence “auto”-encoder); i.e. it does not require human intervention.

    hashtag
    What does this tutorial teach?

    Now that you know how to create different network configurations with MultiLayerNetwork and ComputationGraph, we will construct a “stacked” autoencoder that performs anomaly detection on MNIST digits without pretraining. The goal is to identify outlier digits; i.e. digits that are unusual and atypical. Identification of items, events or observations that “stand out” from the norm of a given dataset is broadly known as anomaly detection. Anomaly detection does not require a labeled dataset, and can be undertaken with unsupervised learning, which is helpful because most of the world’s data is not labeled.

    This type of anomaly detection uses reconstruction error to measure how well the decoder is performing. Stereotypical examples should have low reconstruction error, whereas outliers should have high reconstruction error.

    hashtag
    What is anomaly detection good for?

    Network intrusion, fraud detection, systems monitoring, sensor network event detection (IoT), and unusual trajectory sensing are examples of anomaly detection applications.

    hashtag
    Imports

    hashtag
    The stacked autoencoder

    The following autoencoder uses two stacked dense layers for encoding. The MNIST digits are transformed into a flat 1D array of length 784 (MNIST images are 28x28 pixels, which equals 784 when you lay them end to end).

    784 → 250 → 10 → 250 → 784

    hashtag
    Using the MNIST iterator

    The MNIST iterator, like most of Deeplearning4j’s built-in iterators, extends the DataSetIterator class. This API allows for simple instantiation of datasets and the automatic downloading of data in the background.

    hashtag
    Unsupervised training

    Now that the network configruation is set up and instantiated along with our MNIST test/train iterators, training takes just a few lines of code. The fun begins.

    Earlier, we attached a ScoreIterationListener to the model by using the setListeners() method. Depending on the browser used to run this notebook, you can open the debugger/inspector to view listener output. This output is redirected to the console since the internals of Deeplearning4j use SL4J for logging, and the output is being redirected by Zeppelin. This helps reduce clutter in the notebooks.

    hashtag
    Evaluating the model

    Now that the autoencoder has been trained, we’ll evaluate the model on the test data. Each example will be scored individually, and a map will be composed that relates each digit to a list of (score, example) pairs.

    Finally, we will calculate the N best and N worst scores per digit.

    Instacart Single Task Example

    This tutorial will be similar to the Instacart Multitask tutorial. The only difference is that we will not use multitasking to train our neural network. Recall the data originially comes from a Kaggle challenge (kaggle.com/c/instacart-market-basket-analysisarrow-up-right). We removed users that only made 1 order using the instacart app and then took 5000 users out of the remaining to be part of the data for this tutorial.

    For each order, we have information on the product the user purchased. For example, there is information on the product name, what aisle it is found in, and the department it falls under. To construct features, we extracted indicators representing whether or not a user purchased a product in the given aisles for each order. In total there are 134 aisles. The targets were whether or not a user will buy a product in the breakfast department in the next order. As mentioned, we will not use any auxiliary targets.

    Because of temporal dependencies within the data, we used a LSTM network for our model.

    hashtag
    Imports

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory.

    We will then extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directories, and copy the files from the tar.gz file.

    hashtag
    DataSetIterators

    Next we will convert the raw data (csv files) into DataSetIterators, which will be fed into a neural network. Our training data will have 4000 examples which will be represented by a single DataSetIterator, and the testing data will have 1000 examples which will be represented by a separate DataSetIterator.

    We first initialize CSVSequenceRecordReaders, which will parse the raw data into record-like format. Then the SequenceRecordReaderDataSetIterators can be created using the RecordReaders. Since each example has sequences of different lengths, an alignment mode of align end is needed.

    hashtag
    Neural Network

    The next task is to set up the neural network configuration. We will use a MultiLayerNetwork and the configuration will be similar to the multitask model from before. Again we use one GravesLSTM layer but this time only one RnnOutputLayer.

    We must then initialize the neural network.

    hashtag
    Model Training

    To train the model, we use 5 epochs with a simple call to the fit method of the MultiLayerNetwork.

    hashtag
    Model Evaluation

    We will now evaluate our trained model. Note that we will use the area under the curve (AUC) metric of the ROC curve.

    We achieve a AUC of 0.64!

    Build from Source

    Instructions to build all DL4J libraries from source.

    Note: This guide is outdated. We are working on updating it as soon as possible.

    hashtag
    Build Locally from Master

    NOTE: MOST USERS SHOULD USE THE RELEASES ON MAVEN CENTRAL AS PER THE QUICK START GUIDE, AND NOT BUILD FROM SOURCE

    Unless you have a very good reason to build from source (such as developing new features - excluding custom layers, custom activation functions, custom loss functions, etc - all of which can be added without modifying DL4J directly) then you shouldn't build from source. Building from source can be quite complex, with no benefit in a lot of cases.

    For those developers and engineers who prefer to use the most up-to-date version of Deeplearning4j or fork and build their own version, these instructions will walk you through building and installing Deeplearning4j. The preferred installation destination is to your machine's local maven repository. If you are not using the master branch, you can modify these steps as needed (i.e.: switching GIT branches and modifying the build-dl4j-stack.sh script).

    Building locally requires that you build the entire Deeplearning4j stack which includes:

    Note that Deeplearning4j is designed to work on most platforms (Windows, OS X, and Linux) and is also includes multiple "flavors" depending on the computing architecture you choose to utilize. This includes CPU (OpenBLAS, MKL, ATLAS) and GPU (CUDA). The DL4J stack also supports x86 and PowerPC architectures.

    hashtag
    Prerequisites

    Your local machine will require some essential software and environment variables set before you try to build and install the DL4J stack. Depending on your platform and the version of your operating system, the instructions may vary in getting them to work. This software includes:

    • git

    • cmake (3.2 or higher)

    • OpenMP

    • gcc (4.9 or higher)

    Architecture-specific software includes:

    CPU options:

    • Intel MKL

    • OpenBLAS

    • ATLAS

    GPU options:

    • CUDA

    IDE-specific requirements:

    • IntelliJ Lombok plugin

    DL4J testing dependencies:

    • dl4j-test-resources

    hashtag
    Installing Prerequisite Tools

    hashtag
    Linux

    Ubuntu Assuming you are using Ubuntu as your flavor of Linux and you are running as a non-root user, follow these steps to install prerequisite software:

    hashtag
    OS X

    Homebrew is the accepted method of installing prerequisite software. Assuming you have Homebrew installed locally, follow these steps to install your necessary tools.

    First, before using Homebrew we need to ensure an up-to-date version of Xcode is installed (it is used as a primary compiler):

    Finally, install prerequisite tools:

    Note: You can not use clang. You also can not use a new version of gcc. If you have a newer version of gcc, please switch versions with

    hashtag
    Windows

    libnd4j depends on some Unix utilities for compilation. So in order to compile it you will need to install .

    After you have setup Msys2 by following , you will have to install some additional development packages. Start the msys2 shell and setup the dev environment with:

    This will install the needed dependencies for use in the msys2 shell.

    You will also need to setup your PATH environment variable to include C:\msys64\mingw64\bin (or where ever you have decided to install msys2). If you have IntelliJ (or another IDE) open, you will have to restart it before this change takes effect for applications started through them. If you don't, you probably will see a "Can't find dependent libraries" error.

    hashtag
    Installing Prerequisite Architectures

    Once you have installed the prerequisite tools, you can now install the required architectures for your platform.

    hashtag
    Intel MKL

    Of all the existing architectures available for CPU, Intel MKL is currently the fastest. However, it requires some "overhead" before you actually install it.

    1. Apply for a license at

    2. After a few steps through Intel, you will receive a download link

    3. Download and install Intel MKL using

    hashtag
    OpenBLAS

    Linux

    Ubuntu Assuming you are using Ubuntu, you can install OpenBLAS via:

    You will also need to ensure that /opt/OpenBLAS/lib (or any other home directory for OpenBLAS) is on your PATH. In order to get OpenBLAS to work with Apache Spark, you will also need to do the following:

    CentOS Enter the following in your terminal (or ssh session) as a root user:

    After that, you should see a lot of activity and installs on the terminal. To verify that you have, for example, gcc, enter this line:

    For more complete instructions, .

    OS X

    You can install OpenBLAS on OS X with Homebrew:

    Windows

    An OpenBLAS package is available for msys2. You can install it using the pacman command.

    hashtag
    ATLAS

    Linux

    Ubuntu An apt package is available for ATLAS on Ubuntu:

    CentOS You can install ATLAS on CentOS using:

    OS X

    Installing ATLAS on OS X is a somewhat complicated and lengthy process. However, the following commands will work on most machines:

    hashtag
    CUDA

    Linux & OS X

    Detailed instructions for installing GPU architectures such as CUDA can be found .

    Windows

    The CUDA Backend has some additional requirements before it can be built:

    • (Please note: Visual Studio 2015 is NOT SUPPORTED by CUDA 7.5 and below)

    In order to build the CUDA backend you will have to setup some more environment variables first, by calling vcvars64.bat. But first, set the system environment variable SET_FULL_PATH to true, so all of the variables that vcvars64.bat sets up, are passed to the mingw shell.

    1. Inside a normal cmd.exe command prompt, run C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat

    2. Run c:\msys64\mingw64_shell.bat inside that

    3. Change to your libnd4j folder

    This builds the CUDA nd4j.dll.

    hashtag
    IDE Requirements

    If you are building Deeplearning4j through an IDE such as IntelliJ, you will need to install certain plugins to ensure your IDE renders code highlighting appropriately. You will need to install a plugin for Lombok:

    • IntelliJ Lombok Plugin:

    • Eclipse Lombok Plugin: Follow instructions at

    If you want to work on ScalNet, the Scala API, or on certain modules such as the DL4J UI, you will need to ensure your IDE has Scala support installed and available to you.

    hashtag
    Testing

    Deeplearning4j uses a separate repository that contains all resources necessary for testing. This is to keep the central DL4J repository lightweight and avoid large blobs in the GIT history. To run the tests you need to install the test-resources from (~10gb). If you don't care about history, do a shallow clone only with

    Tests will run only when testresources and a backend profile (such as test-nd4j-native) are selected

    Running the tests will take a while. To run tests of just a single maven module you can add a module constraint with -pl deeplearning4j-core (for details see )

    hashtag
    Installing the DL4J Stack

    hashtag
    OS X & Linux

    hashtag
    Checking ENV

    Before running the DL4J stack build script, you must ensure certain environment variables are defined before running your build. These are outlined below depending on your architecture.

    hashtag
    LIBND4J_HOME

    You will need to know the exact path of the directory where you are running the DL4J build script (you are encouraged to use a clean empty directory). Otherwise, your build will fail. Once you determine this path, add /libnd4j to the end of that path and export it to your local environment. This will look like:

    hashtag
    CPU architecture w/ MKL

    You can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. To build against MKL, simply add the path containing libmkl_rt.so (or mkl_rt.dll on Windows), say /path/to/intel64/lib/, to the LD_LIBRARY_PATH environment variable on Linux (or PATH on Windows) and build like before. On Linux though, to make sure it uses the correct version of OpenMP, we also might need to set these environment variables:

    When libnd4j cannot be rebuilt, we can use the MKL libraries after the facts and get them loaded instead of OpenBLAS at runtime, but things are a bit trickier. Please additionally follow the instructions below.

    1. Make sure that files such as /lib64/libopenblas.so.0 and /lib64/libblas.so.3 are not available (or appear after in the PATH on Windows), or they will get loaded by libnd4j by their absolute paths, before anything else.

    2. Inside /path/to/intel64/lib/, create a symbolic link or copy of libmkl_rt.so (or mkl_rt.dll on Windows) to the name that libnd4j expect to load, for example:

    1. Finally, add /path/to/intel64/lib/ to the LD_LIBRARY_PATH environment variable (or early in the PATH on Windows) and run your Java application as usual.

    hashtag
    Building Manually

    If you prefer, you can build each piece in the DL4J stack by hand. The procedure for each piece of software is essentially:

    1. Git clone

    2. Build

    3. Install

    The overall procedure looks like the following commands below, with the exception that libnd4j's ./buildnativeoperations.sh accepts parameters based on the backend you are building for. You need to follow these instructions in the order they're given. If you don't, you'll run into errors. The GPU-specific instructions below have been commented out, but should be substituted for the CPU-specific commands when building for a GPU backend.

    hashtag
    Using Local Dependencies

    Once you've installed the DL4J stack to your local maven repository, you can now include it in your build tool's dependencies. Follow the typical instructions for Deeplearning4j, and appropriately replace versions with the SNAPSHOT version currently on the .

    Note that some build tools such as Gradle and SBT don't properly pull in platform-specific binaries. You can follow instructions for setting up your favorite build tool.

    hashtag
    Support and Assistance

    If you encounter issues while building locally, please reach out on our for help.

    Syntax

    For the complete nd4j-api index, please consult the Javadocarrow-up-right.

    There are three types of operations used in ND4J: scalars, transforms and accumulations. We’ll use the word op synonymously with operation.

    Most of the ops just take enumsarrow-up-right, or a list of discrete values that you can autocomplete. Activation functions are the exception, because they take strings such as "relu" or "tanh".

    Scalars, transforms and accumulations each have their own patterns. Transforms are the simplest, since the take a single argument and perform an operation on it. Absolute value is a transform that takes the argument x like so abs(IComplexNDArray ndarray) and produces the result which is the absolute value of x. Similarly, you would apply to the sigmoid transform sigmoid() to produce the “sigmoid of x”.

    Scalars just take two arguments: the input and the scalar to be applied to that input. For example, ScalarAdd() takes two arguments: the input INDArray x and the scalar Number num; i.e. ScalarAdd(INDArray x, Number num). The same format applies to every Scalar op.

    Finally, we have accumulations, which are also known as reductions in GPU-land. Accumulations add arrays and vectors to one another and can reduce the dimensions of those arrays in the result by adding their elements in a rowwise op. For example, we might run an accumulation on the array

    Which would give us the vector

    Reducing the columns (i.e. dimensions) from two to one.

    Accumulations can be either pairwise or scalar. In a pairwise reduction, we might be dealing with two arrays, x and y, which have the same shape. In that case, we could calculate the cosine similarity of x and y by taking their elements two by two.

    Or take EuclideanDistance(arr, arr2), a reduction between one array arr and another arr2.

    Many ND4J ops are overloaded, meaning methods sharing a common name have different argument lists. Below we will explain only the simplest configurations.

    As you can see, there are three possible argument types with ND4J ops: inputs, optional arguments and outputs. The outputs are specified in the ops’ constructor. The inputs are specified in the parentheses following the method name, always in the first position, and the optional arguments are used to transform the inputs; e.g. the scalar to add; the coefficient to multiply by, always in the second position.

    For other transforms, .

    Here are two examples of performing z = tanh(x), in which the original array x is unmodified.

    The latter two examples above use ND4J’s basic convention for all ops, in which we have 3 NDArrays, x, y and z.

    Frequently, z = x (this is the default if you use a constructor with only one argument). But there are exceptions for situations like x = x + y. Another possibility is z = x + y, etc.

    hashtag
    Accumulations

    Most accumulations are accessable directly via the INDArray interface.

    For example, to add up all elements of an NDArray:

    Accum along dimension example - i.e., sum values in each row:

    Accumulations along dimensions generalize, so you can sum along two dimensions of any array with two or more dimensions.

    hashtag
    Subset Operations on Arrays

    A simple example:

    Interval is fromInclusive, toExclusive; note that can equivalently use inclusive version: NDArrayIndex.interval(1,2,true);

    These are views of the underlying array, not copy operations (which provides greater flexibility and doesn’t have cost of copying).

    To avoid in-place behaviour, random.get(…).dup() to make a copy.

    If you do not understand the explanation of ND4J’s syntax, cannot find a definition for a method, or would like to request that a function be added, please let us know on the .

    Variables

    What types of variables are used in SameDiff, their properties and how to switch these types.

    hashtag
    What are variables

    All values defining or passing through each SameDiff instance - be it weights, bias, inputs, activations or general parameters - all are handled by objects of class SDVariable.

    Observe that by variables we normally mean not just single values - as it is done in various online examples describing autodifferentiation - but rather whole multidimensional arrays of them.

    hashtag
    Variable types

    All variables in SameDiff belong to one of four variable types, constituting an enumeration VariableType. Here they are:

    • VARIABLE: are trainable parameters of your network, e.g. weights and bias of a layer. Naturally, we want them

      to be both stored for further usage - we say, that they are persistent - as well as being updated during training.

    • CONSTANT: are those parameters which, like variables, are persistent for the network, but are not being

      trained; they, however, may be changed externally by the user.

    To infer the type of a particular variable, you may use the method getVariableType, like so:

    The current value of a variable in a form of INDArray may be obtained using getArr or getArr(true) - the latter one if you wish the program to throw an exception if the variable's value is not initialized.

    hashtag
    Data types

    The data within each variable also has its data type, contained in DataType enum. Currently in DataType there are three floating point types: FLOAT, DOUBLE and HALF; four integer types: LONG, INT, SHORT and UBYTE; one boolean type BOOL - all of them will be referred as numeric types. In addition, there is a string type dubbed

    To infer the data type of your variable, use

    You may need to trace your variable's data type since at times it does matter, which types you use in an operation. For example, a convolution product, like this one

    will require its SDVariable arguments input and weights to be of one of the floating point data types, and will throw an exception otherwise. Also, as we shall discuss just below, all the SDVariables of type VARIABLE are supposed to be of floating point type.

    hashtag
    Common features of variables

    Before we go to the differences between variables, let us first look at the properties they all share

    • All variables are ultimately derived from an instance of SameDiff, serving as parts of its

      . In fact, each variable has a SameDiff as one of its fields.

    • Results (outputs) of all operations are of ARRAY type.

    hashtag
    Differences between variable types

    Let us now have a closer look at each type of variables, and what distinguish them from each other.

    hashtag
    Variables

    Variables are the trainable parameters of your network. This predetermines their nature in SameDiff. As we briefly mentioned above, variables' values need to be both preserved for application, and updated during training. Training means, that we iteratively update the values by small fractions of their gradients, and this only makes sense if variables are of floating point types (see data types above).

    Variables may be added to your SameDiff using different versions of var function from your SameDiff instance. For example, the code

    adds a variable constituting of a 784x10 array of float numbers - weights for a single layer MNIST perceptron in this case - to a pre-existing SameDiff instance samediff.

    However, this way the values within a variable will be set as zeros. You may also create a variable with values from a preset INDArray. Say

    will create a variable filled with normally distributed randomly generated numbers with variance 1/28. You may put any other array creation methods instead of nrand, or any preset array, of course. Also, you may use some popular initialization scheme, like so:

    Now, the weights will be randomly initialized using the Xavier scheme. There are other ways to create and

    fill variables: you may look them up in the 'known subclasses' section .

    hashtag
    Constants

    Constants hold values that are stored, but - unlike variables - remain unchanged during training. These, for instance, may be some hyperparamters you wish to have in your network and be able to access from the outside. Or they may be pretrained weights of a neural network that you wish to keep unchanged (see more on that in below). Constants may be of any data type

    • so e.g. int and boolean are allowed alongside with float and double.

    In general, constants are added to SameDiff by means of constant methods. A constant may be created form an INDArray, like that:

    A constant consisting of a single scalar value may be created using one of the scalar methods:

    hashtag
    Placeholders

    The most common placeholders you'll normally have in a SameDiff are inputs and, when applicable, labels. You may create placeholders of any data type, depending on the operations you use them in. To add a placeholder to a SameDiff, you may call one of placeHolder methods, e.g. like that:

    as in MNIST example. Here we specify name, data type and then shape of your placeholder - here, we have 28x28 grayscale pictures rendered as 1d vectors (therefore 784) coming in batches of length we don't know beforehand (therefore -1).

    hashtag
    Arrays

    Variables of ARRAY type appear as outputs of within SameDiff. Accordingly, the data type of an array-type variable depends on the kind of operation it is produced by and variable type(s) ot its argument(s). Arrays are not persistent - they are one-time values that will be recalculated from scratch at the next step. However, unlike placeholders, gradients are computed for them, as those are needed to update the values of VARIABLE's.

    There are as many ways array-type variables are created as there are operations, so you're better up focusing on our , our and .

    hashtag
    Recap table

    Let us summarize the main properties of variable types in one table:

    We haven't discussed what 'Workspaces' mean - if you do not know, do not worry, this is an internal technical term that basically describes how memory is managed internally.

    hashtag
    Changing variable types

    You may change variable types as well. For now, there are three of such options:

    hashtag
    Variable to constant

    At times - for instance if you perform transfer learning - you may wish to turn a variable into a constant. This is done like so:

    where someVariable is an instance of SDVariable of VARIABLE type. The variable someVariable will not be trained any more.

    hashtag
    Constant to variable

    Conversely, constants - if they are of floating point data type - may be converted to variables. So, for instance, if you wish your frozen weights to become trainable again

    hashtag
    Placeholder to constant

    Placeholders may be converted to constants as well - for instance, if you need to freeze one of the inputs. There are no restrictions on the data type, yet, since placeholder values are not persistent, their value should be set before you turn them into constants. This can be done as follows

    For now it is not possible to turn a constant back into a placeholder, we may consider adding this functionality if there is a need for that. For now, if you wish to effectively freeze your placeholder but be able to use it again, consider supplying it with constant values rather than turning it into a constant.

    hashtag
    Variables' names and values

    hashtag
    Getting variables from SameDiff

    Recall that every variable in an instance of SameDiff has its unique String name. Your SameDiff actually tracks your variables by their names, and allows you to retrieve them by using getVariable(String name) method.

    Consider the following line:

    Here, in the function sub we actually have implicitly introduced a variable (of type ARRAY) that holds the result of the subtraction. By adding a name into the operations's argument, we've secured ourselves the possibility to retrieve the variable from elsewhere: say, if later you need to infer the difference between the labels and the prediction as a vector, you may just write:

    This becomes especially handy if your whole SameDiff instance is initialized elsewhere, and you still need to get hold of some of its variables - say, multiple outputs.

    You can get and set the name of an SDVariable the methods getVarName and setVarName respectively. When renaming, note that variable's name is to remain unique within its SameDiff.

    hashtag
    Getting variable's value

    You may retrieve any variable's current value as an INDArray using the method eval(). Note that for non-persistent variables, the value should first be set. For variables with gradients, the gradient's value may also be inferred using the method getGradient.

    Supported Features

    Supported Keras features.

    hashtag
    Keras Model Import: Supported Features

    While not every concept in DL4J has an equivalent in Keras and vice versa, many of the key concepts can be matched. Importing keras models into DL4J is done in our deeplearning4j-modelimportarrow-up-right module. Below is a comprehensive list of currently supported features.

    • Layers

    hashtag

    Mapping keras to DL4J layers is done in the sub-module of model import. The structure of this project loosely reflects the structure of Keras.

    hashtag

    • ✅

    • ✅

    • ✅

    • ✅

    hashtag

    • ✅

    • ✅

    • ✅

    hashtag

    • ✅

    • ✅

    • ✅

    • ✅

    hashtag

    • ✅

    • ✅

    hashtag

    • ✅

    • ❌ GRU

    • ✅

    • ❌ ConvLSTM2D

    hashtag

    • ✅

    hashtag

    • ✅ Add / add

    • ✅ Multiply / multiply

    • ✅ Subtract / subtract

    • ✅ Average / average

    hashtag

    • ✅

    • ✅

    • ✅ ELU

    • ✅

    hashtag

    • ✅

    hashtag
    Noise Layers

    • ✅

    • ✅

    • ✅

    hashtag
    Layer Wrappers

    • ❌ TimeDistributed

    • ✅

    hashtag

    • ✅ mean_squared_error

    • ✅ mean_absolute_error

    • ✅ mean_absolute_percentage_error

    • ✅ mean_squared_logarithmic_error

    hashtag

    • ✅ softmax

    • ✅ elu

    • ✅ selu

    • ✅ softplus

    hashtag

    • ✅ Zeros

    • ✅ Ones

    • ✅ Constant

    • ✅ RandomNormal

    hashtag

    • ✅ l1

    • ✅ l2

    • ✅ l1_l2

    hashtag

    • ✅ max_norm

    • ✅ non_neg

    • ✅ unit_norm

    • ✅ min_max_norm

    hashtag

    • ✅ SGD

    • ✅ RMSprop

    • ✅ Adagrad

    • ✅ Adadelta

    Linalg

    hashtag
    Cholesky

    Computes the Cholesky decomposition of one or more square matrices.

    • input (NUMERIC) - Input tensor with inner-most 2 dimensions forming square matrices

    Image

    hashtag
    CropAndResize

    Given an input image and some crop boxes, extract out the image subsets and resize them to the specified size.

    • image (NUMERIC) - Input image, with shape [batch, height, width, channels]

    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input") //can use any label for this
        .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
        .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
        .setOutputs("L2")    //We need to specify the network outputs and their order
        .build();
    
    ComputationGraph net = new ComputationGraph(conf);
    net.init();
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input1", "input2")
        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
        .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
        .addVertex("merge", new MergeVertex(), "L1", "L2")
        .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
        .setOutputs("out")
        .build();
    ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Sgd(0.01))
            .graphBuilder()
            .addInputs("input")
            .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
            .addLayer("out1", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nIn(4).nOut(3).build(), "L1")
            .addLayer("out2", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.MSE)
                    .nIn(4).nOut(2).build(), "L1")
            .setOutputs("out1","out2")
            .build();
    int numLinesToSkip = 0;
    String fileDelimiter = ",";
    RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String csvPath = "/path/to/my/file.csv";
    rr.initialize(new FileSplit(new File(csvPath)));
    
    int batchSize = 4;
    MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
            .addReader("myReader",rr)
            .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
            .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
            .build();
    int numLinesToSkip = 0;
    String fileDelimiter = ",";
    
    RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String featuresCsvPath = "/path/to/my/myInput.csv";
    featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));
    
    RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
    String labelsCsvPath = "/path/to/my/myOutput.csv";
    labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));
    
    int batchSize = 4;
    int numClasses = 3;
    MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
            .addReader("csvInput", featuresReader)
            .addReader("csvLabels", labelsReader)
            .addInput("csvInput") //Input: all columns from input reader
            .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
            .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
            .build();

    ✅ Reshapearrow-up-right

  • ✅ Mergearrow-up-right

  • ✅ Permutearrow-up-right

  • ✅ RepeatVectorarrow-up-right

  • ✅ Lambdaarrow-up-right

  • ❌ ActivityRegularization

  • ✅ Maskingarrow-up-right

  • ✅ SpatialDropout1Darrow-up-right

  • ✅ SpatialDropout2Darrow-up-right

  • ✅ SpatialDropout3Darrow-up-right

  • ✅
  • ✅ AtrousConvolution2Darrow-up-right

  • ❌ SeparableConv1D

  • ✅ SeparableConv2Darrow-up-right

  • ✅ Conv2DTransposearrow-up-right

  • ❌ Conv3DTranspose

  • ✅ Cropping1Darrow-up-right

  • ✅ Cropping2Darrow-up-right

  • ✅ Cropping3Darrow-up-right

  • ✅ UpSampling1Darrow-up-right

  • ✅ UpSampling2Darrow-up-right

  • ✅ UpSampling3Darrow-up-right

  • ✅ ZeroPadding1Darrow-up-right

  • ✅ ZeroPadding2Darrow-up-right

  • ✅ ZeroPadding3Darrow-up-right

  • ✅ AveragePooling2Darrow-up-right

  • ✅ AveragePooling3Darrow-up-right

  • ✅ GlobalMaxPooling1Darrow-up-right

  • ✅ GlobalMaxPooling2Darrow-up-right

  • ✅ GlobalMaxPooling3Darrow-up-right

  • ✅ GlobalAveragePooling1Darrow-up-right

  • ✅ GlobalAveragePooling2Darrow-up-right

  • ✅ GlobalAveragePooling3Darrow-up-right

  • ✅ Maximum / maximum

  • ✅ Concatenate / concatenate

  • ❌ Dot / dot

  • ✅ squared_hinge

  • ✅ hinge

  • ✅ categorical_hinge

  • ❌ logcosh

  • ✅ categorical_crossentropy

  • ✅ sparse_categorical_crossentropy

  • ✅ binary_crossentropy

  • ✅ kullback_leibler_divergence

  • ✅ poisson

  • ✅ cosine_proximity

  • ✅ softsign

  • ✅ relu

  • ✅ tanh

  • ✅ sigmoid

  • ✅ hard_sigmoid

  • ✅ linear

  • ✅ RandomUniform

  • ✅ TruncatedNormal

  • ✅ VarianceScaling

  • ✅ Orthogonal

  • ✅ Identity

  • ✅ lecun_uniform

  • ✅ lecun_normal

  • ✅ glorot_normal

  • ✅ glorot_uniform

  • ✅ he_normal

  • ✅ he_uniform

  • ✅ Adam

  • ✅ Adamax

  • ✅ Nadam

  • ❌ TFOptimizer

  • Losses
    Activations
    Initializers
    Regularizers
    Constraints
    Metrics
    Optimizers
    Layers
    layersarrow-up-right
    Core Layersarrow-up-right
    Densearrow-up-right
    Activationarrow-up-right
    Dropoutarrow-up-right
    Flattenarrow-up-right
    Convolutional Layersarrow-up-right
    Conv1Darrow-up-right
    Conv2Darrow-up-right
    Conv3Darrow-up-right
    Pooling Layersarrow-up-right
    MaxPooling1Darrow-up-right
    MaxPooling2Darrow-up-right
    MaxPooling3Darrow-up-right
    AveragePooling1Darrow-up-right
    Locally-connected Layersarrow-up-right
    LocallyConnected1Darrow-up-right
    LocallyConnected2Darrow-up-right
    Recurrent Layersarrow-up-right
    SimpleRNNarrow-up-right
    LSTMarrow-up-right
    Embedding Layersarrow-up-right
    Embeddingarrow-up-right
    Merge Layersarrow-up-right
    Advanced Activation Layersarrow-up-right
    LeakyReLUarrow-up-right
    PReLUarrow-up-right
    ThresholdedReLUarrow-up-right
    Normalization Layersarrow-up-right
    BatchNormalizationarrow-up-right
    GaussianNoisearrow-up-right
    GaussianDropoutarrow-up-right
    AlphaDropoutarrow-up-right
    Bidirectionalarrow-up-right
    Losses
    Activations
    Initializers
    Regularizers
    Constraints
    Optimizers
    AtrousConvolution1Darrow-up-right

    maven (3.3 or higher)

    ./buildnativeoperations.sh -c cuda

    libnd4jarrow-up-right
    nd4jarrow-up-right
    datavecarrow-up-right
    deeplearning4jarrow-up-right
    this linkarrow-up-right
    Msys2arrow-up-right
    their instructionsarrow-up-right
    Intel's sitearrow-up-right
    the setup guidearrow-up-right
    go herearrow-up-right
    here
    CUDA SDKarrow-up-right
    Visual Studio 2012 or 2013arrow-up-right
    https://plugins.jetbrains.com/plugin/6317-lombok-pluginarrow-up-right
    https://projectlombok.org/download.htmlarrow-up-right
    https://github.com/KonduitAI/dl4j-test-resourcesarrow-up-right
    herearrow-up-right
    Getting Started
    master POMarrow-up-right
    here
    discoursearrow-up-right

    equivalent to the above

    Returns the result of multiplying each entry of INDArray x by num.

    ScalarReverseDivision(INDArray x, Number num)

    Returns the result of dividing num by each element of INDArray x.

    ScalarReverseSubtraction(INDArray x, Number num)

    Returns the result of subtracting each entry of INDArray x from num.

    ScalarSet(INDArray x, Number num)

    This sets the value of each entry of INDArray x to num.

    ScalarSubtraction(INDArray x, Number num)

    Returns the result of subtracting num from each entry of INDArray x.

    Method

    What it does

    Transforms

    ACos(INDArray x)

    Trigonometric inverse cosine, elementwise. The inverse of cos such that, if y = cos(x), then x = ACos(y).

    ASin(INDArray x)

    Also known as arcsin. Inverse sine, elementwise.

    ATan(INDArray x)

    Trigonometric inverse tangent, elementwise. The inverse of tan, such that, if y = tan(x) then x = ATan(y).

    Transforms.tanh(myArray)

    Hyperbolic tangent: a sigmoidal function. This applies elementwise tanh inplace.

    Scalar

    INDArray.add(number)

    Returns the result of adding number to each entry of INDArray x; e.g. myArray.add(2.0)

    INDArray.addi(number)

    Returns the result of adding number to each entry of INDArray x.

    ScalarAdd(INDArray x, Number num)

    Returns the result of adding num to each entry of INDArray x.

    ScalarDivision(INDArray x, Number num)

    Returns the result of dividing each entry of INDArray x by num.

    ScalarMax(INDArray x, Number num)

    Compares each entry of INDArray x to num and returns the higher quantity.

    please see this page
    community forumsarrow-up-right

    Nd4j.getExecutioner().exec(Nd4j.getOpFactory() .createTransform(“tanh”, myArray))

    ScalarMultiplication(INDArray x, Number num)

    PLACEHOLDER: store temporary values that are to be supplied from the outside, like inputs and labels.

    Accordingly, since new placeholders' values are provided at each iteration, they are not stored: in other words,

    unlike VARIABLE and CONSTANT, PLACEHOLDER is not persistent.

  • ARRAY: are temporary values as well, representing outputs of operations within a SameDiff, for

    instance sums of vectors, activations of a layer, and many more. They are being recalculated at each iteration, and

    therefor, like PLACEHOLDER, are not persistent.

  • UTF8
    ; and two helper data types
    COMPRESSED
    and
    UNKNOWN
    . The 16-bit floating point format
    BFLOAT16
    and unsigned integer types (
    UINT16
    ,
    UINT32
    and
    UINT64
    ) will be available in
    1.0.0-beta5
    .
    All SDVariable's involved in an operation are to belong to the same SameDiff.
  • All variables may or may not be given names - in the latter case, a name is actually created automatically. Either

    way, the names need to be/are created unique. We shall come back to naming below.

  • No

    No

    Yes

    No

    Any

    Instance

    PLACEHOLDER

    No

    No

    No

    No

    Any

    Instance

    ARRAY

    No

    Yes

    No

    Yes

    Any

    Operations

    Trainable

    Gradients

    Persistent

    Workspaces

    Datatypes

    Instantiated from

    VARIABLE

    Yes

    Yes

    Yes

    Yes

    Float only

    Instance

    grapharrow-up-right
    of our javadocarrow-up-right
    Changing Variable Type
    operationsarrow-up-right
    operations sectionarrow-up-right
    javadocarrow-up-right
    examplesarrow-up-right

    CONSTANT

    hashtag
    Lstsq

    Solver for linear squares problems.

    • matrix (NUMERIC) - input tensor

    • rhs (NUMERIC) - input tensor

    • l2_reguralizer - regularizer

    • fast - fast mode, defaults to True - default = true

    hashtag
    Lu

    Computes LU decomposition.

    • input (NUMERIC) - input tensor

    hashtag
    Matmul

    Performs matrix mutiplication on input tensors.

    • a (NUMERIC) - input tensor

    • b (NUMERIC) - input tensor

    hashtag
    MatrixBandPart

    Copy a tensor setting outside a central band in each innermost matrix.

    • input (NUMERIC) - input tensor

    • minLower - lower diagonal count

    • maxUpper - upper diagonal count

    hashtag
    Qr

    Computes the QR decompositions of input matrix.

    • input (NUMERIC) - input tensor

    • full - full matrices mode - default = false

    hashtag
    Solve

    Solver for systems of linear equations.

    • matrix (NUMERIC) - input tensor

    • rhs (NUMERIC) - input tensor

    • adjoint - adjoint mode, defaults to False - default = false

    hashtag
    TriangularSolve

    Solver for systems of linear questions.

    • matrix (NUMERIC) - input tensor

    • rhs (NUMERIC) - input tensor

    • lower - defines whether innermost matrices in matrix are lower or upper triangular

    • adjoint - adjoint mode

    hashtag
    cross

    Computes pairwise cross product.

    • a (NUMERIC) -

    • b (NUMERIC) -

    hashtag
    diag

    Calculates diagonal tensor.

    • input (NUMERIC) -

    hashtag
    diag_part

    Calculates diagonal tensor.

    • input (NUMERIC) -

    hashtag
    logdet

    Calculates log of determinant.

    • input (NUMERIC) -

    hashtag
    mmul

    Matrix multiplication: out = mmul(x,y)

    Supports specifying transpose argument to perform operation such as mmul(a^T, b), etc.

    • x (NUMERIC) - First input variable

    • y (NUMERIC) - Second input variable

    • transposeX - Transpose x (first argument) - default = false

    • transposeY - Transpose y (second argument) - default = false

    • transposeZ - Transpose result array - default = false

    hashtag
    svd

    Calculates singular value decomposition.

    • input (NUMERIC) -

    • fullUV -

    • computeUV -

    • switchNum - - default = 16

    hashtag
    tri

    An array with ones at and below the given diagonal and zeros elsewhere.

    • dataType - Data type - default = DataType.FLOAT

    • row -

    • column -

    • diagonal - - default = 0

    hashtag
    triu

    Upper triangle of an array. Return a copy of a input tensor with the elements below the k-th diagonal zeroed.

    • input (NUMERIC) -

    • diag - - default = 0

    import org.apache.commons.lang3.tuple.ImmutablePair
    import org.apache.commons.lang3.tuple.Pair
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
    import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration
    import org.nd4j.linalg.learning.config.AdaGrad
    import org.deeplearning4j.nn.conf.layers.DenseLayer
    import org.deeplearning4j.nn.conf.layers.OutputLayer
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.optimize.api.IterationListener
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.dataset.DataSet
    import org.nd4j.linalg.factory.Nd4j
    import org.nd4j.linalg.lossfunctions.LossFunctions
    import javax.swing._
    import java.awt._
    import java.awt.image.BufferedImage
    import java.util._
    import java.util
    
    import scala.collection.JavaConversions._
    val conf = new NeuralNetConfiguration.Builder()
        .seed(12345)
        .weightInit(WeightInit.XAVIER)
        .updater(new AdaGrad(0.05))
        .activation(Activation.RELU)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .l2(0.0001)
        .list()
        .layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
                .build())
        .layer(1, new DenseLayer.Builder().nIn(250).nOut(10)
                .build())
        .layer(2, new DenseLayer.Builder().nIn(10).nOut(250)
                .build())
        .layer(3, new OutputLayer.Builder().nIn(250).nOut(784)
                .lossFunction(LossFunctions.LossFunction.MSE)
                .build())
        .build()
    
    val net = new MultiLayerNetwork(conf)
    net.setListeners(new ScoreIterationListener(1))
    //Load data and split into training and testing sets. 40000 train, 10000 test
    val iter = new MnistDataSetIterator(100,50000,false)
    
    val featuresTrain = new util.ArrayList[INDArray]
    val featuresTest = new util.ArrayList[INDArray]
    val labelsTest = new util.ArrayList[INDArray]
    
    val rand = new util.Random(12345)
    
    while(iter.hasNext()){
        val next = iter.next()
        val split = next.splitTestAndTrain(80, rand)  //80/20 split (from miniBatch = 100)
        featuresTrain.add(split.getTrain().getFeatures())
        val dsTest = split.getTest()
        featuresTest.add(dsTest.getFeatures())
        val indexes = Nd4j.argMax(dsTest.getLabels(),1) //Convert from one-hot representation -> index
        labelsTest.add(indexes)
    }
    // the "simple" way to do multiple epochs is to wrap fit() in a loop
    val nEpochs = 30
    (1 to nEpochs).foreach{ epoch =>  
        featuresTrain.forEach( data => net.fit(data, data))
        println("Epoch " + epoch + " complete");
    }
    //Evaluate the model on the test data
    //Score each example in the test set separately
    //Compose a map that relates each digit to a list of (score, example) pairs
    //Then find N best and N worst scores per digit
    val listsByDigit = new util.HashMap[Integer, ArrayList[Pair[Double, INDArray]]]
    
    (0 to 9).foreach{ i => listsByDigit.put(i, new util.ArrayList[Pair[Double, INDArray]]) }
    
    (0 to featuresTest.size-1).foreach{ i =>
        val testData = featuresTest.get(i)
        val labels = labelsTest.get(i)
        
        (0 to testData.rows-1).foreach{ j =>
            val example = testData.getRow(j, true)
            val digit = labels.getDouble(j).toInt
            val score = net.score(new DataSet(example, example))
            // Add (score, example) pair to the appropriate list
            val digitAllPairs = listsByDigit.get(digit)
            digitAllPairs.add(new ImmutablePair[Double, INDArray](score, example))
        }
    }
    
    //Sort each list in the map by score
    val c = new Comparator[Pair[Double, INDArray]]() {
      override def compare(o1: Pair[Double, INDArray],
                           o2: Pair[Double, INDArray]): Int =
        java.lang.Double.compare(o1.getLeft, o2.getLeft)
    }
    
    listsByDigit.values().forEach(digitAllPairs => Collections.sort(digitAllPairs, c))
    
    //After sorting, select N best and N worst scores (by reconstruction error) for each digit, where N=5
    val best = new util.ArrayList[INDArray](50)
    val worst = new util.ArrayList[INDArray](50)
    
    (0 to 9).foreach{ i => 
        val list = listsByDigit.get(i)
        
        (0 to 4).foreach{ j=>
            best.add(list.get(j).getRight)
            worst.add(list.get(list.size - j - 1).getRight)
        }
    }
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.Updater;
    import org.deeplearning4j.nn.conf.layers.LSTM;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
    import org.deeplearning4j.nn.conf.GradientNormalization;
    import org.deeplearning4j.eval.ROC;
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader;
    import org.datavec.api.records.reader.SequenceRecordReader;
    import org.datavec.api.split.NumberedFileInputSplit;
    import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator;
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.nd4j.linalg.learning.config.Adam;
    
    import java.io.File;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    import org.apache.commons.io.FilenameUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/tutorials/instacart.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_instacart/")
    val directory = new File(DATA_PATH)
    directory.mkdir() 
    
    val archizePath = DATA_PATH + "instacart.tar.gz"
    val archiveFile = new File(archizePath)
    val extractedPath = DATA_PATH + "instacart" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) 
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val path = FilenameUtils.concat(DATA_PATH, "instacart/") // set parent directory
    
    val featureBaseDir = FilenameUtils.concat(path, "features") // set feature directory
    val targetsBaseDir = FilenameUtils.concat(path, "breakfast") // set label directory
    val trainFeatures = new CSVSequenceRecordReader(1, ",");
    trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1, 4000));
    val trainLabels = new CSVSequenceRecordReader(1, " ");
    trainLabels.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1, 4000));
    
    val train = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, 32,
        2, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    
    val testFeatures = new CSVSequenceRecordReader(1, ",");
    testFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 4001, 5000));
    val testLabels = new CSVSequenceRecordReader(1, " ");
    testLabels.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 4001, 5000));
    
    val test = new SequenceRecordReaderDataSetIterator(testFeatures, testLabels, 32,
        2, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);;
    val conf = new NeuralNetConfiguration.Builder()
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .seed(12345)
        .dropOut(0.25)
        .weightInit(WeightInit.XAVIER)
        .updater(new Adam())
        .list()
        .layer(0, new LSTM.Builder()
            .activation(Activation.TANH)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .nIn(134)
            .nOut(150)
            .build())
        .layer(1, new RnnOutputLayer.Builder(LossFunction.XENT)
            .activation(Activation.SOFTMAX)
            .nIn(150)
            .nOut(2)
            .build())
    .build();
    val net = new MultiLayerNetwork(conf);
    net.init();
    net.fit( train , 5);
    // Evaluate the model
    
    val roc = new ROC(100);
    
    while(test.hasNext()){
        val next = test.next();
        val features = next.getFeatures();
        val output = net.output(features);
        roc.evalTimeSeries(next.getLabels(), output);
    }
    println(roc.calculateAUC());
    sudo apt-get purge maven maven2 maven3
    sudo add-apt-repository ppa:natecarlson/maven3
    sudo apt-get update
    sudo apt-get install maven build-essentials cmake libgomp1
    xcode-select --install
    brew update
    brew install maven gcc5
    pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-cmake mingw-w64-x86_64-extra-cmake-modules make pkg-config grep sed gzip tar mingw64/mingw-w64-x86_64-openblas
    sudo apt-get install libopenblas-dev
    sudo cp libopenblas.so liblapack.so.3
    sudo cp libopenblas.so libblas.so.3
    yum groupinstall 'Development Tools'
    gcc --version
    brew install openblas
    sudo apt-get install libatlas-base-dev libatlas-dev
    sudo yum install atlas-devel
    wget --content-disposition https://sourceforge.net/projects/math-atlas/files/latest/download?source=files
    tar jxf atlas*.tar.bz2
    mkdir atlas (Creating a directory for ATLAS)
    mv ATLAS atlas/src-3.10.1
    cd atlas/src-3.10.1
    wget http://www.netlib.org/lapack/lapack-3.5.0.tgz (It may be possible that the atlas download already contains this file in which case this command is not needed)
    mkdir intel(Creating a build directory)
    cd intel
    cpufreq-selector -g performance (This command requires root access. It is recommended but not essential)
    ../configure --prefix=/path to the directory where you want ATLAS installed/ --shared --with-netlib-lapack-tarfile=../lapack-3.5.0.tgz
    make
    make check
    make ptcheck
    make time
    make install
    git clone --depth 1 --branch master https://github.com/KonduitAI/dl4j-test-resources
    cd dl4j-test-resources
    mvn install
    mvn clean test -P  testresources,test-nd4j-native
    export LIBND4J_HOME=/home/user/directory/libnd4j
    export MKL_THREADING_LAYER=GNU
    export LD_PRELOAD=/lib64/libgomp.so.1
    ln -s libmkl_rt.so libopenblas.so.0
    ln -s libmkl_rt.so libblas.so.3
    copy mkl_rt.dll libopenblas.dll
    copy mkl_rt.dll libblas3.dll
    # removes any existing repositories to ensure a clean build
    rm -rf libnd4j
    rm -rf nd4j
    rm -rf datavec
    rm -rf deeplearning4j
    
    # compile libnd4j
    git clone https://github.com/eclipse/deeplearning4j.git
    cd libnd4j
    ./buildnativeoperations.sh
    # and/or when using GPU
    # ./buildnativeoperations.sh -c cuda -cc INSERT_YOUR_DEVICE_ARCH_HERE 
    # i.e. if you have GTX 1070 device, use -cc 61
    export LIBND4J_HOME=`pwd`
    cd ..
    
    # build and install nd4j to maven locally
    git clone https://github.com/eclipse/deeplearning4j.git
    cd nd4j
    # cross-build across Scala versions (recommended)
    bash buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
    # or build for a single scala version
    # mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
    # or when using GPU
    # mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-tests'
    cd ..
    
    # build and install datavec
    git clone https://github.com/eclipse/deeplearning4j.git
    cd datavec
    if [ "$SCALAV" == "" ]; then
      bash buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true
    else
      mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dscala.binary.version=$SCALAV -Dscala.version=$SCALA
    fi
    cd ..
    
    # build and install deeplearning4j
    git clone https://github.com/eclipse/deeplearning4j.git
    cd deeplearning4j
    # cross-build across Scala versions (recommended)
    ./buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true
    # or build for a single scala version
    # mvn clean install -DskipTests -Dmaven.javadoc.skip=true
    # If you skipped CUDA you may need to add
    # -pl '!./deeplearning4j-cuda/'
    # to the mvn clean install command to prevent the build from looking for cuda libs
    cd ..
    [1 2
    3 4]
    [3
    7]
        cosineSim(x[i], y[i])
    INDArray x = Nd4j.rand(3,2);	//input
    INDArray z = Nd4j.create(3,2); //output
    Nd4j.getExecutioner().exec(new Tanh(x,z));
    Nd4j.getExecutioner().exec(Nd4j.getOpFactory().createTransform("tanh",x,z));
    x is input, always required
    y is (optional) input, only used in some ops (like CosineSimilarity, AddOp etc)
    z is output
    double sum = myArray.sumNumber().doubleValue();
    INDArray tenBy3 = Nd4j.ones(10,3);	//10 rows, 3 columns
    INDArray sumRows = tenBy3.sum(0);
    System.out.println(sumRows);	//Output: [ 10.00, 10.00, 10.00]
    INDArray random = Nd4j.rand(3, 3);
    System.out.println(random);
    [[0.93,0.32,0.18]
    [0.20,0.57,0.60]
    [0.96,0.65,0.75]]
    
    INDArray lastTwoRows = random.get(NDArrayIndex.interval(1,3),NDArrayIndex.all());
    System.out.println(lastTwoRows);
    [[0.20,0.57,0.60]
    [0.96,0.65,0.75]]
    
    INDArray twoValues = random.get(NDArrayIndex.point(1),NDArrayIndex.interval(0, 2));
    System.out.println(twoValues);
    [ 0.20, 0.57]
    twoValues.addi(5.0);
    System.out.println(twoValues);
    [ 5.20, 5.57]
    
    System.out.println(random);
    [[0.93,0.32,0.18]
    [5.20,5.57,0.60]
    [0.96,0.65,0.75]]
    VariableType varType = yourVariable.getVariableType();
    DataType dataType = yourVariable.dataType();
    SDVariable prod = samediff.cnn.conv1d(input, weights, config);
    SDVariable weights = samediff.var("weights", DataType.FLOAT, 784, 10);
    SDVariable weights = samediff.var("weigths", Nd4j.nrand(784, 10).div(28));
    SDVariable weights = samediff.var("weights", new XavierInitScheme('c', 784, 10), DataType.FLOAT, 784, 10);
    SDVariable constant = samediff.constant("constants", Nd4j.create(new float[] {3.1415f, 42f}));
    INDArray someScalar = samediff.scalar("scalar", 42);
    SDVariable in = samediff.placeHolder("input", DataType.FLOAT, -1, 784);
    samediff.convertToConstant(someVariable);
    samediff.convertToVariable(frozenWeights); //not frozen any more
    placeHolder.setArray(someArray);
    samediff.convertToConstant(placeHolder);
    SDVariable regressionCost = weights.mmul(input).sub("regression_prediction", bias).squaredDifference(labels);
    SDVariable errorVector = samediff.getVariable("regressionPrediction").sub(labels);
    INDArray Cholesky(INDArray input)
    
    SDVariable Cholesky(SDVariable input)
    SDVariable Cholesky(String name, SDVariable input)
    INDArray Lstsq(INDArray matrix, INDArray rhs, double l2_reguralizer, boolean fast)
    INDArray Lstsq(INDArray matrix, INDArray rhs, double l2_reguralizer)
    
    SDVariable Lstsq(SDVariable matrix, SDVariable rhs, double l2_reguralizer, boolean fast)
    SDVariable Lstsq(SDVariable matrix, SDVariable rhs, double l2_reguralizer)
    SDVariable Lstsq(String name, SDVariable matrix, SDVariable rhs, double l2_reguralizer, boolean fast)
    SDVariable Lstsq(String name, SDVariable matrix, SDVariable rhs, double l2_reguralizer)
    INDArray Lu(INDArray input)
    
    SDVariable Lu(SDVariable input)
    SDVariable Lu(String name, SDVariable input)
    INDArray Matmul(INDArray a, INDArray b)
    
    SDVariable Matmul(SDVariable a, SDVariable b)
    SDVariable Matmul(String name, SDVariable a, SDVariable b)
    INDArray[] MatrixBandPart(INDArray input, int minLower, int maxUpper)
    
    SDVariable[] MatrixBandPart(SDVariable input, int minLower, int maxUpper)
    SDVariable[] MatrixBandPart(String name, SDVariable input, int minLower, int maxUpper)
    INDArray[] Qr(INDArray input, boolean full)
    INDArray[] Qr(INDArray input)
    
    SDVariable[] Qr(SDVariable input, boolean full)
    SDVariable[] Qr(SDVariable input)
    SDVariable[] Qr(String name, SDVariable input, boolean full)
    SDVariable[] Qr(String name, SDVariable input)
    INDArray Solve(INDArray matrix, INDArray rhs, boolean adjoint)
    INDArray Solve(INDArray matrix, INDArray rhs)
    
    SDVariable Solve(SDVariable matrix, SDVariable rhs, boolean adjoint)
    SDVariable Solve(SDVariable matrix, SDVariable rhs)
    SDVariable Solve(String name, SDVariable matrix, SDVariable rhs, boolean adjoint)
    SDVariable Solve(String name, SDVariable matrix, SDVariable rhs)
    INDArray TriangularSolve(INDArray matrix, INDArray rhs, boolean lower, boolean adjoint)
    
    SDVariable TriangularSolve(SDVariable matrix, SDVariable rhs, boolean lower, boolean adjoint)
    SDVariable TriangularSolve(String name, SDVariable matrix, SDVariable rhs, boolean lower, boolean adjoint)
    INDArray cross(INDArray a, INDArray b)
    
    SDVariable cross(SDVariable a, SDVariable b)
    SDVariable cross(String name, SDVariable a, SDVariable b)
    INDArray diag(INDArray input)
    
    SDVariable diag(SDVariable input)
    SDVariable diag(String name, SDVariable input)
    INDArray diag_part(INDArray input)
    
    SDVariable diag_part(SDVariable input)
    SDVariable diag_part(String name, SDVariable input)
    INDArray logdet(INDArray input)
    
    SDVariable logdet(SDVariable input)
    SDVariable logdet(String name, SDVariable input)
    INDArray mmul(INDArray x, INDArray y, boolean transposeX, boolean transposeY, boolean transposeZ)
    INDArray mmul(INDArray x, INDArray y)
    
    SDVariable mmul(SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable mmul(SDVariable x, SDVariable y)
    SDVariable mmul(String name, SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable mmul(String name, SDVariable x, SDVariable y)
    INDArray svd(INDArray input, boolean fullUV, boolean computeUV, int switchNum)
    INDArray svd(INDArray input, boolean fullUV, boolean computeUV)
    
    SDVariable svd(SDVariable input, boolean fullUV, boolean computeUV, int switchNum)
    SDVariable svd(SDVariable input, boolean fullUV, boolean computeUV)
    SDVariable svd(String name, SDVariable input, boolean fullUV, boolean computeUV, int switchNum)
    SDVariable svd(String name, SDVariable input, boolean fullUV, boolean computeUV)
    INDArray tri(DataType dataType, int row, int column, int diagonal)
    INDArray tri(int row, int column)
    
    SDVariable tri(DataType dataType, int row, int column, int diagonal)
    SDVariable tri(int row, int column)
    SDVariable tri(String name, DataType dataType, int row, int column, int diagonal)
    SDVariable tri(String name, int row, int column)
    INDArray triu(INDArray input, int diag)
    INDArray triu(INDArray input)
    
    SDVariable triu(SDVariable input, int diag)
    SDVariable triu(SDVariable input)
    SDVariable triu(String name, SDVariable input, int diag)
    SDVariable triu(String name, SDVariable input)

    cropBoxes (NUMERIC) - Float32 crop, shape [numBoxes, 4] with values in range 0 to 1

  • boxIndices (NUMERIC) - Indices: which image (index to dimension 0) the cropBoxes belong to. Rank 1, shape [numBoxes]

  • cropOutSize (INT) - Output size for the images - int32, rank 1 with values [outHeight, outWidth]

  • extrapolationValue - Used for extrapolation, when applicable. 0.0 should be used for the default - default = 0.0

  • hashtag
    adjustContrast

    Adjusts contrast of RGB or grayscale images.

    • in (NUMERIC) - images to adjust. 3D shape or higher

    • factor - multiplier for adjusting contrast

    hashtag
    adjustHue

    Adjust hue of RGB image

    • in (NUMERIC) - image as 3D array

    • delta - value to add to hue channel

    hashtag
    adjustSaturation

    Adjust saturation of RGB images

    • in (NUMERIC) - RGB image as 3D array

    • factor - factor for saturation

    hashtag
    extractImagePatches

    Given an input image, extract out image patches (of size kSizes - h x w) and place them in the depth dimension.

    • image (NUMERIC) - Input image to extract image patches from - shape [batch, height, width, channels]

    • kSizes - Kernel size - size of the image patches, [height, width] (Size: Exactly(count=2))

    • strides - Stride in the input dimension for extracting image patches, [stride_height, stride_width] (Size: Exactly(count=2))

    • rates - Usually [1,1]. Equivalent to dilation rate in dilated convolutions - how far apart the output pixels

    • sameMode - Padding algorithm. If true: use Same padding

    hashtag
    hsvToRgb

    Converting image from HSV to RGB format

    • input (NUMERIC) - 3D image

    hashtag
    imageResize

    Resize images to size using the specified method.

    • input (NUMERIC) - 4D image [NHWC]

    • size (INT) - new height and width

    • preserveAspectRatio - Whether to preserve the aspect ratio. If this is set, then images will be resized to a size that fits in size while preserving the aspect ratio of the original image. Scales up the image if size is bigger than the current size of the image. Defaults to False. - default = false

    • antialis - Whether to use an anti-aliasing filter when downsampling an image - default = false

    • ImageResizeMethod - ResizeBilinear: Bilinear interpolation. If 'antialias' is true, becomes a hat/tent filter function with radius 1 when downsampling.

      ResizeLanczos5: Lanczos kernel with radius 5. Very-high-quality filter but may have stronger ringing.

      ResizeBicubic: Cubic interpolant of Keys. Equivalent to Catmull-Rom kernel. Reasonably good quality and faster than Lanczos3Kernel, particularly when upsampling.

      ResizeGaussian: Gaussian kernel with radius 3, sigma = 1.5 / 3.0.

      ResizeNearest: Nearest neighbor interpolation. 'antialias' has no effect when used with nearest neighbor interpolation.

    hashtag
    nonMaxSuppression

    Greedily selects a subset of bounding boxes in descending order of score

    • boxes (NUMERIC) - Might be null. Name for the output variable

    • scores (NUMERIC) - vector of shape [num_boxes]

    • maxOutSize - scalar representing the maximum number of boxes to be selected

    • iouThreshold - threshold for deciding whether boxes overlap too much with respect to IOU

    • scoreThreshold - threshold for deciding when to remove boxes based on score

    hashtag
    randomCrop

    Randomly crops image

    • input (NUMERIC) - input array

    • shape (INT) - shape for crop

    hashtag
    rgbToHsv

    Converting array from HSV to RGB format

    • input (NUMERIC) - 3D image

    hashtag
    rgbToYiq

    Converting array from RGB to YIQ format

    • input (NUMERIC) - 3D image

    hashtag
    rgbToYuv

    Converting array from RGB to YUV format

    • input (NUMERIC) - 3D image

    hashtag
    yiqToRgb

    Converting image from YIQ to RGB format

    • input (NUMERIC) - 3D image

    hashtag
    yuvToRgb

    Converting image from YUV to RGB format

    • input (NUMERIC) - 3D image

    Image caption generationarrow-up-right
    Convolutional networks for sentence classificationarrow-up-right
    Residual learning convolutional neural networksarrow-up-right

    Sea Temperature Convolutional LSTM

    In this tutorial we will use a neural network to forecast daily sea temperatures. The data consists of 2-dimensional temperature grids of 8 seas: Bengal, Korean, Black, Mediterranean, Arabian, Japan, Bohai, and Okhotsk Seas from 1981 to 2017. The raw data was taken from the Earth System Research Laboratory (https://www.esrl.noaa.gov/psd/) and preprocessed into CSV file. Each example consists of fifty 2-dimensional temperature grids, and every grid is represented by a single row in a CSV file. Thus, each sequence is represented by a CSV file with 50 rows.

    For this task, we will use a convolutional LSTM neural network to forecast next-day sea temperatures for a given sequence of temperature grids. Recall, a convolutional network is most often used for image data like the MNIST dataset (dataset of handwritten images). A convolutional network is appropriate for this type of gridded data, since each point in the 2-dimensional grid is related to its neighbor points. Furthermore, the data is sequential, and each temperature grid is related to the previous grids. Because of these long and short term dependencies, a LSTM is fitting for this task too. For these two reasons, we will combine the aspects from these two different neural network architectures into a single convolutional LSTM network.

    For more information on the convolutional LSTM network structure, see https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Ng_Beyond_Short_Snippets_2015_CVPR_paper.pdfarrow-up-right

    hashtag
    Imports

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory.

    We will then extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directories, and copy the files from the tar.gz file.

    hashtag
    DataSetIterators

    Next we will convert the raw data (csv files) into DataSetIterators, which will be fed into a neural network. Our training data will have 1700 examples which will be represented by a single DataSetIterator, and the testing data will have 404 examples which will be represented by a separate DataSet Iterator.

    We first initialize CSVSequenceRecordReaders, which will parse the raw data into record-like format. Then the SequenceRecordReaderDataSetIterators can be created using the RecordReaders. Since each example has exaclty 50 timesteps, an alignment mode of equal length is needed. Note also that this is a regression- based task and not a classification one.

    hashtag
    Neural Network

    The next task is to initialize the parameters for the convolutional LSTM neural network and then set up the neural network configuration.

    In the neural network configuraiton we will use the convolutional layer, LSTM layer, and output layer in success. In order to do this, we need to use the RnnToCnnPreProcessor and CnnToRnnPreprocessor. The RnnToCnnPreProcessor is used to reshape the 3-dimensional input from [batch size, height x width of grid, time series length ] into a 4 dimensional shape [number of examples x time series length , channels, width, height] which is suitable as input to a convolutional layer. The CnnToRnnPreProcessor is then used in a later layer to convert this convolutional shape back to the original 3-dimensional shape.

    hashtag
    Model Training

    To train the model, we use 25 epochs and simply call the fit method of the MultiLayerNetwork.

    hashtag
    Model Evaluation

    We will now evaluate our trained model. Note that we will use RegressionEvaluation, since our task is a regression and not a classification task.

    Model Persistence

    Saving and loading of neural networks.

    MultiLayerNetwork and ComputationGraph both have save and load methods.

    You can save/load a MultiLayerNetwork using:

    Similarly, you can save/load a ComputationGraph using:

    Internally, these methods use the ModelSerializer class, which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multi layer network, the second one saves a computation graph.

    Here is a basic examplearrow-up-right with code to save a computation graph using the ModelSerializer class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.

    hashtag
    RNG Seed

    If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:

    This will guarantee equal results between sessions/JVMs.

    hashtag
    ModelSerializer

    Utility class suited to save/restore neural net models

    writeModel

    Write a model to a file

    • param model the model to write

    • param file the file to write to

    • param saveUpdater whether to save the updater or not

    writeModel

    Write a model to a file

    • param model the model to write

    • param file the file to write to

    • param saveUpdater whether to save the updater or not

    writeModel

    Write a model to a file path

    • param model the model to write

    • param path the path to write to

    • param saveUpdater whether to save the updater or not

    writeModel

    Write a model to an output stream

    • param model the model to save

    • param stream the output stream to write to

    • param saveUpdater whether to save the updater for the model or not

    writeModel

    Write a model to an output stream

    • param model the model to save

    • param stream the output stream to write to

    • param saveUpdater whether to save the updater for the model or not

    restoreMultiLayerNetwork

    Load a multi layer network from a file

    • param file the file to load from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Load a multi layer network from a file

    • param file the file to load from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Load a MultiLayerNetwork from InputStream from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is the inputstream to load from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Restore a multi layer network from an input stream Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is the input stream to restore from

    • return the loaded multi layer network

    • throws IOException

    restoreMultiLayerNetwork

    Load a MultilayerNetwork model from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreMultiLayerNetwork

    Load a MultilayerNetwork model from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Restore a MultiLayerNetwork and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is Input stream to read from

    • param loadUpdater Whether to load the updater from the model or not

    • return Model and normalizer, if present

    restoreComputationGraph

    Load a computation graph from a file

    • param path path to the model file, to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a InputStream

    • param is the inputstream to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a InputStream

    • param is the inputstream to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Load a computation graph from a file

    • param file the file to get the computation graph from

    • return the loaded computation graph

    • throws IOException

    restoreComputationGraph

    Restore a ComputationGraph and Normalizer (if present - null if not) from the InputStream. Note: the input stream is read fully and closed by this method. Consequently, the input stream cannot be re-used.

    • param is Input stream to read from

    • param loadUpdater Whether to load the updater from the model or not

    • return Model and normalizer, if present

    taskByModel

    • param model

    • return

    addNormalizerToModel

    This method appends normalizer to a given persisted model.

    PLEASE NOTE: File should be model file saved earlier with ModelSerializer

    • param f

    • param normalizer

    addObjectToFile

    Add an object to the (already existing) model file using Java Object Serialization. Objects can be restored using {- link #getObjectFromFile(File, String)}

    • param f File to add the object to

    • param key Key to store the object under

    • param o Object to store using Java object serialization

    Ops

    What kind of operations is there in `SameDiff` and how to use them

    Operations in SameDiff work mostly the way you'd expect them to. You take variables - in our framework, those are objects of type SDVariable - apply operations to them, and thus produce new variables. Before we proceed to the overview of the available operations, let us list some of their common properties.

    hashtag
    Common properties of operations

    • Variables of any variable type may be used in any operation, as long as their data types match those that are

      required by the operation (again, see our section for what variable types are). Most

      often an operation will require its SDVariable to have a floating point data type.

    • Variables created by operations have ARRAY variable type.

    • For all operations, you may define a String name of your resulting variable, although for most operations this

      is not obligatory. The name goes as the first argument in each operation, like so:

      Named variables may be accessed from outside using a SameDiff method getVariable(String name). For the code above,

      this method will allow you to infer the value of both output as well as the result of mmul operation. Note that we

    hashtag
    Overview of operations

    The number of currently available operations, including overloads totals several hundreds, they range in complexity from s imple additions and multiplications via producing outputs of convolutional layers to creation of dedicated recurrent neural network modules, and much more. The sheer number of operations would've made it cumbersome to list them all on a single page. So, if you are already looking for something specific, you'll be better off checking our , which already contains a detailed information on each operation, or by simply browsing through autocompletion suggestions (if your IDE supports that). Here we rather try to give you an idea of what operations you may expect to find and where to seek for them.

    All operations may be split into two major branches: those which are methods of SDVariable and those of SameDiff classes. Let us have a closer look at each:

    hashtag
    SDVariable operations

    We have already seen SDVariable operations in previous examples, in expressions like

    where x and y are SDVariable's.

    Among SDVariable methods, you will find:

    • BLAS-type operations to perform linear algebra: things like add, neg, mul (used for both scaling and elementwise

      multiplication) and mmul (matrix multiplication), dot, rdiv, etc.;

    SDVariable operations may be easily chained, producing lines like:

    hashtag
    SameDiff operations

    The operations that are methods of SameDiff are called via one of 6 auxiliary objects present in each SameDiff, which split all operations into 6 uneven branches:

    • math - for general mathematical operations;

    • random - creating different random number generators;

    • nn - general neural network tools;

    Let us briefly describe what kinds of operations you may expect to find in each of the branches:

    hashtag
    math - basic mathematical operations

    Math module mostly consists of general mathematical functions and statistics methods. Those include:

    • power functions, e.g. square, cube, sqrt, pow, reciprocal etc.;

    • trigonometric functions, e.g. sin, atan etc.;

    Most operations in math have very simple structure, and are inferred like that:

    Operations may be chained, although in a more cumbersome way in comparison to the SDVariable operations, e.g.:

    Observe that the (integer) argument 1 in the sum operation tells us that we have to take maximum absolute value along the 1's dimension, i.e. the column of the matrix.

    hashtag
    random - creating random values Random

    These operations create variables whose underlying arrays will be filled with random numbers following some distribution

    • say, Bernoulli, normal, binomial etc.. These values will be reset at each iteration. If you wish, for instance,

      to create a variable that will add a Gaussian noise to entries of the MNIST database, you may do something like:

      The shape of you random variable may vary. Suppose, for instance, that you have audio signals of varying length, and you

      want to add noise to them. Then, you need to specify an SDVariable, say, windowShape with an integer

      , and proceed like that

    hashtag
    nn - general neural network tools

    Here we store methods for neural networks that are not necessarily associated with convolutional ones. Among them are

    • creation of dense linear and ReLU layers (with or without bias), and separate bias addition: linear, reluLayer,

      biasAdd;

    • popular activation functions, e.g. relu, sigmoid, tanh

    Some methods were created for internal use, but are openly available. Those include:

    • derivatives for several popular activation functions - these are mostly designed for speeding up

      backpropagation;

    • attention modules - basically, building blocks for recurrent neural networks we shall discuss below.

    While activations in nn are fairly simple, other operations become more involved. Say, to create a linear or a ReLU layer, up to three predefined SDVariable objects may be required, as in the following code:

    where input, weights and bias need to have dimensions suiting each other.

    To create, say, a dense layer with softmax activation, you may proceed as follows:

    hashtag
    cnn - convolutional neural networks tools

    The cnn module contains layers and operations typically used in convolutional neural networks - different activations may be picked up from the nn module. Among cnn operations we currently have creation of:

    • linear convolution layers, currently for tensors of dimension up to 3 (minibatch not included): conv1d, conv2d,

      conv3d, depthWiseConv2d, separableConv2D/sconv2d;

    Convolution and deconvolution operations are specified by a number of static parameters like kernel size, dilation, having or not having bias etc.. To facilitate the creation process, we pack the required parameters into easily constructable and alterable configuration objects. Desired activations may be borrowed from the nn module. So, for example, if we want to create a 3x3 convolutional layer with relu activation, we may proceed as follows:

    In the first line, we construct a convolution configuration using its default constructor. Then we specify the kernel size (this is mandatory) and optional padding size, keeping other settings default (unit stride, no dilation, no bias, NCHW data format). We then employ this configuration to create a linear convolution with predefined SDVariables for input and weights; the shape of weights is to be tuned to that of input and to config beforehand. Thus, if in the above example input has shape, say, [-1, nIn, height, width], then weights are to have a form [nIn, nOut, 3, 3] (because we have 3x3 convolution kernel). The shape of the resulting variable convoluton2d will be predetermined by these parameters (in our case, it will be

    hashtag
    rnn - Recurrent neural networks

    This module contains arguably the most sophisticated methods in the framework. Currently it allows you to create

    • simple recurrent units, using sru and sruCell methods;

    • LSTM units, using lstmCell, lstmBlockCell and lstmLayer;

    As of now, recurrent operations require special configuration objects as input, in which you need to pack all the variables that will be used in a unit. This is subject to change in the later versions. For instance, to create a simple recurrent unit, you need to proceed like that:

    Here, the arguments in the SRUConfiguration constructor are variables that are to be defined beforehand. Obviously their shapes should be matching, and these shapes predetermine the shape of output.

    hashtag
    loss - Loss functions

    In this branch we keep common loss functions. Most loss functions may be created quite simply, like that:

    where labels and predictions are SDVariable's. A String name is a mandatory parameter in most loss methods, yet it may be set to null - in this case, the name will be generated automatically. You may also create weighted loss functions by adding another SDVariable parameters containing weights, as well as specify a reduction method (see below) for the loss over the minibatch. Thus, a full-fledged logLoss operation may look like:

    Some loss operations may allow/require further arguments, depending on their type: e.g. a dimension along which the loss is to be computed (as in cosineLoss), or some real-valued parameters.

    As for reduction methods, over the minibatch, there are currently 4 of them available. Thus, initially loss values for each sample of the minibatch are computed, then they are multiplied by weights (if specified), and finally one of the following routines takes place:

    • NONE - leaving the resulting (weighted)loss values as-is; the result is an INDArray with the length of the

      minibatch: sum_loss = sum(weights * loss_per_sample).

    • SUM - summing the values, producing a scalar result.

    hashtag
    The don'ts of operations

    In order for SameDiff operations to work properly, several main rules are to be upheld. Failing to do so may result in an exception or, worse even, to a working code producing undesired results. All the things we mention in the current section describe what you better not do.

    • All variables in an operation have to belong to the same instance of SamdeDiff (see the

      section on how variables are added to a SameDiff instance). In other words, you better not

    • At best, a new variable is to be created for a result of an operation or a chain of operations. In other words, **you

      better not redefine existing variables and better not** leave operations returning no result. In other words, try to

    Clinical Time Series LSTM

    In this tutorial, we will learn how to apply a long-short term memory (LSTM) neural network to a medical time series problem. The data used comes from 4000 intensive care unit (ICU) patients and the goal is to predict the mortality of patients using 6 general descriptor features, such as age, gender, and weight along with 37 sequential features, such as cholesterol level, temperature, pH, and glucose level. Each patient has multiple measurements of the sequential features, with patients having a different amount of measurements taken. Furthermore, the time between measurements also differ among patients as well.

    A LSTM is well suited for this type of problem due to the sequential nature of the data. In addition, LSTM networks avoid vanishing and exploding gradients and are able to effectively capture long term dependencies due to its cell state, a feature not present in typical recurrent networks.

    hashtag
    Imports

    Now that we have imported everything needed to run this tutorial, we will start with obtaining the data and then converting the data into a format a neural network can understand.

    hashtag
    Data Source

    The data is contained in a compressed tar.gz file. We will have to download the data from the url below and then extract csv files containing the ICU data. Each patient will have a separate csv file for the features and labels. The features will be contained in a directory called sequence and the labels will be contained in a directory called mortality. The features are contained in a single csv file with the columns representing the features and the rows representing different time steps. The labels are contained in a single csv file which contains a value of 0 indicating death and a value of 1 indicating survival.

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory.

    Next, we must extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directory, and copy the files into our temporary directory.

    hashtag
    DataSetIterators

    Our next goal is to convert the raw data (csv files) into a DataSetIterator, which can then be fed into a neural network for training. Our training data will have 3200 examples which will be represented by a single DataSetIterator, and the testing data will have 800 examples which will be represented by a separate DataSet Iterator.

    In order to obtain DataSetIterators, we must first initialize CSVSequenceRecordReaders, which will parse the raw data into record-like format. We will first set the directories for the features and labels and initialize the CSVSequenceRecordReaders.

    Next, we can initialize the SequenceRecordReaderDataSetIterator using the previously created CSVSequenceRecordReaders. We will use an alignment mode of ALIGN_END. This alignment mode is needed due to the fact that the number of time steps differs between different patients. Because the mortality label is always at the end of the sequence, we need all the sequences aligned so that the time step with the mortality label is the last time step for all patients. For a more in depth explanation of alignment modes, see .

    hashtag
    Neural Network Configuration

    Now we can finally configure and then initialize the neural network for this problem. We will be using the ComputationGraph class of DL4J.

    hashtag
    Training

    To train the neural network, we simply call the fit method of the ComputationGraph on the trainData DataSetIterator and also pass how many epochs it should train for.

    hashtag
    Model Evaluation

    Finally, we can evaluate the model with the testing split using the AUC (area under the curve metric ) using a ROC curve. A randomly guessing model will have an AUC close to 0.50, while a perfect model will achieve an AUC of 1.00

    We see that this model achieves an AUC on the test set of 0.69!

    Instacart Multitask Example

    In this tutorial we will use a LSTM neural network to predict instacart users’ purchasing behavior given a history of their past orders. The data originially comes from a Kaggle challenge (kaggle.com/c/instacart-market-basket-analysisarrow-up-right). We first removed users that only made 1 order using the instacart app and then took 5000 users out of the remaining to be part of the data for this tutorial.

    For each order, we have information on the product the user purchased. For example, there is information on the product name, what aisle it is found in, and the department it falls under. To construct features, we extracted indicators representing whether or not a user purchased a product in the given aisles for each order. In total there are 134 aisles. The targets were whether or not a user will buy a product in the breakfast department in the next order. We also used auxiliary targets to train this LSTM. The auxiliary targets were whether or not a user will buy a product in the dairy department in the next order.

    We suspect that a LSTM will be effective for this task, because of the temporal dependencies in the data.

    hashtag
    Imports

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory.

    We will then extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directories, and copy the files from the tar.gz file.

    hashtag
    DataSetIterators

    Next we will convert the raw data (csv files) into DataSetIterators, which will be fed into a neural network. Our training data will have 4000 examples which will be represented by a single DataSetIterator, and the testing data will have 1000 examples which will be represented by a separate DataSetIterator.

    We first initialize CSVSequenceRecordReaders, which will parse the raw data into record-like format. Because we will be using multitask learning, we will use two outputs. Thus we need three RecordReaders in total: one for the input, another for the first target, and the last for the second target. Next, we will need the RecordreaderMultiDataSetIterator, since we now have two outputs. We can add our SequenceRecordReaders using the addSequenceReader methods and specify the input and both outputs. The ALIGN_END alignment mode is used, since the sequences for each example vary in length.

    We will create DataSetIterators for both the training data and the test data.

    hashtag
    Neural Network

    The next task is to set up the neural network configuration. We see below that the ComputationGraph class is used to create a LSTM with two outputs. We can set the outputs using the setOutputs method of the NeuralNetConfiguraitonBuilder. One GravesLSTM layer and two RnnOutputLayers will be used. We will also set other hyperparameters of the model, such as dropout, weight initialization, updaters, and activation functions.

    We must then initialize the neural network.

    hashtag
    Model Training

    To train the model, we use 5 epochs with a simple call to the fit method of the ComputationGraph.

    hashtag
    Model Evaluation

    We will now evaluate our trained model on the original task, which was predicting whether or not a user will purchase a product in the breakfast department. Note that we will use the area under the curve (AUC) metric of the ROC curve.

    We achieve a AUC of 0.75!

    MultiLayerNetwork And ComputationGraph

    DL4J provides the following classes to configure networks:

    1. MultiLayerNetwork

    2. ComputationGraph

    MultiLayerNetwork consists of a single input layer and a single output layer with a stack of layers in between them.

    ComputationGraph is used for constructing networks with a more complex architecture than MultiLayerNetwork. It can have multiple input layers, multiple output layers and the layers in between can be connected through a direct acyclic graph.

    hashtag
    Network Configurations

    Whether you create MultiLayerNetwork or ComputationGraph, you have to provide a network configuration to it through NeuralNetConfiguration.Builder. As the name implies, it provides a Builder pattern to configure a network. To create a MultiLayerNetwork, we build a MultiLayerConfiguraionand for ComputationGraph, it’s ComputationGraphConfiguration.

    The pattern goes like this: [High Level Configuration] -> [Configure Layers] -> [Build Configuration]

    hashtag
    Required imports

    hashtag
    Building a MultiLayerConfiguration

    hashtag
    What we did here?

    hashtag
    High Level Configuration

    hashtag
    Configuration of Layers

    Here we are calling list() to get the ListBuilder. It provides us the necessary api to add layers to the network through the layer(arg1, arg2) function.

    • The first parameter is the index of the position where the layer needs to be added.

    • The second parameter is the type of layer we need to add to the network.

    To build and add a layer we use a similar builder pattern as:

    hashtag
    Building a Graph

    Finally, the last build() call builds the configuration for us.

    hashtag
    Sanity checking for our MultiLayerConfiguration

    You can get your network configuration as String, JSON or YAML for sanity checking. For JSON we can use the toJson() function.

    hashtag
    Creating a MultiLayerNetwork

    Finally, to create a MultiLayerNetwork, we pass the configuration to it as shown below

    hashtag
    Building a ComputationGraphConfiguration

    hashtag
    What we did here?

    The only difference here is the way we are building layers. Instead of calling the list() function, we call the graphBuilder() to get a GraphBuilder for building our ComputationGraphConfiguration. Following table explains what each function of a GraphBuilder does.

    The output layers defined here use another function lossFunction to define what loss function to use.

    hashtag
    Sanity checking for our ComputationGraphConfiguration

    You can get your network configuration as String, JSON or YAML for sanity checking. For JSON we can use the toJson() function

    hashtag
    Creating a ComputationGraph

    Finally, to create a ComputationGraph, we pass the configuration to it as shown below

    hashtag
    More MultiLayerConfiguration Examples

    hashtag
    Regularization

    hashtag
    Dropout connects

    hashtag
    Bias initialization

    hashtag
    More ComputationGraphConfiguration Examples

    hashtag
    Recurrent Network

    with Skip Connections

    hashtag
    Multiple Inputs and Merge Vertex

    hashtag
    Multi-Task Learning

    Visualization

    How to visualize, monitor and debug neural network learning.

    hashtag
    Contents

    • Visualizing Network Training with the Deeplearning4j Training UI

    hashtag

    Note: This information here pertains to DL4J versions 1.0.0-beta6 and later.

    DL4J Provides a user interface to visualize in your browser (in real time) the current network status and progress of training. The UI is typically used to help with tuning neural networks - i.e., the selection of hyperparameters (such as learning rate) to obtain good performance for a network.

    Step 1: Add the Deeplearning4j UI dependency to your project.

    Step 2: Enable the UI in your project

    This is relatively straightforward:

    To access the UI, open your browser and go to http://localhost:9000/train/overview. You can set the port by using the org.deeplearning4j.ui.port system property: i.e., to use port 9001, pass the following to the JVM on launch: -Dorg.deeplearning4j.ui.port=9001

    Information will then be collected and routed to the UI when you call the fit method on your network.

    Example:

    The full set of UI examples are available .

    hashtag

    The overview page (one of 3 available pages) contains the following information:

    • Top left: score vs iteration chart - this is the value of the loss function on the current minibatch

    • Top right: model and training information

    • Bottom left: Ratio of parameters to updates (by layer) for all network weights vs. iteration

    Note that for the bottom two charts, these are displayed as the logarithm (base 10) of the values. Thus a value of -3 on the update: parameter ratio chart corresponds to a ratio of 10-3 = 0.001.

    The ratio of updates to parameters is specifically the ratio of mean magnitudes of these values (i.e., log10(mean(abs(updates))/mean(abs(parameters))).

    See the later section of this page on how to use these values in practice.

    hashtag

    The model page contains a graph of the neural network layers, which operates as a selection mechanism. Click on a layer to display information for it.

    On the right, the following charts are available, after selecting a layer:

    • Table of layer information

    • Update to parameter ratio for this layer, as per the overview page. The components of this ratio (the parameter and update mean magnitudes) are also available via tabs.

    • Layer activations (mean and mean +/- 2 standard deviations) over time

    Note: parameters are labeled as follows: weights (W) and biases (b). For recurrent neural networks, W refers to the weights connecting the layer to the layer below, and RW refers to the recurrent weights (i.e., those between time steps).

    hashtag

    The DL4J UI can be used with Spark. However, as of 0.7.0, conflicting dependencies mean that running the UI and Spark is the same JVM can be difficult.

    Two alternatives are available:

    1. Collect and save the relevant stats, to be visualized (offline) at a later point

    2. Run the UI in a separate server, and Use the remote UI functionality to upload the data from the Spark master to your UI instance

    Collecting Stats for Later Offline Use

    Then, later you can load and display the saved information using:

    Using the Remote UI Functionality

    First, in the JVM running the UI (note this is the server):

    This will require the deeplearning4j-ui dependency. (NOTE THIS IS NOT THE CLIENT THIS IS YOUR SERVER - SEE BELOW FOR THE CLIENT WHICH USES: deeplearning4j-ui-model)

    Client (both spark and standalone neural networks using simple deeplearning4j-nn) Second, for your neural net (Note this example is for spark, but computation graph and multi layer network both have the equivalemtn setListeners method with the same usage, ):

    To avoid dependency conflicts with Spark, you should use the deeplearning4j-ui-model dependency to get the StatsListener, not the full deeplearning4j-ui UI dependency.

    Note: you should replace UI_MACHINE_IP with the IP address of the machine running the user interface instance.

    hashtag

    Here's an excellent about visualizing neural net training. It is worth reading and understanding that page first.

    Tuning neural networks is often more an art than a science. However, here's some ideas that may be useful:

    Overview Page - Model Score vs. Iteration Chart

    The score vs. iteration should (overall) go down over time.

    • If the score increases consistently, your learning rate is likely set too high. Try reducing it until scores become more stable.

    • Increasing scores can also be indicative of other network issues, such as incorrect data normalization

    • If the score is flat or decreases very slowly (over a few hundred iterations) (a) your learning rate may be too low, or (b) you might be having difficulties with optimization. In the latter case, if you are using the SGD updater, try a different updater such as Nesterovs (momentum), RMSProp or Adagrad.

    Overview Page and Model Page - Using the Update: Parameter Ratio Chart

    • The ratio of mean magnitude of updates to parameters is provided on both the overview and model pages

      • "Mean magnitude" = the average of the absolute value of the parameters or updates at the current time step

    • The most important use of this ratio is in selecting a learning rate. As a rule of thumb: this ratio should be around 1:1000 = 0.001. On the (log10) chart, this corresponds to a value of -3 (i.e., 10-3 = 0.001)

    Model Page: Layer Activations (vs. Time) Chart

    This chart can be used to detect vanishing or exploding activations (due to poor weight initialization, too much regularization, lack of data normalization, or too high a learning rate).

    • This chart should ideally stabilize over time (usually a few hundred iterations)

    • A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate one of the problems mentioned above.

    Model Page: Layer Parameters Histogram

    The layer parameters histogram is displayed for the most recent iteration only.

    • For weights, these histograms should have an approximately Gaussian (normal) distribution, after some time

    • For biases, these histograms will generally start at 0, and will usually end up being approximately Gaussian

      • One exception to this is for LSTM recurrent neural network layers: by default, the biases for one gate (the forget gate) are set to 1.0 (by default, though this is configurable), to help in learning dependencies across long time periods. This results in the bias graphs initially having many biases around 0.0, with another set of biases around 1.0

    Model Page: Layer Updates Histogram

    The layer update histogram is displayed for the most recent iteration only.

    • Note that these are the updates - i.e., the gradients after applying learning rate, momentum, regularization etc

    • As with the parameter graphs, these should have an approximately Gaussian (normal) distribution

    • Keep an eye out for very large values: this can indicate exploding gradients in your network

    Model Page: Parameter Learning Rates Chart

    This chart simply shows the learning rates of the parameters of selected layer, over time.

    If you are not using learning rate schedules, the chart will be flat. If you are using learning rate schedules, you can use this chart to track the current value of the learning rate (for each parameter), over time.

    hashtag

    We rely on to reduce the dimensionality of and project words into a two or three-dimensional space. Here's some code for using TSNE with Word2Vec:

    hashtag

    A possible exception that can occur with the DL4J UI is the following:

    This exception is not due to DL4J directly, but is due to a missing application.conf file, required by the Play framework (the library that DL4J's UI is based on). This is originally present in the deeplearning4j-play dependency: however, if an uber-jar (i.e., a JAR file with dependencies) is built (say, via mvn package), it may not be copied over correctly. For example, using the maven-assembly-plugin has caused this exception for some users.

    The recommended solution (for Maven) is to use the Maven Shade plugin to produce an uber-jar, configured as follows:

    Then, create your uber-jar with mvn package and run via cd target && java -cp dl4j-examples-0.9.1-bin.jar org.deeplearning4j.examples.userInterface.UIExample. Note the "-bin" suffix for the generated JAR file: this includes all dependencies.

    Note also that this Maven Shade approach is configured for DL4J's examples repository.

    Cloud Detection Example

    In this tutorial, we will apply a neural network model to a cloud detection application using satellite imaging data. The data is from NASA’s Multi-angle Imaging SpectroRadiometer (MISR) which was launched in 1999. The MISR has nine cameras that view the Earth from nine different directions which allows the MISR to measure elevations and angular radiance signatures of objects. We will use the radiances measured from the MISR and features developed using domain expertise to learn to detect whether clouds are present in polar regions. This is a particularly challenging task due to the snow and ice covering the ground surfaces.

    hashtag
    Imports

    Quickstart

    ND4J Key features and brief samples.

    hashtag
    Introduction

    ND4J is a scientific computing library for the JVM. It is meant to be used in production environments rather than as a research tool, which means routines are designed to run fast with minimum RAM requirements. The main features are:

    • A versatile n-dimensional array object.

    Sea Temperature Convolutional LSTM 2

    In this tutorial we will use a neural network to forecast daily sea temperatures. This tutorial will be similar to tutorial . Recall, that the data consists of 2-dimensional temperature grids of 8 seas: Bengal, Korean, Black, Mediterranean, Arabian, Japan, Bohai, and Okhotsk Seas from 1981 to 2017. The raw data was taken from the Earth System Research Laboratory (https://www.esrl.noaa.gov/psd/) and preprocessed into CSV files. Each example consists of fifty 2-dimensional temperature grids, and every grid is represented by a single row in a CSV file. Thus, each sequence is represented by a CSV file with 50 rows.

    For this task, we will use a convolutional LSTM neural network to forecast 10 days worth of sea temperatures following a given sequence of temperature grids. The network will be trained similarly to the network trained tutorial 15. But the evaluation will be handled differently (applied only to the 10 days following the sequences).

    hashtag

    Adding Ops

    How to add differential functions and other ops to SameDiff graph.

    hashtag
    A quick SameDiff overview

    To get started with SameDiff, familiarize yourself with the autodiff module of the ND4J API located

    For better or worse, SameDiff code is organized in just a few key places. For basic usage and testing of SameDiff the following modules are key. We'll discuss some of them in more detail in just a bit.

    Zoo Models

    hashtag
    Available models

    hashtag
    AlexNet

    INDArray CropAndResize(INDArray image, INDArray cropBoxes, INDArray boxIndices, INDArray cropOutSize, double extrapolationValue)
    INDArray CropAndResize(INDArray image, INDArray cropBoxes, INDArray boxIndices, INDArray cropOutSize)
    
    SDVariable CropAndResize(SDVariable image, SDVariable cropBoxes, SDVariable boxIndices, SDVariable cropOutSize, double extrapolationValue)
    SDVariable CropAndResize(SDVariable image, SDVariable cropBoxes, SDVariable boxIndices, SDVariable cropOutSize)
    SDVariable CropAndResize(String name, SDVariable image, SDVariable cropBoxes, SDVariable boxIndices, SDVariable cropOutSize, double extrapolationValue)
    SDVariable CropAndResize(String name, SDVariable image, SDVariable cropBoxes, SDVariable boxIndices, SDVariable cropOutSize)
    INDArray adjustContrast(INDArray in, double factor)
    
    SDVariable adjustContrast(SDVariable in, double factor)
    SDVariable adjustContrast(String name, SDVariable in, double factor)
    INDArray adjustHue(INDArray in, double delta)
    
    SDVariable adjustHue(SDVariable in, double delta)
    SDVariable adjustHue(String name, SDVariable in, double delta)
    INDArray adjustSaturation(INDArray in, double factor)
    
    SDVariable adjustSaturation(SDVariable in, double factor)
    SDVariable adjustSaturation(String name, SDVariable in, double factor)
    INDArray extractImagePatches(INDArray image, int[] kSizes, int[] strides, int[] rates, boolean sameMode)
    
    SDVariable extractImagePatches(SDVariable image, int[] kSizes, int[] strides, int[] rates, boolean sameMode)
    SDVariable extractImagePatches(String name, SDVariable image, int[] kSizes, int[] strides, int[] rates, boolean sameMode)
    INDArray hsvToRgb(INDArray input)
    
    SDVariable hsvToRgb(SDVariable input)
    SDVariable hsvToRgb(String name, SDVariable input)
    INDArray imageResize(INDArray input, INDArray size, boolean preserveAspectRatio, boolean antialis, ImageResizeMethod ImageResizeMethod)
    INDArray imageResize(INDArray input, INDArray size, ImageResizeMethod ImageResizeMethod)
    
    SDVariable imageResize(SDVariable input, SDVariable size, boolean preserveAspectRatio, boolean antialis, ImageResizeMethod ImageResizeMethod)
    SDVariable imageResize(SDVariable input, SDVariable size, ImageResizeMethod ImageResizeMethod)
    SDVariable imageResize(String name, SDVariable input, SDVariable size, boolean preserveAspectRatio, boolean antialis, ImageResizeMethod ImageResizeMethod)
    SDVariable imageResize(String name, SDVariable input, SDVariable size, ImageResizeMethod ImageResizeMethod)
    INDArray nonMaxSuppression(INDArray boxes, INDArray scores, int maxOutSize, double iouThreshold, double scoreThreshold)
    
    SDVariable nonMaxSuppression(SDVariable boxes, SDVariable scores, int maxOutSize, double iouThreshold, double scoreThreshold)
    SDVariable nonMaxSuppression(String name, SDVariable boxes, SDVariable scores, int maxOutSize, double iouThreshold, double scoreThreshold)
    INDArray randomCrop(INDArray input, INDArray shape)
    
    SDVariable randomCrop(SDVariable input, SDVariable shape)
    SDVariable randomCrop(String name, SDVariable input, SDVariable shape)
    INDArray rgbToHsv(INDArray input)
    
    SDVariable rgbToHsv(SDVariable input)
    SDVariable rgbToHsv(String name, SDVariable input)
    INDArray rgbToYiq(INDArray input)
    
    SDVariable rgbToYiq(SDVariable input)
    SDVariable rgbToYiq(String name, SDVariable input)
    INDArray rgbToYuv(INDArray input)
    
    SDVariable rgbToYuv(SDVariable input)
    SDVariable rgbToYuv(String name, SDVariable input)
    INDArray yiqToRgb(INDArray input)
    
    SDVariable yiqToRgb(SDVariable input)
    SDVariable yiqToRgb(String name, SDVariable input)
    INDArray yuvToRgb(INDArray input)
    
    SDVariable yuvToRgb(SDVariable input)
    SDVariable yuvToRgb(String name, SDVariable input)
    MultiLayerNetwork net = ...
    net.save(new File("...");
    
    MultiLayerNetwork net2 = MultiLayerNetwork.load(new File("..."), true);
    ComputationGraph net = ...
    net.save(new File("..."));
    
    ComputationGraph net2 = ComputationGraph.load(new File("..."), true);
    import org.datavec.api.records.reader.SequenceRecordReader;
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader;
    import org.datavec.api.split.NumberedFileInputSplit;
    import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator;
    import org.deeplearning4j.eval.ROC;
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.ComputationGraphConfiguration;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.Updater;
    import org.deeplearning4j.nn.conf.layers.GravesLSTM;
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
    import org.deeplearning4j.nn.graph.ComputationGraph;
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.dataset.api.DataSet;
    import org.nd4j.linalg.lossfunctions.LossFunctions;
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
    import org.nd4j.linalg.learning.config.Adam
    
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.io.File;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.io.FilenameUtils;
    import java.io.IOException;
    import java.util.HashMap;
    import java.util.Arrays;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    import java.lang.Byte;
    
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    ResizeArea: Anti-aliased resampling with area interpolation. 'antialias' has no effect when used with area interpolation; it always anti-aliases.

    ResizeMitchelcubic: Mitchell-Netravali Cubic non-interpolating filter. For synthetic images (especially those lacking proper prefiltering), less ringing than Keys cubic kernel but less sharp.

    throws IOException
    param dataNormalization the normalizer to save (optional)
  • throws IOException

  • throws IOException
    throws IOException
    param dataNormalization the normalizer ot save (may be null)
  • throws IOException

  • see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)
    see #restoreMultiLayerNetworkAndNormalizer(InputStream, boolean)
    throws IOException If an error occurs when reading from the stream
    throws IOException If an error occurs when reading from the stream
    [source]arrow-up-right
    Recurrent Networks in DL4J

    Function

    Details

    seed

    For keeping the network outputs reproducible during runs by initializing weights and other network randomizations through a seed

    updater

    Algorithm to be used for updating the parameters. The first value is the learning rate, the second one is the Nesterov's momentum.

    Function

    Details

    nIn

    The number of inputs coming from the previous layer. (In the first layer, it represents the input it is going to take from the input layer)

    nOut

    he number of outputs it’s going to send to the next layer. (For output layer it represents the labels here)

    weightInit

    The type of weights initialization to use for the layer parameters.

    activation

    The activation function between layers

    Function

    Details

    addInputs

    A list of strings telling the network what layers to use as input layers

    addLayer

    First parameter is the layer name, then the layer object and finally a list of strings defined previously to feed this layer as inputs

    setOutputs

    A list of strings telling the network what layers to use as output layers

    haven't even explicitly defined this result as a separate SDVariable, and yet a corresponding SDVariable will be

    created internally and added to our instance of SameDiff under the String name "matrix_product". In fact, a unique

    String name is given to every SDVariable you produce by operations: if you don't give a name explicitly, it is

    assigned to the resulting SDVariable automatically based on the operation's name.

    comparison operations like gt or lte, used both to compare each element to a fixed double value as well as for

    elementwise comparison with another SDVariable of the same shape, and alike;

  • basic reduction operations: things like min, sum, prod (product of elements in array), mean, norm2,

    argmax (index of the maximal element), squaredDifference and so on, which may be taken along specified dimensions;

  • basic statistics operations for computing mean and standard deviation along given dimensions: mean and std.

  • operations for restructuring of the underlying array: reshape and permute, along with shape - an operation that

    delivers the shape of a variable as an array of integers - the dimension sizes;

  • cnn - convolutional neural network tools;

  • rnn - recurrent neural network tools;

  • loss - loss functions;

    In order to use a particular operation, you need to call one of these 6 objects form your SameDiff instance, and then

    an operation itself, like that:

    or

    The distribution of operations among the auxiliary objects has no structural bearing beyond organizing things in a more

    intuitive way. So, for instance, if you're not sure whether to seek for, say, tanh operation in math or in nn,

    don't worry: we have it in both.

  • exponential/hyperbolic functions, like exp, sinh, log, atanh etc.;

  • miscellaneous elementwise operations, like taking absolute value, rounding and clipping, such as abs, sign,

    ceil, round, clipByValue, clipByNorm etc.;

  • reductions along specified dimensions: min, amax, mean, asum, logEntropy, and similar;

  • distance (reduction) operations, such as euclideanDistance, manhattanDistance, jaccardDistance, cosineDistance,

    hammingDistance, cosineSimilarity, along specified dimensions, for two identically shaped SDVariables;

  • specific matrix operations: matrixInverse, matrixDeterminant, diag (creating a diagonal matrix), trace, eye

    (creating identity matrix with variable dimensions), and several others;

  • more statistics operations: standardize, moment, normalizeMoments, erf and erfc (Gaussian error function and

    its complementary);

  • counting and indexing reductions: methods like conuntZero (number of zero elements), iamin (index of the element

    with the smallest absolute value), firstIndex (an index of the first element satisfying a specified Condition function);

  • reductions indicating properties of the underlying arrays. These include e.g. isNaN (elementwise checking), isMax

    (shape-preserving along specified dimensions), isNonDecreasing (reduction along specified dimensions);

  • elementwise logical operations: and, or, xor, not.

  • ,
    softmax
    as well as their less used versions like

    leakyRelu, elu, hardTanh, and many more;

  • padding for 2d arrays with method pad, supporting several padding types, with both constant and variable padding width;

  • explosion/overfitting prevention, such as dropout, layerNorm and batchNorm for layer resp. batch normalization;

  • linear deconvolution layers, currently deconv1d, deconv2d, deconv3d;
  • pooling, e.g. maxPoooling2D, avgPooling1D;

  • specialized reshaping methods: batchToSpace, spaceToDepth, col2Im and alike;

  • upsampling, currently presented by upsampling2d operation;

  • local response normalization: localResponseNormalization, currently for 2d convolutional layers only;

  • [-1, nOut, height, width]
    ). Finally, in the last line we apply a
    relu
    activation.
    Graves LSTM units, using gru methods.

    MEAN_BY_WEIGHT - first computes the sum as above, and then divides it by the sum of all weights, producing a scalar

    value: mean_loss = sum(weights * loss_per_sample) / sum(weights). If weights are not

    specified, they all are set to 1.0 and this reduction is equivalent to getting mean loss value over the minibatch.

  • MEAN_BY_NONZERO_WEIGHT_COUNT - divides the weighted sum by the number of nonzero weight, producing a scalar:

    mean_count_loss = sum(weights * loss_per_sample) / count(weights != 0). Useful e.g. when you want to compute the mean

    only over a subset of valid samples, setting weights by either 0. or 1.. When weights are not given, it just

    produces mean, and thus equivalent to MEAN_BY_WEIGHT.

  • avoid the code like this:

    A properly working version of the above code (if we've desired to obtain 2xy+2y2 in an unusual way) will be

    To learn more why it functions like that, see our graph sectionarrow-up-right.

    variables
    operations overview
    data type
    variablesarrow-up-right
    hashtag
    Data

    The data is taken from MISR measurements and expert features of 3 images of polar regions. For each location in the grid, there is an expert label whether or not clouds are present and 8 features (radiances + expert labels). Data from two images will comprise the training set and the left out image is in the test set.

    The data can be found in a tar.gz file located at the url provided below in the next cell. It is organized into two directories (train and test). In each directory there are five subdirectories: n1, n2, n3, n4, and n5. The data in n1 contains expert features and the label pertaining to a particular location in an image. n2, n3, n4, and n5 contain the expert features corresponding to the nearest locations to the original location.

    We will additionally use features from a location’s nearest neighbors as features to feed into our model, because there are dependencies across neighboring locations. In other words, if a location’s neighbors have a positive cloud label, it is more likely for the original location to have a positive cloud label as well. The reverse also applies as well.

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory.

    Next, we must extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directory, and copy the files into our temporary directory.

    hashtag
    DataSetIterators

    Our next goal is to convert the raw data (csv files) into a DataSetIterator, which can then be fed into a neural network for training. We will first obtain the paths containing the raw data, which is in csv file format.

    We then will create two DataSetIterators to feed the data into a neural network. But first, we will initialize CSVRecordReaders to parse the raw data and convert it to record-like format. We create separate CSVRecordReaders for the original location and each nearest neighbor. Since the data is contained in separate RecordReaders, we will use a RecordReaderMultiDataSetIterator, which allows for multiple inputs or outputs. We then add the RecordReaders to the DataSetIterator using the addReader method of the DataSetIterator.Builder() class. We specify the inputs using the addInput method and the label using the addOutputOneHot method.

    The same process is applied to the testing data.

    hashtag
    Neural Net Configuration

    Now that the DataSetIterators are initialized, we can now specify the configuration of the neural network. We will ultimately use a ComputationGraph since we will have multiple inputs to the network. MultiLayerNetworks cannot be used when there are multiple inputs and/or outputs.

    To specify the network architecture and the hyperparameters, we use the NeuralNetConfiguraiton.Builder class. We can add each input using the addLayer method of the class. Because the inputs are separate, the addVertex method is used to add a MergeVertex to the network. This vertex will merge the outputs from the previous input layers into a combined representation. Finally, a fully connected layer is applied to the merged output, which passes the activations to the final output layer.

    The other hyperparameters, such as the optimization algorithm, updater, number of hidden nodes, and etc are also specified in this block of code as well.

    hashtag
    Model Training

    We are now ready to train our model. We initialize our ComptutationGraph and train over the number of epochs with a call to the fit method of the ComputationGraph to train our specified model.

    To evaluate our model, we simply use the evaluateROC method of the ComptuationGraph class.

    Finally we can print out the area under the curve (AUC) metric!

    Linear algebra and signal processing functions.

  • Multiplatform functionality including GPUs.

    • all major operating systems: win/linux/osx/android.

    • architectures: x86, arm, ppc.

  • This quickstart follows the same layout and approach of the Numpy quickstartarrow-up-right. This should help people familiar with Python and Numpy get started quickly with Nd4J.

    hashtag
    Prerequisites

    You can use Nd4J from any JVM Languagearrow-up-right. (For example: Scala, Kotlin). You can use Nd4J with any build tool. The sample code in this quick start uses the following:

    • Java (developer version)arrow-up-right 1.7 or later (Only 64-Bit versions supported)

    • Apache Mavenarrow-up-right (automated build and dependency manager)

    • Gitarrow-up-right (distributed version control system)

    To improve readability we show you the output of System.out.println(...). But we have not show the print statement in the sample code. If you are confident you know how to use maven and git, please feel free to skip to the Basics. In the remainder of this section we will build a small 'hello ND4J' application to verify the prequisites are set up correctly.

    Execute the following commands to get the project from github.

    When everything is set up correctly you should see the following output:

    hashtag
    Basics

    The main feature of Nd4j is the versatile n-dimensional array interface called INDArray. To improve performance Nd4j uses off-heap memory to store data. The INDArray is different from standard Java arrays.

    Some of the key properties and methods for an INDArray x are as follows:

    hashtag
    Array Creation

    To create INDArrays you use the static factory methods of the Nd4jarrow-up-right class.

    The Nd4j.createFromArray function is overloaded to make it easy to create INDArrays from regular Java arrays. The example below uses Java double arrays. Similar create methods are overloaded for float, int and long. The Nd4j.createFromArray function has overloads up to 4d for all types.

    Nd4j can create arrays initialized with zeros and ones using the functions zeros and ones. The rand function allows you to create an array initialized with random values. The default datatype of the INDArray created is float. Some overloads allow you to set the datatype.

    Use the arange functions to create an array of evenly spaces values:

    The linspace function allows you to specify the number of points generated:

    hashtag
    Printing Arrays

    The INDArray supports Java's toString() method. The current implementation has limited precision and a limited number of elements. The output is similar to printing NumPy arrays:

    hashtag
    Basic Operations

    You will have to use INDArray methods to perform operations on your arrays. There are in-place and copy overloads and scalar and element wise overloaded versions. The in-place operators return a reference to the array so you can conveniently chain operations together. Use in-place operators where possible to improve performance. Copy operators have new array creation overhead.

    addition: arr.add(...), arr.addi(...) substraction: arr.sub(...), arr.subi(...) multiplication: arr.mul(...), arr.muli(...) division: arr.div(...), arr.divi(...)

    When you perform the basic operations you must make sure the underlying data types are the same.

    The INDArray has methods implementing reduction/accumulation operations such as sum, min, max.

    Provide a dimension argument to apply the operation across the specified dimension:

    hashtag
    Transform operation

    Nd4j provides familiar mathematical functions such as sin, cos, and exp. These are called transform operations. The result is returned as an INDArray.

    You can check out a complete list of transform operations in the Javadocarrow-up-right

    hashtag
    Matrix multiplication

    We have already seen the element wise multiplcation in the basic operations. The other Matrix operations have their own methods:

    hashtag
    Indexing, Slicing and Iterating

    Indexing, Slicing and Iterating is harder in Java than in Python. To retreive individual values from an INDArray you can use the getDouble, getFloat or getInt methods. INDArrays cannot be indexed like Java arrays. You can get a Java array from an INDArray using toDoubleVector(), toDoubleMatrix(), toFloatVector() and toFloatMatrix()

    For multidimensional arrays you should use INDArray.get(NDArrayIndex...). The example below shows how to iterate over the rows and columns of a 2D array. Note that for 2D arrays we could have used the getColumn and getRow convenience methods.

    hashtag
    Shape Manipulation

    hashtag
    Changing the shape of an array

    The number of elements along each axis is in the shape. The shape can be changed with various methods,.

    hashtag
    Stacking together different arrays

    Arrays can be stacked together using the vstack and hstack methods.

    hashtag
    Copies and View

    When working with INDArrays the data is not always copied. Here are three cases you should be aware of.

    hashtag
    No Copy at All

    Simple assignments make no copy of the data. Java passes objects by reference. No copies are made on a method call.

    hashtag
    View or Shallow Copy

    Some functions will return a view of an array.

    hashtag
    Deep Copy

    To make a copy of the array use the dup method. This will give you a new array with new data.

    Imports

    hashtag
    Download Data

    To download the data, we will create a temporary directory that will store the data files, extract the tar.gz file from the url, and place it in the specified directory

    We will then extract the data from the tar.gz file, recreate directories within the tar.gz file into our temporary directories, and copy the files from the tar.gz file.

    hashtag
    DataSetIterators

    Next we will convert the raw data (csv files) into DataSetIterators, which will be fed into a neural network. Our training data will have 1600 examples which will be represented by a single DataSetIterator, and the testing data will have 136 examples which will be represented by a separate DataSetIterator. The temperatures of the 10 days following the sequences in the training and testing data will be contained in a separate DataSetIterator as well.

    We first initialize CSVSequenceRecordReaders, which will parse the raw data into record-like format. Then the SequenceRecordReaderDataSetIterators can be created using the RecordReaders. Since each example has exactly 50 timesteps, an alignment mode of equal length is needed. Note also that this is a regression- based task and not a classification one.

    hashtag
    Neural Network

    The next task is to initialize the parameters for the convolutional LSTM neural network and then set up the neural network configuration.

    In the neural network configuraiton we will use the convolutional layer, subsampling layer, LSTM layer, and output layer in success. In order to do this, we need to use the RnnToCnnPreProcessor and CnnToRnnPreprocessor. The RnnToCnnPreProcessor is used to reshape the 3-dimensional input from [batch size, height x width of grid, time series length ] into a 4 dimensional shape [number of examples x time series length , channels, width, height] which is suitable as input to a convolutional layer. The CnnToRnnPreProcessor is then used in a later layer to convert this convolutional shape back to the original 3-dimensional shape.

    hashtag
    Model Training

    To train the model, we use 15 epochs and call the fit method of the MultiLayerNetwork.

    hashtag
    Model Evaluation

    We will now evaluate our trained model. Note that we will use RegressionEvaluation, since our task is a regression and not a classification task. We will only evaluate the model using the temperature of the 10 days following the given sequence of daily temperatures and not on the temperatures of the days in the sequence. This will be done using the rnnTimeStep() method of the MultiLayerNetwork.

    Here we print out the evaluation statistics.

    Sea Temperature Convolutional LSTM Example

    AlexNet

    Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdfarrow-up-right https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxtarrow-up-right

    Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.

    Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code

    hashtag
    Darknet19

    [source]arrow-up-right

    Darknet19 Reference: https://arxiv.org/pdf/1612.08242.pdfarrow-up-right ImageNet weights for this model are available and have been converted from https://pjreddie.com/darknet/imagenet/arrow-up-right using https://github.com/allanzelener/YAD2K .

    There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.listarrow-up-right .

    hashtag
    FaceNetNN4Small2

    [source]arrow-up-right

    A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: https://arxiv.org/abs/1503.03832arrow-up-right Also based on the OpenFace implementation: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdfarrow-up-right

    hashtag
    InceptionResNetV1

    [source]arrow-up-right

    A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: https://arxiv.org/abs/1503.03832arrow-up-right Also based on the OpenFace implementation: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdfarrow-up-right

    hashtag
    LeNet

    [source]arrow-up-right

    LeNet was an early promising achiever on the ImageNet dataset. References:

    • http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdfarrow-up-right

    • https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxtarrow-up-right

    MNIST weights for this model are available and have been converted from https://github.com/f00-/mnist-lenet-kerasarrow-up-right.

    hashtag
    NASNet

    [source]arrow-up-right

    Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.

    This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.

    Paper: https://arxiv.org/abs/1707.07012arrow-up-right ImageNet weights for this model are available and have been converted from https://keras.io/applications/arrow-up-right.

    hashtag
    ResNet50

    [source]arrow-up-right

    Residual networks for deep learning.

    Paper: https://arxiv.org/abs/1512.03385arrow-up-right ImageNet weights for this model are available and have been converted from https://keras.io/applications/</a>.

    hashtag
    SimpleCNN

    [source]arrow-up-right

    A simple convolutional network for generic image classification. Reference: https://github.com/oarriaga/face_classification/arrow-up-right

    hashtag
    SqueezeNet

    [source]arrow-up-right

    U-Net

    An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.

    Paper: https://arxiv.org/abs/1602.07360arrow-up-right ImageNet weights for this model are available and have been converted from https://github.com/rcmalli/keras-squeezenet/arrow-up-right.

    hashtag
    TextGenerationLSTM

    [source]arrow-up-right

    LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is

    Architecture follows this implementation: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.pyarrow-up-right

    Walt Whitman weights are available for generating text from his works, adapted from https://github.com/craigomac/InfiniteMonkeysarrow-up-right.

    hashtag
    TinyYOLO

    [source]arrow-up-right

    Tiny YOLO Reference: https://arxiv.org/pdf/1612.08242.pdfarrow-up-right

    ImageNet+VOC weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2Karrow-up-right and the following code.

    String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);

    FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();

    ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();

    System.out.println(model.summary(InputType.convolutional(416, 416, 3)));

    ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>

    The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

    hashtag
    UNet

    [source]arrow-up-right

    U-Net

    An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

    Paper: https://arxiv.org/abs/1505.04597arrow-up-right Weights are available for image segmentation trained on a synthetic dataset

    hashtag
    VGG16

    [source]arrow-up-right

    VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556arrow-up-right Deep Face Recognition http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdfarrow-up-right

    ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applicationsarrow-up-right. CIFAR-10 weights for this model are available and have been converted using “approach 2” from https://github.com/rajatvikramsingh/cifar10-vgg16arrow-up-right. VGGFace weights for this model are available and have been converted from https://github.com/rcmalli/keras-vggfacearrow-up-right.

    hashtag
    VGG19

    [source]arrow-up-right

    VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition https://arxiv.org/abs/1409.1556arrow-up-right ImageNet weights for this model are available and have been converted from https://github.com/fchollet/keras/tree/1.1.2/keras/applicationsarrow-up-right.

    hashtag
    Xception

    [source]arrow-up-right

    U-Net

    An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.

    Paper: https://arxiv.org/abs/1610.02357arrow-up-right ImageNet weights for this model are available and have been converted from https://keras.io/applications/arrow-up-right.

    hashtag
    YOLO2

    [source]arrow-up-right

    YOLOv2 Reference: https://arxiv.org/pdf/1612.08242.pdf

    ImageNet+COCO weights for this model are available and have been converted from https://pjreddie.com/darknet/yolo using https://github.com/allanzelener/YAD2K and the following code.

    The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

    pretrainedUrl

    Default prior boxes for the model

    [source]arrow-up-right
               in the patches should be, in the input. A dilation of [a,b] means every {@code a}th pixel is taken
               along the height/rows dimension, and every {@code b}th pixel is take along the width/columns dimension (Size: AtLeast(min=0))
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.layers.LSTM;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
    import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator;
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
    import org.nd4j.linalg.dataset.DataSet;
    import org.deeplearning4j.nn.conf.preprocessor.RnnToCnnPreProcessor;
    import org.deeplearning4j.nn.conf.preprocessor.CnnToRnnPreProcessor;
    import org.deeplearning4j.nn.conf.GradientNormalization;
    import org.deeplearning4j.nn.conf.layers;
    import org.deeplearning4j.eval.RegressionEvaluation;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer.Builder;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
    import org.deeplearning4j.nn.conf.Updater;
    
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader;
    import org.datavec.api.records.reader.SequenceRecordReader;
    import org.datavec.api.split.NumberedFileInputSplit;
    
    import java.io.File;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    
    import org.apache.commons.io.FilenameUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    
    
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/seatemp/sea_temp.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_seas/")
    val directory = new File(DATA_PATH)
    directory.mkdir() 
    
    val archizePath = DATA_PATH + "sea_temp.tar.gz"
    val archiveFile = new File(archizePath)
    val extractedPath = DATA_PATH + "sea_temp" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) 
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val path = FilenameUtils.concat(DATA_PATH, "sea_temp/") // set parent directory
    
    val featureBaseDir = FilenameUtils.concat(path, "features") // set feature directory
    val targetsBaseDir = FilenameUtils.concat(path, "targets") // set label directory
    val numSkipLines = 1;
    val regression = true;
    val batchSize = 32;
    
    val trainFeatures = new CSVSequenceRecordReader(numSkipLines, ",");
    trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1, 1936));
    val trainTargets = new CSVSequenceRecordReader(numSkipLines, ",");
    trainTargets.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1, 1936));
    
    val train = new SequenceRecordReaderDataSetIterator(trainFeatures, trainTargets, batchSize,
                    10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
                    
                    
    val testFeatures = new CSVSequenceRecordReader(numSkipLines, ",");
    testFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1937, 2089));
    val testTargets = new CSVSequenceRecordReader(numSkipLines, ",");
    testTargets.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1937, 2089));
    
    val test = new SequenceRecordReaderDataSetIterator(testFeatures, testTargets, batchSize,
                    10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
    val V_HEIGHT = 13;
    val V_WIDTH = 4;
    val kernelSize = 2;
    val numChannels = 1;
    val conf = new NeuralNetConfiguration.Builder()
                    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                    .seed(12345)
                    .weightInit(WeightInit.XAVIER)
                    .updater(new AdaGrad(0.005))
                    .list()
                    .layer(0, new ConvolutionLayer.Builder(kernelSize, kernelSize)
                            .nIn(1) //1 channel
                            .nOut(7)
                            .stride(2, 2)
                            .activation(Activation.RELU)
                            .build())
                    .layer(1, new LSTM.Builder()
                            .activation(Activation.SOFTSIGN)
                            .nIn(84)
                            .nOut(200)
                            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                            .gradientNormalizationThreshold(10)
                            .build())
                    .layer(2, new RnnOutputLayer.Builder(LossFunction.MSE)
                            .activation(Activation.IDENTITY)
                            .nIn(200)
                            .nOut(52)
                            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                            .gradientNormalizationThreshold(10)
                            .build())
                    .inputPreProcessor(0, new RnnToCnnPreProcessor(V_HEIGHT, V_WIDTH, numChannels))
                    .inputPreProcessor(1, new CnnToRnnPreProcessor(6, 2, 7 ))
                    .build();
                    
    val net = new MultiLayerNetwork(conf);
    net.init();
    // Train model on training set
    net.fit(train , 25);
    val eval = net.evaluateRegression[RegressionEvaluation](test);
    
    test.reset();
    println()
    
    println( eval.stats() );
     Nd4j.getRandom().setSeed(12345);
     ModelSerializer.restoreMultiLayerNetwork(modelFile);
    public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater) throws IOException 
    public static void writeModel(@NonNull Model model, @NonNull File file, boolean saveUpdater,DataNormalization dataNormalization) throws IOException 
    public static void writeModel(@NonNull Model model, @NonNull String path, boolean saveUpdater) throws IOException 
    public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater)
                throws IOException 
    public static void writeModel(@NonNull Model model, @NonNull OutputStream stream, boolean saveUpdater,DataNormalization dataNormalization)
                throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file) throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull File file, boolean loadUpdater)
                throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is, boolean loadUpdater)
                throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull InputStream is) throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path) throws IOException 
    public static MultiLayerNetwork restoreMultiLayerNetwork(@NonNull String path, boolean loadUpdater)
                throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull String path) throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull String path, boolean loadUpdater)
                throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull InputStream is, boolean loadUpdater)
                throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull InputStream is) throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull File file) throws IOException 
    public static ComputationGraph restoreComputationGraph(@NonNull File file, boolean loadUpdater) throws IOException 
    public static Task taskByModel(Model model) 
    public static void addNormalizerToModel(File f, Normalizer<?> normalizer) 
    public static void addObjectToFile(@NonNull File f, @NonNull String key, @NonNull Object o)
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/physionet2012/physionet2012.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_physionet/")
    val directory = new File(DATA_PATH)
    directory.mkdir() // create new directory at specified path
    
    val archizePath = DATA_PATH + "physionet2012.tar.gz" // set path for tar.gz file
    val archiveFile = new File(archizePath) // create tar.gz file
    val extractedPath = DATA_PATH + "physionet2012" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) // copy data from URL to file
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val NB_TRAIN_EXAMPLES = 3200 // number of training examples
    val NB_TEST_EXAMPLES = 800 // number of testing examples
    val path = FilenameUtils.concat(DATA_PATH, "physionet2012/") // set parent directory
    
    val featureBaseDir = FilenameUtils.concat(path, "sequence") // set feature directory
    val mortalityBaseDir = FilenameUtils.concat(path, "mortality") // set label directory
    
    // Load training data
    
    val trainFeatures = new CSVSequenceRecordReader(1, ",")
    trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 0, NB_TRAIN_EXAMPLES - 1))
    
    val trainLabels = new CSVSequenceRecordReader()
    trainLabels.initialize(new NumberedFileInputSplit(mortalityBaseDir + "/%d.csv", 0, NB_TRAIN_EXAMPLES - 1))
    
    val trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels,
                  32, 2, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END)
                  
                  
    // Load testing data
    val testFeatures = new CSVSequenceRecordReader(1, ",");
    testFeatures.initialize(new NumberedFileInputSplit(featureBaseDir + "/%d.csv", NB_TRAIN_EXAMPLES, NB_TRAIN_EXAMPLES + NB_TEST_EXAMPLES - 1));
           
    val testLabels = new CSVSequenceRecordReader();
    testLabels.initialize(new NumberedFileInputSplit(mortalityBaseDir + "/%d.csv", NB_TRAIN_EXAMPLES, NB_TRAIN_EXAMPLES  + NB_TEST_EXAMPLES - 1));
    
    val testData = new SequenceRecordReaderDataSetIterator(testFeatures, testLabels,
                    32, 2, false,SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    // Set neural network parameters
    val NB_INPUTS = 86
    val NB_EPOCHS = 10
    val RANDOM_SEED = 1234
    val LEARNING_RATE = 0.005
    val BATCH_SIZE = 32
    val LSTM_LAYER_SIZE = 200
    val NUM_LABEL_CLASSES = 2
    val conf = new NeuralNetConfiguration.Builder()
            .seed(RANDOM_SEED)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Adam(LEARNING_RATE))
            .weightInit(WeightInit.XAVIER)
            .dropOut(0.25)
            .graphBuilder()
            .addInputs("trainFeatures")
            .setOutputs("predictMortality")
            .addLayer("L1", new GravesLSTM.Builder()
                    .nIn(NB_INPUTS)
                    .nOut(LSTM_LAYER_SIZE)
                    .forgetGateBiasInit(1)
                    .activation(Activation.TANH)
                    .build(),
                    "trainFeatures")
            .addLayer("predictMortality", new RnnOutputLayer.Builder(LossFunctions.LossFunction.XENT)
                    .activation(Activation.SOFTMAX)
                    .nIn(LSTM_LAYER_SIZE).nOut(NUM_LABEL_CLASSES).build(),"L1")
            .build()
            
    val model = new ComputationGraph(conf)
    model.fit(trainData, 2)
    val roc = new ROC(100);
    
    while (testData.hasNext()) {
        val batch = testData.next();
        val output = model.output(batch.getFeatures());
        roc.evalTimeSeries(batch.getLabels(), output(0));
    }
    
    println("FINAL TEST AUC: " + roc.calculateAUC());
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.Updater;
    import org.deeplearning4j.nn.conf.layers.LSTM;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
    import org.deeplearning4j.nn.conf.GradientNormalization;
    import org.deeplearning4j.eval.ROC;
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader;
    import org.datavec.api.records.reader.SequenceRecordReader;
    import org.datavec.api.split.NumberedFileInputSplit;
    import org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator;
    import org.nd4j.linalg.dataset.api.iterator.MultiDataSetIterator;
    import org.deeplearning4j.nn.conf.ComputationGraphConfiguration;
    import org.deeplearning4j.nn.graph.ComputationGraph;
    import org.nd4j.linalg.dataset.api.MultiDataSet;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.nd4j.linalg.learning.config.Adam;
    
    import java.io.File;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    import org.apache.commons.io.FilenameUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/tutorials/instacart.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_instacart/")
    val directory = new File(DATA_PATH)
    directory.mkdir() 
    
    val archizePath = DATA_PATH + "instacart.tar.gz"
    val archiveFile = new File(archizePath)
    val extractedPath = DATA_PATH + "instacart" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) 
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val path = FilenameUtils.concat(DATA_PATH, "instacart/") // set parent directory
    
    val featureBaseDir = FilenameUtils.concat(path, "features") // set feature directory
    val targetsBaseDir = FilenameUtils.concat(path, "breakfast") // set label directory
    val auxilBaseDir = FilenameUtils.concat(path, "dairy") // set futures directory
    val trainFeatures = new CSVSequenceRecordReader(1, ",");
    trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1, 4000));
    
    val trainBreakfast = new CSVSequenceRecordReader(1, ",");
    trainBreakfast.initialize( new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1, 4000));
    
    val trainDairy = new CSVSequenceRecordReader(1, ",");
    trainDairy.initialize(new NumberedFileInputSplit(auxilBaseDir + "/%d.csv", 1, 4000));
    
    val train =  new RecordReaderMultiDataSetIterator.Builder(20)
        .addSequenceReader("rr1", trainFeatures).addInput("rr1")
        .addSequenceReader("rr2",trainBreakfast).addOutput("rr2")
        .addSequenceReader("rr3",trainDairy).addOutput("rr3")
        .sequenceAlignmentMode(RecordReaderMultiDataSetIterator.AlignmentMode.ALIGN_END)
        .build();
    val testFeatures = new CSVSequenceRecordReader(1, ",");
    testFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 4001, 5000));
    
    val testBreakfast = new CSVSequenceRecordReader(1, ",");
    testBreakfast.initialize( new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 4001, 5000));
    
    val testDairy = new CSVSequenceRecordReader(1, ",");
    testDairy.initialize(new NumberedFileInputSplit(auxilBaseDir + "/%d.csv", 4001, 5000));
    
    val test =  new RecordReaderMultiDataSetIterator.Builder(20)
        .addSequenceReader("rr1", testFeatures).addInput("rr1")
        .addSequenceReader("rr2",testBreakfast).addOutput("rr2")
        .addSequenceReader("rr3",testDairy).addOutput("rr3")
        .sequenceAlignmentMode(RecordReaderMultiDataSetIterator.AlignmentMode.ALIGN_END)
        .build();
    val conf = new NeuralNetConfiguration.Builder()
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .seed(12345)
        .weightInit(WeightInit.XAVIER)
        .dropOut(0.25)
        .updater(new Adam())
        .graphBuilder()
        .addInputs("input")
        .addLayer("L1", new LSTM.Builder()
            .nIn(134).nOut(150)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .activation(Activation.TANH)
            .build(), "input")
        .addLayer("out1", new RnnOutputLayer.Builder(LossFunction.XENT)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .activation(Activation.SIGMOID)
            .nIn(150).nOut(1).build(), "L1")
        .addLayer("out2", new RnnOutputLayer.Builder(LossFunction.XENT)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .activation(Activation.SIGMOID)
            .nIn(150).nOut(1).build(), "L1")
        .setOutputs("out1","out2")
        .build();
    val net = new ComputationGraph(conf);
    net.init();
    net.fit( train , 5);
    // Evaluate model
    
    val roc = new ROC();
    
    test.reset();
    
    while(test.hasNext()){
        val next = test.next();
        val features =  next.getFeatures();
        val output = net.output(features(0));
        roc.evalTimeSeries(next.getLabels()(0), output(0));
    }
    
    println(roc.calculateAUC());
    import org.deeplearning4j.nn.conf.graph.MergeVertex
    import org.deeplearning4j.nn.conf.layers.{DenseLayer, LSTM, OutputLayer, RnnOutputLayer}
    import org.deeplearning4j.nn.conf.{ComputationGraphConfiguration, MultiLayerConfiguration, NeuralNetConfiguration}
    import org.deeplearning4j.nn.graph.ComputationGraph
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
    import org.deeplearning4j.nn.weights.WeightInit
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config.Nesterovs
    import org.nd4j.linalg.lossfunctions.LossFunctions
    val multiLayerConf: MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
      .seed(123)
      .updater(new Nesterovs(0.1, 0.9)) //High Level Configuration
      .list() //For configuring MultiLayerNetwork we call the list method
      .layer(0, new DenseLayer.Builder().nIn(784).nOut(100).weightInit(WeightInit.XAVIER).activation(Activation.RELU).build()) //Configuring Layers
      .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.XENT).nIn(100).nOut(10).weightInit(WeightInit.XAVIER).activation(Activation.SIGMOID).build())
      .build() //Building Configuration
    println(multiLayerConf.toJson)
    val multiLayerNetwork : MultiLayerNetwork = new MultiLayerNetwork(multiLayerConf)
    val computationGraphConf : ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
          .seed(123)
          .updater(new Nesterovs(0.1, 0.9)) //High Level Configuration
          .graphBuilder()  //For configuring ComputationGraph we call the graphBuilder method
          .addInputs("input") //Configuring Layers
          .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
          .addLayer("out1", new OutputLayer.Builder().lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).nIn(4).nOut(3).build(), "L1")
          .addLayer("out2", new OutputLayer.Builder().lossFunction(LossFunctions.LossFunction.MSE).nIn(4).nOut(2).build(), "L1")
          .setOutputs("out1","out2")
          .build() //Building configuration
    println(computationGraphConf.toJson)
    val computationGraph : ComputationGraph = new ComputationGraph(computationGraphConf)
    //You can add regularization in the higher level configuration in the network 
    // configuring a regularization algorithm -> 'l1()', l2()' etc as shown 
    // below:
    new NeuralNetConfiguration.Builder()
        .l2(1e-4)
    //When creating layers, you can add a dropout connection by using 
    // 'dropout(<dropOut_factor>)'
    new NeuralNetConfiguration.Builder()
        .list() 
        .layer(0, new DenseLayer.Builder().dropOut(0.8).build())
    //You can initialize the bias of a particular layer by using 
    // 'biasInit(<init_value>)'
    new NeuralNetConfiguration.Builder()
        .list() 
        .layer(0, new DenseLayer.Builder().biasInit(0).build())
    val cgConf1 : ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
            .graphBuilder()
            .addInputs("input") //can use any label for this
            .addLayer("L1", new LSTM.Builder().nIn(5).nOut(5).build(), "input")
            .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
            .setOutputs("L2")
            .build();
    //Here MergeVertex concatenates the layer outputs
    val cgConf2 : ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
            .graphBuilder()
            .addInputs("input1", "input2")
            .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
            .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
            .addVertex("merge", new MergeVertex(), "L1", "L2")
            .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
            .setOutputs("out")
            .build();
    val cgConf3 : ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
            .graphBuilder()
            .addInputs("input")
            .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
            .addLayer("out1", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nIn(4).nOut(3).build(), "L1")
            .addLayer("out2", new OutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.MSE)
                    .nIn(4).nOut(2).build(), "L1")
            .setOutputs("out1","out2")
            .build();
    SDVariable y = sameDiff.math.sin(x);
    SDVariable y = samediff.math().sin(x);
    SDVariable linear = weights.mmul("matrix_product", input).add(bias); 
    SDVariable output = sameDiff.nn.sigmoid("output", linear);
    SDVariable z = x.add(y);
    SDVariable regressionCost = weights.mmul(input).add("regression_prediction", bias).squaredDifference(labels);
    SDVariable activation = sameDiff.math.cube(input);
    SDVariable matrixNorm1 = sameDiff.math.max(sameDiff.math.sum(sameDiff.math.abs(matrix), 1));
    double mean = 0.;
    double deviation = 0.05;
    long[] shape = new long[28, 28];
    SDVariable noise_mnist = sameDiff.random.normal("noise_mnist", mean, deviation, shape);
    SDVariabel noise_audio = sameDiff.random.normal("noise_audio", mean, deviation, windowShape);
    SDVariable denseReluLayer = sameDiff.nn.reluLayer(input, weights, bias);
    SDVariable linear = sameDiff.nn.linear(input, weight, bias);
    SDVariable output = sameDiff.nn.softmax(linear);
    Conv2DConfig config2d = new Conv2DConfig().builder().kW(3).kH(3).pW(2).pH(2).build();
    SDVariable convolution2dLinear = sameDiff.cnn.conv2d(input, weights, config2d);
    SDVariable convolution2dOutput = sameDiff.nn.relu(convolution2dLinear);
    SRUConfiguration sruConfig = new SRUConfiguration(input, weights, bias, init);
    SDVariable sruOutput = samediff.rnn().sru(sruConfig);
    SDVariable logLoss = sameDiff.loss.logLoss("logLoss", label, predictions);
    SDVariable wLogLossMean = sameDiff.loss.logLoss("wLogLossMean", label, predictions, weights, LossReduce.MEAN_BY_WEIGHT);
    SDVariable x = sameDiff0.var(DataType.FLOAT, 1);
    SDVariable y = sameDiff1.placeHolder(DataType.FLOAT, 1);
    SDVariable z = x.add(y);
    SDVariable z = x.add(y);
    //DON'T!!!
    z.mul(2);
    x = z.mul(y);
    SDVariable z = x.add(y);
    SDVariable _2z = z.mul(2);
    w = _2z.mul(y);
    import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
    import org.deeplearning4j.eval.ROC;
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.Updater;
    import org.deeplearning4j.nn.conf.layers.DenseLayer;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.datavec.api.records.reader.RecordReader;
    import org.datavec.api.split.FileSplit;
    import org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator;
    import org.deeplearning4j.nn.conf.layers.OutputLayer;
    import org.deeplearning4j.eval.Evaluation;
    import org.nd4j.linalg.dataset.api.iterator.MultiDataSetIterator;
    import org.nd4j.linalg.dataset.api.MultiDataSet;
    import org.deeplearning4j.nn.conf.ComputationGraphConfiguration;
    import org.nd4j.linalg.lossfunctions.LossFunctions;
    import org.deeplearning4j.nn.conf.graph.MergeVertex;
    import org.deeplearning4j.nn.graph.ComputationGraph;
    import org.nd4j.linalg.learning.config.Adam;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import java.io.File;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    import org.apache.commons.io.FilenameUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/tutorials/Cloud.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_cloud/")
    val directory = new File(DATA_PATH)
    directory.mkdir() 
    
    val archizePath = DATA_PATH + "Cloud.tar.gz"
    val archiveFile = new File(archizePath)
    val extractedPath = DATA_PATH + "Cloud" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) 
    
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val path = FilenameUtils.concat(DATA_PATH, "Cloud/") // set parent directory
    
    val trainBaseDir1 = FilenameUtils.concat(path, "train/n1/train.csv") 
    val trainBaseDir2 = FilenameUtils.concat(path, "train/n2/train.csv")
    val trainBaseDir3 = FilenameUtils.concat(path, "train/n3/train.csv")
    val trainBaseDir4 = FilenameUtils.concat(path, "train/n4/train.csv")
    val trainBaseDir5 = FilenameUtils.concat(path, "train/n5/train.csv") 
    
    val testBaseDir1 = FilenameUtils.concat(path, "test/n1/test.csv")
    val testBaseDir2 = FilenameUtils.concat(path, "test/n2/test.csv")
    val testBaseDir3 = FilenameUtils.concat(path, "test/n3/test.csv")
    val testBaseDir4 = FilenameUtils.concat(path, "test/n4/test.csv") 
    val testBaseDir5 = FilenameUtils.concat(path, "test/n5/test.csv")
    val rrTrain1 = new CSVRecordReader(1);
    rrTrain1.initialize(new FileSplit(new File(trainBaseDir1)));
    val rrTrain2 = new CSVRecordReader(1);
    rrTrain2.initialize(new FileSplit(new File(trainBaseDir2)))
    
    val rrTrain3 = new CSVRecordReader(1);
    rrTrain3.initialize(new FileSplit(new File(trainBaseDir3)))
    
    val rrTrain4 = new CSVRecordReader(1);
    rrTrain4.initialize(new FileSplit(new File(trainBaseDir4)))
    
    val rrTrain5 = new CSVRecordReader(1);
    rrTrain5.initialize(new FileSplit(new File(trainBaseDir5)))
    
    
    val trainIter = new RecordReaderMultiDataSetIterator.Builder(20)
            .addReader("rr1",rrTrain1)
            .addReader("rr2",rrTrain2)
            .addReader("rr3",rrTrain3)
            .addReader("rr4",rrTrain4)
            .addReader("rr5",rrTrain5)
            .addInput("rr1", 1, 3)
            .addInput("rr2", 0, 2)
            .addInput("rr3", 0, 2)
            .addInput("rr4", 0, 2)
            .addInput("rr5", 0, 2)
            .addOutputOneHot("rr1", 0, 2)
            .build();
    val rrTest1 = new CSVRecordReader(1);
    rrTest1.initialize(new FileSplit(new File(testBaseDir1)));
    
    val rrTest2 = new CSVRecordReader(1);
    rrTest2.initialize(new FileSplit(new File(testBaseDir2)));
    
    val rrTest3 = new CSVRecordReader(1);
    rrTest3.initialize(new FileSplit(new File(testBaseDir3)));
    
    val rrTest4 = new CSVRecordReader(1);
    rrTest4.initialize(new FileSplit(new File(testBaseDir4)));
    
    val rrTest5 = new CSVRecordReader(1);
    rrTest5.initialize(new FileSplit(new File(testBaseDir5)));
    
    val testIter = new RecordReaderMultiDataSetIterator.Builder(20)
            .addReader("rr1",rrTest1)
            .addReader("rr2",rrTest2)
            .addReader("rr3",rrTest3)
            .addReader("rr4",rrTest4)
            .addReader("rr5",rrTest5)
            .addInput("rr1", 1, 3)
            .addInput("rr2", 0, 2)
            .addInput("rr3", 0, 2)
            .addInput("rr4", 0, 2)
            .addInput("rr5", 0, 2)
            .addOutputOneHot("rr1", 0, 2)
            .build();
    val conf = new NeuralNetConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Adam())
            .graphBuilder()
            .addInputs("input1", "input2", "input3", "input4", "input5")
            .addLayer("L1", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(3).nOut(50)
                .build(), "input1")
            .addLayer("L2", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(3).nOut(50)
                .build(), "input2")
            .addLayer("L3", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(3).nOut(50)
                .build(), "input3")
            .addLayer("L4", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(3).nOut(50)
                .build(), "input4")
            .addLayer("L5", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(3).nOut(50)
                .build(), "input5")
            .addVertex("merge", new MergeVertex(), "L1", "L2", "L3", "L4", "L5")
            .addLayer("L6", new DenseLayer.Builder()
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.RELU)
                .nIn(250).nOut(125).build(), "merge")
            .addLayer("out", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.MCXENT)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.SOFTMAX)
                .nIn(125)
                .nOut(2).build(), "L6")
            .setOutputs("out")
            .build();
    val model = new ComputationGraph(conf);
    model.init()
    model.fit( trainIter, 5 );
    val roc = model.evaluateROC[ROC](testIter, 100)
    println("FINAL TEST AUC: " + roc.calculateAUC());
    git clone https://github.com/RobAltena/HelloNd4J.git
    
    cd HelloNd4J
    
    mvn install
    
    mvn exec:java -Dexec.mainClass="HelloNd4j"
    [         0,         0]
    import org.nd4j.linalg.factory.Nd4j;
    import org.nd4j.linalg.api.buffer.DataType;
    
    INDArray x = Nd4j.zeros(3,4);
    
    // The number of axes (dimensions) of the array.
    int dimensions = x.rank();
    
    // The dimensions of the array. The size in each dimension.
    long[] shape = x.shape();
    
    // The total number of elements.
    long length = x.length();
    
    // The type of the array elements. 
    DataType dt = x.dataType();
    double arr_2d[][]={{1.0,2.0,3.0},{4.0,5.0,6.0},{7.0,8.0,9.0}};
    INDArray x_2d = Nd4j.createFromArray(arr_2d);
    
    double arr_1d[]={1.0,2.0,3.0};
    INDArray  x_1d = Nd4j.createFromArray(arr_1d);
    INDArray  x = Nd4j.zeros(5);
    //[         0,         0,         0,         0,         0], FLOAT
    
    int [] shape = {5};
    x = Nd4j.zeros(DataType.DOUBLE, 5);
    //[         0,         0,         0,         0,         0], DOUBLE
    
    // For higher dimensions you can provide a shape array. 2D random matrix example:
    int rows = 4;
    int cols = 5;
    int[] shape = {rows, cols};
    INDArray x = Nd4j.rand(shape);
    INDArray  x = Nd4j.arange(5);
    // [         0,    1.0000,    2.0000,    3.0000,    4.0000]
    
    INDArray  x = Nd4j.arange(2, 7);
    // [    2.0000,    3.0000,    4.0000,    5.0000,    6.0000]
    INDArray  x = Nd4j.linspace(1, 10, 5); //start, stop, count.
    // [    1.0000,    3.2500,    5.5000,    7.7500,   10.0000]
    
    // Evaluate a function over many points.
    import static org.nd4j.linalg.ops.transforms.Transforms.sin;
    INDArray  x = Nd4j.linspace(0.0, Math.PI, 100, DataType.DOUBLE);
    INDArray  y = sin(x);
    INDArray  x = Nd4j.arange(6);  //1d array
    System.out.println(x);  //We just give the output of the print command from here on.
    // [         0,    1.0000,    2.0000,    3.0000,    4.0000,    5.0000]
    
    int [] shape = {4,3};
    x = Nd4j.arange(12).reshape(shape);   //2d array
    /*
    [[         0,    1.0000,    2.0000], 
     [    3.0000,    4.0000,    5.0000], 
     [    6.0000,    7.0000,    8.0000], 
     [    9.0000,   10.0000,   11.0000]]
    */
    
    int [] shape2 = {2,3,4};
    x = Nd4j.arange(24).reshape(shape2);  //3d array
    /*
    [[[         0,    1.0000,    2.0000,    3.0000], 
      [    4.0000,    5.0000,    6.0000,    7.0000], 
      [    8.0000,    9.0000,   10.0000,   11.0000]], 
    
     [[   12.0000,   13.0000,   14.0000,   15.0000], 
      [   16.0000,   17.0000,   18.0000,   19.0000], 
      [   20.0000,   21.0000,   22.0000,   23.0000]]]
    */
    //Copy
    arr_new = arr.add(scalar);    // return a new array with scalar added to each element of arr.
    arr_new = arr.add(other_arr); // return a new array with element wise addition of arr and other_arr.
    
    //in place.
    arr_new = arr.addi(scalar); //Heads up: arr_new points to the same array as arr.
    arr_new = arr.addi(other_arr);
    int [] shape = {5};
    INDArray  x = Nd4j.zeros(shape, DataType.DOUBLE);
    INDArray  x2 = Nd4j.zeros(shape, DataType.INT);
    INDArray  x3 = x.add(x2);
    // java.lang.IllegalArgumentException: Op.X and Op.Y must have the same data type, but got INT vs DOUBLE
    
    // casting x2 to DOUBLE solves the problem:
    INDArray x3 = x.add(x2.castTo(DataType.DOUBLE));
    int [] shape = {2,3};
    INDArray  x = Nd4j.rand(shape);
    x;
    x.sum();
    x.min();
    x.max();
    /*
    [[    0.8621,    0.9224,    0.8407], 
     [    0.1504,    0.5489,    0.9584]]
    4.2830
    0.1504
    0.9584
    */
    INDArray x = Nd4j.arange(12).reshape(3, 4);
    /*
    [[         0,    1.0000,    2.0000,    3.0000], 
     [    4.0000,    5.0000,    6.0000,    7.0000], 
     [    8.0000,    9.0000,   10.0000,   11.0000]]
    */        
    
    x.sum(0); // Sum of each column.
    //[   12.0000,   15.0000,   18.0000,   21.0000]
    
    x.min(1); // Min of each row
    //[         0,    4.0000,    8.0000]
    
    x.cumsum(1); // cumulative sum across each row,
    /*
    [[         0,    1.0000,    3.0000,    6.0000], 
     [    4.0000,    9.0000,   15.0000,   22.0000], 
     [    8.0000,   17.0000,   27.0000,   38.0000]]
    */
    import static org.nd4j.linalg.ops.transforms.Transforms.exp;
    import static org.nd4j.linalg.ops.transforms.Transforms.sqrt;
    
    INDArray x = Nd4j.arange(3);
    // [         0,    1.0000,    2.0000]
    exp(x);
    // [    1.0000,    2.7183,    7.3891]
    sqrt(x);
    // [         0,    1.0000,    1.4142]
    INDArray x = Nd4j.arange(12).reshape(3, 4);
    /*
    [[         0,    1.0000,    2.0000,    3.0000], 
     [    4.0000,    5.0000,    6.0000,    7.0000], 
     [    8.0000,    9.0000,   10.0000,   11.0000]]
    */
    
    INDArray y = Nd4j.arange(12).reshape(4, 3);
    /*
    [[         0,    1.0000,    2.0000], 
     [    3.0000,    4.0000,    5.0000], 
     [    6.0000,    7.0000,    8.0000], 
     [    9.0000,   10.0000,   11.0000]]
    */
    
    x.mmul(y);  // matrix product.
    /*
    [[   42.0000,   48.0000,   54.0000], 
     [  114.0000,  136.0000,  158.0000], 
     [  186.0000,  224.0000,  262.0000]]
    */
    
    // dot product.
    INDArray x = Nd4j.arange(12);
    INDArray y = Nd4j.arange(12);
    dot(x, y);  
    //506.0000
    INDArray x = Nd4j.arange(12);
    // [         0,    1.0000,    2.0000,    3.0000,    4.0000,    5.0000,    6.0000,    7.0000,    8.0000,    9.0000,   10.0000,   11.0000]
    
    float f = x.getFloat(3);  // Single element access. Other methods: getDouble, getInt, ...
    // 3.0
    
    float []  fArr = x.toFloatVector(); //Convert to Java array.
    // [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
    
    INDArray x2 = x.get(NDArrayIndex.interval(2, 6));
    // [    2.0000,    3.0000,    4.0000,    5.0000]
    
    // On a copy of x: From start to position 6, exclusive, set every 2nd element to -1.0
    INDArray y = x.dup();
    y.get(NDArrayIndex.interval(0, 2, 6)).assign(-1.0);
    //[   -1.0000,    1.0000,   -1.0000,    3.0000,   -1.0000,    5.0000,    6.0000,    7.0000,    8.0000,    9.0000,   10.0000,   11.0000]
    
    // reversed copy of y.
    INDArray y2 = Nd4j.reverse(y.dup());
    //[   11.0000,   10.0000,    9.0000,    8.0000,    7.0000,    6.0000,    5.0000,   -1.0000,    3.0000,   -1.0000,    1.0000,   -1.0000]
    // Iterate over the rows and columns of a 2d arrray.
    int rows = 4;
    int cols = 5;
    int[] shape = {rows, cols};
    
    INDArray x = Nd4j.rand(shape);
    /*
    [[    0.2228,    0.2871,    0.3880,    0.7167,    0.9951], 
     [    0.7181,    0.8106,    0.9062,    0.9291,    0.5115], 
     [    0.5483,    0.7515,    0.3623,    0.7797,    0.5887], 
     [    0.6822,    0.7785,    0.4456,    0.4231,    0.9157]]
    */
    
    for (int row=0; row<rows; row++) {
        INDArray y = x.get(NDArrayIndex.point(row), NDArrayIndex.all());
        }
    /*
    [    0.2228,    0.2871,    0.3880,    0.7167,    0.9951]
    [    0.7181,    0.8106,    0.9062,    0.9291,    0.5115]
    [    0.5483,    0.7515,    0.3623,    0.7797,    0.5887]
    [    0.6822,    0.7785,    0.4456,    0.4231,    0.9157]
    */
    
    for (int col=0; col<cols; col++) {
        INDArray y = x.get(NDArrayIndex.all(), NDArrayIndex.point(col));
        }
    /*
    [    0.2228,    0.7181,    0.5483,    0.6822]
    [    0.2871,    0.8106,    0.7515,    0.7785]
    [    0.3880,    0.9062,    0.3623,    0.4456]
    [    0.7167,    0.9291,    0.7797,    0.4231]
    [    0.9951,    0.5115,    0.5887,    0.9157]
    */
    INDArray x = Nd4j.rand(3,4);
    x.shape();
    // [3, 4]
    
    INDArray x2 = x.ravel();
    x2.shape();
    // [12]
    
    INDArray x3 = x.reshape(6,2).shape();
    x3.shape();
    //[6, 2]
    
    // Be aware that x, x2, and x3 share the same data. 
    x2.putScalar(5, -1.0);
    
    System.out.println( x);
    /*
    [[    0.0270,    0.3799,    0.5576,    0.3086], 
     [    0.2266,   -1.0000,    0.1107,    0.4895], 
     [    0.8431,    0.6011,    0.2996,    0.7500]]
    */
    
    System.out.println( x2);
    // [    0.0270,    0.3799,    0.5576,    0.3086,    0.2266,   -1.0000,    0.1107,    0.4895,    0.8431,    0.6011,    0.2996,    0.7500]
    
    System.out.println( x3);
    /*        
    [[    0.0270,    0.3799], 
     [    0.5576,    0.3086], 
     [    0.2266,   -1.0000], 
     [    0.1107,    0.4895], 
     [    0.8431,    0.6011], 
     [    0.2996,    0.7500]]
    */
    INDArray x = Nd4j.rand(2,2);
    INDArray y = Nd4j.rand(2,2);
    
    x
    /*
    [[    0.1462,    0.5037], 
     [    0.1418,    0.8645]]
    */
    
    y;
    /*
    [[    0.2305,    0.4798], 
     [    0.9407,    0.9735]]
    */
    
    Nd4j.vstack(x, y);
    /*
    [[    0.1462,    0.5037], 
     [    0.1418,    0.8645], 
     [    0.2305,    0.4798], 
     [    0.9407,    0.9735]]
    */
    
    Nd4j.hstack(x, y);
    /*
    [[    0.1462,    0.5037,    0.2305,    0.4798], 
     [    0.1418,    0.8645,    0.9407,    0.9735]]
    */
    INDArray x = Nd4j.rand(2,2);
    INDArray y = x; // y and x point to the same INData object.
    
    public static void f(INDArray x){
        // No copy is made. Any changes to x are visible after the function call.
        }
    INDArray x = Nd4j.rand(3,4);
    INDArray  x2 = x.ravel();
    INDArray  x3 = x.reshape(6,2);
    
    x2.putScalar(5, -1.0); // Changes x, x2 and x3
    
    x
    /*
    [[    0.8546,    0.1509,    0.0331,    0.1308], 
     [    0.1753,   -1.0000,    0.2277,    0.1998], 
     [    0.2741,    0.8257,    0.6946,    0.6851]]
    */
    
    x2
    // [    0.8546,    0.1509,    0.0331,    0.1308,    0.1753,   -1.0000,    0.2277,    0.1998,    0.2741,    0.8257,    0.6946,    0.6851]
    
    x3
    /*
    [[    0.8546,    0.1509], 
     [    0.0331,    0.1308], 
     [    0.1753,   -1.0000], 
     [    0.2277,    0.1998], 
     [    0.2741,    0.8257], 
     [    0.6946,    0.6851]]
    */
    INDArray x = Nd4j.rand(3,4);
    INDArray  x2 = x.ravel().dup();
    
    x2.putScalar(5, -1.0); // Now only changes x2.
    
    x
    /*
    [[    0.1604,    0.0322,    0.8910,    0.4604], 
     [    0.7724,    0.1267,    0.1617,    0.7586], 
     [    0.6117,    0.5385,    0.1251,    0.6886]]
    */
    
    x2
    // [    0.1604,    0.0322,    0.8910,    0.4604,    0.7724,   -1.0000,    0.1617,    0.7586,    0.6117,    0.5385,    0.1251,    0.6886]
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
    import org.deeplearning4j.nn.conf.layers.LSTM;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
    import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
    import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
    import org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator;
    import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
    import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
    import org.nd4j.linalg.dataset.DataSet;
    import org.deeplearning4j.nn.conf.preprocessor.RnnToCnnPreProcessor;
    import org.deeplearning4j.nn.conf.preprocessor.CnnToRnnPreProcessor;
    import org.deeplearning4j.nn.conf.GradientNormalization;
    import org.deeplearning4j.nn.conf.layers;
    import org.deeplearning4j.eval.RegressionEvaluation;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer.Builder;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
    import org.nd4j.linalg.learning.config.Adam;
    import org.deeplearning4j.nn.conf.layers.SubsamplingLayer;
    
    import org.datavec.api.records.reader.impl.csv.CSVSequenceRecordReader;
    import org.datavec.api.records.reader.SequenceRecordReader;
    import org.datavec.api.split.NumberedFileInputSplit;
    
    import org.nd4j.linalg.indexing.NDArrayIndex;
    import org.nd4j.linalg.factory.Nd4j;
    
    import java.io.File;
    import java.net.URL;
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.BufferedOutputStream;
    import java.io.FileOutputStream;
    
    import org.apache.commons.io.FilenameUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
    import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
    import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
    val DATA_URL = "https://dl4jdata.blob.core.windows.net/training/seatemp/sea_temp2.tar.gz"
    val DATA_PATH = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_seas/")
    val directory = new File(DATA_PATH)
    directory.mkdir() 
    
    val archizePath = DATA_PATH + "sea_temp2.tar.gz"
    val archiveFile = new File(archizePath)
    val extractedPath = DATA_PATH + "sea_temp" 
    val extractedFile = new File(extractedPath)
    
    FileUtils.copyURLToFile(new URL(DATA_URL), archiveFile) 
    var fileCount = 0
    var dirCount = 0
    val BUFFER_SIZE = 4096
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream( new BufferedInputStream( new FileInputStream(archizePath))))
    
    var entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    
    while(entry != null){
        if (entry.isDirectory()) {
            new File(DATA_PATH + entry.getName()).mkdirs()
            dirCount = dirCount + 1
            fileCount = 0
        }
        else {
            
            val data = new Array[scala.Byte](4 * BUFFER_SIZE)
    
            val fos = new FileOutputStream(DATA_PATH + entry.getName());
            val dest = new BufferedOutputStream(fos, BUFFER_SIZE);
            var count = tais.read(data, 0, BUFFER_SIZE)
            
            while (count != -1) {
                dest.write(data, 0, count)
                count = tais.read(data, 0, BUFFER_SIZE)
            }
            
            dest.close()
            fileCount = fileCount + 1
        }
        if(fileCount % 1000 == 0){
            print(".")
        }
        
        entry = tais.getNextEntry().asInstanceOf[TarArchiveEntry]
    }
    val path = FilenameUtils.concat(DATA_PATH, "sea_temp/") // set parent directory
    
    val featureBaseDir = FilenameUtils.concat(path, "features") // set feature directory
    val targetsBaseDir = FilenameUtils.concat(path, "targets") // set label directory
    val futureBaseDir = FilenameUtils.concat(path, "futures") // set futures directory
    val numSkipLines = 1;
    val regression = true;
    val batchSize = 32;
    
    val trainFeatures = new CSVSequenceRecordReader(numSkipLines, ",");
    trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1, 1600));
    val trainTargets = new CSVSequenceRecordReader(numSkipLines, ",");
    trainTargets.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1, 1600));
    
    val train = new SequenceRecordReaderDataSetIterator(trainFeatures, trainTargets, batchSize,
                    10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
                    
                    
    val testFeatures = new CSVSequenceRecordReader(numSkipLines, ",");
    testFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + "/%d.csv", 1601, 1736));
    val testTargets = new CSVSequenceRecordReader(numSkipLines, ",");
    testTargets.initialize(new NumberedFileInputSplit(targetsBaseDir + "/%d.csv", 1601, 1736));
    
    val test = new SequenceRecordReaderDataSetIterator(testFeatures, testTargets, batchSize,
                    10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
                    
    val futureFeatures = new CSVSequenceRecordReader(numSkipLines, ",");
    futureFeatures.initialize( new NumberedFileInputSplit(futureBaseDir + "/%d.csv", 1601, 1736));
    val futureLabels = new CSVSequenceRecordReader(numSkipLines, ",");
    futureLabels.initialize(new NumberedFileInputSplit(futureBaseDir + "/%d.csv", 1601, 1736));
    
    val future = new SequenceRecordReaderDataSetIterator(futureFeatures, futureLabels, batchSize,
                    10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
    val V_HEIGHT = 13;
    val V_WIDTH = 4;
    val kernelSize = 2;
    val numChannels = 1;
    val conf = new NeuralNetConfiguration.Builder()
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .seed(12345)
        .updater(new Adam())
        .weightInit(WeightInit.XAVIER)
        .list()
        .layer(0, new ConvolutionLayer.Builder(kernelSize, kernelSize)
            .nIn(numChannels) //1 channel
            .nOut(7)
            .stride(2, 2)
            .activation(Activation.RELU)
            .build())
        .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(kernelSize, kernelSize)
            .stride(2, 2).build())
        .layer(2, new LSTM.Builder()
            .activation(Activation.SOFTSIGN)
            .nIn(21)
            .nOut(100)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .build())
        .layer(3, new RnnOutputLayer.Builder(LossFunction.MSE)
            .activation(Activation.IDENTITY)
            .nIn(100)
            .nOut(52)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(10)
            .build())
        .inputPreProcessor(0, new RnnToCnnPreProcessor(V_HEIGHT, V_WIDTH, numChannels))
        .inputPreProcessor(2, new CnnToRnnPreProcessor(3, 1, 7 ))
        .build();
        
    val net = new MultiLayerNetwork(conf);
    net.init();
    // Train model on training set
    net.fit( train, 15 );
    val eval = new RegressionEvaluation();
    
    test.reset();
    future.reset();
    
    while(test.hasNext()) {
        val next = test.next();
        val features = next.getFeatures();
    
        var pred =  Nd4j.zeros(1, 2);
    
        for(i <- 0 to 49){
            pred = net.rnnTimeStep(features.get(NDArrayIndex.all(), NDArrayIndex.all(), NDArrayIndex.interval(i,i+1)));
        }
    
        val correct = future.next();
        val cFeatures = correct.getFeatures();
    
        for(i <- 0 to 9){
            eval.evalTimeSeries(pred, cFeatures.get(NDArrayIndex.all(), NDArrayIndex.all(), NDArrayIndex.interval(i,i+1)));
            pred = net.rnnTimeStep(pred);
        }
        net.rnnClearPreviousState();
    }
    println(eval.stats())
    String filename = “yolo.h5”; 
    KerasLayer.registerCustomLayer(“Lambda”, KerasSpaceToDepth.class); 
    ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false);
    INDArray priors = Nd4j.create(priorBoxes);
    FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
     .seed(seed)
     .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
     .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
     .gradientNormalizationThreshold(1.0)
     .updater(new Adam.Builder().learningRate(1e-3).build())
     .l2(0.00001)
     .activation(Activation.IDENTITY)
     .trainingWorkspaceMode(workspaceMode)
     .inferenceWorkspaceMode(workspaceMode)
     .build();
    ComputationGraph model = new TransferLearning.GraphBuilder(graph)
     .fineTuneConfiguration(fineTuneConf) 
     .addLayer(“outputs”, new Yolo2OutputLayer.Builder() 
                          .boundingBoxPriors(priors)
                          .build(), “conv2d_23”)
     .setOutputs(“outputs”)
     .build();
    System.out.println(model.summary(InputType.convolutional(608, 608, 3)));
    ModelSerializer.writeModel(model, “yolo2_dl4j_inference.v1.zip”, false); }
    public String pretrainedUrl(PretrainedType pretrainedType) 
    Bottom right: Standard deviations (vs. time) of: activations, gradients and updates
    Histograms of parameters and updates, for each parameter type
  • Learning rate vs. time (note this will be flat, unless learning rate schedules are used)

  • Note that data that isn't shuffled (i.e., each minibatch contains only one class, for classification) can result in very rough or abnormal-looking score vs. iteration graphs

  • Some noise in this line chart is expected (i.e., the line will go up and down within a small range). However, if the scores vary quite significantly between runs variation is very large, this can be a problem

    • The issues mentioned above (learning rate, normalization, data shuffling) may contribute to this.

    • Setting the minibatch size to a very small number of examples can also contribute to noisy score vs. iteration graphs, and might lead to optimization difficulties

  • Note that is a rough guide only, and may not be appropriate for all networks. It's often a good starting point, however.

  • If the ratio diverges significantly from this (for example, > -2 (i.e., 10-2=0.01) or < -4 (i.e., 10-4=0.0001), your parameters may be too unstable to learn useful features, or may change too slowly to learn useful features

  • To change this ratio, adjust your learning rate (or sometimes, parameter initialization). In some networks, you may need to set the learning rate differently for different layers.

  • Keep an eye out for unusually large spikes in the ratio: this may indicate exploding gradients

  • Keep an eye out for parameters that are diverging to +/- infinity: this may be due to too high a learning rate, or insufficient regularization (try adding some L2 regularization to your network).

  • Keep an eye out for biases that become very large. This can sometimes occur in the output layer for classification, if the distribution of classes is very imbalanced

  • Exploding gradients are problematic as they can 'mess up' the parameters of your network
  • In this case, it may indicate a weight initialization, learning rate or input/labels data normalization issue

  • In the case of recurrent neural networks, adding some gradient normalization or gradient clippingarrow-up-right may help

  • Deeplearning4j UI: The Overview Page
    Deeplearning4j UI: The Model Page
    Deeplearning4J UI and Spark Training
    Using the UI to Tune Your Network
    TSNE and Word2Vec
    Fixing UI Issue: "No configuration setting" exception
    Visualizing Network Training with the Deeplearning4j Training UI
    See a UI example herearrow-up-right
    herearrow-up-right
    Deeplearning4j UI: The Overview Page
    Deeplearning4j UI: The Model Page
    Deeplearning4J UI and Spark Training
    example found herearrow-up-right
    Using the UI to Tune Your Network
    web page by Andrej Karpathyarrow-up-right
    TSNE and Word2vec
    TSNEarrow-up-right
    word feature vectors
    Fixing UI Issue: "No configuration setting" exception

    functions: This module has the basic building blocks to build SameDiff variables and graphs.

  • execution: has everything related to SameDiff graph execution.

  • gradcheck: Utility functionality for checking SameDiff gradients, similar in structure to the respective tool in DL4J.

  • loss: Loss functions for SameDiff

  • samediff: Main SameDiff module to define, set up and run SameDiff operations and graphs.

  • hashtag
    Differential functions in the functions module

    See the functions module on GitHub.arrow-up-right

    The central abstraction of the functions module is DifferentialFunction, which underlies pretty much everything in SameDiff. Mathematically, what we're doing in SameDiff is build a directed acyclic graph whose nodes are differential functions, for which we can compute gradients. In that regard, DifferentialFunction makes up a SameDiff graph on a fundamental level.

    Note that each DifferentialFunction comes with a SameDiff instance. We'll discuss SameDiff and this relationship later on. Also, while there's only few key abstractions, they're essentially used everywhere, so it's almost impossible to discuss SameDiff concepts separately. Eventually we'll get around to each part.

    hashtag
    Properties and mappings

    Each differential function comes with properties. In the simplest case, a differential function just has a name. Depending on the operation in question, you'll usually have many more properties (think strides or kernel sizes in convolutions). When we import computation graphs from other projects (TensorFlow, ONNX, etc.) these properties need to be mapped to the conventions we're using internally. The methods attributeAdaptersForFunction, mappingsForFunction, propertiesForFunction and resolvePropertiesFromSameDiffBeforeExecution are what you want to look at to get started.

    Once properties are defined and properly mapped, you call initFromTensorFlow and initFromOnnx for TensorFlow and ONNX import, respectively. More on this later, when we discuss building SameDiff operations.

    hashtag
    Inputs and outputs

    A differential function is executed on a list of inputs, using function properties, and produces one or more output variables. You have access to many helper functions to set or access these variables:

    • args(): returns all input variables.

    • arg(): returns the first input variable (the only one for unary operations).

    • larg() and rarg(): return the first and second (read "left" and "right") argument for binary operations

    • outputVariables(): returns a list of all output variables. Depending on the operation, this may be computed dynamically. As we'll see later on, to get the result for ops with a single output, we'll call .outputVariables()[0].

    Handling output variables is tricky and one of the pitfalls in using and extending SameDiff. For instance, implementing calculateOutputShape for a differential function might be necessary, but if implemented incorrectly can lead to hard-to-debug failures. (Note that SameDiff will eventually call op execution in libnd4j and dynamic custom ops either infer output shapes or need to be provided with the correct output shape.)

    hashtag
    Automatic differentiation

    Automatic differentiation for a differential functions is implemented in a single method: doDiff. Each operation has to provide an implementation of doDiff. If you're implementing a SameDiff operation for a libnd4j op x and you're lucky to find x_bp (as in "back-propagation") you can use that and your doDiff implementation comes essentially for free.

    You'll also see a diff implementation that's used internally and calls doDiff.

    hashtag
    Differential function factory

    Importantly, each differential function has access to a factory, an instance of DifferentialFunctionFactory, by calling f(). More precisely, this will return the factory of the SameDiff instance the differential function has:

    This is used in many places and gives you access to all differential functions currently registered in SameDiff. Think of this factory as a provider of operations. Here's an example of exposing sum to the DifferentialFunctionFactory:

    We leave out the function arguments on purpose here. Note that all we do is redirect to the Sum operation defined elsewhere in ND4J and then return the first output variable (of type SDVariable, discussed in a second). Disregarding the implementation details for now, what this allows you to do is call f().sum(...) from anywhere you have access to a differential function factory. For instance, when implementing a SameDiff op x and you already have x_bp in your function factory, you can override doDiff for x

    hashtag
    Building and executing graphs in samediff

    See the samediff module on GitHub.arrow-up-right

    Not surprisingly, this is where the magic happens. This module has the core structures that SameDiff operates with. First, let's have a look at the variables that make up SameDiff operations.

    hashtag
    SameDiff variables

    SDVariable (read SameDiff variable) extends DifferentialFunction and is to SameDiff what INDArray is to good old ND4J. In particular, SameDiff graphs operate on these variables and each individual operation takes in and spits out a list of SDVariable. An SDVariable comes with a name, is equipped with a SameDiff instance, has shape information and knows how to initialize itself with an ND4J WeightInitScheme. You'll also find a few helpers to set and get these properties.

    One of the few things an SDVariable can do that a DifferentialFunction can't it evaluate its result and return an underlying INDArray by calling eval(). This will run SameDiff internally and retrieve the result. A similar getter is getArr() which you can call at any point to get the current value of this variable. This functionality is used extensively in testing, to assert proper results. An SDVariable also has access to its current gradient through gradient(). Upon initialization there won't be any gradient, it will usually be computed at a later point.

    Apart from these methods, SDVariable also carries methods for concrete ops (and is in that regard a little similar to DifferentialFunctionFactory). For instance, defining add as follows:

    allows you to call c = a.add(b) on two SameDiff variables, the result of which can be accessed by c.eval().

    hashtag
    SameDiff

    The SameDiff class is the main workhorse of the module and brings together most of the concepts discussed so far. A little unfortunately, the inverse is also true and SameDiff instances are part of all other SameDiff module abstractions in some way or the other (which is why you've seen it many times already). Generally speaking, SameDiff is the main entry point for automatic differentiation and you use it to define a symbolic graph that carries operations on SDVariables. Once built, a SameDiff graph can be run in a few ways, for instance exec() and execAndEndResult().

    Convince yourself that invoking SameDiff() sets up a million things!arrow-up-right Essentially, SameDiff will collect and give you access (in terms of both getters and setters) to

    • All differential functions for the graph, with all their properties, which can be accessed in various ways (e.g. name or id).

    • All inputs and output information for said functions.

    • All function properties and how to map them, propertiesToResolve and propertiesForFunction are of particular note.

    SameDiff is also the place where you expose new operations to the SameDiff module. Essentially, you write a little wrapper for the respective operation in the DifferentialFunctionFactory instance f(). Here's an example for cross products:

    hashtag
    SameDiff execution examples and tests

    At this point it might be a good idea to check out and run a few examples. SameDiff tests are a good source for that. Here's an example of how to multiply two SameDiff variables

    This example is taken from SameDiffTestsarrow-up-right, one of the main test sources, in which you also find a few complete end-to-end examples.

    The second place you find tests is in samediffarrow-up-right repo directory. Whenever you add a new operation to SameDiff, add tests for the forward pass and gradient checks as well.

    The third set of relevant tests is stored in importsarrow-up-right and contains test for importing TensorFlow and ONNX graphs. On a side note, the resources for these import tests are generated in our TFOpsTestsarrow-up-right project.

    hashtag
    Creating and exposing new SameDiff ops

    We've seen how ND4J operations get picked up by DifferentialFunctionFactory and SameDiff to expose them to SameDiff at various levels. As for actually implementing these ops, you need to know a few things. In libnd4j you find two classes of operations, which are described herearrow-up-right in detail. We'll show how to implement both op types.

    All operations go herearrow-up-right, and most of the time it's obvious where exactly to put the ops. Special attention goes to layers, which is reserved for deep learning layer implementations (like Conv2Darrow-up-right). These higher-level ops are based on the concept of Modulesarrow-up-right, similar to modules in pytorch or layers in TensorFlow. These layer op implementation also provide a source of more involved op implementations.

    hashtag
    Implementing legacy operations

    Legacy (or XYZ) operations are the old breed of ND4J operations with a characteristic "xyz" signature. Here's how to implement cosine in ND4J by wrapping the cos legacy op from libn4j: Cosine implementationarrow-up-right. When it comes to SameDiff, the good thing about legacy ops is that they're already available in ND4J, but need to be augmented by SameDiff specific functionality to pass the muster. Since the cosine function does not have any properties, this implementation is straightforward. The parts that make this op SameDiff compliant are:

    • You specify SameDiff constructors herearrow-up-right

    • You implement doDiff herearrow-up-right.

    • You specify a SameDiff opName, a TensorFlow tensorflowName and an ONNX onnxName .

    If you look closely, this is only part of the truth, since Cos extends BaseTransformOp, which implements other SameDiff functionality. (Note that BaseTransformOp is a BaseOparrow-up-right, which extends DifferentialFunction from earlier.) For instance, calculateOutputShape is implemented therearrow-up-right. If you want to implement a new transform, you can simply inherit from BaseTransformOp, too. For other op types like reductions etc. there are op base classes available as well, meaning you only need to address the three bullet points above.

    In the rare case you need to write a legacy op from scratch, you'll have to find the respective op number from libn4j, which can be found in legacy_ops.h.

    hashtag
    Implementing Dynamic Custom Operations

    DynamicCustomOparrow-up-right is the new kind of operation from libnd4j and all recent additions are implemented as such. This operation type in ND4J directly extends DifferentialFunction.

    Here'sarrow-up-right an example of the BatchToSpace operation, which inherits from DynamicCustomOp:

    • BatchToSpace is initializedarrow-up-right with two properties, blocks and crops. Note how blocks and crops, which are both of integer type, get added to integer arguments for the operation by calling addIArgument. For float arguments and other types, use addTArgument instead.

    • The operation gets its own name and ,

    • and doDiff is .

    The BatchToSpace operation is then integrated into DifferentialFunctionFactory herearrow-up-right, exposed to SameDiff herearrow-up-right and tested herearrow-up-right.

    The only thing BatchToSpace is currently missing is property mapping. We call the properties for this operation blocks and crops, but in ONNX or TensorFlow they might be called and stored quite differently. To look up the differences for mappings this correctly, see ops.protoarrow-up-right for TensorFlow and onnxops.jsonarrow-up-right for ONNX.

    Let's look at another operation that does property mapping right, namely DynamicPartitionarrow-up-right. This op has precisely one property, called numPartitions in SameDiff. To map and use this property, you do the following:

    • Implement a little helper method called addArgsarrow-up-right that is used in the constructor of the op and in an import helper one-liner that we're discussing next. It's not necessary, but encouraged to do this and call it addArgs consistently, for clarity.

    • Override initFromTensorFlow methodarrow-up-right that maps properties for us using a TFGraphMapper instance and adding arguments with addArgs. Note that since ONNX does not support dynamic partitioning at the time of this writing (hence no onnxName) there's also no initFromOnnx method, which works pretty much the same way as initFromTensorFlow.

    • For the TensorFlow import to work, we also need to . This example of a mapping is very simple, all it does is map TensorFlow's property name num_partititions to our name numPartitions.

    Note that while DynamicPartition has proper property mapping, it currently does not have a working doDiff implementation.

    As a last example, we show one that has a little more interesting property mapping setup, namely Dilation2Darrow-up-right. Not only has this op far more properties to map, as you can see in mappingsForFunctionarrow-up-right, the properties also come with property values, as defined in attributeAdaptersForFunctionarrow-up-right. We've chosen to show this op because it is one that has property mapping, but is neither exposed to DifferentialFunctionFactory not SameDiff.

    Hence, the three DynamicCustomOp examples shown each come with their own defects and represent examples of the work that has to be done for SameDiff. To summarize, to add a new SameDiff op you need to:

    • Create a new operation in ND4J that extends DifferentialFunction. How exactly this implementation is set up depends on the

      • op generation (legacy vs. dynamic custom)

      • op type (transform, reduction, etc.)

    • Define an own op name, as well as TensorFlow and ONNX names.

    • Define necessary SameDiff constructors

    • Use addArgs to add op arguments in a reusable way.

    • Expose the operation in DifferentialFunctionFactory first and wrap it then in SameDiff (or SDVariable for variable methods).

    • Implement doDiff for automatic differentiation.

    • Override mappingsForFunction to map properties for TensorFlow and ONNX

    • If necessary, also provide an attribute adapter by overriding attributeAdaptersForFunction.

    • Add import one-liners for TensorFlow and ONNX by adding initFromTensorFlow and initFromOnnx (using addArgs).

    • Test, test, test.

    here on GitHub.arrow-up-right

    Convolutional Networks

    Train FaceNet Using Center Loss

    Deep learning is the de facto standard for face recognition. In 2015, Google researchers published FaceNet: A Unified Embedding for Face Recognition and Clusteringarrow-up-right, which set a new record for accuracy of 99.63% on the LFW datasetarrow-up-right. An important aspect of FaceNet is that it made face recognition more practical by using the embeddingsarrow-up-right to learn a mapping of face features to a compact Euclidean space (basically, you input an image and get a small 1D array from the network). FaceNet was an adapted version of an Inception-stylearrow-up-right network.

    Around the same time FaceNet was being developed, other research groups were making significant advances in facial recognition. DeepID3arrow-up-right, for example, achieved impressive results. Oxford’s Visual Geometry Group published Deep Face Recognitionarrow-up-right. Note that the Deep Face Recognition paper has a comparison of previous papers, and one key factor in FaceNet is the number of images used to train the network: 200 million.

    hashtag
    Introducing center loss

    FaceNet is difficult to train, partially because of how it uses triplet loss. This required exotic architectures that either set up three models in tandem, or required stacking of examples and unstacking with additional nodes to calculate loss based on euclidean similarity. introduced center loss, a promising technique that added an intraclass component to a training loss function.

    The advantage of training embeddings with center loss is that an exotic architecture is no longer required. In addition, because hardware is better utilized, the amount of time it takes to train embeddings is much shorter. One important distinction when using center loss vs. a triplet loss architecture is that a center loss layer stores its own parameters. These parameters calculate the intraclass “center” of all examples for each label.

    hashtag
    What are we going to learn in this tutorial?

    Using Deeplearning4j, you will learn how to train embeddings for facial recognition and transfer parameters to a new network that uses the embeddings for feed forward. The network will be built using ComputationGraph (Inception-type networks require multiple nodes) via the variant, which is a hand-tuned, parameter-minimized model of FaceNet.

    Because Inception networks are large, we will use the Deeplearning4j model zoo to help build our network.

    hashtag
    Imports

    hashtag
    Instantiate the model

    We are using a minified version of the full FaceNet network to reduce the hardware requirements. Below, we use the FaceNetHelper class for some of the Inception blocks, where parameters have been unchanged from the larger version.

    hashtag
    Print the configuration

    To see that Center Loss if already in the model configuration, you can print a string table of all layers in the network. Use the summary() method to get a complete summary of all layers and parameters. You’ll see that our network here has over 5 million parameters, this is still quite low compared to advanced ImageNet configurations, but will still be taxing on your hardware.

    hashtag
    Using the LFW iterator

    The LFWDataSetIterator, like most of the Deeplearning4j built-in iterators, extends the DataSetIterator class. This API allows for the simple instantiation of datasets and automatic downloading of data in the background. If you are unfamiliar with using DL4J’s built-in iterators, there’s a tutorial describing their usage.

    hashtag
    Classifier training

    With the network configruation is set up and instantiated along with the LFW test/train iterators, training takes just a few lines of code. Since we have a labelled dataset and are using center loss, this is considered “classifier training” and is a supervised learning process. Earlier we attached a ScoreIterationListener to the model by using the setListeners() method. Its output is printed to the console since the internals of Deeplearning4j use SL4J for logging.

    After each epoch, we will evaluate how well the network is learning by using the evaluate() method. Although in this example we only use accuracy() and precision(), it is strongly recommended you perform advanced evaluation with ROC curves and understand the output of a confusion matrix.

    hashtag
    Transferring the parameters

    Now that the network has been trained, using the embeddings requires removing the center loss output layer. Deeplearning4j has a native transfer learning API to assist.

    Benchmark Guide

    General guidelines for benchmarking in DL4J and ND4J.

    hashtag
    General Benchmarking Guidelines

    Guideline 1: Run Warm-Up Iterations Before Benchmarking

    A warm-up period is where you run a number of iterations (for example, a few hundred) of your benchmark without timing, before commencing timing for further iterations.

    Why is a warm-up required? The first few iterations of any ND4J/DL4J execution may be slower than those that come later, for a number of reasons:

    1. In the initial benchmark iterations, the JVM has not yet had time to perform just-in-time compilation of code. Once JIT has completed, code is likely to execute faster for all subsequent operations

    2. ND4J and DL4J (and, some other libraries) have some degree of lazy initialization: the first operation may trigger some one-off execution code.

    3. DL4J or ND4J (when using workspaces) can take some iterations to learn memory requirements for execution. During this learning phase, performance will be lower than after its completion.

    Guideline 2: Run Multiple Iterations of All Benchmarks

    Your benchmark isn't the only thing running on your computer (not to mention if you are using cloud hardware, that might have shared resources). And operation runtime is not perfectly deterministic.

    For benchmark results to be reliable, it is important to run multiple iterations - and ideally report both mean and standard deviation for the runtime. Without this, it's impossible to compare the performance of operations, as performance differences may simply be due to random variation.

    Guideline 3: Pay Careful Attention to What You Are Benchmarking

    This is especially important when comparing frameworks. Before you declare that "performance on operation X is Y" or "A is faster than B", make sure that:

    You are bench-marking only the operations of interest.

    If your goal is to check the performance of an operation, make sure that only this operation is being timed.

    You should carefully check whether you unintentionally including other things - for example, does it include: JVM initialization time? Library initialization time? Result array allocation time? Garbage collection time? Data loading time?

    Ideally, these should be excluded from any timing/performance results you report. If they cannot be excluded, make sure you note this whenever making performance claims.

    1. What native libraries are you using? For example: what BLAS implementation (MKL, OpenBLAS, etc)? If you are using CUDA, are you using CuDNN? ND4J and DL4J can use these libraries (MKL, CuDNN) when they are available - but are not always available by default. If they are not made available, performance can be lower - sometimes considerably.

      This is especially important when comparing results between libraries: for example, if you compared two libraries (one using OpenBLAS, another using MKL) your results may simply reflect the performance differences it the BLAS library being used - and not the performance of the libraries being tested. Similarly, one library with CuDNN and another without CuDNN may simply reflect the performance benefit of using CuDNN.

    2. How are things configured? For better or worse, DL4J and ND4J allow a lot of configuration. The default values for a lot of this configuration is adequate for most users - but sometimes manual configuration is required for optimal performance. This can be especially true in some benchmarks! Some of these configuration options allow users to trade off higher memory use for better performance, for example. Some configuration options of note: (a) Memory configuration (b) Workspaces and garbage collection (c) CuDNN (d) DL4J Cache Mode (enable using

    Guideline 4: Focus on Real-World Use Cases - And Run a Range of Sizes

    Consider for example a benchmark a benchmark that adds two numbers:

    And something equivalent in ND4J:

    Of course, the ND4J benchmark above is going to be much slower - method calls are required, input validation is performed, native code has to be called (with context switching overhead), and so on. One must ask the question, however: is this what users will actually be doing with ND4J or an equivalent linear algebra library? It's an extreme example - but the general point is a valid one.

    Note also that performance on mathematical operations can be size - and shape - specific. For example, if you are benchmarking the performance on matrix multiplication - the matrix dimensions can matter a lot. In some internal benchmarks, we found that different BLAS implementations (MKL vs OpenBLAS) - and different backends (CPU vs GPU) - can perform very differently with different matrix dimensions. None of the BLAS implementations (OpenBLAS, MKL, CUDA) we have tested internally were uniformly faster than others for all input shapes and sizes.

    Therefore - whenever you are running benchmarks, it's important to run those benchmarks with multiple different input shapes/sizes, to get the full performance picture.

    Guideline 5: Understand Your Hardware

    When comparing different hardware, it's important to be aware of what it excels at. For example, you might find that neural network training performs faster on a CPU with minibatch size 1 than on a GPU - yet larger minibatch sizes show exactly the opposite. Similarly, small layer sizes may not be able to adequately utilize the power of a GPU.

    Furthermore, some deep learning distributions may need to be specifically compiled to provide support for hardware features such as AVX2 (note that recent version of ND4J are packaged with binaries for CPUs that support these features). When running benchmarks, the utilization (or lack there-of) of these features can make a considerable difference to performance.

    Guideline 6: Make It Reproducible

    When running benchmarks, it's important to make your benchmarks reproducible. Why? Good or bad performance may only occur under certain limited circumstances.

    And finally - remember that (a) ND4J and DL4J are in constant development, and (b) benchmarks do sometimes identify performance bottlenecks (after all we - ND4J includes literally hundreds of distinct operations). If you identify a performance bottleneck, great - we want to know about it - so we can fix it. Any time a potential bottleneck is identified, we first need to reproduce it - so that we can study it, understand it and ultimately fix it.

    Guideline 7: Understand the Limitations of Your Benchmarks

    Linear algebra libraries contain hundreds of distinct operations. Neural network libraries contain dozens of layer types. When benchmarking, it's important to understand the limitations of those benchmarks. Benchmarking one type of operation or layer cannot tell you anything about the performance on other types of layers or operations - unless they share code that has been identified to be a performance bottleneck.

    Guideline 8: If You Aren't Sure - Ask

    The DL4J/ND4J developers are available on discourse. You can ask questions about benchmarking and performance there:

    And if you do happen to find a performance issue - let us know!

    hashtag
    ND4J Specific Benchmarking

    A Note on BLAS and Array Orders

    BLAS - or Basic Linear Algebra Subprograms - refers to an interface and set of methods used for linear algebra operations. Some examples include 'gemm' - General Matrix Multiplication - and 'axpy', which implements Y = a*X+b.

    ND4J can use multiple BLAS implementations - versions up to and including 1.0.0-beta6 have defaulted to OpenBLAS. However, if Intel MKL (free versions are available ) is installed an available, ND4J will link with it for improved performance in many BLAS operations.

    Note that ND4J will log the BLAS backend used when it initializes. For example:

    Performance can depend on the available BLAS library - in internal tests, we have found that OpenBLAS has been between 30% faster and 8x slower than MKL - depending on the array sizes and array orders.

    Regarding array orders, this also matters for performance. ND4J has the possibility of representing arrays in either row major ('c') or column major ('f') order. See for more details. Performance in operations such as matrix multiplication - but also more general ND4J operations - depends on the input and result array orders.

    For matrix multiplication, this means there are 8 possible combinations of array orders (c/f for each of input 1, input 2 and result arrays). Performance won't be the same for all cases.

    Similarly, an operation such as element-wise addition (i.e., z=x+y) will be much faster for some combinations of input orders than others - notably, when x, y and z are all the same order. In short, this is due to memory striding: it's cheaper to read a sequence of memory addresses when those memory addresses are adjacent to each other in memory, as compared to being spread far apart.

    Note that, by default, ND4J expects result arrays (for matrix multiplication) to be defined in column major ('f') order, to be consistent across backends, given that CuBLAS (i.e., NVIDIA's BLAS library for CUDA) requires results to be in f order. As a consequence, some ways of performing matrix multiplication with the result array being in c order will have lower performance than if the same operation was executed with an 'f' order array.

    Finally, when it comes to CUDA: array orders/striding can matter even more than when running on CPU. For example, certain combinations of orders can be much faster than others - and input/output dimensions that are even multiples of 32 or 64 typically perform faster (sometimes considerably) than when input/output dimensions are not multiples of 32.

    hashtag
    DL4J Specific Benchmarking

    Most of what has been said for ND4J also applies to DL4J.

    In addition:

    1. If you are using the nd4j-native (CPU) backend, ensure you are using Intel MKL. This is faster than the default of OpenBLAS in most cases.

    2. If you are using CUDA, ensure you are using CuDNN ()

    3. Check the and guides. The defaults are usually good - but sometimes better performance can be obtained with some tweaking. This is especially important if you have a lot of Java objects (such as, Word2Vec vectors) in memory while training.

    hashtag
    Common Benchmark Mistakes

    Finally, here's a summary list of common benchmark mistakes:

    1. Not using the latest version of ND4J/DL4J (there's no point identifying a bottleneck that was fixed many releases back). Consider trying snapshots to get the latest performance improvements.

    2. Not paying attention to what native libraries (MKL, OpenBLAS, CuDNN etc) are being used

    3. Providing no warm-up period before benchmarking begins

    hashtag
    How to Run Deeplearning4j Benchmarks - A Guide

    Total training time is always ETL plus computation. That is, both the data pipeline and the matrix manipulations determine how long a neural network takes to train on a dataset.

    When programmers familiar with Python try to run benchmarks comparing Deeplearning4j to well-known Python frameworks, they usually end up comparing ETL + computation on DL4J to just computation on the Python framework. That is, they're comparing apples to oranges. We'll explain how to optimize several parameters below.

    The JVM has knobs to tune, and if you know how to tune them, you can make it a very fast environment for deep learning. There are several things to keep in mind on the JVM. You need to:

    • Increase the

    • Get garbage collection right

    • Make ETL asynchronous

    • Presave datasets (aka pickling)

    hashtag
    Setting Heap Space

    Users have to reconfigure their JVMs themselves, including setting the heap space. We can't give it to you preconfigured, but we can show you how to do it. Here are the two most important knobs for heap space.

    • Xms sets the minimum heap space

    • Xmx sets the maximum heap space

    You can set these in IDEs like IntelliJ and Eclipse, as well as via the CLI like so:

    In , not a program argument. When you hit run in IntelliJ (the green button), that sets up a run-time configuration. IJ starts a Java VM for you with the configurations you specify.

    What’s the ideal amount to set Xmx to? That depends on how much RAM is on your computer. In general, allocate as much heap space as you think the JVM will need to get work done. Let’s say you’re on a 16G RAM laptop — allocate 8G of RAM to the JVM. A sound minimum on laptops with less RAM would be 3g, so

    It may seem counterintuitive, but you want the min and max to be the same; i.e. Xms should equal Xmx. If they are unequal, the JVM will progressively allocate more memory as needed until it reaches the max, and that process of gradual allocation slows things down. You want to pre-allocate it at the beginning. So

    IntelliJ will automatically specify the in question.

    Another way to do this is by setting your environmental variables. Here, you would alter your hidden .bash_profile file, which adds environmental variables to bash. To see those variables, enter env in the command line. To add more heap space, enter this command in your console:

    We need to increase heap space because Deeplearning4j loads data in the background, which means we're taking more RAM in memory. By allowing more heap space for the JVM, we can cache more data in memory.

    hashtag
    Garbage Collection

    A garbage collector is a program which runs on the JVM and gets rid of objects no longer used by a Java application. It is automatic memory management. Creating a new object in Java takes on-heap memory: A new Java object takes up 8 bytes of memory by default. So every new DatasetIterator you create takes another 8 bytes.

    You may need to alter the garbage collection algorithm that Java is using. This can be done via the command line like so:

    Better garbage collection increases throughput. For a more detailed exploration of the issue, please read this .

    DL4J is tightly linked to the garbage collector. , the bridge between the JVM and C++, adheres to the heap space you set with Xmx and works extensively with off-heap memory. The off-heap memory will not surpass the amount of heap space you specify.

    JavaCPP, created by a Skymind engineer, relies on the garbage collector to tell it what has been done. We rely on the Java GC to tell us what to collect; the Java GC points at things, and we know how to de-allocate them with JavaCPP. This applies equally to how we work with GPUs.

    The larger the batch size you use, the more RAM you’re taking in memory.

    hashtag
    ETL & Asynchronous ETL

    In our dl4j-examples repo, we don't make the ETL asynchronous, because the point of examples is to keep them simple. But for real-world problems, you need asynchronous ETL, and we'll show you how to do it with examples.

    Data is stored on disk and disk is slow. That’s the default. So you run into bottlenecks when loading data onto your hard drive. When optimizing throughput, the slowest component is always the bottleneck. For example, a distributed Spark job using three GPU workers and one CPU worker will have a bottleneck with the CPU. The GPUs have to wait for that CPU to finish.

    The Deeplearning4j class DatasetIterator hides the complexity of loading data on disk. The code for using any Datasetiterator will always be the same, invoking looks the same, but they work differently.

    • one loads from disk

    • one loads asynchronously

    • one loads pre-saved from RAM

    Here's how the DatasetIterator is uniformly invoked for MNIST:

    You can optimize by using an asynchronous loader in the background. Java can do real multi-threading. It can load data in the background while other threads take care of compute. So you load data into the GPU at the same time that compute is being run. The neural net trains even as you grab new data from memory.

    This is the , in particular the third line:

    There are actually two types of asynchronous dataset iterators. The AsyncDataSetIterator is what you would use most of the time. It's described in the .

    For special cases such as recurrent neural nets applied to time series, or for computation graphs, you would use a AsyncMultiDataSetIterator, described in the .

    Notice in the code above that prefetchSize is another parameter to set. Normal batch size might be 1000 examples, but if you set prefetchSize to 3, it would pre-fetch 3,000 instances.

    hashtag
    ETL: Comparing Python frameworks With Deeplearning4j

    In Python, programmers are converting their data into , or binary data objects. And if they're working with a smallish toy dataset, they're loading all those pickles into RAM. So they're effectively sidestepping a major task in dealing with larger datasets. At the same time, when benchmarking against Dl4j, they're not loading all the data onto RAM. So they're effectively comparing Dl4j speed for training computations + ETL against only training computation time for Python frameworks.

    But Java has robust tools for moving big data, and if compared correctly, is much faster than Python. The Deeplearning4j community has reported up to 3700% increases in speed over Python frameworks, when ETL and computation are optimized.

    Deeplearning4j uses DataVec as it ETL and vectorization library. Unlike other deep-learning tools, DataVec does not force a particular format on your dataset. (Caffe forces you to use , for example.)

    We try to be more flexible. That means you can point DL4J at raw photos, and it will load the image, run the transforms and put it into an NDArray to generate a dataset on the fly.

    But if your training pipeline is doing that every time, Deeplearning4j will seem about 10x slower than other frameworks, because you’re spending your time creating datasets. Every time you call fit, you're recreating a dataset, over and over again. We allow it to happen for ease of use, but we can show you how to speed things up. There are ways to make it just as fast.

    One way is to pre-save the datasets, in a manner similar to the Python frameworks. (Pickles are pre-formatted data.) When you pre-save the dataset, you create a separate class.

    Here’s how you .

    A Recordreaderdatasetiterator talks to Datavec and outputs datasets for DL4J.

    Here’s how you .

    Line 90 is where you see the asynchronous ETL. In this case, it's wrapping the pre-saved iterator, so you're taking advantage of both methods, with the asynch loading the pre-saved data in the background as the net trains.

    hashtag
    MKL and Inference on CPUs

    If you are running inference benchmarks on CPUs, make sure you are using Deeplearning4j with Intel's MKL library, which is available via a clickwrap; i.e. Deeplearning4j does not bundle MKL like Anaconda, which is used by libraries like PyTorch.

    Loss

    hashtag
    absoluteDifference

    Absolute difference loss: sum_i abs( label[i] - predictions[i] )

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

    hashtag
    cosineDistance

    Cosine distance loss: 1 - cosineSimilarity(x,y) or 1 - sum_i label[i] * prediction[i], which is

    equivalent to cosine distance when both the predictions and labels are normalized. Note: This loss function assumes that both the predictions and labels are normalized to have unit l2 norm.

    If this is not the case, you should normalize them first by dividing by norm2(String, SDVariable, boolean, int...)

    along the cosine distance dimension (with keepDims=true).

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is use

    hashtag
    hingeLoss

    Hinge loss: a loss function used for training classifiers.

    Implements L = max(0, 1 - t * predictions) where t is the label values after internally converting to {-1,1`

    from the user specified {0,1. Note that Labels should be provided with values {0,1.

    • label (NUMERIC) - Label array. Each value should be 0.0 or 1.0 (internally -1 to 1 is used)

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    huberLoss

    Huber loss function, used for robust regression. It is similar both squared error loss and absolute difference loss,

    though is less sensitive to outliers than squared error. Huber loss implements:

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    l2Loss

    L2 loss: 1/2 * sum(x^2)

    • var (NUMERIC) - Variable to calculate L2 loss of

    hashtag
    logLoss

    Log loss, i.e., binary cross entropy loss, usually used for binary multi-label classification. Implements:

    -1/numExamples * sum_i (labels[i] * log(predictions[i] + epsilon) + (1-labels[i]) * log(1-predictions[i] + epsilon))

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    logPoisson

    Log poisson loss: a loss function used for training classifiers.

    Implements L = exp(c) - z * c where c is log(predictions) and z is labels.

    • label (NUMERIC) - Label array. Each value should be 0.0 or 1.0

    • predictions (NUMERIC) - Predictions array (has to be log(x) of actual predictions)

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    meanPairwiseSquaredError

    Mean pairwise squared error. MPWSE loss calculates the difference between pairs of consecutive elements in the predictions and labels arrays.

    For example, if predictions = [p0, p1, p2] and labels are [l0, l1, l2] then MPWSE is:

    {@code [((p0-p1) - (l0-l1))^2 + ((p0-p2) - (l0-l2))^2 + ((p1-p2) - (l1-l2))^2] / 3}

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used. Must be either null, scalar, or have shape [batchSize]

    hashtag
    meanSquaredError

    Mean squared error loss function. Implements (label[i] - prediction[i])^2 - i.e., squared error on a per-element basis.

    When averaged (using LossReduce#MEAN_BY_WEIGHT or LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT (the default))

    this is the mean squared error loss function.

    • label (NUMERIC) - Label array

    • predictions (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    sigmoidCrossEntropy

    Sigmoid cross entropy: applies the sigmoid activation function on the input logits (input "pre-sigmoid preductions")

    and implements the binary cross entropy loss function. This implementation is numerically more stable than using

    standard (but separate) sigmoid activation function and log loss (binary cross entropy) loss function. Implements:

    -1/numExamples * sum_i (labels[i] * log(sigmoid(logits[i])) + (1-labels[i]) * log(1-sigmoid(logits[i])))

    though this is done in a mathematically equivalent but more numerical stable form. When label smoothing is > 0, the following label smoothing is used:

    • label (NUMERIC) - Label array

    • predictionLogits (NUMERIC) - Predictions array

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    softmaxCrossEntropy

    Applies the softmax activation function to the input, then implement multi-class cross entropy: {@code -sum_classes label[i] * log(p[c])} where {@code p = softmax(logits)} If LossReduce#NONE is used, returned shape is [numExamples] out for [numExamples, numClasses] predicitons/labels;

    otherwise, the output is a scalar.

    When label smoothing is > 0, the following label smoothing is used:

    • oneHotLabels (NUMERIC) - Label array. Should be one-hot per example and same shape as predictions (for example, [mb, nOut])

    • logitPredictions (NUMERIC) - Predictions array (pre-softmax)

    • weights (NUMERIC) - Weights array. May be null. If null, a weight of 1.0 is used

    hashtag
    sparseSoftmaxCrossEntropy

    As per softmaxCrossEntropy(String, SDVariable, SDVariable, LossReduce) but the labels variable

    is represented as an integer array instead of the equivalent one-hot array. i.e., if logits are rank N, then labels have rank N-1

    • logits (NUMERIC) - Logits array ("pre-softmax activations")

    • labels (INT) - Labels array. Must be an integer type.

    hashtag
    weightedCrossEntropyWithLogits

    Weighted cross entropy loss with logits

    • targets (NUMERIC) - targets array

    • inputs (NUMERIC) - input array

    • weights (NUMERIC) - eights array. May be null. If null, a weight of 1.0 is used

    RNN

    hashtag
    Operation classes

    hashtag
    gru

    The GRU operation. Gated Recurrent Unit - Cho et al. 2014.

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-ui</artifactId>
            <version>{{ page.version }}</version>
        </dependency>
        //Initialize the user interface backend
        UIServer uiServer = UIServer.getInstance();
    
        //Configure where the network information (gradients, score vs. time etc) is to be stored. Here: store in memory.
        StatsStorage statsStorage = new InMemoryStatsStorage();         //Alternative: new FileStatsStorage(File), for saving and loading later
    
        //Attach the StatsStorage instance to the UI: this allows the contents of the StatsStorage to be visualized
        uiServer.attach(statsStorage);
    
        //Then add the StatsListener to collect this information from the network, as it trains
        net.setListeners(new StatsListener(statsStorage));
        SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
    
        StatsStorage ss = new FileStatsStorage(new File("myNetworkTrainingStats.dl4j"));
        sparkNet.setListeners(ss, Collections.singletonList(new StatsListener(null)));
        StatsStorage statsStorage = new FileStatsStorage(statsFile);    //If file already exists: load the data from it
        UIServer uiServer = UIServer.getInstance();
        uiServer.attach(statsStorage);
        UIServer uiServer = UIServer.getInstance();
        uiServer.enableRemoteListener();        //Necessary: remote support is not enabled by default
        SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
    
        StatsStorageRouter remoteUIRouter = new RemoteUIStatsStorageRouter("http://UI_MACHINE_IP:9000");
        sparkNet.setListeners(remoteUIRouter, Collections.singletonList(new StatsListener(null)));
    log.info("Plot TSNE....");
    BarnesHutTsne tsne = new BarnesHutTsne.Builder()
            .setMaxIter(1000)
            .stopLyingIteration(250)
            .learningRate(500)
            .useAdaGrad(false)
            .theta(0.5)
            .setMomentum(0.5)
            .normalize(true)
            .usePca(false)
            .build();
    vec.lookupTable().plotVocab(tsne);
    com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'play.crypto.provider'
            at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
            at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:170)
            ...
            at play.server.Server.forRouter(Server.java:96)
            at org.deeplearning4j.ui.play.PlayUIServer.runMain(PlayUIServer.java:206)
            at org.deeplearning4j.ui.api.UIServer.getInstance(UIServer.java:27)
        <build>
            <plugins>
                <plugin>
                    <groupId>org.codehaus.mojo</groupId>
                    <artifactId>exec-maven-plugin</artifactId>
                    <version>${exec-maven-plugin.version}</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>exec</goal>
                            </goals>
                        </execution>
                    </executions>
                    <configuration>
                        <executable>java</executable>
                    </configuration>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>${maven-shade-plugin.version}</version>
                    <configuration>
                        <shadedArtifactAttached>true</shadedArtifactAttached>
                        <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
                        <createDependencyReducedPom>true</createDependencyReducedPom>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <!--<exclude>org/datanucleus/**</exclude>-->
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
    
                    </configuration>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.conf</resource>
                                    </transformer>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            <plugins>
        <build>
    public DifferentialFunctionFactory f() {
        return sameDiff.f();
    }
    public SDVariable sum(...) {
        return new Sum(...).outputVariables()[0];
    }
    @Override
    public List<SDVariable> doDiff(List<SDVariable> grad) {
        ...
        return Arrays.asList(f().x_bp(...));
    }
    public SDVariable add(double sameDiffVariable) {
        return add(sameDiff.generateNewVarName(new AddOp().opName(),0),sameDiffVariable);
    }
    public SDVariable cross(SDVariable a, SDVariable b) {
        return cross(null, a, b);
    }
    
    public SDVariable cross(String name, SDVariable a, SDVariable b) {
        SDVariable ret = f().cross(a, b);
        return updateVariableNameAndReference(ret, name);
    }
    SameDiff sd = SameDiff.create();
    
    INDArray inArr = Nd4j.linspace(1, n, n).reshape(inOrder, d0, d1, d2);
    INDArray inMul2Exp = inArr.mul(2);
    
    SDVariable in = sd.var("in", inArr);
    SDVariable inMul2 = in.mul(2.0);
    
    sd.exec();
    INDArray absoluteDifference(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce)
    INDArray absoluteDifference(INDArray label, INDArray predictions, INDArray weights)
    
    SDVariable absoluteDifference(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable absoluteDifference(SDVariable label, SDVariable predictions, SDVariable weights)
    SDVariable absoluteDifference(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable absoluteDifference(String name, SDVariable label, SDVariable predictions, SDVariable weights)
    herearrow-up-right
    names for importarrow-up-right
    implementedarrow-up-right
    override mappingsForFunctionarrow-up-right
    .cacheMode(CacheMode.DEVICE)
    )

    If you aren't sure if you are only measuring what you intend to measure when running DL4J or ND4J code, you can use a profiler such as VisualVM or YourKit Profilers.

  • What versions are you using? When benchmarking, you should use the latest version of whatever libraries you are benchmarking. There's no point identifying and reporting a bottleneck that was fixed 6 months ago. An exception to this would be when you are comparing performance over time between versions. Note also that snapshot versions of DL4J and ND4J are also available - these may contain performance improvements (feel free to ask)

  • Watch out for ETL bottlenecks. You can add PerformanceListener to your network training to see if ETL is a bottleneck.

  • Don't forget that performance is dependent on minibatch sizes. Don't benchmark with minibatch size 1 - use something more realistic.

  • If you need multi-GPU training or inference support, use ParallelWrapper or ParallelInference.

  • Don't forget that CuDNN is configurable: you can specify DL4J/CuDNN to prefer performance - at the expense of memory - using .cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST) configuration on convolution layers

  • When using GPUs, multiples of 8 (or 32) for input sizes and layer sizes may perform better.

  • When using RNNs (and manually creating INDArrays), use 'f' ordered arrays for both features and (RnnOutputLayer) labels. Otherwise, use 'c' ordered arrays. This is for faster memory access.

  • Running only a single (or too few) iterations, or not reporting mean, standard deviation and number of iterations
  • Not configuring workspaces, garbage collection, etc

  • Running only one possible case - for example, benchmarking a single set of array dimensions/orders when benchmarking BLAS operations

  • Running unusually small inputs - for example, minibatch size 1 on a GPU (which might be slower - but isn't realistic!)

  • Not measuring exactly - and only - what you claim to be measuring (for example, not accounting for array allocation, initialization or garbage collection time)

  • Not making your benchmarks reproducible (does the benchmark conclusion generalize? are there problems with the benchmark? what can we do to fix it?)

  • Comparing results across different hardware, not accounting for differences (for example, testing on one machine with AVX2 support, and on another without)

  • Not asking the devs (via Discoursearrow-up-right - we are happy to provide suggestions and investigate if performance isn't where it should be!

  • https://community.konduit.ai/c/dl4jarrow-up-right
    herearrow-up-right
    this Wikipedia pagearrow-up-right
    link
    Workspaces
    Memory
    heap spacearrow-up-right
    IntelliJ, this is a VM parameterarrow-up-right
    Java main classarrow-up-right
    InfoQ articlearrow-up-right
    JavaCPParrow-up-right
    relevant codearrow-up-right
    Javadoc herearrow-up-right
    Javadoc herearrow-up-right
    picklesarrow-up-right
    hdf5arrow-up-right
    pre-save datasetsarrow-up-right
    load a pre-saved datasetarrow-up-right

    lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • dimension - Dimension to perform the cosine distance over

  • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

    lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • delta - Loss function delta value

  • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • epsilon - epsilon - default = 0.0

  • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • full - Boolean flag. true for logPoissonFull, false for logPoisson

  • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

    lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

    lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • labelSmoothing - Label smoothing value. Default value: 0 - default = 0.0

  • lossReduce - Reduction type for the loss. See LossReduce for more details. Default: LossReduce#MEAN_BY_NONZERO_WEIGHT_COUNT - default = LossReduce.MEAN_BY_NONZERO_WEIGHT_COUNT

  • labelSmoothing - Label smoothing value. Default value: 0 - default = 0.0

  • double x = 0;
    //<start timing>
    x += 1.0;
    //<end timing>
    INDArray x = Nd4j.create(1);
    //<start timing>
    x.addi(1.0);
    //<end timing>
    14:17:34,169 INFO  ~ Loaded [CpuBackend] backend
    14:17:34,672 INFO  ~ Number of threads used for NativeOps: 8
    14:17:34,823 INFO  ~ Number of threads used for BLAS: 8
    14:17:34,831 INFO  ~ Backend used: [CPU]; OS: [Windows 10]
    14:17:34,831 INFO  ~ Cores: [16]; Memory: [7.1GB];
    14:17:34,831 INFO  ~ Blas vendor: [OPENBLAS]
        java -Xms256m -Xmx1024m YourClassNameHere
        java -Xmx3g
        java -Xms3g -Xmx3g YourClassNameHere
        echo "export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=512m"" > ~/.bash_profile
        java -XX:+UseG1GC
            while(mnistTest.hasNext()){
                    DataSet ds = mnistTest.next();
                    INDArray output = model.output(ds.getFeatures(), false);
                    eval.eval(ds.getLabels(), output);
            }
        MultiDataSetIterator iterator;
        if (prefetchSize > 0 && source.asyncSupported()) {
            iterator = new AsyncMultiDataSetIterator(source, prefetchSize);
        } else iterator = source;
    INDArray cosineDistance(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce, int dimension)
    INDArray cosineDistance(INDArray label, INDArray predictions, INDArray weights, int dimension)
    
    SDVariable cosineDistance(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, int dimension)
    SDVariable cosineDistance(SDVariable label, SDVariable predictions, SDVariable weights, int dimension)
    SDVariable cosineDistance(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, int dimension)
    SDVariable cosineDistance(String name, SDVariable label, SDVariable predictions, SDVariable weights, int dimension)
    INDArray hingeLoss(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce)
    INDArray hingeLoss(INDArray label, INDArray predictions, INDArray weights)
    
    SDVariable hingeLoss(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable hingeLoss(SDVariable label, SDVariable predictions, SDVariable weights)
    SDVariable hingeLoss(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable hingeLoss(String name, SDVariable label, SDVariable predictions, SDVariable weights)
    INDArray huberLoss(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce, double delta)
    INDArray huberLoss(INDArray label, INDArray predictions, INDArray weights, double delta)
    
    SDVariable huberLoss(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, double delta)
    SDVariable huberLoss(SDVariable label, SDVariable predictions, SDVariable weights, double delta)
    SDVariable huberLoss(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, double delta)
    SDVariable huberLoss(String name, SDVariable label, SDVariable predictions, SDVariable weights, double delta)
    
    
    `L = 0.5 * (label[i] - predictions[i])^2 if abs(label[i] - predictions[i]) < delta`
    
    `L = delta * abs(label[i] - predictions[i]) - 0.5 * delta^2 otherwise`
    
    INDArray l2Loss(INDArray var)
    
    SDVariable l2Loss(SDVariable var)
    SDVariable l2Loss(String name, SDVariable var)
    INDArray logLoss(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce, double epsilon)
    INDArray logLoss(INDArray label, INDArray predictions)
    
    SDVariable logLoss(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, double epsilon)
    SDVariable logLoss(SDVariable label, SDVariable predictions)
    SDVariable logLoss(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, double epsilon)
    SDVariable logLoss(String name, SDVariable label, SDVariable predictions)
    INDArray logPoisson(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce, boolean full)
    INDArray logPoisson(INDArray label, INDArray predictions, INDArray weights, boolean full)
    
    SDVariable logPoisson(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, boolean full)
    SDVariable logPoisson(SDVariable label, SDVariable predictions, SDVariable weights, boolean full)
    SDVariable logPoisson(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce, boolean full)
    SDVariable logPoisson(String name, SDVariable label, SDVariable predictions, SDVariable weights, boolean full)
    INDArray meanPairwiseSquaredError(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce)
    INDArray meanPairwiseSquaredError(INDArray label, INDArray predictions, INDArray weights)
    
    SDVariable meanPairwiseSquaredError(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable meanPairwiseSquaredError(SDVariable label, SDVariable predictions, SDVariable weights)
    SDVariable meanPairwiseSquaredError(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable meanPairwiseSquaredError(String name, SDVariable label, SDVariable predictions, SDVariable weights)
    INDArray meanSquaredError(INDArray label, INDArray predictions, INDArray weights, LossReduce lossReduce)
    INDArray meanSquaredError(INDArray label, INDArray predictions, INDArray weights)
    
    SDVariable meanSquaredError(SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable meanSquaredError(SDVariable label, SDVariable predictions, SDVariable weights)
    SDVariable meanSquaredError(String name, SDVariable label, SDVariable predictions, SDVariable weights, LossReduce lossReduce)
    SDVariable meanSquaredError(String name, SDVariable label, SDVariable predictions, SDVariable weights)
    INDArray sigmoidCrossEntropy(INDArray label, INDArray predictionLogits, INDArray weights, LossReduce lossReduce, double labelSmoothing)
    INDArray sigmoidCrossEntropy(INDArray label, INDArray predictionLogits, INDArray weights)
    
    SDVariable sigmoidCrossEntropy(SDVariable label, SDVariable predictionLogits, SDVariable weights, LossReduce lossReduce, double labelSmoothing)
    SDVariable sigmoidCrossEntropy(SDVariable label, SDVariable predictionLogits, SDVariable weights)
    SDVariable sigmoidCrossEntropy(String name, SDVariable label, SDVariable predictionLogits, SDVariable weights, LossReduce lossReduce, double labelSmoothing)
    SDVariable sigmoidCrossEntropy(String name, SDVariable label, SDVariable predictionLogits, SDVariable weights)
    
    
    `numClasses = labels.size(1);
    
    label = (1.0 - labelSmoothing) * label + 0.5 * labelSmoothing`
    
    INDArray softmaxCrossEntropy(INDArray oneHotLabels, INDArray logitPredictions, INDArray weights, LossReduce lossReduce, double labelSmoothing)
    INDArray softmaxCrossEntropy(INDArray oneHotLabels, INDArray logitPredictions, INDArray weights)
    
    SDVariable softmaxCrossEntropy(SDVariable oneHotLabels, SDVariable logitPredictions, SDVariable weights, LossReduce lossReduce, double labelSmoothing)
    SDVariable softmaxCrossEntropy(SDVariable oneHotLabels, SDVariable logitPredictions, SDVariable weights)
    SDVariable softmaxCrossEntropy(String name, SDVariable oneHotLabels, SDVariable logitPredictions, SDVariable weights, LossReduce lossReduce, double labelSmoothing)
    SDVariable softmaxCrossEntropy(String name, SDVariable oneHotLabels, SDVariable logitPredictions, SDVariable weights)
    
    
    `numClasses = labels.size(1);
    
    oneHotLabel = (1.0 - labelSmoothing) * oneHotLabels + labelSmoothing/numClasses`
    
    INDArray sparseSoftmaxCrossEntropy(INDArray logits, INDArray labels)
    
    SDVariable sparseSoftmaxCrossEntropy(SDVariable logits, SDVariable labels)
    SDVariable sparseSoftmaxCrossEntropy(String name, SDVariable logits, SDVariable labels)
    INDArray weightedCrossEntropyWithLogits(INDArray targets, INDArray inputs, INDArray weights)
    
    SDVariable weightedCrossEntropyWithLogits(SDVariable targets, SDVariable inputs, SDVariable weights)
    SDVariable weightedCrossEntropyWithLogits(String name, SDVariable targets, SDVariable inputs, SDVariable weights)
    • x (NUMERIC) - input [time, bS, nIn]

    • hLast (NUMERIC) - initial cell output (at time step = 0) [bS, nOut]

    • Wx (NUMERIC) - input-to-hidden weights, [nIn, 3*nOut]

    • Wh (NUMERIC) - hidden-to-hidden weights, [nOut, 3*nOut]

    • biases (NUMERIC) - biases, [3*nOut]

    hashtag
    gruCell

    The GRU cell. Does a single time step operation

    • x (NUMERIC) - Input, with shape [batchSize, inSize]

    • hLast (NUMERIC) - Output of the previous cell/time step, with shape [batchSize, numUnits]

    • GRUWeights - see GRUWeights

    hashtag
    lstmCell

    The LSTM cell. Does a single time step operation.

    • x (NUMERIC) - Input, with shape [batchSize, inSize]

    • cLast (NUMERIC) - Previous cell state, with shape [batchSize, numUnits]

    • yLast (NUMERIC) - revious cell output, with shape [batchSize, numUnits]

    • LSTMWeights - see

    • LSTMConfiguration - see

    hashtag
    lstmLayer

    Long Short-Term Memory layer - Hochreiter 1997.

    SUPPORTS following data formats:

    for unidirectional:

    TNS: shapes [timeLength, numExamples, inOutSize]

    NST: shapes [numExamples, inOutSize, timeLength]

    NTS: shapes [numExamples, timeLength, inOutSize]

    for bidirectional:

    T2NS: shapes [timeLength, 2, numExamples, inOutSize] (for ONNX)

    SUPPORTS following direction modes:

    FWD: forward

    BWD: backward

    BIDIR_SUM: bidirectional sum

    BIDIR_CONCAT: bidirectional concat

    BIDIR_EXTRA_DIM: bidirectional extra output dim (in conjunction with format dataFormat - T2NS)

    You may use different gate configurations:

    specify gate/cell/out aplha/beta and numbers of activations for gate/cell/out described in activations enum

    ("RELU","SIGMOID","AFFINE","LEAKY_RELU","THRESHHOLD_RELU","SCALED_TAHN","HARD_SIGMOID","ELU","SOFTSIGN","SOFTPLUS")

    Also this layer supports MKLDNN (DNNL) and cuDNN acceleration

    • x (NUMERIC) - Input, with shape dependent on the data format (in config).

    • cLast (NUMERIC) - Previous/initial cell state, with shape [batchSize, numUnits]

    • yLast (NUMERIC) - Previous/initial cell output, with shape [batchSize, numUnits]

    • maxTSLength (NUMERIC) - maxTSLength with shape [batchSize]

    • LSTMLayerWeights - see

    • LSTMLayerConfig - see

    hashtag
    lstmblock

    The LSTM block

    • maxTSLength (NUMERIC) -

    • x (NUMERIC) - Input, with shape dependent on the data format (in config).

    • cLast (NUMERIC) - Previous/initial cell state, with shape [batchSize, numUnits]

    • yLast (NUMERIC) - Previous/initial cell output, with shape [batchSize, numUnits]

    • LSTMWeights - see

    • LSTMConfiguration - see

    hashtag
    sru

    The SRU layer. Does a single time step operation.

    • x (NUMERIC) - Input, with shape [batchSize, inSize]

    • initialC (NUMERIC) - Initial cell state, with shape [batchSize, inSize]

    • mask (NUMERIC) - An optional dropout mask, with shape [batchSize, inSize]

    • SRUWeights - see

    hashtag
    sruCell

    The SRU layer. Does a single time step operation.

    • x (NUMERIC) - Input, with shape [batchSize, inSize]

    • cLast (NUMERIC) - Previous cell state, with shape [batchSize, inSize]

    • SRUWeights - see SRUWeights

    hashtag
    Configuration Classes

    hashtag
    LSTMConfiguration

    • RnnDataFormat (ENUM) - The data format of the input. Input shape depends on data format (in config):

      TNS -> [timeSteps, batchSize, inSize]

      NST -> [batchSize, inSize, timeSteps]

      NTS -> [batchSize, timeSteps, inSize]

    • peepHole (BOOL) - Whether to provide peephole connections

    • forgetBias (NUMERIC) - The bias added to forget gates in order to reduce the scale of forgetting in the beginning of the training.

    • clippingCellValue (NUMERIC) - The bias added to forget gates in order to reduce the scale of forgetting in the beginning of the training.

    Used in these ops: lstmCell lstmblock

    hashtag
    LSTMLayerConfig

    • LSTMDataFormat (ENUM) - for unidirectional: TNS: shape [timeLength, numExamples, inOutSize] - sometimes referred to as "time major"

      NST: shape [numExamples, inOutSize, timeLength]

      NTS: shape [numExamples, timeLength, inOutSize] - TF "time_major=false" layout for bidirectional:

      T2NS: 3 = [timeLength, 2, numExamples, inOutSize] (for ONNX)

    • LSTMDirectionMode (ENUM) - direction

      FWD: 0 = fwd

      BWD: 1 = bwd

      BIDIR_SUM: 2 = bidirectional sum

      BIDIR_CONCAT: 3 = bidirectional concat

      BIDIR_EXTRA_DIM: 4 = bidirectional extra output dim (in conjunction with format dataFormat = 3)

    • gateAct (ENUM) - Activations

    • cellAct (ENUM) - Activations

    • outAct (ENUM) - Activations

    • retFullSequence (BOOL) - indicates whether to return whole time sequence h {h_0, h_1, ... , h_sL-1} - default = true

    • retLastH (BOOL) - indicates whether to return output at last time step only,

      in this case shape would be [bS, nOut] (exact shape depends on dataFormat argument) - default = false

    • retLastC (BOOL) - indicates whether to return cells state at last time step only,

      in this case shape would be [bS, nOut] (exact shape depends on dataFormat argument) - default = false

    • cellClip (NUMERIC) - Cell clipping value, if it = 0 then do not apply clipping - default = 0.0

    • gateAlpha (NUMERIC) - null - default = 0.0

    • gateBeta (NUMERIC) - null - default = 0.0

    • cellAlpha (NUMERIC) - null - default = 0.0

    • cellBeta (NUMERIC) - null - default = 0.0

    • outAlpha (NUMERIC) - null - default = 0.0

    • outBeta (NUMERIC) - null - default = 0.0

    Used in these ops: lstmLayer

    hashtag
    GRUWeights

    • ruWeight- null (NUMERIC type)

    • cWeight- null (NUMERIC type)

    • ruBias- null (NUMERIC type)

    • cBias- null (NUMERIC type)

    Used in these ops: gruCell

    hashtag
    SRUWeights

    • weights- null (NUMERIC type)

    • bias- null (NUMERIC type)

    Used in these ops: sru sruCell

    hashtag
    LSTMWeights

    • ruWeight- null (NUMERIC type)

    • inputPeepholeWeights- null (NUMERIC type)

    • forgetPeepholeWeights- null (NUMERIC type)

    • outputPeepholeWeights- null (NUMERIC type)

    • bias- null (NUMERIC type)

    Used in these ops: lstmCell lstmblock

    hashtag
    LSTMLayerWeights

    • inputWeights- input weights Wx:

      1) shapes [nIn, 4*nOut] for FWD,BWD 2) shapes [2, nIn, 4*nOut] BIDIR_SUM, BIDIR_CONCAT and BIDIR_EXTRA_DIM (NUMERIC type)

    • recurrentWeights- recurrent weights Wr:

      1) shapes [nIn, 4*nOut] for FWD, BWD 2) shapes [2, nIn, 4*nOut] BIDIR_SUM, BIDIR_CONCAT and BIDIR_EXTRA_DIM (NUMERIC type)

    • biases- biases

      1) shapes [4*nOut] for FWD, BWD 2) shapes [2, 4*nOut] for BIDIR_SUM, BIDIR_CONCAT and BIDIR_EXTRA_DIM (NUMERIC type)

    • peepholeWeights- peephole weights Wp:

      1) [3*nOut] when directionMode < 2

      2) [2, 3*nOut] when directionMode >= 2 (NUMERIC type)

    Used in these ops: lstmLayer

    A Discriminative Feature Learning Approach for Deep Face Recognitionarrow-up-right
    OpenFace NN4.Small2arrow-up-right
    INDArray gru(INDArray x, INDArray hLast, INDArray Wx, INDArray Wh, INDArray biases)
    
    SDVariable gru(SDVariable x, SDVariable hLast, SDVariable Wx, SDVariable Wh, SDVariable biases)
    SDVariable gru(String name, SDVariable x, SDVariable hLast, SDVariable Wx, SDVariable Wh, SDVariable biases)
    INDArray[] gruCell(INDArray x, INDArray hLast, GRUWeights gRUWeights)
    
    SDVariable[] gruCell(SDVariable x, SDVariable hLast, GRUWeights gRUWeights)
    SDVariable[] gruCell(String name, SDVariable x, SDVariable hLast, GRUWeights gRUWeights)
    INDArray[] lstmCell(INDArray x, INDArray cLast, INDArray yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    
    SDVariable[] lstmCell(SDVariable x, SDVariable cLast, SDVariable yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    SDVariable[] lstmCell(String name, SDVariable x, SDVariable cLast, SDVariable yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    INDArray[] lstmLayer(INDArray x, INDArray cLast, INDArray yLast, INDArray maxTSLength, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    INDArray[] lstmLayer(INDArray x, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    
    SDVariable[] lstmLayer(SDVariable x, SDVariable cLast, SDVariable yLast, SDVariable maxTSLength, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    SDVariable[] lstmLayer(SDVariable x, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    SDVariable[] lstmLayer(String name, SDVariable x, SDVariable cLast, SDVariable yLast, SDVariable maxTSLength, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    SDVariable[] lstmLayer(String name, SDVariable x, LSTMLayerWeights lSTMLayerWeights, LSTMLayerConfig lSTMLayerConfig)
    INDArray lstmblock(INDArray maxTSLength, INDArray x, INDArray cLast, INDArray yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    INDArray lstmblock(INDArray x, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    
    SDVariable lstmblock(SDVariable maxTSLength, SDVariable x, SDVariable cLast, SDVariable yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    SDVariable lstmblock(SDVariable x, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    SDVariable lstmblock(String name, SDVariable maxTSLength, SDVariable x, SDVariable cLast, SDVariable yLast, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    SDVariable lstmblock(String name, SDVariable x, LSTMWeights lSTMWeights, LSTMConfiguration lSTMConfiguration)
    INDArray sru(INDArray x, INDArray initialC, INDArray mask, SRUWeights sRUWeights)
    INDArray sru(INDArray x, INDArray initialC, SRUWeights sRUWeights)
    
    SDVariable sru(SDVariable x, SDVariable initialC, SDVariable mask, SRUWeights sRUWeights)
    SDVariable sru(SDVariable x, SDVariable initialC, SRUWeights sRUWeights)
    SDVariable sru(String name, SDVariable x, SDVariable initialC, SDVariable mask, SRUWeights sRUWeights)
    SDVariable sru(String name, SDVariable x, SDVariable initialC, SRUWeights sRUWeights)
    INDArray sruCell(INDArray x, INDArray cLast, SRUWeights sRUWeights)
    
    SDVariable sruCell(SDVariable x, SDVariable cLast, SRUWeights sRUWeights)
    SDVariable sruCell(String name, SDVariable x, SDVariable cLast, SRUWeights sRUWeights)
    import org.datavec.image.loader.LFWLoader
    // import org.deeplearning4j.zoo.model.FaceNetNN4Small2
    import org.deeplearning4j.zoo.model.helper.FaceNetHelper;
    import org.deeplearning4j.zoo._
    import org.deeplearning4j.nn.graph.ComputationGraph
    import org.deeplearning4j.nn.conf._
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    import org.deeplearning4j.datasets.iterator.impl.LFWDataSetIterator
    import org.deeplearning4j.nn.transferlearning.TransferLearning
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config.Adam
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.nn.conf.layers._
    import org.deeplearning4j.nn.conf.graph.L2NormalizeVertex
    import org.deeplearning4j.nn.conf.graph.MergeVertex
    import org.nd4j.linalg.lossfunctions.LossFunctions
    import org.deeplearning4j.nn.conf.inputs.InputType
    import org.deeplearning4j.nn.conf.WorkspaceMode
    import org.nd4j.evaluation.classification.Evaluation
    
    
    import scala.collection.JavaConversions._
    import scala.collection.JavaConverters._
    import java.util.Random
    val batchSize = 48 // depending on your hardware, you will want to increase or decrease
    val numExamples = LFWLoader.NUM_IMAGES
    val outputNum = LFWLoader.NUM_LABELS // number of "identities" in the dataset
    val splitTrainTest = 1.0
    val randomSeed = 123;
    val iterations = 1; // this is almost always 1
    val transferFunction = Activation.RELU
    val inputShape = Array[Int](3,96,96)
    
    // val zooModel = new FaceNetNN4Small2(outputNum, randomSeed, iterations)
    // val net = zooModel.init().asInstanceOf[ComputationGraph]
    
    def graphConf(): ComputationGraphConfiguration = {
        val embeddingSize = 128
        
        val graph = new NeuralNetConfiguration.Builder()
        .seed(randomSeed)
        .activation(Activation.IDENTITY)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(new Adam(0.1, 0.9, 0.999, 0.01))
        .weightInit(WeightInit.RELU)
        .l2(5e-5)
        .convolutionMode(ConvolutionMode.Same)
        .inferenceWorkspaceMode(WorkspaceMode.SEPARATE)
        .trainingWorkspaceMode(WorkspaceMode.SEPARATE)
        .graphBuilder
        
        graph
        .addInputs("input1")
        .addLayer("stem-cnn1", new ConvolutionLayer.Builder(Array[Int](7,7), Array[Int](2,2), Array[Int](3,3)).nIn(inputShape(0)).nOut(64).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "input1")
        .addLayer("stem-batch1", new BatchNormalization.Builder(false).nIn(64).nOut(64).build, "stem-cnn1").addLayer("stem-activation1", new ActivationLayer.Builder().activation(Activation.RELU).build, "stem-batch1")
        .addLayer("stem-pool1", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, Array[Int](3, 3), Array[Int](2, 2), Array[Int](1, 1)).build, "stem-activation1")
        .addLayer("stem-lrn1", new LocalResponseNormalization.Builder(1, 5, 1e-4, 0.75).build, "stem-pool1")
        .addLayer("inception-2-cnn1", new ConvolutionLayer.Builder(1,1).nIn(64).nOut(64).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "stem-lrn1")
        .addLayer("inception-2-batch1", new BatchNormalization.Builder(false).nIn(64).nOut(64).build, "inception-2-cnn1")
        .addLayer("inception-2-activation1", new ActivationLayer.Builder().activation(Activation.RELU).build, "inception-2-batch1")
        .addLayer("inception-2-cnn2", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](1, 1), Array[Int](1, 1)).nIn(64).nOut(192).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-2-activation1").addLayer("inception-2-batch2", new BatchNormalization.Builder(false).nIn(192).nOut(192).build, "inception-2-cnn2")
        .addLayer("inception-2-activation2", new ActivationLayer.Builder().activation(Activation.RELU).build, "inception-2-batch2")
        .addLayer("inception-2-lrn1", new LocalResponseNormalization.Builder(1, 5, 1e-4, 0.75).build, "inception-2-activation2")
        .addLayer("inception-2-pool1", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, Array[Int](3, 3), Array[Int](2, 2), Array[Int](1, 1)).build, "inception-2-lrn1")
    
        // Inception 3a
        FaceNetHelper.appendGraph(graph, "3a", 192, Array[Int](3, 5), Array[Int](1, 1), Array[Int](128, 32), Array[Int](96, 16, 32, 64), SubsamplingLayer.PoolingType.MAX, transferFunction, "inception-2-pool1")
    
        // Inception 3b
        FaceNetHelper.appendGraph(graph, "3b", 256, Array[Int](3, 5), Array[Int](1, 1), Array[Int](128, 64), Array[Int](96, 32, 64, 64), SubsamplingLayer.PoolingType.PNORM, 2, transferFunction, "inception-3a")
    
        // Inception 3c
        graph.addLayer("3c-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(320).nOut(128).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-3b").addLayer("3c-1x1-norm", FaceNetHelper.batchNorm(128, 128), "3c-1x1").addLayer("3c-transfer1", new ActivationLayer.Builder().activation(transferFunction).build, "3c-1x1-norm").addLayer("3c-3x3", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](2, 2)).nIn(128).nOut(256).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "3c-transfer1").addLayer("3c-3x3-norm", FaceNetHelper.batchNorm(256, 256), "3c-3x3").addLayer("3c-transfer2", new ActivationLayer.Builder().activation(transferFunction).build, "3c-3x3-norm").addLayer("3c-2-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(320).nOut(32).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-3b").addLayer("3c-2-1x1-norm", FaceNetHelper.batchNorm(32, 32), "3c-2-1x1").addLayer("3c-2-transfer3", new ActivationLayer.Builder().activation(transferFunction).build, "3c-2-1x1-norm").addLayer("3c-2-5x5", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](2, 2)).nIn(32).nOut(64).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "3c-2-transfer3").addLayer("3c-2-5x5-norm", FaceNetHelper.batchNorm(64, 64), "3c-2-5x5").addLayer("3c-2-transfer4", new ActivationLayer.Builder().activation(transferFunction).build, "3c-2-5x5-norm").addLayer("3c-pool", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, Array[Int](3, 3), Array[Int](2, 2), Array[Int](1, 1)).build, "inception-3b").addVertex("inception-3c", new MergeVertex, "3c-transfer2", "3c-2-transfer4", "3c-pool")
        
        // Inception 4a
        FaceNetHelper.appendGraph(graph, "4a", 640, Array[Int](3, 5), Array[Int](1, 1), Array[Int](192, 64), Array[Int](96, 32, 128, 256), SubsamplingLayer.PoolingType.PNORM, 2, transferFunction, "inception-3c")
    
        // Inception 4e
        graph
        .addLayer("4e-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(640).nOut(160).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-4a")
        .addLayer("4e-1x1-norm", FaceNetHelper.batchNorm(160, 160), "4e-1x1").addLayer("4e-transfer1", new ActivationLayer.Builder().activation(transferFunction).build, "4e-1x1-norm")
        .addLayer("4e-3x3", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](2, 2)).nIn(160).nOut(256).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "4e-transfer1")
        .addLayer("4e-3x3-norm", FaceNetHelper.batchNorm(256, 256), "4e-3x3").addLayer("4e-transfer2", new ActivationLayer.Builder().activation(transferFunction).build, "4e-3x3-norm")
        .addLayer("4e-2-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(640).nOut(64).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-4a")
        .addLayer("4e-2-1x1-norm", FaceNetHelper.batchNorm(64, 64), "4e-2-1x1").addLayer("4e-2-transfer3", new ActivationLayer.Builder().activation(transferFunction).build, "4e-2-1x1-norm")
        .addLayer("4e-2-5x5", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](2, 2)).nIn(64).nOut(128).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "4e-2-transfer3")
        .addLayer("4e-2-5x5-norm", FaceNetHelper.batchNorm(128, 128), "4e-2-5x5").addLayer("4e-2-transfer4", new ActivationLayer.Builder().activation(transferFunction).build, "4e-2-5x5-norm")
        .addLayer("4e-pool", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, Array[Int](3, 3), Array[Int](2, 2), Array[Int](1, 1)).build, "inception-4a")
        .addVertex("inception-4e", new MergeVertex, "4e-transfer2", "4e-2-transfer4", "4e-pool")
    
        // Inception 5a
        graph
        .addLayer("5a-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(1024).nOut(256).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-4e")
        .addLayer("5a-1x1-norm", FaceNetHelper.batchNorm(256, 256), "5a-1x1")
        .addLayer("5a-transfer1", new ActivationLayer.Builder().activation(transferFunction).build, "5a-1x1-norm")
        .addLayer("5a-2-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(1024).nOut(96).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-4e")
        .addLayer("5a-2-1x1-norm", FaceNetHelper.batchNorm(96, 96), "5a-2-1x1").addLayer("5a-2-transfer2", new ActivationLayer.Builder().activation(transferFunction).build, "5a-2-1x1-norm")
        .addLayer("5a-2-3x3", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](1, 1)).nIn(96).nOut(384).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "5a-2-transfer2")
        .addLayer("5a-2-3x3-norm", FaceNetHelper.batchNorm(384, 384), "5a-2-3x3").addLayer("5a-transfer3", new ActivationLayer.Builder().activation(transferFunction).build, "5a-2-3x3-norm").addLayer("5a-3-pool", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.PNORM, Array[Int](3, 3), Array[Int](1, 1)).pnorm(2).build, "inception-4e")
        .addLayer("5a-3-1x1reduce", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(1024).nOut(96).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "5a-3-pool")
        .addLayer("5a-3-1x1reduce-norm", FaceNetHelper.batchNorm(96, 96), "5a-3-1x1reduce").addLayer("5a-3-transfer4", new ActivationLayer.Builder().activation(Activation.RELU).build, "5a-3-1x1reduce-norm")
        .addVertex("inception-5a", new MergeVertex, "5a-transfer1", "5a-transfer3", "5a-3-transfer4")
        
        // Inception 5b
        graph
        .addLayer("5b-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(736).nOut(256).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-5a")
        .addLayer("5b-1x1-norm", FaceNetHelper.batchNorm(256, 256), "5b-1x1").addLayer("5b-transfer1", new ActivationLayer.Builder().activation(transferFunction).build, "5b-1x1-norm")
        .addLayer("5b-2-1x1", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(736).nOut(96).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "inception-5a")
        .addLayer("5b-2-1x1-norm", FaceNetHelper.batchNorm(96, 96), "5b-2-1x1").addLayer("5b-2-transfer2", new ActivationLayer.Builder().activation(transferFunction).build, "5b-2-1x1-norm")
        .addLayer("5b-2-3x3", new ConvolutionLayer.Builder(Array[Int](3, 3), Array[Int](1, 1)).nIn(96).nOut(384).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "5b-2-transfer2")
        .addLayer("5b-2-3x3-norm", FaceNetHelper.batchNorm(384, 384), "5b-2-3x3").addLayer("5b-2-transfer3", new ActivationLayer.Builder().activation(transferFunction).build, "5b-2-3x3-norm")
        .addLayer("5b-3-pool", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, Array[Int](3, 3), Array[Int](1, 1), Array[Int](1, 1)).build, "inception-5a")
        .addLayer("5b-3-1x1reduce", new ConvolutionLayer.Builder(Array[Int](1, 1), Array[Int](1, 1)).nIn(736).nOut(96).cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE).build, "5b-3-pool")
        .addLayer("5b-3-1x1reduce-norm", FaceNetHelper.batchNorm(96, 96), "5b-3-1x1reduce").addLayer("5b-3-transfer4", new ActivationLayer.Builder().activation(transferFunction).build, "5b-3-1x1reduce-norm").addVertex("inception-5b", new MergeVertex, "5b-transfer1", "5b-2-transfer3", "5b-3-transfer4")
        
        // output
        graph
        .addLayer("avgpool", new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.AVG, Array[Int](3, 3), Array[Int](3, 3)).build, "inception-5b")
        .addLayer("bottleneck", new DenseLayer.Builder().nIn(736).nOut(embeddingSize).activation(Activation.IDENTITY).build, "avgpool")
        .addVertex("embeddings", new L2NormalizeVertex(Array[Int](), 1e-6), "bottleneck")
        .addLayer("lossLayer", new CenterLossOutputLayer.Builder().lossFunction(LossFunctions.LossFunction.SQUARED_LOSS).activation(Activation.SOFTMAX).nIn(128).nOut(outputNum).lambda(1e-4).alpha(0.9).gradientNormalization(GradientNormalization.RenormalizeL2PerLayer).build, "embeddings")
        .setOutputs("lossLayer")
        .setInputTypes(InputType.convolutional(inputShape(2), inputShape(1), inputShape(0)))
        
        graph.build
    }
    
    val net = new ComputationGraph(graphConf())
    
    net.setListeners(new ScoreIterationListener(1))
    println(net.summary())
    val inputWHC = Array[Int](inputShape(2), inputShape(1), inputShape(0))
    
    val iter = new LFWDataSetIterator(batchSize, numExamples, inputWHC, outputNum, false, true, splitTrainTest, new Random(randomSeed))
    val nEpochs = 30
    (1 to nEpochs).foreach{ epoch =>
        // training
        net.fit(iter)
        println("Epoch " + epoch + " complete");
        
        // here you will want to pass an iterator that contains your test set
        // val eval = net.evaluate[Evaluation](testIter)
        // println(s"""Accuracy: ${eval.accuracy()} | Precision: ${eval.precision()} | Recall: ${eval.recall()}""")
    }
    // use the GraphBuilder when your network is a ComputationGraph
    val snipped = new TransferLearning.GraphBuilder(net)
        .setFeatureExtractor("embeddings") // the L2Normalize vertex and layers below are frozen
        .removeVertexAndConnections("lossLayer")
        .setOutputs("embeddings")
        .build()
        
    // grab a single example to test feed forward
    val ds = iter.next()
    
    // when you forward a batch of examples ("faces") through the graph, you'll get a compressed representation as a result
    val embedding = snipped.feedForward(ds.getFeatures(), false)
    LSTMWeights
    LSTMConfiguration
    LSTMLayerWeights
    LSTMLayerConfig
    LSTMWeights
    LSTMConfiguration
    SRUWeights

    Recurrent Neural Network

    Recurrent Neural Network (RNN) implementations in DL4J.

    This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.

    hashtag
    The Basics: Data and Network Configuration

    DL4J currently supports the following types of recurrent neural network

    • RNN ("vanilla" RNN)

    • LSTM (Long Short-Term Memory)

    Java documentation for each is available: , .

    hashtag
    Data for RNNs

    Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].

    Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.

    When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.

    hashtag
    RnnOutputLayer

    RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.

    Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:

    Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.

    hashtag
    RNN Training Features

    hashtag
    Truncated Back Propagation Through Time

    Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.

    Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).

    Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:

    For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.

    In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

    Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.

    The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.

    Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)

    The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.

    Some things of note:

    • By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).

    • The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.

    • The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)

    hashtag
    Masking: One-to-Many, Many-to-One, and Sequence Classification

    DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).

    Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:

    Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.

    The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).

    Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.

    Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.

    For a single example, the input and output masking arrays are shown below:

    For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.

    In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.

    hashtag
    Evaluation and Scoring with Masking

    Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.

    Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:

    where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.

    Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.

    hashtag
    Masking and Sequence Classification After Training

    Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).

    However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't actually label data anywhere except for the last time step.

    Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?

    To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:

    Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.

    The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).

    In this 'variable length' case, we need to get the last time step for each example separately. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the timeSeriesLength in the above code with the length of that example.

    If we don't have the lengths of the time series directly, we need to extract them from the mask array.

    If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):

    Alternatively, if we have only the features mask: One quick and dirty approach is to use this:

    To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).

    In either case, we can then do the following:

    hashtag
    Combining RNN Layers with Other Layer Types

    RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.

    Of course, the DenseLayer and Convolutional layers do not handle time series data - they expect a different type of input. To deal with this, we need to use the layer preprocessor functionality: for example, the CnnToRnnPreProcessor and FeedForwardToRnnPreprocessor classes. See for all preprocessors. Fortunately, in most situations, the DL4J configuration system will automatically add these preprocessors as required. However, the preprocessors can be added manually (overriding the automatic addition of preprocessors, for each layer).

    For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: .inputPreProcessor(2, new RnnToFeedForwardPreProcessor()).

    hashtag
    Inference: Predictions One Step at a Time

    As with other types of neural networks, predictions can be generated for RNNs using the MultiLayerNetwork.output() and MultiLayerNetwork.feedForward() methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.

    Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.

    For these situations, MultiLayerNetwork provides four methods of note:

    • rnnTimeStep(INDArray)

    • rnnClearPreviousState()

    • rnnGetPreviousState(int layer)

    The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.

    In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:

    1. Generate output/predictions (forward pass), using the previous stored state (if any)

    2. Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)

    For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input). If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.

    Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:

    For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:

    There are a number of important differences here:

    1. In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data

    2. The forward pass is thus a single time step (as compared to the hundreds – or more)

    3. After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.

    However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the MultiLayerNetwork.rnnClearPreviousState() method. This will reset the internal state of all recurrent layers in the network.

    If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is not saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.

    Some other points of note:

    • We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.

    • If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.

    • The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:

    hashtag
    Loading Time Series Data

    Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.

    The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file. This method also supports:

    • Variable length time series input

    • One-to-many and many-to-one data loading (where input and labels are in different files)

    • Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])

    Note that in all cases, each line in the data files represents one time step.

    (In addition to the examples below, you might find to be of some use.)

    hashtag
    Example 1: Time Series of Same Length, Input and Labels in Separate Files

    Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).

    To use the and approaches, we first create two CSVSequenceRecordReader objects, one for input and one for labels:

    This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).

    Second, we need to initialize these two readers, by telling them where to get the data from. We do this with an InputSplit object. Suppose that our time series are numbered, with file names "myInput_0.csv", "myInput_1.csv", ..., "myLabels_0.csv", etc. One approach is to use the :

    In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.

    Finally, we can create our SequenceRecordReaderdataSetIterator:

    This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.

    The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.

    Note that:

    • For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.

      • Labels data: one value per line, as a class index

      • Label data will be converted to a one-hot representation automatically

    hashtag
    Example 2: Time Series of Same Length, Input and Labels in Same File

    Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.

    As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)

    In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":

    miniBatchSize and numPossibleLabels are the same as the previous example. Here, labelIndex specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).

    For regression on a single output value, we use:

    Again, the numPossibleLabels argument is not used for regression.

    hashtag
    Example 3: Time Series of Different Lengths (Many-to-Many)

    Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.

    We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:

    The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:

    1. That the time series may be of different lengths

    2. To align the input and labels - for each example individually - such that their last values occur at the same time step.

    Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.

    Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.

    Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.

    hashtag
    Example 4: Many-to-One and One-to-Many Data

    We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:

    • Input and labels are in separate delimited files

    • The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)

    • The input lengths may (optionally) differ between examples

    In fact, the same approach as in example 3 can do this:

    Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):

    The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.

    Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:

    hashtag
    Available layers

    hashtag
    LSTM

    LSTM recurrent neural network layer without peephole connections. Supports CuDNN acceleration - see for details

    hashtag
    RnnLossLayer

    Recurrent Neural Network Loss Layer. Handles calculation of gradients etc for various objective (loss) time distributed dense component here. Consequently, the output activations size is equal to the input size. Input and output activations are same as other RNN layers: 3 dimensions with shape [miniBatchSize,nIn,timeSeriesLength] and [miniBatchSize,nOut,timeSeriesLength] respectively. Note that RnnLossLayer also has the option to configure an activation function

    setNIn

    • param lossFunction Loss function for the loss layer

    hashtag
    RnnOutputLayer

    and labels of shape [minibatch,nOut,sequenceLength]. It also supports mask arrays. Note that RnnOutputLayer can also be used for 1D CNN layers, which also have [minibatch,nOut,sequenceLength] activations/labels shape.

    build

    • param lossFunction Loss function for the output layer

    hashtag
    Bidirectional

    Bidirectional is a “wrapper” layer: it wraps any uni-directional RNN layer to make it bidirectional. Note that multiple different modes are supported - these specify how the activations should be combined from the forward and separate copies of the wrapped RNN layer, each with separate parameters.

    getNOut

    This Mode enumeration defines how the activations for the forward and backward networks should be combined. ADD: out = forward + backward (elementwise addition) MUL: out = forward backward (elementwise multiplication) AVERAGE: out = 0.5 (forward + backward) CONCAT: Concatenate the activations. Where ‘forward’ is the activations for the forward RNN, and ‘backward’ is the activations for the backward RNN. In all cases except CONCAT, the output activations size is the same size as the standard RNN that is being wrapped by this layer. In the CONCAT case, the output activations size (dimension 1) is 2x larger than the standard RNN’s activations array.

    getUpdaterByParam

    Get the updater for the given parameter. Typically the same updater will be used for all updaters, but this is not necessarily the case

    • param paramName Parameter name

    • return IUpdater for the parameter

    hashtag
    LastTimeStep

    LastTimeStep is a “wrapper” layer: it wraps any RNN (or CNN1D) layer, and extracts out the last time step during forward pass, and returns it as a row vector (per example). That is, for 3d (time series) input (with shape [minibatch, layerSize, timeSeriesLength]), we take the last time step and return it as a 2d array with shape [minibatch, layerSize]. Note that the last time step operation takes into account any mask arrays, if present: thus, variable length time series (in the same minibatch) are handled as expected here.

    hashtag
    SimpleRnn

    activationFn( in_t inWeight + out_(t-1) recurrentWeights + bias)}.

    Note that other architectures (LSTM, etc) are usually much more effective, especially for longer time series; however SimpleRnn is very fast to compute, and hence may be considered where the length of the temporal dependencies in the dataset are only a few steps long.

    rnnSetPreviousState(int layer, Map<String,INDArray> state)

    For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]

  • For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.

  • It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.

  • The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.

  • The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).

  • The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.

  • Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)
    For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.
    • The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)

    • No processing of the labels is done when regression = true

    SimpleRnnarrow-up-right
    LSTMarrow-up-right
    herearrow-up-right
    these unit testsarrow-up-right
    SequenceRecordReaderDataSetIteratorarrow-up-right
    CSVSequenceRecordReaderarrow-up-right
    NumberedFileInputSplitarrow-up-right
    [source]arrow-up-right
    cuDNN
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation(Activation.SOFTMAX)
    .weightInit(WeightInit.XAVIER).nIn(prevLayerSize).nOut(nOut).build())
    .backpropType(BackpropType.TruncatedBPTT)
    .tBPTTLength(100)
    Evaluation.evalTimeSeries(INDArray labels, INDArray predicted, INDArray outputMask)
        INDArray timeSeriesFeatures = ...;
        INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);
        int timeSeriesLength = timeSeriesOutput.size(2);        //Size of time dimension
        INDArray lastTimeStepProbabilities = timeSeriesOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength-1));
        INDArray labelsMaskArray = ...;
        INDArray lastTimeStepIndices = Nd4j.argMax(labelMaskArray,1);
        INDArray featuresMaskArray = ...;
        int longestTimeSeries = featuresMaskArray.size(1);
        INDArray linspace = Nd4j.linspace(1,longestTimeSeries,longestTimeSeries);
        INDArray temp = featuresMaskArray.mulColumnVector(linspace);
        INDArray lastTimeStepIndices = Nd4j.argMax(temp,1);
        int numExamples = timeSeriesFeatures.size(0);
        for( int i=0; i<numExamples; i++ ){
            int thisTimeSeriesLastIndex = lastTimeStepIndices.getInt(i);
            INDArray thisExampleProbabilities = timeSeriesOutput.get(NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(thisTimeSeriesLastIndex));
        }
    SequenceRecordReader featureReader = new CSVSequenceRecordReader(1, ",");
    SequenceRecordReader labelReader = new CSVSequenceRecordReader(1, ",");
    featureReader.initialize(new NumberedFileInputSplit("/path/to/data/myInput_%d.csv", 0, 9));
    labelReader.initialize(new NumberedFileInputSplit(/path/to/data/myLabels_%d.csv", 0, 9));
    DataSetIterator iter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression);
    SequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
    reader.initialize(new NumberedFileInputSplit("/path/to/data/myData_%d.csv", 0, 9));
    DataSetIterator iterClassification = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, numPossibleLabels, labelIndex, false);
    DataSetIterator iterRegression = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, -1, labelIndex, true);
    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
    public void setNIn(int nIn)
    public RnnOutputLayer build() 
    public long getNOut() 
    public IUpdater getUpdaterByParam(String paramName) 

    NN

    hashtag
    CReLU

    Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.

    • x (NUMERIC) - Input variable

    hashtag
    batchNorm

    Neural network batch normalization operation.

    For details, see

    • input (NUMERIC) - Input variable.

    • mean (NUMERIC) - Mean value. For 1d axis, this should match input.size(axis)

    • variance (NUMERIC) - Variance value. For 1d axis, this should match input.size(axis)

    hashtag
    biasAdd

    Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector

    • input (NUMERIC) - 4d input variable

    • bias (NUMERIC) - 1d bias

    • nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels].

    hashtag
    dotProductAttention

    This operation performs dot product attention on the given timeseries input with the given queries

    out = sum(similarity(k_i, q) * v_i)

    similarity(k, q) = softmax(k q) where x q is the dot product of x and q

    Optionally with normalization step:

    similarity(k, q) = softmax(k * q / sqrt(size(q))

    See also "Attention is all you need" (, p. 4, eq. 1)

    Note: This supports multiple queries at once, if only one query is available the queries vector still has to

    be 3D but can have queryCount = 1

    Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for

    both.

    Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The

    output rank will depend on the input rank.

    • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

      or 4D array of shape [batchSize, numHeads, featureKeys, queryCount]

    • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

      or 4D array of shape [batchSize, numHeads, featureKeys, timesteps]

    hashtag
    dropout

    Dropout operation

    • input (NUMERIC) - Input array

    • inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)

    hashtag
    elu

    Element-wise exponential linear unit (ELU) function:

    out = x if x > 0

    out = a * (exp(x) - 1) if x <= 0

    with constant a = 1.0

    See:

    • x (NUMERIC) - Input variable

    hashtag
    gelu

    GELU activation function - Gaussian Error Linear Units

    For more details, see Gaussian Error Linear Units (GELUs) -

    This method uses the sigmoid approximation

    • x (NUMERIC) - Input variable

    hashtag
    hardSigmoid

    Element-wise hard sigmoid function:

    out[i] = 0 if in[i] <= -2.5

    out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5

    out[i] = 1 if in[i] >= 2.5

    • x (NUMERIC) - Input variable

    hashtag
    hardTanh

    Element-wise hard tanh function:

    out[i] = -1 if in[i] <= -1

    out[1] = in[i] if -1 < in[i] < 1

    out[i] = 1 if in[i] >= 1

    • x (NUMERIC) - Input variable

    hashtag
    hardTanhDerivative

    Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)

    • x (NUMERIC) - Input variable

    hashtag
    layerNorm

    Apply Layer Normalization

    y = gain * standardize(x) + bias

    • input (NUMERIC) - Input variable

    • gain (NUMERIC) - Gain

    • bias (NUMERIC) - Bias

    hashtag
    leakyRelu

    Element-wise leaky ReLU function:

    out = x if x >= 0.0

    out = alpha * x if x < cutoff

    Alpha value is most commonly set to 0.01

    • x (NUMERIC) - Input variable

    • alpha - Cutoff - commonly 0.01

    hashtag
    leakyReluDerivative

    Leaky ReLU derivative: dOut/dIn given input.

    • x (NUMERIC) - Input variable

    • alpha - Cutoff - commonly 0.01

    hashtag
    linear

    Linear layer operation: out = mmul(in,w) + bias

    Note that bias array is optional

    • input (NUMERIC) - Input data

    • weights (NUMERIC) - Weights variable, shape [nIn, nOut]

    • bias (NUMERIC) - Optional bias variable (may be null)

    hashtag
    logSigmoid

    Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))

    • x (NUMERIC) - Input variable

    hashtag
    logSoftmax

    Log softmax activation

    • x (NUMERIC) -

    hashtag
    logSoftmax

    Log softmax activation

    • x (NUMERIC) - Input

    • dimension - Dimension along which to apply log softmax

    hashtag
    multiHeadDotProductAttention

    This performs multi-headed dot product attention on the given timeseries input

    out = concat(head_1, head_2, ..., head_n) * Wo

    head_i = dot_product_attention(Wq_iq, Wk_ik, Wv_i*v)

    Optionally with normalization when calculating the attention for each head.

    See also "Attention is all you need" (, pp. 4,5, "3.2.2 Multi-Head Attention")

    This makes use of dot_product_attention OP support for rank 4 inputs.

    see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)

    • queries (NUMERIC) - input 3D array "queries" of shape [batchSize, featureKeys, queryCount]

    • keys (NUMERIC) - input 3D array "keys" of shape [batchSize, featureKeys, timesteps]

    • values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

    hashtag
    pad

    Padding operation

    • input (NUMERIC) - Input tensor

    • padding (NUMERIC) - Padding value

    • PadMode - Padding format - default = CONSTANT

    hashtag
    preciseGelu

    GELU activation function - Gaussian Error Linear Units

    For more details, see Gaussian Error Linear Units (GELUs) -

    This method uses the precise method

    • x (NUMERIC) - Input variable

    hashtag
    prelu

    PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:

    out[i] = in[i] if in[i] >= 0

    out[i] = in[i] * alpha[i] otherwise

    sharedAxes allows you to share learnable parameters along axes.

    For example, if the input has shape [batchSize, channels, height, width]

    and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an

    alpha with shape [channels].

    • input (NUMERIC) - Input data

    • alpha (NUMERIC) - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha.

    • sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))

    hashtag
    relu

    Element-wise rectified linear function with specified cutoff:

    out[i] = in[i] if in[i] >= cutoff

    out[i] = 0 otherwise

    • x (NUMERIC) - Input

    • cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0

    hashtag
    relu6

    Element-wise "rectified linear 6" function with specified cutoff:

    out[i] = min(max(in, cutoff), 6)

    • x (NUMERIC) - Input

    • cutoff - Cutoff value for ReLU operation. Usually 0

    hashtag
    reluLayer

    ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)

    Note that bias array is optional

    • input (NUMERIC) - Input data

    • weights (NUMERIC) - Weights variable

    • bias (NUMERIC) - Optional bias variable (may be null)

    hashtag
    selu

    Element-wise SeLU function - Scaled exponential Lineal Unit: see

    out[i] = scale alpha (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0

    Uses default scale and alpha values.

    • x (NUMERIC) - Input variable

    hashtag
    sigmoid

    Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))

    • x (NUMERIC) - Input variable

    hashtag
    sigmoidDerivative

    Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut

    • x (NUMERIC) - Input Variable

    • wrt (NUMERIC) - Gradient at the output - dL/dOut. Must have same shape as the input

    hashtag
    softmax

    Softmax activation, along the specified dimension

    • x (NUMERIC) - Input

    • dimension - Dimension along which to apply softmax - default = -1

    hashtag
    softmaxDerivative

    Softmax derivative function

    • x (NUMERIC) - Softmax input

    • wrt (NUMERIC) - Gradient at output, dL/dx

    • dimension - Softmax dimension

    hashtag
    softplus

    Element-wise softplus function: out = log(exp(x) + 1)

    • x (NUMERIC) - Input variable

    hashtag
    softsign

    Element-wise softsign function: out = x / (abs(x) + 1)

    • x (NUMERIC) - Input variable

    hashtag
    softsignDerivative

    Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)

    • x (NUMERIC) - Input variable

    hashtag
    swish

    Element-wise "swish" function: out = x sigmoid(bx) with b=1.0

    See:

    • x (NUMERIC) - Input variable

    hashtag
    tanh

    Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

    • x (NUMERIC) - Input variable

    INDArray CReLU(INDArray x)
    
    SDVariable CReLU(SDVariable x)
    SDVariable CReLU(String name, SDVariable x)

    gamma (NUMERIC) - Gamma value. For 1d axis, this should match input.size(axis)

  • beta (NUMERIC) - Beta value. For 1d axis, this should match input.size(axis)

  • epsilon - Epsilon constant for numerical stability (to avoid division by 0)

  • axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations.

    For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC

    For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))

  • Unused for 2d inputs

    values (NUMERIC) - input 3D array "values" of shape [batchSize, featureValues, timesteps]

    or 4D array of shape [batchSize, numHeads, featureValues, timesteps]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

  • channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data

  • dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))

  • Wq (NUMERIC) - input query projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wk (NUMERIC) - input key projection weights of shape [numHeads, projectedKeys, featureKeys]

  • Wv (NUMERIC) - input value projection weights of shape [numHeads, projectedValues, featureValues]

  • Wo (NUMERIC) - output projection weights of shape [numHeads * projectedValues, outSize]

  • mask (NUMERIC) - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps]

  • scaled - normalization, false -> do not apply normalization, true -> apply normalization

  • constant - Padding constant

    https://arxiv.org/abs/1502.03167arrow-up-right
    https://arxiv.org/abs/1706.03762arrow-up-right
    https://arxiv.org/abs/1511.07289arrow-up-right
    https://arxiv.org/abs/1606.08415arrow-up-right
    https://arxiv.org/abs/1706.03762arrow-up-right
    https://arxiv.org/abs/1606.08415arrow-up-right
    Self-Normalizing Neural Networksarrow-up-right
    https://arxiv.org/abs/1710.05941arrow-up-right
    INDArray batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int[] axis)
    
    SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)
    SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int[] axis)
    INDArray biasAdd(INDArray input, INDArray bias, boolean nchw)
    
    SDVariable biasAdd(SDVariable input, SDVariable bias, boolean nchw)
    SDVariable biasAdd(String name, SDVariable input, SDVariable bias, boolean nchw)
    INDArray dotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray mask, boolean scaled)
    
    SDVariable dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
    SDVariable dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
    INDArray dropout(INDArray input, double inputRetainProbability)
    
    SDVariable dropout(SDVariable input, double inputRetainProbability)
    SDVariable dropout(String name, SDVariable input, double inputRetainProbability)
    INDArray elu(INDArray x)
    
    SDVariable elu(SDVariable x)
    SDVariable elu(String name, SDVariable x)
    INDArray gelu(INDArray x)
    
    SDVariable gelu(SDVariable x)
    SDVariable gelu(String name, SDVariable x)
    INDArray hardSigmoid(INDArray x)
    
    SDVariable hardSigmoid(SDVariable x)
    SDVariable hardSigmoid(String name, SDVariable x)
    INDArray hardTanh(INDArray x)
    
    SDVariable hardTanh(SDVariable x)
    SDVariable hardTanh(String name, SDVariable x)
    INDArray hardTanhDerivative(INDArray x)
    
    SDVariable hardTanhDerivative(SDVariable x)
    SDVariable hardTanhDerivative(String name, SDVariable x)
    INDArray layerNorm(INDArray input, INDArray gain, INDArray bias, boolean channelsFirst, int[] dimensions)
    INDArray layerNorm(INDArray input, INDArray gain, boolean channelsFirst, int[] dimensions)
    
    SDVariable layerNorm(SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int[] dimensions)
    SDVariable layerNorm(SDVariable input, SDVariable gain, boolean channelsFirst, int[] dimensions)
    SDVariable layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int[] dimensions)
    SDVariable layerNorm(String name, SDVariable input, SDVariable gain, boolean channelsFirst, int[] dimensions)
    INDArray leakyRelu(INDArray x, double alpha)
    
    SDVariable leakyRelu(SDVariable x, double alpha)
    SDVariable leakyRelu(String name, SDVariable x, double alpha)
    INDArray leakyReluDerivative(INDArray x, double alpha)
    
    SDVariable leakyReluDerivative(SDVariable x, double alpha)
    SDVariable leakyReluDerivative(String name, SDVariable x, double alpha)
    INDArray linear(INDArray input, INDArray weights, INDArray bias)
    
    SDVariable linear(SDVariable input, SDVariable weights, SDVariable bias)
    SDVariable linear(String name, SDVariable input, SDVariable weights, SDVariable bias)
    INDArray logSigmoid(INDArray x)
    
    SDVariable logSigmoid(SDVariable x)
    SDVariable logSigmoid(String name, SDVariable x)
    INDArray logSoftmax(INDArray x)
    
    SDVariable logSoftmax(SDVariable x)
    SDVariable logSoftmax(String name, SDVariable x)
    INDArray logSoftmax(INDArray x, int dimension)
    
    SDVariable logSoftmax(SDVariable x, int dimension)
    SDVariable logSoftmax(String name, SDVariable x, int dimension)
    INDArray multiHeadDotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray Wq, INDArray Wk, INDArray Wv, INDArray Wo, INDArray mask, boolean scaled)
    
    SDVariable multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
    SDVariable multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
    INDArray pad(INDArray input, INDArray padding, PadMode PadMode, double constant)
    INDArray pad(INDArray input, INDArray padding, double constant)
    
    SDVariable pad(SDVariable input, SDVariable padding, PadMode PadMode, double constant)
    SDVariable pad(SDVariable input, SDVariable padding, double constant)
    SDVariable pad(String name, SDVariable input, SDVariable padding, PadMode PadMode, double constant)
    SDVariable pad(String name, SDVariable input, SDVariable padding, double constant)
    INDArray preciseGelu(INDArray x)
    
    SDVariable preciseGelu(SDVariable x)
    SDVariable preciseGelu(String name, SDVariable x)
    INDArray prelu(INDArray input, INDArray alpha, int[] sharedAxes)
    
    SDVariable prelu(SDVariable input, SDVariable alpha, int[] sharedAxes)
    SDVariable prelu(String name, SDVariable input, SDVariable alpha, int[] sharedAxes)
    INDArray relu(INDArray x, double cutoff)
    
    SDVariable relu(SDVariable x, double cutoff)
    SDVariable relu(String name, SDVariable x, double cutoff)
    INDArray relu6(INDArray x, double cutoff)
    
    SDVariable relu6(SDVariable x, double cutoff)
    SDVariable relu6(String name, SDVariable x, double cutoff)
    INDArray reluLayer(INDArray input, INDArray weights, INDArray bias)
    
    SDVariable reluLayer(SDVariable input, SDVariable weights, SDVariable bias)
    SDVariable reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)
    INDArray selu(INDArray x)
    
    SDVariable selu(SDVariable x)
    SDVariable selu(String name, SDVariable x)
    INDArray sigmoid(INDArray x)
    
    SDVariable sigmoid(SDVariable x)
    SDVariable sigmoid(String name, SDVariable x)
    INDArray sigmoidDerivative(INDArray x, INDArray wrt)
    
    SDVariable sigmoidDerivative(SDVariable x, SDVariable wrt)
    SDVariable sigmoidDerivative(String name, SDVariable x, SDVariable wrt)
    INDArray softmax(INDArray x, int dimension)
    INDArray softmax(INDArray x)
    
    SDVariable softmax(SDVariable x, int dimension)
    SDVariable softmax(SDVariable x)
    SDVariable softmax(String name, SDVariable x, int dimension)
    SDVariable softmax(String name, SDVariable x)
    INDArray softmaxDerivative(INDArray x, INDArray wrt, int dimension)
    
    SDVariable softmaxDerivative(SDVariable x, SDVariable wrt, int dimension)
    SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt, int dimension)
    INDArray softplus(INDArray x)
    
    SDVariable softplus(SDVariable x)
    SDVariable softplus(String name, SDVariable x)
    INDArray softsign(INDArray x)
    
    SDVariable softsign(SDVariable x)
    SDVariable softsign(String name, SDVariable x)
    INDArray softsignDerivative(INDArray x)
    
    SDVariable softsignDerivative(SDVariable x)
    SDVariable softsignDerivative(String name, SDVariable x)
    INDArray swish(INDArray x)
    
    SDVariable swish(SDVariable x)
    SDVariable swish(String name, SDVariable x)
    INDArray tanh(INDArray x)
    
    SDVariable tanh(SDVariable x)
    SDVariable tanh(String name, SDVariable x)

    Overview

    Comprehensive programming guide for ND4J. This user guide is designed to explain (and provide examples for) the main functionality in ND4J.

    hashtag
    Introduction

    An NDArray is in essence n-dimensional array: i.e., a rectangular array of numbers, with some number of dimensions.

    Some concepts you should be familiar with:

    • The rank of a NDArray is the number of dimensions. 2d NDArrays have a rank of 2, 3d arrays have a rank of 3, and so on. You can create NDArrays with any arbitrary rank.

    • The shape of an NDArray defines the size of each of the dimensions. Suppose we have a 2d array with 3 rows and 5 columns. This NDArray would have shape [3,5]

    • The length of an NDArray defines the total number of elements in the array. The length is always equal to the product of the values that make up the shape.

    • The stride of an NDArray is defined as the separation (in the underlying data buffer) of contiguous elements in each dimension. Stride is defined per dimension, so a rank N NDArray has N stride values, one for each dimension. Note that most of the time, you don't need to know (or concern yourself with) the stride - just be aware that this is how ND4J operates internally. The next section has an example of strides.

    • The data type of an NDArray refers to the type of data of an NDArray (for example, float or double precision). Note that this is set globally in ND4J, so all NDArrays should have the same data type. Setting the data type is discussed later in this document.

    In terms of indexing there are a few things to know. First, rows are dimension 0, and columns are dimension 1: thus INDArray.size(0) is the number of rows, and INDArray.size(1) is the number of columns. Like normal arrays in most programming languages, indexing is zero-based: thus rows have indexes 0 to INDArray.size(0)-1, and so on for the other dimensions.

    Throughout this document, we'll use the term NDArray to refer to the general concept of an n-dimensional array; the term INDArray refers specifically to the that ND4J defines. In practice, these two terms can be used interchangeably.

    hashtag
    NDArrays: How Are They Stored in Memory?

    The next few paragraphs describe some of architecture behind ND4J. Understanding this is not strictly necessary in order to use ND4J, but it may help you to understand what is going on behind the scenes. NDArrays are stored in memory as a single flat array of numbers (or more generally, as a single contiguous block of memory), and hence differs a lot from typical Java multidimensional arrays such as a float[][] or double[][][].

    Physically, the data that backs an INDArray is stored off-heap: that is, it is stored outside of the Java Virtual Machine (JVM). This has numerous benefits, including performance, interoperability with high-performance BLAS libraries, and the ability to avoid some shortcomings of the JVM in high-performance computing (such as issues with Java arrays being limited to 2^31 -1 (2.14 billion) elements due to integer indexing).

    In terms of encoding, an NDArray can be encoded in either C (row-major) or Fortran (column-major) order. For more details on row vs. column major order, see . Nd4J may use a combination of C and F order arrays together, at the same time. Most users can just use the default array ordering, but note that it is possible to use a specific ordering for a given array, should the need arise.

    The following image shows how a simple 3x3 (2d) NDArray is stored in memory,

    In the above array, we have:

    • Shape = [3,3] (3 rows, 3 columns)

    • Rank = 2 (2 dimensions)

    • Length = 9 (3x3=9)

    hashtag

    A key concept in ND4J is the fact that two NDArrays can actually point to the same underlying data in memory. Usually, we have one NDArray referring to some subset of another array, and this only occurs for certain operations (such as INDArray.get(), INDArray.transpose(), INDArray.getRow() etc. This is a powerful concept, and one that is worth understanding.

    There are two primary motivations for this:

    1. There are considerable performance benefits, most notably in avoiding copying arrays

    2. We gain a lot of power in terms of how we can perform operations on our NDArrays

    Consider a simple operation like a matrix transpose on a large (10,000 x 10,000) matrix. Using views, we can perform this matrix transpose in constant time without performing any copies (i.e., O(1) in ), avoiding the considerable cost copying all of the array elements. Of course, sometimes we do want to make a copy - at which point we can use the INDArray.dup() to get a copy. For example, to get a copy of a transposed matrix, use INDArray out = myMatrix.transpose().dup(). After this dup() call, there will be no link between the original array myMatrix and the array out (thus, changes to one will not impact the other).

    So see how views can be powerful, consider a simple task: adding 1.0 to the first row of a larger array, myArray. We can do this easily, in one line:

    myArray.getRow(0).addi(1.0)

    Let's break down what is happening here. First, the getRow(0) operation returns an INDArray that is a view of the original. Note that both myArrays and myArray.getRow(0) point to the same area in memory:

    then, after the addi(1.0) is performed, we have the following situation:

    As we can see, changes to the NDArray returned by myArray.getRow(0) will be reflected in the original array myArray; similarly, changes to myArray will be reflected in the row vector.

    hashtag

    hashtag

    Two of the most commonly used methods of creating arrays are:

    • Nd4j.zeros(int...)

    • Nd4j.ones(int...)

    The shape of the arrays are specified as integers. For example, to create a zero-filled array with 3 rows and 5 columns, use Nd4j.zeros(3,5).

    These can often be combined with other operations to create arrays with other values. For example, to create an array filled with 10s:

    INDArray tens = Nd4j.zeros(3,5).addi(10)

    The above initialization works in two steps: first by allocating a 3x5 array filled with zeros, and then by adding 10 to each value.

    hashtag

    Nd4j provides a few methods to generate INDArrays, where the contents are pseudo-random numbers.

    To generate uniform random numbers in the range 0 to 1, use Nd4j.rand(int nRows, int nCols) (for 2d arrays), or Nd4j.rand(int[]) (for 3 or more dimensions).

    Similarly, to generate Gaussian random numbers with mean zero and standard deviation 1, use Nd4j.randn(int nRows, int nCols) or Nd4j.randn(int[]).

    For repeatability (i.e., to set Nd4j's random number generator seed) you can use Nd4j.getRandom().setSeed(long)

    hashtag

    Nd4j provides convenience methods for the creation of arrays from Java float and double arrays.

    To create a 1d NDArray from a 1d Java array, use:

    • Row vector: Nd4j.create(float[]) or Nd4j.create(double[])

    • Column vector: Nd4j.create(float[],new int[]{length,1}) or Nd4j.create(double[],new int[]{length,1})

    For 2d arrays, use Nd4j.create(float[][]) or Nd4j.create(double[][]).

    For creating NDArrays from Java primitive arrays with 3 or more dimensions (double[][][] etc), one approach is to use the following:

    hashtag

    There are three primary ways of creating arrays from other arrays:

    • Creating an exact copy of an existing NDArray using INDArray.dup()

    • Create the array as a subset of an existing NDArray

    • Combine a number of existing NDArrays to create a new NDArray

    For the second case, you can use getRow(), get(), etc. See for details on this.

    Two methods for combining NDArrays are Nd4j.hstack(INDArray...) and Nd4j.vstack(INDArray...).

    hstack (horizontal stack) takes as argument a number of matrices that have the same number of rows, and stacks them horizontally to produce a new array. The input NDArrays can have a different number of columns, however.

    Example:

    Output:

    vstack (vertical stack) is the vertical equivalent of hstack. The input arrays must have the same number of columns.

    Example:

    Output:

    ND4J.concat combines arrays along a dimension.

    Example:

    Output:

    ND4J.pad is used to pad an array.

    Example:

    Output:

    One other method that can occasionally be useful is Nd4j.diag(INDArray in). This method has two uses, depending on the argument in:

    • If in in a vector, diag outputs a NxN matrix with the diagonal equal to the array in (where N is the length of in)

    • If in is a NxN matrix, diag outputs a vector taken from the diagonal of in

    hashtag

    To create an of size N, you can use Nd4j.eye(N).

    To create a row vector with elements [a, a+1, a+2, ..., b] you can use the linspace command:

    Nd4j.linspace(a, b, b-a+1)

    Linspace can be combined with a reshape operation to get other shapes. For example, if you want a 2d NDArray with 5 rows and 5 columns, with values 1 to 25 inclusive, you can use the following:

    Nd4j.linspace(1,25,25).reshape(5,5)

    hashtag

    For an INDArray, you can get or set values using the indexes of the element you want to get or set. For a rank N array (i.e., an array with N dimensions) you need N indices.

    Note: getting or setting values individually (for example, one at a time in a for loop) is generally a bad idea in terms of performance. When possible, try to use other INDArray methods that operate on a large number of elements at a time.

    To get values from a 2d array, you can use: INDArray.getDouble(int row, int column)

    For arrays of any dimensionality, you can use INDArray.getDouble(int...). For example, to get the value at index i,j,k use INDArray.getDouble(i,j,k)

    To set values, use one of the putScalar methods:

    • INDArray.putScalar(int[],double)

    • INDArray.putScalar(int[],float)

    • INDArray.putScalar(int[],int)

    Here, the int[] is the index, and the double/float/int is the value to be placed at that index.

    Some additional functionality that might be useful in certain circumstances is the NDIndexIterator class. The NDIndexIterator allows you to get the indexes in a defined order (specifially, the C-order traversal order: [0,0,0], [0,0,1], [0,0,2], ..., [0,1,0], ... etc for a rank 3 array).

    To iterate over the values in a 2d array, you can use:

    hashtag

    hashtag

    In order to get a single row from an INDArray, you can use INDArray.getRow(int). This will obviously return a row vector. Of note here is that this row is a view: changes to the returned row will impact the original array. This can be quite useful at times (for example: myArr.getRow(3).addi(1.0) to add 1.0 to the third row of a larger array); if you want a copy of a row, use getRow(int).dup().

    Simiarly, to get multiple rows, use INDArray.getRows(int...). This returns an array with the rows stacked; note however that this will be a copy (not a view) of the original rows, a view is not possible here due to the way NDArrays are stored in memory.

    For setting a single row, you can use myArray.putRow(int rowIdx,INDArray row). This will set the rowIdxth row of myArray to the values contained in the INDArray row.

    hashtag

    Get:

    A more powerful and general method is to use INDArray.get(NDArrayIndex...). This functionality allows you to get an arbitrary sub-arrays based on certain indexes. This is perhaps best explained by some examples:

    To get a single row (and all columns), you can use:

    myArray.get(NDArrayIndex.point(rowIdx), NDArrayIndex.all())

    To get a range of rows (row a (inclusive) to row b (exclusive)) and all columns, you can use:

    myArray.get(NDArrayIndex.interval(a,b), NDArrayIndex.all())

    To get all rows and every second column, you can use:

    myArray.get(NDArrayIndex.all(),NDArrayIndex.interval(0,2,nCols))

    Though the above examples are for 2d arrays only, the NDArrayIndex approach extends to 3 or more dimensions. For 3 dimension, you would provide 3 INDArrayIndex objects instead of just two, as above.

    Note that the NDArrayIndex.interval(...), .all() and .point(int) methods always return views of the underlying arrays. Thus, changes to the arrays returned by .get() will be reflected in the original array.

    Put:

    The same NDArrayIndex approach is also used to put elements to another array: in this case you use the INDArray.put(INDArrayIndex[], INDArray toPut) method. Clearly, the size of the NDArray toPut must match the size implied by the provided indexes.

    Also note that myArray.put(NDArrayIndex[],INDArray other) is functionally equivalent to doing myArray.get(INDArrayIndex...).assign(INDArray other). Again, this is because .get(INDArrayIndex...) returns a view of the underlying array, not a copy.

    hashtag

    (Note: ND4J versions 0.4-rc3.8 and earlier returned slightly different results for tensor along dimension, as compared to current versions).

    Tensor along dimension is a powerful technique, but can be a little hard to understand at first. The idea behind tensor along dimension (hereafter refered to as TAD) is to get a lower rank sub-array that is a of the original array.

    The tensor along dimension method takes two arguments:

    • The index of the tensor to return (in the range of 0 to numTensors-1)

    • The dimensions (1 or more values) along which to execute the TAD operation

    The simplest case is a tensor along a single row or column of a 2d array. Consider the following diagram (where dimension 0 (rows) are indexed going down the page, and dimension 1 (columns) are indexed going across the page):

    Note that the output of the tensorAlongDimension call with one dimension is a row vector in all cases.

    To understand why we get this output: consider the first case in the above diagram. There, we are taking the 0th (first) tensor along dimension 0 (dimension 0 being rows); the values (1,5,2) are in a line as we move along dimension 0, hence the output. Similarly, the tensorAlongDimension(1,1) is the second (index=1) tensor along dimension 1; values (5,3,5) are in a line as we move along dimension 1.

    The TAD operation can also be executed along multiple dimensions. For example, by specifying two dimensions to execute the TAD operation along, we can use it to get a 2d sub-array from a 3d (or 4d, or 5d...) array. Similarly, by specifying 3 dimensions, we can use it to get a 3d from 4d or higher.

    There are two things we need to know about the output, for the TAD operation to be useful.

    First, we need to the number of tensors that we can get, for a given set of dimensions. To determine this, we can use the "number of tensors along dimensions" method, INDArray.tensorssAlongDimension(int... dimensions). This method simply returns the number of tensors along the specified dimensions. In the examples above, we have:

    • myArray.tensorssAlongDimension(0) = 3

    • myArray.tensorssAlongDimension(1) = 3

    • myArray.tensorssAlongDimension(0,1) = 1

    (In the latter 2 cases, note that tensor along dimension would give us the same array out as the original array in - i.e., we get a 2d output from a 2d array).

    More generally, the number of tensors is given by the product of the remaining dimensions, and the shape of the tensors is given by the size of the specified dimensions in the original shape.

    Here's some examples:

    • For input shape [a,b,c], tensorssAlongDimension(0) gives b*c tensors, and tensorAlongDimension(i,0) returns tensors of shape [1,a].

    • For input shape [a,b,c], tensorssAlongDimension(1) gives a*c tensors, and tensorAlongDimension(i,1) returns tensors of shape [1,b].

    • For input shape [a,b,c], tensorssAlongDimension(0,1) gives c tensors, and tensorAlongDimension(i,0,1) returns tensors of shape [a,b].

    hashtag

    [This section: Forthcoming.]

    hashtag

    Nd4J has the concept of ops (operations) for many things you might want to do with (or to) an INDArray. For example, ops are used to apply things like tanh operations, or add a scalar, or do element-wise operations.

    ND4J defines five types of operations:

    • Scalar

    • Transform

    • Accumulation

    • Index Accumulation

    And two methods of executing each:

    • Directly on the entire INDArray, or

    • Along a dimension

    Before getting into the specifics of these operations, let's take a moment to consider the difference between in-place and copy operations.

    Many ops have both in-place and copy operations. Suppose we want to add two arrays. Nd4j defines two methods for this: INDArray.add(INDArray) and INDArray.addi(INDArray). The former (add) is a copy operation; the latter is an in-place operation - the i in addi means in-place. This convention (...i means in-place, no i means copy) holds for other ops that are accessible via the INDArray interface.

    Suppose we have two INDArrays x and y and we do INDArray z = x.add(y) or INDArray z = x.addi(y). The results of these operations are shown below.

    Note that with the x.add(y) operation, the original array x is not modified. Comparatively, with the in-place version x.addi(y), the array x is modified. In both versions of the add operation, an INDArray is returned that contains the result. Note however that in the case of the addi operation, the result array us actually just the original array x.

    hashtag

    are element-wise operations that also take a scalar (i.e., a number). Examples of scalar ops are add, max, multiply, set and divide operations (see the previous link for a full list).

    A number of the methods such as INDArray.addi(Number) and INDArray.divi(Number) actually execute scalar ops behind the scenes, so when available, it is more convenient to use these methods.

    To execute a scalar op more directly, you can use for example:

    Nd4j.getExecutioner().execAndReturn(new ScalarAdd(myArray,1.0))

    Note that myArray is modified by this operation. If this is not what you want, use myArray.dup().

    Unlike the remaining ops, scalar ops don't have a sensible interpretation of executing them along a dimension.

    hashtag

    Transform ops are operations such as element-wise logarithm, cosine, tanh, rectified linear, etc. Other examples include add, subtract and copy operations. Transform ops are commonly used in an element-wise manner (such as tanh on each element), but this is not always the case - for example, softmax is typically executed along a dimension.

    To execute an element-wise tanh operation directly (on the full NDArray) you can use:

    INDArray tanh = Nd4j.getExecutioner().execAndReturn(new Tanh(myArr)) As with scalar ops mentioned above, transform operations using the above method are in-place operations: that is, the NDArray myArr is modified, and the returned array tanh is actually the same object as the input myArr. Again, you can use myArr.dup() if you want a copy.

    The also defines some convenience methods, such as: INDArray tanh = Transforms.tanh(INDArray in,boolean copy); This is equivalent to the method using Nd4j.getExecutioner() above.

    hashtag

    When it comes to executing accumulations, there is a key difference between executing the accumulation on the entire NDArray, versus executing along a particular dimension (or dimensions). In the first case (executing on the entire array), only a single value is returned. In the second case (accumulating along a dimension) a new INDArray is returned.

    To get the sum of all values in the array:

    double sum = Nd4j.getExecutioner().execAndReturn(new Sum(myArray)).getFinalResult().doubleValue();

    or equivalently (and more conveniently)

    double sum = myArray.sumNumber().doubleValue();

    Accumulation ops can also be executed along a dimension. For example, to get the sum of all values in each column (in each column = along dimension 0, or "for values in each row"), you can use:

    INDArray sumOfColumns = Nd4j.getExecutioner().exec(new Sum(myArray),0);

    or equivalently,

    INDArray sumOfColumns = myArray.sum(0)

    Suppose this was executed on a 3x3 input array. Visually, this sum operation along dimension 0 operation looks like:

    Note that here, the input has shape [3,3] (3 rows, 3 columns) and the output has shape [1,3] (i.e., our output is a row vector). Had we instead done the operation along dimension 1, we would get a column vector with shape [3,1], with values (12,13,11).

    Accumulations along dimensions also generalize to NDArrays with 3 or more dimensions.

    hashtag

    are very similar to accumulation ops. The difference is that they return an integer index, instead of a double values.

    Examples of index accumulation ops are IMax (argmax), IMin (argmin) and IAMax (argmax of absolute values).

    To get the index of the maximum value in the array:

    int idx = Nd4j.getExecutioner().execAndReturn(new IAMax(myArray)).getFinalResult();

    Index accumulation ops are often most useful when executed along a dimension. For example, to get the index of the maximum value in each column (in each column = along dimension 0), you can use:

    INDArray idxOfMaxInEachColumn = Nd4j.getExecutioner().exec(new IAMax(myArray),0);

    Suppose this was executed on a 3x3 input array. Visually, this argmax/IAMax operation along dimension 0 operation looks like:

    As with the accumulation op described above, the output has shape [1,3]. Again, had we instead done the operation along dimension 1, we would get a column vector with shape [3,1], with values (1,0,2).

    hashtag

    ND4J also defines broadcast and vector operations.

    Some of the more useful operations are vector operations, such as addRowVector and muliColumnVector.

    Consider for example the operation x.addRowVector(y) where x is a matrix and y is a row vector. In this case, the addRowVector operation adds the row vector y to each row of the matrix x, as shown below.

    As with other ops, there are inplace and copy versions. There are also column column versions of these operations, such as addColumnVector, which adds a column vector to each column of the original INDArray.

    hashtag

    [This section: Forthcoming.]

    hashtag

    Workspaces are a feature of ND4J used to improve performance, by means of more efficient memory allocation and management. Specifically, workspaces are designed for cyclical workloads - such as training neural networks - as they allow for off-heap memory reuse (instead of continually allocating and deallocating memory on each iteration of the loop). The net effect is improved performance and reduced memory use.

    For more details on workspaces, see the following links:

    hashtag

    Sometimes with workspaces, you may encounter an exception such as:

    or

    Understanding Scope Panic Exceptions

    In short: these exceptions mean that an INDArray that has been allocated in a workspace is being used incorrectly (for example, a bug or incorrect implementation of some method). This can occur for two reasons:

    1. The INDArray has 'leaked out' of the workspace in which is was defined

    2. The INDArray is used within the correct workspace, but from a previous iteration

    In both cases, the underlying off-heap memory that the INDArray points to has been invalidated, and can no longer be used.

    An example sequence of events leading to a workspace leak: 1. Workspace W is opened 2. INDArray X is allocated in workspace W 3. Workspace W is closed, and hence the memory for X is no longer valid. 4. INDArray X is used in some operation, resulting in an exception

    An example sequence of events, leading to an outdated workspace pointer: 1. Workspace W is opened (iteration 1) 2. INDArray X is allocated in workspace W (iteration 1) 3. Workspace W is closed (iteration 1) 4. Workspace W is opened (iteration 2) 5. INDArray X (from iteration 1) is used in some operation, resulting in an exception

    Workarounds and Fixes for Scope Panic Exceptions

    There are two basic solutions, depending on the cause.

    First. if you have implemented some custom code (or are using workspaces manually), this usually indicates a bug in your code. Generally, you have two options: 1. Detach the INDArray from all workspace, using the INDArray.detach() method. The consequence is that the returned array is no longer associated with a workspace, and can be used freely within or outside of any workspace. 2. Don't allocate the array in the workspace in the first place. You can temporarily 'turn off' a workspace using: try(MemoryWorkspace scopedOut = Nd4j.getWorkspaceManager().scopeOutOfWorkspaces()){ <your code here> }. The consequence is that any new arrays (created via Nd4j.create, for example) within the try block will not be associated with a workspace, and can be used outside of a workspace 3. Move/copy the array to a parent workspace, using one of the INDArray.leverage() or leverageTo(String) or migrate() methods. See the Javadoc of these methods for more details.

    Second, if you are using workspaces as part of Deeplearning4j and have not implemented any custom functionality (i.e., you have not written your own layer, data pipeline, etc), then (on the off-chance you run into this), this most likely indicates a bug in the underlying library, which usually should be reported via a Github issue. One possible workaround in the mean time is to disable workspaces using the following code:

    If the exception is due to an issue in the data pipeline, you can try wrapping your DataSetIterator or MultiDataSetIterator in an AsyncShieldDataSetIterator or AsyncShieldMultiDataSetIterator.

    For either cause, a final solution - if you are sure your code is correct - is to try disabling scope panic. Note that this is NOT recommended and can crash the JVM if a legitimate issue is present. To do this, use Nd4j.getExecutioner().setProfilingMode(OpExecutioner.ProfilingMode.DISABLED); before executing your code.

    hashtag

    hashtag

    ND4J currently allows INDArrays to be backed by either float or double-precision values. The default is single-precision (float). To set the order that ND4J uses for arrays globally to double precision, you can use:

    Note that this should be done before using ND4J operations or creating arrays.

    Alternatively, you can set the property when launching the JVM:

    hashtag
    Reshaping

    [This section: Forthcoming.]

    hashtag

    Flattening is the process of taking a or more INDArrays and converting them into a single flat array (a row vector), given some traversal order of the arrays.

    Nd4j provides the following methods for this:

    Nd4j also provides overloaded toFlattened methods with the default ordering. The order argument must be 'c' or 'f', and defines the order in which values are taken from the arrays: c order results in the arrays being flattened using array indexes in an order like [0,0,0], [0,0,1], etc (for 3d arrays) whereas f order results in values being taken in order [0,0,0], [1,0,0], etc.

    hashtag
    Permute

    [This section: Forthcoming.]

    hashtag
    sortRows/sortColumns

    [This section: Forthcoming.]

    hashtag
    Directly accessing BLAS operations

    [This section: Forthcoming.]

    hashtag
    Serialization

    Nd4j provides serialization of INDArrays many formats. Here are some examples for binary and text serialization:

    The directory provides packages for Aeron, base64, camel-routes, gsom, jackson and kryo.

    hashtag

    This section lists the most commonly used operations in ND4J, in a summary form. More details on most of these can be found later in this page.

    In this section, assume that arr, arr1 etc are INDArrays.

    Creating NDArrays:

    • Create a zero-initialized array: Nd4j.zeros(nRows, nCols) or Nd4j.zeros(int...)

    • Create a one-initialized array: Nd4j.ones(nRows, nCols)

    • Create a copy (duplicate) of an NDArray: arr.dup()

    Determining the Size/Dimensions of an INDArray:

    The following methods are defined by the INDArray interface:

    • Get the number of dimensions: rank()

    • For 2d NDArrays only: rows(), columns()

    • Size of the ith dimension: size(i)

    Getting and Setting Single Values:

    • Get the value at row i, column j: arr.getDouble(i,j)

    • Getting a values from a 3+ dimenional array: arr.getDouble(int[])

    • Set a single value in an array: arr.putScalar(int[],double)

    Scalar operations: Scalar operations take a double/float/int value and do an operation for each As with element-wise operations, there are in-place and copy operations.

    • Add a scalar: arr1.add(myDouble)

    • Substract a scalar: arr1.sub(myDouble)

    • Multiply by a scalar: arr.mul(myDouble)

    • Divide by a scalar: arr.div(myDouble)

    Element-Wise Operations: Note: there are copy (add, mul, etc) and in-place (addi, muli) operations. The former: arr1 is not modified. In the latter: arr1 is modified

    • Adding: arr1.add(arr2)

    • Subtract: arr.sub(arr2)

    • Multiply: add1.mul(arr2)

    Reduction Operations (sum, etc); Note that these operations operate on the entire array. Call .doubleValue() to get a double out of the returned Number.

    • Sum of all elements: arr.sumNumber()

    • Product of all elements: arr.prod()

    • L1 and L2 norms: arr.norm1() and arr.norm2()

    Linear Algebra Operations:

    • Matrix multiplication: arr1.mmul(arr2)

    • Transpose a matrix: transpose()

    • Get the diagonal of a matrix: Nd4j.diag(INDArray)

    Getting Parts of a Larger NDArray: Note: all of these methods return

    • Getting a row (2d NDArrays only): getRow(int)

    • Getting multiple rows as a matrix (2d only): getRows(int...)

    • Setting a row (2d NDArrays only): putRow(int,INDArray)

    Element-Wise Transforms (Tanh, Sigmoid, Sin, Log etc):

    • Using : Transforms.sin(INDArray), Transforms.log(INDArray), Transforms.sigmoid(INDArray) etc

    • Directly (method 1): Nd4j.getExecutioner().execAndReturn(new Tanh(INDArray))

    hashtag

    Q: Does ND4J support sparse arrays?

    At present: no. Support for spase arrays is planned for the future.

    Q: Is it possible to dynamically grow or shrink the size on an INDArray? In the current version of ND4J, this is not possible. We may add this functionality in the future, however.

    There are two possible work-arounds:

    1. Allocate a new array and do a copy (for example, a .put() operation)

    2. Initially, pre-allocate a larger than required NDArray, and then operate on a view of that array. Then, as you need a larger array, get a larger view on the original pre-allocated array.

    Performance Issues

    How to Debug Performance Issues

    This page is a how-to guide for debugging performance issues encountered when training neural networks with Deeplearning4j. Much of the information also applies to debugging performance issues encountered when using ND4J.

    Deeplearning4j and ND4J provide excellent performance in most cases (utilizing optimized c++ code for all numerical operations as well as high performance libraries such as NVIDIA cuDNN and Intel MKL). However, sometimes bottlenecks or misconfiguration issues may limit performance to well below the maximum. This page is intended to be a guide to help users identify the cause of poor performance, and provide steps to fix these issues.

    Performance issues may include:

    1. Poor CPU/GPU utilization

    Stride

    • C order stride: [3,1]: the values in consecutive rows are separated in the buffer by 3, and the values consecutive columns are separated in the buffer by 1

    • F order stride: [1,3]: the values in consecutive rows are separated in the buffer by 1, and the values in consecutive columns are separated in the buffer by 3

    myArray.tensorssAlongDimension(1,0) = 1

    For input shape [a,b,c], tensorssAlongDimension(1,2) gives a tensors, and tensorAlongDimension(i,1,2) returns tensors of shape [b,c].
  • For input shape [a,b,c,d], tensorssAlongDimension(1,2) gives a*d tensors, and tensorAlongDimension(i,1,2) returns tensors of shape [b,c].

  • For input shape [a,b,c,d], tensorssAlongDimension(0,2,3) gives b tensors, and tensorAlongDimension(i,0,2,3) returns tensors of shape [a,c,d].

  • Broadcast

  • Create a row/column vector from a double[]: myRow = Nd4j.create(myDoubleArr), myCol = Nd4j.create(myDoubleArr,new int[]{10,1})

  • Create a 2d NDArray from a double[][]: Nd4j.create(double[][])

  • Stacking a set of arrays to make a larger array: Nd4j.hstack(INDArray...), Nd4j.vstack(INDArray...) for horizontal and vertical respectively

  • Uniform random NDArrays: Nd4j.rand(int,int), Nd4j.rand(int[]) etc

  • Normal(0,1) random NDArrays: Nd4j.randn(int,int), Nd4j.randn(int[])

  • Get the size of all dimensions, as an int[]: shape()

  • Determine the total number of elements in array: arr.length()

  • See also: isMatrix(), isVector(), isRowVector(), isColumnVector()

  • Reverse subtract (scalar - arr1): arr1.rsub(myDouble)

  • Reverse divide (scalar / arr1): arr1.rdiv(myDouble)

  • Divide: arr1.div(arr2)

  • Assignment (set each value in arr1 to those in arr2): arr1.assign(arr2)

  • Standard deviation of all elements: arr.stdNumber()

  • Matrix inverse: InvertMatrix.invert(INDArray,boolean)

    Getting the first 3 rows, all columns: Nd4j.create(0).get(NDArrayIndex.interval(0,3),NDArrayIndex.all());

    Directly (method 2) Nd4j.getExecutioner().execAndReturn(Nd4j.getOpFactory().createTransform("tanh",INDArray))
    Java interfacearrow-up-right
    Wikipediaarrow-up-right
    Views: When Two or More NDArrays Refer to the Same Data
    big O notationarrow-up-right
    Creating NDArrays
    Zero, One and Scalar-Value Initialized Arrays
    Random Arrays
    Creating NDArrays from Java arrays
    Creating NDArrays from Other NDArrays
    Getting and Setting Parts of NDArrays
    Miscellaneous NDArray Creation Methods
    identity matrixarrow-up-right
    Getting and Setting Individual Values
    Getting and Setting Parts of NDArrays
    getRow() and putRow()
    Sub-Arrays: get(), put() and NDArrayIndex
    Tensor Along Dimension
    view
    Slice
    Performing Operations on NDArrays
    Scalar Ops
    Scalar opsarrow-up-right
    Transform Ops
    Transforms classarrow-up-right
    Accumulation (Reduction) Ops
    Index Accumulation Ops
    Index accumulation opsarrow-up-right
    Broadcast and Vector Ops
    Boolean Indexing: Selectively Apply Operations Based on a Condition
    Link: Boolean Indexing Unit Testsarrow-up-right
    Workspaces
    Deeplearning4j Guide to Workspaces
    Workspaces Examplesarrow-up-right
    Workspaces: Scope Panic
    Advanced and Miscellaneous Topics
    Setting the data type
    Flattening
    nd4j-serdearrow-up-right
    Quick Reference: A Summary Overview of ND4J Methods
    Transformsarrow-up-right
    FAQ: Frequently Asked Questions

    Slower than expected training or operation execution

    To start, here’s a summary of some possible causes of performance issues:

    1. Wrong ND4J backend is used (for example, CPU backend when GPU backend is expected)

    2. Not using cuDNN when using CUDA GPUs

    3. ETL (data loading) bottlenecks

    4. Garbage collection overheads

    5. Small batch sizes

    6. Multi-threaded use of MultiLayerNetwork/ComputationGraph for inference (not thread safe)

    7. Double precision floating point data type used when single precision should be used

    8. Not using workspaces for memory management (enabled by default)

    9. Poorly configured network

    10. Layer or operation is CPU-only

    11. CPU: Lack of hardware support for modern AVX etc extensions

    12. Other processes using CPU or GPU resources

    13. CPU: Lack of configuration of OMP_NUM_THREADS when using many models/threads simultaneously

    Finally, this page has a short section on Debugging Performance Issues with JVM Profiling

    hashtag
    Step 1: Check if correct backend is used

    ND4J (and by extension, Deeplearning4j) can perform computation on either the CPU or GPU. The device used for computation is determined by your project dependencies - you include nd4j-native-platform to use CPUs for computation or nd4j-cuda-x.x-platform to use GPUs for computation (where x.x is your CUDA version - such as 9.2, 10.0 etc).

    It is straightforward to check which backend is used. ND4J will log the backend upon initialization.

    For CPU execution, you will expect output that looks something like:

    For CUDA execution, you would expect the output to look something like:

    Pay attention to the Loaded [X] backend and Backend used: [X] messages to confirm that the correct backend is used. If the incorrect backend is being used, check your program dependencies to ensure tho correct backend has been included.

    hashtag
    Step 2: Check for cuDNN

    If you are using CPUs only (nd4j-native backend) then you can skip to step 3 as cuDNN only applies when using NVIDIA GPUs (nd4j-cuda-x.x-platform dependency).

    cuDNN is NVIDIA’s library for accelerating neural network training on NVIDIA GPUs. Deeplearning4j can make use of cuDNN to accelerate a number of layers - including ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM. When training on GPUs, cuDNN should always be used if possible as it is usually much faster than the built-in layer implementations.

    Instructions for configuring CuDNN can be found here. In summary, include the deeplearning4j-cuda-x.x dependency (where x.x is your CUDA version - such as 9.2 or 10.0). The network configuration does not need to change to utilize cuDNN - cuDNN simply needs to be available along with the deeplearning4j-cuda module.

    How to determine if CuDNN is used or

    Not all DL4J layer types are supported in cuDNN. DL4J layers with cuDNN support include ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM.

    To check if cuDNN is being used, the simplest approach is to look at the log output when running inference or training: If cuDNN is NOT available when you are using a layer that supports it, you will see a message such as:

    If cuDNN is available and was loaded successfully, no message will be logged.

    Alternatively, you can confirm that cuDNN is used by using the following code:

    Note that you will need to do at least one forward pass or fit call to initialize the cuDNN layer helper.

    If cuDNN is available and was loaded successfully, you will see the following printed:

    whereas if cuDNN is not available or could not be loaded successfully (you will get a warning or error logged also):

    hashtag
    Step 3: Check for ETL (Data Loading) Bottlenecks

    Neural network training requires data to be in memory before training can proceed. If the data is not loaded fast enough, the network will have to wait until data is available. DL4J uses asynchronous prefetch of data to improve performance by default. Under normal circumstances, this asynchronous prefetching means the network should never be waiting around for data (except on the very first iteration) - the next minibatch is loaded in another thread while training is proceeding in the main thread.

    However, when data loading takes longer than the iteration time, data can be a bottleneck. For example, if a network takes 100ms to perform fitting on a single minibatch, but data loading takes 200ms, then we have a bottleneck: the network will have to wait 100ms per iteration (200ms loading - 100ms loading in parallel with training) before continuing the next iteration. Conversely, if network fit operation was 100ms and data loading was 50ms, then no data loading bottleck will occur, as the 50ms loading time can be completed asynchronously within one iteration.

    How to check for ETL / data loading bottlenecks

    The way to identify ETL bottlenecks is simple: add PerformanceListener to your network, and train as normal. For example:

    When training, you will see output such as:

    The above output shows that there is no ETL bottleneck (i.e., ETL: 0 ms). However, if ETL time is greater than 0 consistently (after the first iteration), an ETL bottleneck is present.

    How to identify the cause of an ETL bottleneck

    There are a number of possible causes of ETL bottlenecks. These include (but are not limited to):

    • Slow hard drives

    • Network latency or throughput issues (when reading from remote or network storage)

    • Computationally intensive or inefficient ETL (especially for custom ETL pipelines)

    One useful way to get more information is to perform profiling, as described in the profiling section later in this page. For custom ETL pipelines, adding logging for the various stages can help. Finally, another approach to use a process of elimination - for example, measuring the latency and throughput of reading raw files from disk or from remote storage vs. measuring the time to actually process the data from its raw format.

    hashtag
    Step 4: Check for Garbage Collection Overhead

    Java uses garbage collection for management of on-heap memory (see this linkarrow-up-right for example for an explanation). Note that DL4J and ND4J use off-heap memory for storage of all INDArrays (see the memory page for details).

    Even though DL4J/ND4J array memory is off-heap, garbage collection can still cause performance issues.

    In summary:

    • Garbage collection will sometimes (temporarily and briefly) pause/stop application execution (“stop the world”)

    • These GC pauses slow down program execution

    • The overall performance impact of GC pauses depends on both the frequency of GC pauses, and the duration of GC pauses

    • The frequency is controllable (in part) by ND4J, using Nd4j.getMemoryManager().setAutoGcWindow(10000); and Nd4j.getMemoryManager().togglePeriodicGc(false);

    • Not every GC event is caused by or controlled by the above ND4J configuration.

    In our experience, garbage collection time depends strongly on the number of objects in the JVM heap memory. As a rough guide:

    • Less than 100,000 objects in heap memory: short GC events (usually not a performance problem)

    • 100,000-500,000 objects: GC overhead becomes noticeable, often in the 50-250ms range per full GC event

    • 500,000 or more objects: GC can be a bottleneck if performed frequently. Performance may still be good if GC events are infrequent (for example, every 10 seconds or less).

    • 10 million or more objects: GC is a major bottleneck even if infrequently called, with each full GC takes multiple seconds

    How to configure ND4J garbage collection settings

    In simple terms, there are two settings of note:

    If you suspect garbage collection overhead is having an impact on performance, try changing these settings. The main downside to reducing the frequency or disabling periodic GC entirely is when you are not using workspaces, though workspaces are enabled by default for all neural networks in Deeplearning4j.

    Side note: if you are using DL4J for training on Spark, setting these values on the master/driver will not impact the settings on the worker. Instead, see this guide.

    How to determine GC impact using PerformanceListener

    NOTE: this feature was added after 1.0.0-beta3 and will be available in future releases To determine the impact of garbage collection using PerformanceListener, you can use the following:

    This will report GC activity:

    The garbage collection activity is reported for all available garbage collectors - the GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)] means that garbage collection was performed 2 times since the last PerformanceListener reporting, and took 1ms and 24ms total respectively for the two GC algorithms, respectively.

    Keep in mind: PerformanceListener reports GC events every N iterations (as configured by the user). Thus, if PerformanceListener is configured to report statistics every 10 iterations, the garbage collection stats would be for the period of time corresponding to the last 10 iterations.

    How to determine GC impact using -verbose:gc

    Another useful tool is the -verbose:gc, -XX:+PrintGCDetails -XX:+PrintGCTimeStamps command line options. For more details, see Oracle Command Line Optionsarrow-up-right and Oracle GC Portal Documentationarrow-up-right

    These options can be passed to the JVM on launch (when using java -jar or java -cp) or can be added to IDE launch options (for example, in IntelliJ: these should be placed in the “VM Options” field in Run/Debug Configurations - see Setting Configuration Optionsarrow-up-right)

    When these options are enabled, you will have information reported on each GC event, such as:

    This information can be used to determine the frequency, cause (System.gc() calls, allocation failure, etc) and duration of GC events.

    How to determine GC impact using a profiler

    An alternative approach is to use a profiler to collect garbage collection information.

    For example, YourKit Java Profilerarrow-up-right can be used to determine both the frequency and duration of garbage collection - see Garbage collection telemetryarrow-up-right for more details.

    Other toolsarrow-up-right, such as VisualVM can also be used to monitor GC activity.

    How to determine number (and type) of JVM heap objects using memory dumps

    If you determine that garbage collection is a problem, and suspect that this is due to the number of objects in memory, you can perform a heap dump.

    To perform a heap dump:

    • Step 1: Run your program

    • Step 2: While running, determine the process ID

      • One approach is to use jps:

        • For basic details, run jps on the command line. If jps is not on the system PATH, it can be found (on Windows) at C:\Program Files\Java\jdk<VERSION>\bin\jps.exe

        • For more details on each process, run jps -lv instead

      • Alternatively, you can use the top command on Linux or Task Manager (Windows) to find the PID (on Windows, the PID column may not be enabled by default)

    • Step 3: Create a heap dump using jmap -dump:format=b,file=file_name.hprof 123 where 123 is the process id (PID) to create the heap dump for

    A number of alternatives for generating heap dumps can be found herearrow-up-right.

    After a memory dump has been collected, it can be opened in tools such as YourKit profiler and VisualVM to determine the number, type and size of objects. With this information, you should be able to pinpoint the cause of the large number of objects and make changes to your code to reduce or eliminate the objects that are causing the garbage collection overhead.

    hashtag
    Step 5: Check Minibatch Size

    Another common cause of performance issues is a poorly chosen minibatch size. A minibatch is a number of examples used together for one step of inference and training. Minibatch sizes of 32 to 128 are commonly used, though smaller or larger are sometimes used.

    In summary:

    • If minibatch size is too small (for example, training or inference with 1 example at a time), poor hardware utilization and lower overall throughput is expected

    • If minibatch size is too large

      • Hardware utilization will usually be good

      • Iteration times will slow down

      • Memory utilization may be too high (leading to out-of-memory errors)

    For inference, avoid using minibatch size of 1, as throughput will suffer. Unless there are strict latency requirements, you should use larger minibatch sizes as this will give you the best hardware utilization and hence throughput, and is especially important for GPUs.

    For training, you should never use a minibatch size of 1 as overall performance and hardware utilization will be reduced. Network convergence may also suffer. Start with a minibatch size of 32-128, if memory will allow this to be used.

    For serving predictions in multi-threaded applications (such as a web server), ParallelInferencearrow-up-right should be used.

    hashtag
    Step 6: Ensure you are not using a single MultiLayerNetwork/ComputationGraph for inference from multiple threads

    MultiLayerNetwork and ComputationGraph are not considered thread-safe, and should not be used from multiple threads. That said, most operations such as fit, output, etc use synchronized blocks. These synchronized methods should avoid hard to understand exceptions (race conditions due to concurrent use), they will limit throughput to a single thread (though, note that native operation parallelism will still be parallelized as normal). In summary, using the one network from multiple threads should be avoided as it is not thread safe and can be a performance bottleneck.

    For inference from multiple threads, you should use one model per thread (as this avoids locks) or for serving predictions in multi-threaded applications (such as a web server), use ParallelInferencearrow-up-right.

    hashtag
    Step 7: Check Data Types

    As of 1.0.0-beta3 and earlier, ND4J has a global datatype setting that determines the datatype of all arrays. The default value is 32-bit floating point. The data type can be set using Nd4j.setDataType(DataBuffer.Type.FLOAT); for example.

    For best performance, this value should be left as its default. If 64-bit floating point precision (double precision) is used instead, performance can be significantly reduced, especially on GPUs - most consumer NVIDIA GPUs have very poor double precision performance (and half precision/FP16). On Tesla series cards, double precision performance is usually much better than for consumer (GeForce) cards, though is still usually half or less of the single precision performance. Wikipedia has a summary of the single and double precision performance of NVIDIA GPUs herearrow-up-right.

    Performance on CPUs can also be reduced for double precision due to the additional memory batchwidth requirements vs. float precision.

    You can check the data type setting using:

    hashtag
    Step 8: Check workspace configuration for memory management (enabled by default)

    For details on workspaces, see the workspaces page.

    In summary, workspaces are enabled by default for all Deeplearning4j networks, and enabling them improves performance and reduces memory requirements. There are very few reasons to disable workspaces.

    You can check that workspaces are enabled for your MultiLayerNetwork using:

    or for a ComputationGraph using:

    You want to see the output as ENABLED output for both training and inference. To change the workspace configuration, use the setter methods, for example: net.getLayerWiseConfigurations().setTrainingWorkspaceMode(WorkspaceMode.ENABLED);

    hashtag
    Step 9: Check for a badly configured network or network with layer bottlenecks

    Another possible cause (especially for newer users) is a poorly designed network. A network may be poorly designed if:

    • It has too many layers. A rough guideline:

      • More than about 100 layers for a CNN may be too many

      • More than about 10 layers for a RNN/LSTM network may be too many

      • More than about 20 feed-forward layers may be too many for a MLP

    • The input/activations are too large

      • For CNNs, inputs in the range of 224x224 (for image classification) to 600x600 (for object detection and segmentation) are used. Large image sizes (such as 500x500) are computationally demanding, and much larger than this should be considered too large in most cases.

      • For RNNs, the sequence length matters. If you are using sequences longer than a few hundred steps, you should use if possible.

    • The output number of classes is too large

      • Classification with more than about 10,000 classes can become a performance bottleneck with standard softmax output layers

    • The layers are too large

      • For CNNs, most layers have kernel sizes in the range 2x2 to 7x7, with channels equal to 32 to 1024 (with larger number of channels appearing later in the network). Much larger than this may cause a performance bottleneck.

      • For MLPs, most layers have at most 2048 units/neurons (often much smaller). Much larger than this may be too large.

    • The network has too many parameters

      • This is usually a consequence of the other issues already mentioned - too many layers, too large input, too many output classes

      • For comparison, less than 1 million parameters would be considered small, and more than about 100 million parameters would be considered very large.

    Note that these are guidelines only, and some reasonable network may exceed the numbers specified here. Some networks can become very large, such as those commonly used for imagenet classification or object detection. However, in these cases, the network is usually carefully designed to provide a good tradeoff between accuracy and computation time.

    If your network architecture is significantly outside of the guidelines specified here, you may want to reconsider the design to improve performance.

    hashtag
    Step 10: Check for CPU-only ops (when using GPUs)

    If you are using CPUs only (nd4j-native backend), you can skip this step, as it only applies when using the GPU (nd4j-cuda) backend.

    As of 1.0.0-beta3, a handful of recently added operations do not yet have GPU implementations. Thus, when these layer are used in a network, they will execute on CPU only, irrespective of the nd4j-backend used. GPU support for these layers will be added in an upcoming release.

    The layers without GPU support as of 1.0.0-beta3 include:

    • Convolution3D

    • Upsampling1D/2D/3D

    • Deconvolution2D

    • LocallyConnected1D/2D

    • SpaceToBatch

    • SpaceToDepth

    Unfortunately, there is no workaround or fix for now, until these operations have GPU implementations completed.

    hashtag
    Step 11: Check CPU support for hardware extensions (AVX etc)

    If you are running on a GPU, this section does not apply.

    When running on older CPUs or those that lack modern AVX extensions such as AVX2 and AVX512, performance will be reduced compared to running on CPUs with these features. Though there is not much you can do about the lack of such features, it is worth knowing about if you are comparing performance between different CPU models.

    In summary, CPU models with AVX2 support will perform better than those without it; similarly, AVX512 is an improvement over AVX2.

    For more details on AVX, see the Wikipedia AVX articlearrow-up-right

    hashtag
    Step 12: Check other processes using CPU or GPU resources

    Another obvious cause of performance issues is other processes using CPU or GPU resources.

    For CPU, it is straightforward to see if other processes are using resources using tools such as top (for Linux) or task managed (for Windows).

    For NVIDIA CUDA GPUs, nvidia-smi can be used. nvidia-smi is usually installed with the NVIDIA display drivers, and (when run) shows the overall GPU and memory utilization, as well as the GPU utilization of programs running on the system.

    On Linux, this is usually on the system path by default. On Windows, it may be found at C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi

    hashtag
    Step 13: Check OMP_NUM_THREADS performing concurrent inference using CPU in multiple threads simultaneously

    If you are using GPUs (nd4j-cuda backend), you can skip this section.

    One issue to be aware of when running multiple DL4J networks (or ND4J operations generally) concurrently in multiple threads is the OpenMP number of threads setting. In summary, in ND4J we use OpenMP pallelism at the c++ level to increase operation performance. By default, ND4J will use a value equal to the number of physical CPU cores (not logical cores) as this will give optimal performance

    This also applies if the CPU resources are shared with other computationally demanding processes.

    In either case, you may see better overall throughput by reducing the number of OpenMP threads by setting the OMP_NUM_THREADS environment variable - see ND4JEnvironmentVarsarrow-up-right for details.

    One reason for reducing OMP_NUM_THREADS improving overall performance is due to reduced cache thrashingarrow-up-right.

    hashtag
    Debugging Performance Issues with JVM Profiling

    Profiling is a process whereby you can trace how long each method in your code takes to execute, to identify and debug performance bottlenecks.

    A full guide to profiling is beyond the scope of this page, but the summary is that you can trace how long each method takes to execute (and where it is being called from) using a profiling tool. This information can then be used to identify bottlenecks (and their causes) in your program.

    hashtag
    How to Perform Profiling

    Multiple options are available for performing profiling locally. We suggest using either YourKit Java Profilerarrow-up-right or VisualVMarrow-up-right for profiling.

    The YourKit profiling documentation is quite good. To perform profiling with YourKit:

    • Install and start YourKit Profiler

    • Start your application with the profiler enabled. For details, see Running applications with the profilerarrow-up-right and Local profilingarrow-up-right

      • Note that IDE integrations are available - see IDE integrationarrow-up-right

    • Collect a snapshot and analyze

    Note that YourKit provides multiple different types of profiling: Sampling, tracing, and call counting. Each type of profiling has different pros and cons, such as accuracy vs. overhead. For more details, see Sampling, tracing, call countingarrow-up-right

    VisualVM also supports profiling - see the Profiling Applications section of the VisualVM documentationarrow-up-right for more details.

    hashtag
    Profiling on Spark

    When debugging performance issues for Spark training or inference jobs, it can often be useful to perform profiling here also.

    One approach that we have used internally is to combine manual profiling settings (-agentpath JVM argument) with spark-submit arguments for YourKit profiler.

    To perform profiling in this manner, 5 steps are required:

    1. Download YourKit profiler to a location on each worker (must be the same location on each worker) and (optionally) the driver

    2. [Optional] Copy the profiling configuration onto each worker (must be the same location on each worker)

    3. Create a local output directory for storing the profiling result files on each worker

    4. Launch the Spark job with the appropriate configuration (see example below)

    5. The snapshots will be saved when the Spark job completes (or is cancelled) to the specified directories.

    For example, to perform tracing on both the driver and the workers,

    The configuration (tracing_settings_path) is optional. A sample tracing settings file is provided below:

    CNN

    hashtag
    Operation classes

    hashtag
    avgPooling2d

    2D Convolution layer operation - average pooling 2d

    • input (NUMERIC) - the input to average pooling 2d operation - 4d CNN (image) activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • Pooling2DConfig - see

    hashtag
    avgPooling3d

    3D convolution layer operation - average pooling 3d

    • input (NUMERIC) - the input to average pooling 3d operation - 5d activations in NCDHW format (shape [minibatch, channels, depth, height, width]) or NDHWC format (shape [minibatch, depth, height, width, channels])

    • Pooling3DConfig - see

    hashtag
    batchToSpace

    Convolution 2d layer batch to space operation on 4d input.

    Reduces input batch dimension by rearranging data into a larger spatial dimensions

    • x (NUMERIC) - Input variable. 4d input

    • blocks - Block size, in the height/width dimension (Size: Exactly(count=2))

    • croppingTop - (Size: Exactly(count=2))

    hashtag
    col2Im

    col2im operation for use in 2D convolution operations. Outputs a 4d array with shape

    [minibatch, inputChannels, height, width]

    • in (NUMERIC) - Input - rank 6 input with shape [minibatch, inputChannels, kernelHeight, kernelWidth, outputHeight, outputWidth]

    • Conv2DConfig - see

    hashtag
    conv1d

    Conv1d operation.

    • input (NUMERIC) - the inputs to conv1d

    • weights (NUMERIC) - weights for conv1d op - rank 3 array with shape [kernelSize, inputChannels, outputChannels]

    • bias (NUMERIC) - bias for conv1d op - rank 1 array with shape [outputChannels]. May be null.

    hashtag
    conv2d

    2D Convolution operation with optional bias

    • layerInput (NUMERIC) - the input to max pooling 2d operation - 4d CNN (image) activations in NCHW format

    • weights (NUMERIC) - Weights for the convolution operation. 4 dimensions with format [kernelHeight, kernelWidth, inputChannels, outputChannels]

    • bias (NUMERIC) - Optional 1D bias array with shape [outputChannels]. May be null.

    hashtag
    conv3d

    Convolution 3D operation with optional bias

    • input (NUMERIC) - the input to average pooling 3d operation - 5d activations in NCDHW format (shape [minibatch, channels, depth, height, width]) or NDHWC format (shape [minibatch, depth, height, width, channels])

    • weights (NUMERIC) - Weights for conv3d. Rank 5 with shape [kernelDepth, kernelHeight, kernelWidth, inputChannels, outputChannels].

    • bias (NUMERIC) - Optional 1D bias array with shape [outputChannels]. May be null.

    hashtag
    deconv2d

    2D deconvolution operation with optional bias

    • layerInput (NUMERIC) - the input to deconvolution 2d operation - 4d CNN (image) activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • weights (NUMERIC) - Weights for the 2d deconvolution operation. 4 dimensions with format [inputChannels, outputChannels, kernelHeight, kernelWidth]

    • bias (NUMERIC) - Optional 1D bias array with shape [outputChannels]. May be null.

    hashtag
    deconv3d

    3D CNN deconvolution operation with or without optional bias

    • input (NUMERIC) - Input array - shape [bS, iD, iH, iW, iC] (NDHWC) or [bS, iC, iD, iH, iW] (NCDHW)

    • weights (NUMERIC) - Weights array - shape [kD, kH, kW, oC, iC]

    • bias (NUMERIC) - Bias array - optional, may be null. If non-null, must have shape [outputChannels]

    hashtag
    depthToSpace

    Convolution 2d layer batch to space operation on 4d input. Reduces input channels dimension by rearranging data into a larger spatial dimensions Example: if input has shape [mb, 8, 2, 2] and block size is 2, then output size is [mb, 8/(22), 22, 2*2]

    = [mb, 2, 4, 4]

    • x (NUMERIC) - the input to depth to space pooling 2d operation - 4d activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • blockSize - Block size, in the height/width dimension

    • dataFormat - Data format: "NCHW" or "NHWC"

    hashtag
    depthWiseConv2d

    Depth-wise 2D convolution operation with optional bias

    • layerInput (NUMERIC) - the input to max pooling 2d operation - 4d CNN (image) activations in NCHW format

    • depthWeights (NUMERIC) - Depth-wise conv2d weights. 4 dimensions with format [kernelHeight, kernelWidth, inputChannels, depthMultiplier]

    • bias (NUMERIC) - Optional 1D bias array with shape [outputChannels]. May be null.

    hashtag
    dilation2D

    TODO doc string

    • df (NUMERIC) -

    • weights (NUMERIC) - df

    • strides - weights (Size: Exactly(count=2))

    hashtag
    extractImagePatches

    Extract image patches

    • input (NUMERIC) - Input array. Must be rank 4, with shape [minibatch, height, width, channels]

    • kH - Kernel height

    • kW - Kernel width

    hashtag
    im2Col

    im2col operation for use in 2D convolution operations. Outputs a 6d array with shape

    [minibatch, inputChannels, kernelHeight, kernelWidth, outputHeight, outputWidth]

    • in (NUMERIC) - Input - rank 4 input with shape [minibatch, inputChannels, height, width]

    • Conv2DConfig - see

    hashtag
    localResponseNormalization

    2D convolution layer operation - local response normalization

    • input (NUMERIC) - the inputs to lrn

    • LocalResponseNormalizationConfig - see

    hashtag
    maxPoolWithArgmax

    2D Convolution layer operation - Max pooling on the input and outputs both max values and indices

    • input (NUMERIC) - the input to max pooling 2d operation - 4d CNN (image) activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • Pooling2DConfig - see

    hashtag
    maxPooling2d

    2D Convolution layer operation - max pooling 2d

    • input (NUMERIC) - the input to max pooling 2d operation - 4d CNN (image) activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • Pooling2DConfig - see

    hashtag
    maxPooling3d

    3D convolution layer operation - max pooling 3d operation.

    • input (NUMERIC) - the input to average pooling 3d operation - 5d activations in NCDHW format (shape [minibatch, channels, depth, height, width]) or NDHWC format (shape [minibatch, depth, height, width, channels])

    • Pooling3DConfig - see

    hashtag
    separableConv2d

    Separable 2D convolution operation with optional bias

    • layerInput (NUMERIC) - the input to max pooling 2d operation - 4d CNN (image) activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • depthWeights (NUMERIC) - Separable conv2d depth weights. 4 dimensions with format [kernelHeight, kernelWidth, inputChannels, depthMultiplier]

    • pointWeights (NUMERIC) - Point weights, rank 4 with format [1, 1, inputChannels*depthMultiplier, outputChannels]. May be null

    hashtag
    spaceToBatch

    Convolution 2d layer space to batch operation on 4d input.

    Increases input batch dimension by rearranging data from spatial dimensions into batch dimension

    • x (NUMERIC) - Input variable. 4d input

    • blocks - Block size, in the height/width dimension (Size: Exactly(count=2))

    • paddingTop - Optional 2d int[] array for padding the result: values [[pad top, pad bottom], [pad left, pad right]] (Size: Exactly(count=2))

    hashtag
    spaceToDepth

    Convolution 2d layer space to depth operation on 4d input. Increases input channels (reduced spatial dimensions) by rearranging data into a larger channels dimension Example: if input has shape [mb, 2, 4, 4] and block size is 2, then output size is [mb, 8/(22), 22, 2*2]

    = [mb, 2, 4, 4]

    • x (NUMERIC) - the input to depth to space pooling 2d operation - 4d activations in NCHW format (shape [minibatch, channels, height, width]) or NHWC format (shape [minibatch, height, width, channels])

    • blockSize - Block size, in the height/width dimension

    • dataFormat - Data format: "NCHW" or "NHWC"

    hashtag
    upsampling2d

    Upsampling layer for 2D inputs.

    scale is used for both height and width dimensions.

    • input (NUMERIC) - Input in NCHW format

    • scale - The scale for both height and width dimensions.

    hashtag
    upsampling2d

    2D Convolution layer operation - Upsampling 2d

    • input (NUMERIC) - Input in NCHW format

    • scaleH - Scale to upsample in height dimension

    • scaleW - Scale to upsample in width dimension

    hashtag
    upsampling3d

    3D Convolution layer operation - Upsampling 3d

    • input (NUMERIC) - Input in NCHW format

    • ncdhw - If true: input is in NCDHW (minibatch, channels, depth, height, width) format. False: NDHWC format

    • scaleD - Scale to upsample in depth dimension

    hashtag
    Configuration Classes

    hashtag
    Conv1DConfig

    • k (LONG) - Kernel - default = -1

    • s (LONG) - stride - default = 1

    • p (LONG) - padding - default = 0

    Used in these ops:

    hashtag
    Conv2DConfig

    • kH (LONG) - Kernel height - default = -1

    • kW (LONG) - Kernel width - default = -1

    • sH (LONG) - Stride along height dimension - default = 1

    Used in these ops:

    hashtag
    Conv3DConfig

    • kD (LONG) - Kernel depth - default = -1

    • kW (LONG) - Kernel width - default = -1

    • kH (LONG) - Kernel height - default = -1

    Used in these ops:

    hashtag
    DeConv2DConfig

    • kH (LONG) - Kernel height - default = -1

    • kW (LONG) - Kernel width - default = -1

    • sH (LONG) - Stride along height dimension - default = 1

    Used in these ops:

    hashtag
    DeConv3DConfig

    • kD (LONG) - Kernel depth - default = -1

    • kW (LONG) - Kernel width - default = -1

    • kH (LONG) - Kernel height - default = -1

    Used in these ops:

    hashtag
    Pooling2DConfig

    • kH (LONG) - Kernel height - default = -1

    • kW (LONG) - Kernel width - default = -1

    • sH (LONG) - Stride along height dimension - default = 1

    Used in these ops:

    hashtag
    Pooling3DConfig

    • kD (LONG) - Kernel depth - default = -1

    • kW (LONG) - Kernel width - default = -1

    • kH (LONG) - Kernel height - default = -1

    Used in these ops:

    hashtag
    LocalResponseNormalizationConfig

    • alpha (NUMERIC) - alpha - default = 1

    • beta (NUMERIC) - beta - default = 0.5

    • bias (NUMERIC) - bias - default = 1

    Used in these ops:

    double[] flat = ArrayUtil.flattenDoubleArray(myDoubleArray);
    int[] shape = ...;    //Array shape here
    INDArray myArr = Nd4j.create(flat,shape,'c');
    int nRows = 2;
    int nColumns = 2;
    // Create INDArray of zeros
    INDArray zeros = Nd4j.zeros(nRows, nColumns);
    // Create one of all ones
    INDArray ones = Nd4j.ones(nRows, nColumns);
    //hstack
    INDArray hstack = Nd4j.hstack(ones,zeros);
    System.out.println("### HSTACK ####");
    System.out.println(hstack);
    ### HSTACK ####
    [[1.00, 1.00, 0.00, 0.00],
    [1.00, 1.00, 0.00, 0.00]]
    int nRows = 2;
    int nColumns = 2;
    // Create INDArray of zeros
    INDArray zeros = Nd4j.zeros(nRows, nColumns);
    // Create one of all ones
    INDArray ones = Nd4j.ones(nRows, nColumns);
    //vstack
    INDArray vstack = Nd4j.vstack(ones,zeros);
    System.out.println("### VSTACK ####");
    System.out.println(vstack);
    ### VSTACK ####
    [[1.00, 1.00],
     [1.00, 1.00],
     [0.00, 0.00],
     [0.00, 0.00]]
    int nRows = 2;
    int nColumns = 2;
    //INDArray of zeros
    INDArray zeros = Nd4j.zeros(nRows, nColumns);
    // Create one of all ones
    INDArray ones = Nd4j.ones(nRows, nColumns);
    // Concat on dimension 0
    INDArray combined = Nd4j.concat(0,zeros,ones);
    System.out.println("### COMBINED dimension 0####");
    System.out.println(combined);
    //Concat on dimension 1
    INDArray combined2 = Nd4j.concat(1,zeros,ones);
    System.out.println("### COMBINED dimension 1 ####");
    System.out.println(combined2);
    ### COMBINED dimension 0####
    [[0.00, 0.00],
     [0.00, 0.00],
     [1.00, 1.00],
     [1.00, 1.00]]
    ### COMBINED dimension 1 ####
    [[0.00, 0.00, 1.00, 1.00],
     [0.00, 0.00, 1.00, 1.00]]
    int nRows = 2;
    int nColumns = 2;
    // Create INDArray of all ones
    INDArray ones = Nd4j.ones(nRows, nColumns);
    // pad the INDArray
    INDArray padded = Nd4j.pad(ones, new int[]{1,1}, Nd4j.PadMode.CONSTANT );
    System.out.println("### Padded ####");
    System.out.println(padded);
    ### Padded ####
    [[0.00, 0.00, 0.00, 0.00],
     [0.00, 1.00, 1.00, 0.00],
     [0.00, 1.00, 1.00, 0.00],
     [0.00, 0.00, 0.00, 0.00]]
    NdIndexIterator iter = new NdIndexIterator(nRows, nCols);
    while (iter.hasNext()) {
        int[] nextIndex = iter.next();
        double nextVal = myArray.getDouble(nextIndex);
        //do something with the value
    }
    org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
    For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic
    org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses outdated workspace pointer from workspace [LOOP_EXTERNAL]
    For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic
    .trainingWorkspaceMode(WorkspaceMode.NONE)
    .inferenceWorkspaceMode(WorkspaceMode.NONE)
    Nd4j.setDataType(DataBuffer.Type.DOUBLE);
    -Ddtype=double
    Nd4j.toFlattened(char order, INDArray... arrays)
    Nd4j.toFlattened(char order, Collection<INDArray>)
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.nd4j.linalg.factory.Nd4j;
    import org.nd4j.serde.binary.BinarySerde;
    
    import java.io.*;
    import java.nio.ByteBuffer;
    
    INDArray arrWrite = Nd4j.linspace(1,10,10);
    INDArray arrRead;
    
    //1. Binary format
    //   Close the streams manually or use try with resources.
    try (DataOutputStream sWrite = new DataOutputStream(new FileOutputStream(new File("tmp.bin")))) {
        Nd4j.write(arrWrite, sWrite);
        }
    
    try (DataInputStream sRead = new DataInputStream(new FileInputStream(new File("tmp.bin")))) {
        arrRead = Nd4j.read(sRead);
        }
    
    //2. Binary format using java.nio.ByteBuffer;
    ByteBuffer buffer = BinarySerde.toByteBuffer(arrWrite);
    arrRead = BinarySerde.toArray(buffer);
    
    //3. Text format
    Nd4j.writeTxt(arrWrite, "tmp.txt");
    arrRead = Nd4j.readTxt("tmp.txt");
    
    // To read csv format:
    // The writeNumpy method has been deprecated.
    arrRead =Nd4j.readNumpy("tmp.csv", ", ");
    o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
    o.n.n.NativeOpsHolder - Number of threads used for NativeOps: 8
    o.n.n.Nd4jBlas - Number of threads used for BLAS: 8
    o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 10]
    o.n.l.a.o.e.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
    o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]
    13:08:09,042 INFO  ~ Loaded [JCublasBackend] backend
    13:08:13,061 INFO  ~ Number of threads used for NativeOps: 32
    13:08:14,265 INFO  ~ Number of threads used for BLAS: 0
    13:08:14,274 INFO  ~ Backend used: [CUDA]; OS: [Windows 10]
    13:08:14,274 INFO  ~ Cores: [16]; Memory: [7.1GB];
    13:08:14,274 INFO  ~ Blas vendor: [CUBLAS]
    13:08:14,274 INFO  ~ Device Name: [TITAN X (Pascal)]; CC: [6.1]; Total/free memory: [12884901888]
    o.d.n.l.c.ConvolutionLayer - cuDNN not found: use cuDNN for better GPU performance by including the deeplearning4j-cuda module. For more information, please refer to: https://deeplearning4j.org/cudnn
    java.lang.ClassNotFoundException: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at java.lang.Class.forName0(Native Method)
    MultiLayerNetwork net = ...
    LayerHelper h = net.getLayer(0).getHelper();    //Index 0: assume layer 0 is a ConvolutionLayer in this example
    System.out.println("Layer helper: " + (h == null ? null : h.getClass().getName()));
    Layer helper: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
    Layer helper: null
    MultiLayerNetwork net = ...
    net.setListeners(new PerformanceListener(1));       //Logs ETL and iteration speed on each iteration
    .d.o.l.PerformanceListener - ETL: 0 ms; iteration 16; iteration time: 65 ms; samples/sec: 492.308; batches/sec: 15.384; 
    Nd4j.getMemoryManager().setAutoGcWindow(10000);             //Set to 10 seconds (10000ms) between System.gc() calls
    Nd4j.getMemoryManager().togglePeriodicGc(false);            //Disable periodic GC calls
    int listenerFrequency = 1;
    boolean reportScore = true;
    boolean reportGC = true;
    net.setListeners(new PerformanceListener(listenerFrequency, reportScore, reportGC));
    o.d.o.l.PerformanceListener - ETL: 0 ms; iteration 30; iteration time: 17 ms; samples/sec: 588.235; batches/sec: 58.824; score: 0.7229335801186025; GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)];
    5.938: [GC (System.gc()) [PSYoungGen: 5578K->96K(153088K)] 9499K->4016K(502784K), 0.0006252 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    5.939: [Full GC (System.gc()) [PSYoungGen: 96K->0K(153088K)] [ParOldGen: 3920K->3911K(349696K)] 4016K->3911K(502784K), [Metaspace: 22598K->22598K(1069056K)], 0.0117132 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
    System.out.println("ND4J Data Type Setting: " + Nd4j.dataType());
    System.out.println("Training workspace config: " + net.getLayerWiseConfigurations().getTrainingWorkspaceMode());
    System.out.println("Inference workspace config: " + net.getLayerWiseConfigurations().getInferenceWorkspaceMode());
    System.out.println("Training workspace config: " + cg.getConfiguration().getTrainingWorkspaceMode());
    System.out.println("Inference workspace config: " + cg.getConfiguration().getInferenceWorkspaceMode());
    spark-submit
        --conf 'spark.executor.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/executor/,tracing_settings_path=/home/user/yourkitconf.txt'
        --conf 'spark.driver.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/driver/,tracing_settings_path=/home/user/yourkitconf.txt'
        <other spark submit arguments>
    walltime=*
    adaptive=true
    adaptive_min_method_invocation_count=1000
    adaptive_max_average_method_time_ns=100000
    INDArray avgPooling2d(INDArray input, Pooling2DConfig pooling2DConfig)
    
    SDVariable avgPooling2d(SDVariable input, Pooling2DConfig pooling2DConfig)
    SDVariable avgPooling2d(String name, SDVariable input, Pooling2DConfig pooling2DConfig)

    For RNNs such as LSTMs, layers are typically in the range of 128 to 512, though the largest RNNs may use around 1024 units per layer.

    You can check the number of parameters using MultiLayerNetwork/ComputationGraph.numParams() or MultiLayerNetwork/ComputationGraph.summary()

    truncated backpropgation through time

    croppingBottom - (Size: Exactly(count=2))

    Conv1DConfig - see Conv1DConfig

    Conv2DConfig - see Conv2DConfig

    Conv3DConfig - see Conv3DConfig

    DeConv2DConfig - see DeConv2DConfig

    DeConv3DConfig - see DeConv3DConfig

    Conv2DConfig - see Conv2DConfig

    rates - strides (Size: Exactly(count=2))

  • isSameMode - isSameMode

  • sH - Stride height

  • sW - Stride width

  • rH - Rate height

  • rW - Rate width

  • sameMode - If true: use same mode padding. If false

  • bias (NUMERIC) - Optional bias, rank 1 with shape [outputChannels]. May be null.

  • Conv2DConfig - see Conv2DConfig

  • paddingBottom - Optional 2d int[] array for padding the result: values [[pad top, pad bottom], [pad left, pad right]] (Size: Exactly(count=2))

    nchw - If true: input is in NCHW (minibatch, channels, height, width) format. False: NHWC format

    scaleH - Scale to upsample in height dimension

  • scaleW - Scale to upsample in width dimension

  • d (LONG) - dilation - default = 1

  • isSameMode (BOOL) - Same mode - default = true

  • dataFormat (STRING) - Data format - default = NCW

  • sW (LONG) - Stride along width dimension - default = 1

  • pH (LONG) - Padding along height dimension - default = 0

  • pW (LONG) - Padding along width dimension - default = 0

  • dH (LONG) - Dilation along height dimension - default = 1

  • dW (LONG) - Dilation along width dimension - default = 1

  • isSameMode (BOOL) - Same mode - default = true

  • dataFormat (STRING) - Data format - default = NCHW

  • sD (LONG) - Stride depth - default = 1

  • sW (LONG) - Stride width - default = 1

  • sH (LONG) - Stride height - default = 1

  • pD (LONG) - Padding depth - default = 0

  • pW (LONG) - Padding width - default = 0

  • pH (LONG) - Padding height - default = 0

  • dD (LONG) - Dilation depth - default = 1

  • dW (LONG) - Dilation width - default = 1

  • dH (LONG) - Dilation height - default = 1

  • biasUsed (BOOL) - biasUsed - default = false

  • isSameMode (BOOL) - Same mode - default = true

  • dataFormat (STRING) - Data format - default = NDHWC

  • sW (LONG) - Stride along width dimension - default = 1

  • pH (LONG) - Padding along height dimension - default = 0

  • pW (LONG) - Padding along width dimension - default = 0

  • dH (LONG) - Dilation along height dimension - default = 1

  • dW (LONG) - Dilation along width dimension - default = 1

  • isSameMode (BOOL) - Same mode - default = false

  • dataFormat (STRING) - Data format - default = NCHW

  • sD (LONG) - Stride depth - default = 1

  • sW (LONG) - Stride width - default = 1

  • sH (LONG) - Stride height - default = 1

  • pD (LONG) - Padding depth - default = 0

  • pW (LONG) - Padding width - default = 0

  • pH (LONG) - Padding height - default = 0

  • dD (LONG) - Dilation depth - default = 1

  • dW (LONG) - Dilation width - default = 1

  • dH (LONG) - Dilation height - default = 1

  • isSameMode (BOOL) - Same mode - default = false

  • dataFormat (STRING) - Data format - default = NCDHW

  • sW (LONG) - Stride along width dimension - default = 1

  • pH (LONG) - Padding along height dimension - default = 0

  • pW (LONG) - Padding along width dimension - default = 0

  • dH (LONG) - Dilation along height dimension - default = 1

  • dW (LONG) - Dilation along width dimension - default = 1

  • isSameMode (BOOL) - Same mode - default = true

  • dataFormat (STRING) - Data format - default = nchw

  • sD (LONG) - Stride depth - default = 1

  • sW (LONG) - Stride width - default = 1

  • sH (LONG) - Stride height - default = 1

  • pD (LONG) - Padding depth - default = 0

  • pW (LONG) - Padding width - default = 0

  • pH (LONG) - Padding height - default = 0

  • dD (LONG) - Dilation depth - default = 1

  • dW (LONG) - Dilation width - default = 1

  • dH (LONG) - Dilation height - default = 1

  • isSameMode (BOOL) - Same mode - default = true

  • dataFormat (STRING) - Data format - default = NCDHW

  • depth (INT) - depth - default = 5

    Pooling2DConfig
    Pooling3DConfig
    Conv2DConfig
    Conv2DConfig
    LocalResponseNormalizationConfig
    Pooling2DConfig
    Pooling2DConfig
    Pooling3DConfig
    conv1d
    col2Im
    conv2d
    depthWiseConv2d
    im2Col
    separableConv2d
    conv3d
    deconv2d
    deconv3d
    avgPooling2d
    maxPoolWithArgmax
    maxPooling2d
    avgPooling3d
    maxPooling3d
    localResponseNormalization
    INDArray avgPooling3d(INDArray input, Pooling3DConfig pooling3DConfig)
    
    SDVariable avgPooling3d(SDVariable input, Pooling3DConfig pooling3DConfig)
    SDVariable avgPooling3d(String name, SDVariable input, Pooling3DConfig pooling3DConfig)
    INDArray batchToSpace(INDArray x, int[] blocks, int[] croppingTop, int[] croppingBottom)
    
    SDVariable batchToSpace(SDVariable x, int[] blocks, int[] croppingTop, int[] croppingBottom)
    SDVariable batchToSpace(String name, SDVariable x, int[] blocks, int[] croppingTop, int[] croppingBottom)
    INDArray col2Im(INDArray in, Conv2DConfig conv2DConfig)
    
    SDVariable col2Im(SDVariable in, Conv2DConfig conv2DConfig)
    SDVariable col2Im(String name, SDVariable in, Conv2DConfig conv2DConfig)
    INDArray conv1d(INDArray input, INDArray weights, INDArray bias, Conv1DConfig conv1DConfig)
    INDArray conv1d(INDArray input, INDArray weights, Conv1DConfig conv1DConfig)
    
    SDVariable conv1d(SDVariable input, SDVariable weights, SDVariable bias, Conv1DConfig conv1DConfig)
    SDVariable conv1d(SDVariable input, SDVariable weights, Conv1DConfig conv1DConfig)
    SDVariable conv1d(String name, SDVariable input, SDVariable weights, SDVariable bias, Conv1DConfig conv1DConfig)
    SDVariable conv1d(String name, SDVariable input, SDVariable weights, Conv1DConfig conv1DConfig)
    INDArray conv2d(INDArray layerInput, INDArray weights, INDArray bias, Conv2DConfig conv2DConfig)
    INDArray conv2d(INDArray layerInput, INDArray weights, Conv2DConfig conv2DConfig)
    
    SDVariable conv2d(SDVariable layerInput, SDVariable weights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable conv2d(SDVariable layerInput, SDVariable weights, Conv2DConfig conv2DConfig)
    SDVariable conv2d(String name, SDVariable layerInput, SDVariable weights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable conv2d(String name, SDVariable layerInput, SDVariable weights, Conv2DConfig conv2DConfig)
    INDArray conv3d(INDArray input, INDArray weights, INDArray bias, Conv3DConfig conv3DConfig)
    INDArray conv3d(INDArray input, INDArray weights, Conv3DConfig conv3DConfig)
    
    SDVariable conv3d(SDVariable input, SDVariable weights, SDVariable bias, Conv3DConfig conv3DConfig)
    SDVariable conv3d(SDVariable input, SDVariable weights, Conv3DConfig conv3DConfig)
    SDVariable conv3d(String name, SDVariable input, SDVariable weights, SDVariable bias, Conv3DConfig conv3DConfig)
    SDVariable conv3d(String name, SDVariable input, SDVariable weights, Conv3DConfig conv3DConfig)
    INDArray deconv2d(INDArray layerInput, INDArray weights, INDArray bias, DeConv2DConfig deConv2DConfig)
    INDArray deconv2d(INDArray layerInput, INDArray weights, DeConv2DConfig deConv2DConfig)
    
    SDVariable deconv2d(SDVariable layerInput, SDVariable weights, SDVariable bias, DeConv2DConfig deConv2DConfig)
    SDVariable deconv2d(SDVariable layerInput, SDVariable weights, DeConv2DConfig deConv2DConfig)
    SDVariable deconv2d(String name, SDVariable layerInput, SDVariable weights, SDVariable bias, DeConv2DConfig deConv2DConfig)
    SDVariable deconv2d(String name, SDVariable layerInput, SDVariable weights, DeConv2DConfig deConv2DConfig)
    INDArray deconv3d(INDArray input, INDArray weights, INDArray bias, DeConv3DConfig deConv3DConfig)
    INDArray deconv3d(INDArray input, INDArray weights, DeConv3DConfig deConv3DConfig)
    
    SDVariable deconv3d(SDVariable input, SDVariable weights, SDVariable bias, DeConv3DConfig deConv3DConfig)
    SDVariable deconv3d(SDVariable input, SDVariable weights, DeConv3DConfig deConv3DConfig)
    SDVariable deconv3d(String name, SDVariable input, SDVariable weights, SDVariable bias, DeConv3DConfig deConv3DConfig)
    SDVariable deconv3d(String name, SDVariable input, SDVariable weights, DeConv3DConfig deConv3DConfig)
    INDArray depthToSpace(INDArray x, int blockSize, DataFormat dataFormat)
    
    SDVariable depthToSpace(SDVariable x, int blockSize, DataFormat dataFormat)
    SDVariable depthToSpace(String name, SDVariable x, int blockSize, DataFormat dataFormat)
    INDArray depthWiseConv2d(INDArray layerInput, INDArray depthWeights, INDArray bias, Conv2DConfig conv2DConfig)
    INDArray depthWiseConv2d(INDArray layerInput, INDArray depthWeights, Conv2DConfig conv2DConfig)
    
    SDVariable depthWiseConv2d(SDVariable layerInput, SDVariable depthWeights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable depthWiseConv2d(SDVariable layerInput, SDVariable depthWeights, Conv2DConfig conv2DConfig)
    SDVariable depthWiseConv2d(String name, SDVariable layerInput, SDVariable depthWeights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable depthWiseConv2d(String name, SDVariable layerInput, SDVariable depthWeights, Conv2DConfig conv2DConfig)
    INDArray dilation2D(INDArray df, INDArray weights, int[] strides, int[] rates, boolean isSameMode)
    
    SDVariable dilation2D(SDVariable df, SDVariable weights, int[] strides, int[] rates, boolean isSameMode)
    SDVariable dilation2D(String name, SDVariable df, SDVariable weights, int[] strides, int[] rates, boolean isSameMode)
    INDArray extractImagePatches(INDArray input, int kH, int kW, int sH, int sW, int rH, int rW, boolean sameMode)
    
    SDVariable extractImagePatches(SDVariable input, int kH, int kW, int sH, int sW, int rH, int rW, boolean sameMode)
    SDVariable extractImagePatches(String name, SDVariable input, int kH, int kW, int sH, int sW, int rH, int rW, boolean sameMode)
    INDArray im2Col(INDArray in, Conv2DConfig conv2DConfig)
    
    SDVariable im2Col(SDVariable in, Conv2DConfig conv2DConfig)
    SDVariable im2Col(String name, SDVariable in, Conv2DConfig conv2DConfig)
    INDArray localResponseNormalization(INDArray input, LocalResponseNormalizationConfig localResponseNormalizationConfig)
    
    SDVariable localResponseNormalization(SDVariable input, LocalResponseNormalizationConfig localResponseNormalizationConfig)
    SDVariable localResponseNormalization(String name, SDVariable input, LocalResponseNormalizationConfig localResponseNormalizationConfig)
    INDArray[] maxPoolWithArgmax(INDArray input, Pooling2DConfig pooling2DConfig)
    
    SDVariable[] maxPoolWithArgmax(SDVariable input, Pooling2DConfig pooling2DConfig)
    SDVariable[] maxPoolWithArgmax(String name, SDVariable input, Pooling2DConfig pooling2DConfig)
    INDArray maxPooling2d(INDArray input, Pooling2DConfig pooling2DConfig)
    
    SDVariable maxPooling2d(SDVariable input, Pooling2DConfig pooling2DConfig)
    SDVariable maxPooling2d(String name, SDVariable input, Pooling2DConfig pooling2DConfig)
    INDArray maxPooling3d(INDArray input, Pooling3DConfig pooling3DConfig)
    
    SDVariable maxPooling3d(SDVariable input, Pooling3DConfig pooling3DConfig)
    SDVariable maxPooling3d(String name, SDVariable input, Pooling3DConfig pooling3DConfig)
    INDArray separableConv2d(INDArray layerInput, INDArray depthWeights, INDArray pointWeights, INDArray bias, Conv2DConfig conv2DConfig)
    INDArray separableConv2d(INDArray layerInput, INDArray depthWeights, INDArray pointWeights, Conv2DConfig conv2DConfig)
    
    SDVariable separableConv2d(SDVariable layerInput, SDVariable depthWeights, SDVariable pointWeights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable separableConv2d(SDVariable layerInput, SDVariable depthWeights, SDVariable pointWeights, Conv2DConfig conv2DConfig)
    SDVariable separableConv2d(String name, SDVariable layerInput, SDVariable depthWeights, SDVariable pointWeights, SDVariable bias, Conv2DConfig conv2DConfig)
    SDVariable separableConv2d(String name, SDVariable layerInput, SDVariable depthWeights, SDVariable pointWeights, Conv2DConfig conv2DConfig)
    INDArray spaceToBatch(INDArray x, int[] blocks, int[] paddingTop, int[] paddingBottom)
    
    SDVariable spaceToBatch(SDVariable x, int[] blocks, int[] paddingTop, int[] paddingBottom)
    SDVariable spaceToBatch(String name, SDVariable x, int[] blocks, int[] paddingTop, int[] paddingBottom)
    INDArray spaceToDepth(INDArray x, int blockSize, DataFormat dataFormat)
    
    SDVariable spaceToDepth(SDVariable x, int blockSize, DataFormat dataFormat)
    SDVariable spaceToDepth(String name, SDVariable x, int blockSize, DataFormat dataFormat)
    INDArray upsampling2d(INDArray input, int scale)
    
    SDVariable upsampling2d(SDVariable input, int scale)
    SDVariable upsampling2d(String name, SDVariable input, int scale)
    INDArray upsampling2d(INDArray input, int scaleH, int scaleW, boolean nchw)
    
    SDVariable upsampling2d(SDVariable input, int scaleH, int scaleW, boolean nchw)
    SDVariable upsampling2d(String name, SDVariable input, int scaleH, int scaleW, boolean nchw)
    INDArray upsampling3d(INDArray input, boolean ncdhw, int scaleD, int scaleH, int scaleW)
    
    SDVariable upsampling3d(SDVariable input, boolean ncdhw, int scaleD, int scaleH, int scaleW)
    SDVariable upsampling3d(String name, SDVariable input, boolean ncdhw, int scaleD, int scaleH, int scaleW)

    Layers

    Supported neural network layers.

    hashtag
    What are layers?

    Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

    hashtag
    Using layers

    All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

    hashtag
    Layers vs. vertices

    If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

    hashtag
    General layers

    hashtag
    ActivationLayer

    Activation layer is a simple layer that applies the specified activation function to the input activations

    clone

    • param activation Activation function for the layer

    activation

    Activation function for the layer

    activation

    • param activationFunction Activation function for the layer

    activation

    • param activation Activation function for the layer

    hashtag
    DenseLayer

    Dense layer: a standard fully connected feed forward layer

    hasBias

    If true (default): include bias parameters in the model. False: no bias.

    hasLayerNorm

    If true (default = false): enable layer normalization on this layer

    hashtag
    DropoutLayer

    Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

    build

    Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

    • param dropout Activation retain probability.

    hashtag
    EmbeddingLayer

    Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

    hasBias

    If true: include bias parameters in the layer. False (default): no bias.

    weightInit

    Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

    • param embeddingInitializer Source of the embedding layer weights

    weightInit

    Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

    • param vectors Vectors to initialize the embedding layer with

    hashtag
    EmbeddingSequenceLayer

    Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

    hasBias

    If true: include bias parameters in the layer. False (default): no bias.

    inputLength

    Set input sequence length for this embedding layer.

    • param inputLength input sequence length

    • return Builder

    inferInputLength

    Set input sequence inference mode for embedding layer.

    • param inferInputLength whether to infer input length

    • return Builder

    weightInit

    Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

    • param embeddingInitializer Source of the embedding layer weights

    weightInit

    Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

    • param vectors Vectors to initialize the embedding layer with

    hashtag
    GlobalPoolingLayer

    Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

    Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass: - Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only - CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

    Behaviour with default settings: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

    Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives - [miniBatchSize, vectorSize, 1] for RNN output, - [miniBatchSize, channels, 1, 1] for CNN output, and - [miniBatchSize, channels, 1, 1, 1] for CNN3D output.

    poolingDimensions

    Pooling type for global pooling

    poolingType

    • param poolingType Pooling type for global pooling

    collapseDimensions

    Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

    If false: - 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1] - 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1] - 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]

    • param collapseDimensions Whether to collapse the dimensions or not

    pnorm

    P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

    • param pnorm P-norm constant

    hashtag
    LocalResponseNormalization

    Local response normalization layer See section 3.3 of

    k

    LRN scaling constant k. Default: 2

    n

    Number of adjacent kernel maps to use when doing LRN. default: 5

    • param n Number of adjacent kernel maps

    alpha

    LRN scaling constant alpha. Default: 1e-4

    • param alpha Scaling constant

    beta

    Scaling constant beta. Default: 0.75

    • param beta Scaling constant

    cudnnAllowFallback

    When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

    • param allowFallback Whether fallback to non-CuDNN implementation should be used

    hashtag
    LocallyConnected1D

    SameDiff version of a 1D locally connected layer.

    nIn

    Number of inputs to the layer (input size)

    nOut

    • param nOut Number of outputs (output size)

    activation

    • param activation Activation function for the layer

    kernelSize

    • param k Kernel size for the layer

    stride

    • param s Stride for the layer

    padding

    • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

    convolutionMode

    • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

    dilation

    • param d Dilation for the layer

    hasBias

    • param hasBias If true (default is false) the layer will have a bias

    setInputSize

    Set input filter size for this locally connected 1D layer

    • param inputSize height of the input filters

    • return Builder

    hashtag
    LocallyConnected2D

    SameDiff version of a 2D locally connected layer.

    setKernel

    Number of inputs to the layer (input size)

    setStride

    • param stride Stride for the layer. Must be 2 values (height/width)

    setPadding

    • param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

    setDilation

    • param dilation Dilation for the layer. Must be 2 values (height/width)

    nIn

    • param nIn Number of inputs to the layer (input size)

    nOut

    • param nOut Number of outputs (output size)

    activation

    • param activation Activation function for the layer

    kernelSize

    • param k Kernel size for the layer. Must be 2 values (height/width)

    stride

    • param s Stride for the layer. Must be 2 values (height/width)

    padding

    • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

    convolutionMode

    • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

    dilation

    • param d Dilation for the layer. Must be 2 values (height/width)

    hasBias

    • param hasBias If true (default is false) the layer will have a bias

    setInputSize

    Set input filter size (h,w) for this locally connected 2D layer

    • param inputSize pair of height and width of the input filters to this layer

    • return Builder

    hashtag
    LossLayer

    LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

    nIn

    • param lossFunction Loss function for the loss layer

    hashtag
    OutputLayer

    Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

    build

    • param lossFunction Loss function for the output layer

    hashtag
    Pooling1D

    Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

    hashtag
    Pooling2D

    Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

    hashtag
    Subsampling1DLayer

    sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

    Supports the following pooling types: MAX, AVG, SUM, PNORM

    setKernelSize

    Kernel size

    • param kernelSize kernel size

    setStride

    Stride

    • param stride stride value

    setPadding

    Padding

    • param padding padding value

    hashtag
    Upsampling1D

    sequenceLength]} Example:

    size

    Upsampling size

    • param size upsampling size in single spatial dimension of this 1D layer

    size

    Upsampling size int array with a single element. Array must be length 1

    • param size upsampling size in single spatial dimension of this 1D layer

    hashtag
    Upsampling2D

    Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

    size

    Upsampling size int, used for both height and width

    • param size upsampling size in height and width dimensions

    size

    Upsampling size array

    • param size upsampling size in height and width dimensions

    hashtag
    Upsampling3D

    Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

    size

    Upsampling size as int, so same upsampling size is used for depth, width and height

    • param size upsampling size in height, width and depth dimensions

    size

    Upsampling size as int, so same upsampling size is used for depth, width and height

    • param size upsampling size in height, width and depth dimensions

    hashtag
    ZeroPadding1DLayer

    Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

    setPadding

    Padding value for left and right. Must be length 2 array

    build

    • param padding Padding for both the left and right

    hashtag
    ZeroPadding3DLayer

    Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

    setPadding

    [padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

    build

    • param padding Padding for both the left and right in all three spatial dimensions

    hashtag
    ZeroPaddingLayer

    Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

    setPadding

    Padding value for top, bottom, left, and right. Must be length 4 array

    build

    • param padHeight Padding for both the top and bottom

    • param padWidth Padding for both the left and right

    hashtag
    ElementWiseMultiplicationLayer

    is a learnable weight vector of length nOut - “.” is element-wise multiplication - b is a bias vector Note that the input and output sizes of the element-wise layer are the same for this layer

    created by jingshu

    getMemoryReport

    This is a report of the estimated memory consumption for the given layer

    • param inputType Input type to the layer. Memory consumption is often a function of the input type

    • return Memory report for the layer

    hashtag
    RepeatVector

    RepeatVector layer configuration.

    RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

    getRepetitionFactor

    Set repetition factor for RepeatVector layer

    setRepetitionFactor

    Set repetition factor for RepeatVector layer

    • param n upsampling size in height and width dimensions

    repetitionFactor

    Set repetition factor for RepeatVector layer

    • param n upsampling size in height and width dimensions

    hashtag
    Yolo2OutputLayer

    Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

    Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

    lambdaCoord

    Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

    lambbaNoObj

    Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

    • param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

    lossPositionScale

    Loss function for position/scale component of the loss function

    • param lossPositionScale Loss function for position/scale

    lossClassPredictions

    Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

    • param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

    boundingBoxPriors

    Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

    • param boundingBoxes Bounding box prior dimensions (width, height)

    hashtag
    MaskLayer

    MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

    hashtag
    MaskZeroLayer

    Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    http://www.cs.toronto.edu/~fritz/absps/imagenet.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    https://arxiv.org/abs/1612.08242arrow-up-right
    http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdfarrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    public ActivationLayer clone() 
    public Builder activation(String activationFunction) 
    public Builder activation(IActivation activationFunction) 
    public Builder activation(Activation activation) 
    public Builder hasBias(boolean hasBias) 
    public Builder hasLayerNorm(boolean hasLayerNorm)
    public DropoutLayer build() 
    public Builder hasBias(boolean hasBias) 
    public Builder weightInit(EmbeddingInitializer embeddingInitializer)
    public Builder weightInit(INDArray vectors)
    public Builder hasBias(boolean hasBias) 
    public Builder inputLength(int inputLength) 
    public Builder inferInputLength(boolean inferInputLength) 
    public Builder weightInit(EmbeddingInitializer embeddingInitializer)
    public Builder weightInit(INDArray vectors)
    public Builder poolingDimensions(int... poolingDimensions) 
    public Builder poolingType(PoolingType poolingType) 
    public Builder collapseDimensions(boolean collapseDimensions) 
    public Builder pnorm(int pnorm) 
    public Builder k(double k) 
    public Builder n(double n) 
    public Builder alpha(double alpha) 
    public Builder beta(double beta) 
    public Builder cudnnAllowFallback(boolean allowFallback) 
    public Builder nIn(int nIn) 
    public Builder nOut(int nOut) 
    public Builder activation(Activation activation) 
    public Builder kernelSize(int k) 
    public Builder stride(int s) 
    public Builder padding(int p) 
    public Builder convolutionMode(ConvolutionMode cm) 
    public Builder dilation(int d) 
    public Builder hasBias(boolean hasBias) 
    public Builder setInputSize(int inputSize) 
    public void setKernel(int... kernel) 
    public void setStride(int... stride) 
    public void setPadding(int... padding) 
    public void setDilation(int... dilation) 
    public Builder nIn(int nIn) 
    public Builder nOut(int nOut) 
    public Builder activation(Activation activation) 
    public Builder kernelSize(int... k) 
    public Builder stride(int... s) 
    public Builder padding(int... p) 
    public Builder convolutionMode(ConvolutionMode cm) 
    public Builder dilation(int... d) 
    public Builder hasBias(boolean hasBias) 
    public Builder setInputSize(int... inputSize) 
    public Builder nIn(int nIn) 
    public OutputLayer build() 
    public void setKernelSize(int... kernelSize) 
    public void setStride(int... stride) 
    public void setPadding(int... padding) 
    If input (for a single example, with channels down page, and sequence from left to right) is:
    [ A1, A2, A3]
    [ B1, B2, B3]
    Then output with size = 2 is:
    [ A1, A1, A2, A2, A3, A3]
    [ B1, B1, B2, B2, B3, B2]
    public Builder size(int size) 
    public Builder size(int[] size) 
    Input (slice for one example and channel)
    [ A, B ]
    [ C, D ]
    Size = [2, 2]
    Output (slice for one example and channel)
    [ A, A, B, B ]
    [ A, A, B, B ]
    [ C, C, D, D ]
    [ C, C, D, D ]
    public Builder size(int size) 
    public Builder size(int[] size) 
    public Builder size(int size) 
    public Builder size(int[] size) 
    public void setPadding(int... padding) 
    public ZeroPadding1DLayer build() 
    public void setPadding(int... padding) 
    public ZeroPadding3DLayer build() 
    public void setPadding(int... padding) 
    public ZeroPaddingLayer build() 
    public LayerMemoryReport getMemoryReport(InputType inputType) 
    public int getRepetitionFactor() 
    public void setRepetitionFactor(int n) 
    public Builder repetitionFactor(int n) 
    public Builder lambdaCoord(double lambdaCoord) 
    public Builder lambbaNoObj(double lambdaNoObj) 
    public Builder lossPositionScale(ILossFunction lossPositionScale) 
    public Builder lossClassPredictions(ILossFunction lossClassPredictions) 
    public Builder boundingBoxPriors(INDArray boundingBoxes) 

    Advanced Autoencoder

    Trajectory Clustering Using AIS

    circle-exclamation

    This tutorial still uses an outdated API version. You can still get an idea of how things work, but you will not be able to copy & paste code from it without modifications.

    Sometimes, deep learning is just one piece of the whole project. You may have a time series problem requiring advanced analysis and you need to use more than just a neural network. Trajectory clustering can be a difficult problem to solve when your data isn’t quite “even”. Marine Automatic Identification System (AIS)arrow-up-right is an open system for marine broadcasting of positions. It primarily helps collision avoidance and marine authorities to monitor marine traffic.

    What if you wanted to determine the most popular routes? Or take it one step further and identify anomalous traffic? Not everything can be done with a single neural network. Furthermore, AIS data for 1 year is over 100GB compressed. You’ll need more than just a desktop computer to analyze it seriously.

    hashtag
    Sequence-to-sequence Autoencoders

    As you learned in the Basic Autoencoder tutorial, applications of autoencoders in data science include dimensionality reduction and data denoising. Instead of using dense layers in an autoencoder, you can swap out simple MLPs for LSTMs. That same network using LSTMs are sequence-to-sequence autoencoders and are effective at capturing temporal structure.

    In the case of AIS data, coordinates can be reported at irregular intervals over time. Not all time series for a single ship have an equal length - there’s high dimensionality in the data. Before deep learning was used, other techniques like were used for measuring similarity between sequences. However, now that we can train a network to compress a trajectory of a ship using a seq2seq autoencoder, we can use the output for various things.

    hashtag
    G-means clustering

    So let’s say we want to group similar trajectories of ships together using all available AIS data. It’s hard to guess how many unique groups of routes exist for marine traffic, so a clustering algorithm like k-means is not useful. This is where the G-means algorithm has some utility.

    G-means will repeatedly test a group for Gaussian patterns. If the group tests positive, then it will split the group. This will continue to happen until the groups no longer appear Gaussian. There are also other methods for non-K-means analysis, but G-means is quite useful for our needs.

    hashtag
    Apache Spark

    Sometimes a single computer doesn’t cut it for munging your data. was originally developed for storing and processing large amounts of data; however, with times comes innovation and was eventually developed for faster large-scale data processing, touting up to a 100x improvement over Hadoop. The two frameworks aren’t entirely identical - Spark doesn’t have its own filesystem and often uses Hadoop’s HDFS.

    Spark is also capable of SQL-like exploration of data with its spark-sql module. However, it is not unique in the ecosystem and other frameworks such as and have similar functionality. At the conceptual level, Hive and Pig make it easy to write map-reduce programs. However, Spark has largely become the de facto standard for data analysis and Pig has recently introduced a Spark integration.

    hashtag
    What are we going to learn in this tutorial?

    Using Deeplearning4j, DataVec, and some custom code you will learn how to cluster large amounts of AIS data. We will be using a local Spark cluster built-in to Zeppelin to execute DataVec preprocessing, train an autoencoder on the converted sequences, and finally use G-means on the compressed output and visualize the groups.

    hashtag
    Imports

    hashtag
    Download the dataset

    The file we will be downloading is nearly 2GB uncompressed, make sure you have enough space on your local disk. If you want to check out the file yourself, you can download a copy from . The code below will check if the data already exists and download the file.

    hashtag
    Examine sequence lengths

    The trouble with raw data is that it usually doesn’t have the clean structure that you would expect for an example. It’s useful to investigate the structure of the data, calculate some basic statistics on average sequence length, and figure out the complexity of the raw data. Below we count the length of each sequence and plot the distribution. You will see that this is very problematic. The longest sequence in the data is 36,958 time steps!

    hashtag
    Extract and transform

    Now that we’ve examined our data, we need to extract it from the CSV and transform it into sequence data. DataVec and Spark make this easy to use for us.

    Using DataVec’s Schema class we define the schema of the data and their columns. Alternatively, if you have a sample file of the data you can also use the InferredSchema class. Afterwards, we can build a TransformProcess that removes any unwanted fields and uses a comparison of timestamps to create sequences for each unique ship in the AIS data.

    Once we’re certain that the schema and transformations are what we want, we can read the CSV into a and execute our transformation with DataVec. First, we convert the data to a sequence with convertToSequence() and a numerical comparator to sort by timestamp. Then we apply a window function to each sequence to reduce those windows to a single value. This helps reduce the variability in sequence lengths, which will be problematic when we go to train our autoencoder.

    If you want to use the Scala-style method of programming, you can switch back and forth between the Scala and Java APIs for the Spark RDD. Calling .rdd on a JavaRDD will return a regular RDD Scala class. If you prefer the Java API, call toJavaRDD() on a RDD.

    hashtag
    Filtering of trajectories

    To reduce the complexity of this tutorial, we will be omitting anomalous trajectories. In the analysis above you’ll see that there is a significant number of trajectories with invalid positions. Latitude and longitude coordinates do not exceed the -90,90 and -180,180 ranges respectively; therefore, we filter them. Additionally, many of the trajectories only include a handful of positions - we will eliminate sequences that are too short for meaningful representation.

    hashtag
    Iteration and storage options

    Once you have finished preprocessing your dataset, you have a couple options to serialize your dataset before feeding it to your autoencoder network via an iterator. This applies to both unsupervised and supervised learning.

    1. Save to Hadoop Map File. This serializes the dataset to the hadoop map file format and writes it to disk. You can do this whether your training network will be on a local, single node or distributed across multiple nodes via Spark. The advantage here is you can preprocess your dataset once, and read from the map file as much as necessary.

    2. Pass the RDD to a RecordReaderMultiDataSetIterator. If you prefer to read your dataset directly from Spark, you can pass your RDD to a RecordReaderMultiDataSetIterator. The SparkSourceDummyReader

    This example uses method 1 above. We’ll assume you have a single machine instance for training your network. Note: you can always mix architectures for preprocessing and training (Spark vs. GPU cluster). It really depends on what hardware you have available.

    hashtag
    Iterating from disk

    Now that we’ve saved our dataset to a Hadoop Map File, we need to set up a RecordReader and iterator that will read our saved sequences and feed them to our autoencoder. Conveniently, if you have already saved your data to disk, you can run this code block (and remaining code blocks) as much as you want without preprocessing the dataset again.

    hashtag
    Build the autoencoder

    Now that we’ve prepared our data, we must construct the sequence-to-sequence autoencoder. The configuration is quite similar to the autoencoders in other tutorials, except layers primarily use LSTMs. Note that in this architecture we use a DuplicateToTimeSeriesVertex between our encoder and decoder. This allows us to iteratively generate while each time step will get the same input but a different hidden state.

    hashtag
    Unsupervised training

    Now that the network configruation is set up and instantiated along with our iterators, training takes just a few lines of code. Earlier we attached a ScoreIterationListener to the model by using the setListeners() method. Depending on the browser you are using to run this notebook, you can open the debugger/inspector to view listener output. This output is redirected to the console since the internals of Deeplearning4j use SL4J for logging, and the output is being redirected by Zeppelin. This is a good thing since it can reduce clutter in notebooks.

    After each epoch, we will evaluate how well the network is learning by using the evaluate() method. Although here we only use accuracy() and precision(), it is strongly recommended you learn how to do advanced evaluation with ROC curves and understand the output from a confusion matrix.

    hashtag
    Train for multiple epochs

    Deeplearning4j has a built-in MultipleEpochsIterator that automatically handles multiple epochs of training. Alternatively, if you instead want to handle per-epoch events you can either use an EarlyStoppingGraphTrainer which listens for scoring events, or wrap net.fit() in a for-loop yourself.

    Below, we manually create a for- loop since our iterator requires a more complex MultiDataSet. This is because our seq2seq autoencoder uses multiple inputs/outputs.

    The autoencoder here has been tuned to converge with an average reconstruction error of approximately 2% when trained for 35+ epochs.

    hashtag
    Save your model

    At this point, you’ve invested a lot of time and computation building your autoencoder. Saving it to disk and restoring it is quite simple.

    hashtag
    Compare reconstructed outputs

    Below we build a loop to visualize just how well our autoencoder is able to reconstruct the original sequences. After forwarding a single example, we score the reconstruction and then compare the original array to the reconstructed array. Note that we need to do some string formatting, otherwise when we try to print the array we will get garbled output

    • this is actually a reference to the array object in memory.

    hashtag
    Transferring the parameters

    Now that the network has been trained, we will extract the encoder from the network. This is so we can construct a new network for exclusive representation encoding.

    hashtag
    Clustering the output

    Homestretch! We’re now able to take the compressed representations of our trajectories and start to cluster them together. As mentioned earlier, a non-K clustering algorithm is preferable.

    The has a number of clustering methods already available and we’ll be using it for grouping our trajectories.

    hashtag
    Visualizing the output

    Visualizing the clusters requires one extra step. Our seq2seq autoencoder produces representations that are higher than 2 or 3 dimensions, meaning you will need to use an algorithm such as t-SNE to further reduce the dimensionality and generate “coordinates” that can be used for plotting. The pipeline would first involve clustering your encoded representations with G-means, then feeding the output to t-SNE to reduce the dimensionality of each representation so it can be plotted.

    hashtag
    Interpreting the result

    You may be thinking, “do these clusters make sense?” This is where further exploration is required. You’ll need to go back to your clusters, identify the ships belonging to each one, and compare the ships within each cluster. If your encoder and clustering pipeline worked, you’ll notice patterns such as:

    • ships crossing the English channel are grouped together

    • boats parked in marinas also cluster together

    • trans- atlantic ships also tend to cluster together

    class acts as a placeholder for each source of records. This process will convert the records to a
    MultiDataSet
    which can then be passed to a distributed neural network such as
    SparkComputationGraph
    .
  • Serialize to another format. There are other options for serializing a dataset which will not be discussed here. They include saving the INDArray data in a compressed format on disk or using a proprietary method you create yourself.

  • dynamic time warpingarrow-up-right
    Hadooparrow-up-right
    Apache Sparkarrow-up-right
    Hivearrow-up-right
    Pigarrow-up-right
    https://dl4jdata.blob.core.windows.net/datasets/aisdk_20171001.csv.ziparrow-up-right
    Spark RDDarrow-up-right
    Smile Scala libraryarrow-up-right
    import org.deeplearning4j.nn.graph.ComputationGraph
    import org.deeplearning4j.nn.transferlearning.TransferLearning
    import org.deeplearning4j.nn.api.OptimizationAlgorithm
    import org.deeplearning4j.nn.weights.WeightInit
    import org.deeplearning4j.nn.conf._
    import org.deeplearning4j.nn.conf.layers._
    import org.deeplearning4j.nn.conf.graph.rnn._
    import org.deeplearning4j.nn.conf.inputs.InputType
    import org.deeplearning4j.nn.conf.WorkspaceMode
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener
    import org.deeplearning4j.datasets.iterator.MultipleEpochsIterator
    import org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator
    import org.deeplearning4j.util.ModelSerializer
    
    import org.datavec.api.transform._
    import org.datavec.api.transform.transform.time.StringToTimeTransform
    import org.datavec.api.transform.sequence.comparator.NumericalColumnComparator
    import org.datavec.api.transform.transform.string.ConcatenateStringColumns
    import org.datavec.api.transform.transform.doubletransform.MinMaxNormalizer
    import org.datavec.api.transform.schema.Schema
    import org.datavec.api.transform.metadata.StringMetaData
    import org.datavec.api.records.reader.impl.csv.CSVRecordReader
    import org.datavec.api.split.FileSplit
    import org.datavec.spark.storage.SparkStorageUtils
    import org.datavec.spark.transform.misc.StringToWritablesFunction
    import org.datavec.spark.transform.SparkTransformExecutor
    import org.datavec.api.transform.condition._
    import org.datavec.api.transform.condition.column._
    import org.datavec.api.transform.sequence.window.ReduceSequenceByWindowTransform
    import org.datavec.api.transform.reduce.Reducer
    import org.datavec.api.transform.reduce.AggregableColumnReduction
    import org.datavec.api.transform.sequence.window.TimeWindowFunction
    import org.datavec.api.transform.ops.IAggregableReduceOp
    import org.datavec.api.transform.metadata.ColumnMetaData
    import org.datavec.api.writable._
    import org.datavec.hadoop.records.reader.mapfile.MapFileSequenceRecordReader
    import org.datavec.api.util.ArchiveUtils
    
    import org.nd4j.linalg.api.ndarray.INDArray
    import org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor
    import org.nd4j.linalg.dataset.api.MultiDataSet
    import org.nd4j.linalg.lossfunctions.LossFunctions
    import org.nd4j.linalg.activations.Activation
    import org.nd4j.linalg.learning.config._
    import org.nd4j.linalg.factory.Nd4j
    import org.nd4j.linalg.indexing.BooleanIndexing
    import org.nd4j.linalg.indexing.INDArrayIndex
    import org.nd4j.linalg.indexing.NDArrayIndex._
    import org.nd4j.linalg.indexing.conditions.Conditions
    
    import org.apache.spark.api.java.function.Function
    import org.apache.commons.io.FileUtils
    import org.joda.time.DateTimeZone
    import org.joda.time.format.DateTimeFormat
    
    import scala.collection.JavaConversions._
    import scala.collection.JavaConverters._
    import scala.io.Source
    import java.util.Random
    import java.util.concurrent.TimeUnit
    import java.io._
    import java.net.URL
    val cache = new File(System.getProperty("user.home"), "/.deeplearning4j")
    val dataFile = new File(cache, "/aisdk_20171001.csv")
    
    if(!dataFile.exists()) {
        val remote = "https://dl4jdata.blob.core.windows.net/datasets/aisdk_20171001.csv.zip"
        val tmpZip = new File(cache, "aisdk_20171001.csv.zip")
        tmpZip.delete() // prevents errors
        println("Downloading file...")
        FileUtils.copyURLToFile(new URL(remote), tmpZip)
        println("Decompressing file...")
        ArchiveUtils.unzipFileTo(tmpZip.getAbsolutePath(), cache.getAbsolutePath())
        tmpZip.delete()
        println("Done.")
    } else {
        println("File already exists.")
    }
    val raw = sqlContext.read
        .format("com.databricks.spark.csv")
        .option("header", "true") // Use first line of all files as header
        .option("inferSchema", "true") // Automatically infer data types
        .load(dataFile.getAbsolutePath)
    
    import org.apache.spark.sql.functions._
    
    val positions = raw
        .withColumn("Timestamp", unix_timestamp(raw("# Timestamp"), "dd/MM/YYYY HH:mm:ss"))
        .select("Timestamp","MMSI","Longitude","Latitude")
        
    positions.printSchema    
    positions.registerTempTable("positions")
    val sequences = positions
        .rdd
        .map( row => (row.getInt(1), (row.getLong(0), (row.getDouble(3), row.getDouble(2)))) ) // a tuple of ship ID and timed coordinates
        .groupBy(_._1)
        .map( group => (group._1, group._2.map(pos => pos._2).toSeq.sortBy(_._1)))
        
    case class Stats(numPositions: Int, minTime: Long, maxTime: Long, totalTime: Long)
    
    val stats = sequences
        .map { seq => 
            val timestamps = seq._2.map(_._1).toArray
            Stats(seq._2.size, timestamps.min, timestamps.max, (timestamps.max-timestamps.min))
        }
        .toDF()
    stats.registerTempTable("stats")
    // our reduction op class that we will need shortly
    // due to interpreter restrictions, we put this inside an object
    object Reductions extends Serializable {
        class GeoAveragingReduction(val columnOutputName: String="AveragedLatLon", val delim: String=",") extends AggregableColumnReduction {
            
            override def reduceOp(): IAggregableReduceOp[Writable, java.util.List[Writable]] = {
                new AverageCoordinateReduceOp(delim)
            }
            
            override def getColumnsOutputName(inputName: String): java.util.List[String] = List(columnOutputName)
            
            override def getColumnOutputMetaData(newColumnName: java.util.List[String], columnInputMeta: ColumnMetaData): java.util.List[ColumnMetaData] = 
                List(new StringMetaData(columnOutputName))
            
            override def transform(inputSchema: Schema) = inputSchema
            
            override def outputColumnName: String = null
            
            override def outputColumnNames: Array[String] = new Array[String](0)
            
            override def columnNames: Array[String] = new Array[String](0)
            
            override def columnName: String = null
              
            def getInputSchema(): org.datavec.api.transform.schema.Schema = ???
            
            def setInputSchema(x$1: org.datavec.api.transform.schema.Schema): Unit = ???
        }
        
        class AverageCoordinateReduceOp(val delim: String) extends IAggregableReduceOp[Writable, java.util.List[Writable]] {
            final val PI_180 = Math.PI / 180.0
    
            var sumx = 0.0
            var sumy = 0.0
            var sumz = 0.0
            var count = 0
            
            override def combine[W <: IAggregableReduceOp[Writable, java.util.List[Writable]]](accu: W): Unit = {
              if (accu.isInstanceOf[AverageCoordinateReduceOp]) {
                val r: AverageCoordinateReduceOp =
                  accu.asInstanceOf[AverageCoordinateReduceOp]
                sumx += r.sumx
                sumy += r.sumy
                sumz += r.sumz
                count += r.count
              } else {
                throw new IllegalStateException(
                  "Cannot combine type of class: " + accu.getClass)
              }
            }
    
            override def accept(writable: Writable): Unit = {
              val str: String = writable.toString
              val split: Array[String] = str.split(delim)
              if (split.length != 2) {
                throw new IllegalStateException(
                  "Could not parse lat/long string: \"" + str + "\"")
              }
              val latDeg: Double = java.lang.Double.parseDouble(split(0))
              val longDeg: Double = java.lang.Double.parseDouble(split(1))
              val lat: Double = latDeg * PI_180
              val lng: Double = longDeg * PI_180
              val x: Double = Math.cos(lat) * Math.cos(lng)
              val y: Double = Math.cos(lat) * Math.sin(lng)
              val z: Double = Math.sin(lat)
              sumx += x
              sumy += y
              sumz += z
              count += 1
            }
    
            override def get(): java.util.List[Writable] = {
              val x: Double = sumx / count
              val y: Double = sumy / count
              val z: Double = sumz / count
              val longRad: Double = Math.atan2(y, x)
              val hyp: Double = Math.sqrt(x * x + y * y)
              val latRad: Double = Math.atan2(z, hyp)
              val latDeg: Double = latRad / PI_180
              val longDeg: Double = longRad / PI_180
              val str: String = latDeg + delim + longDeg
              List(new Text(str))
            }
        }
    }
    // note the column names don't exactly match, we are arbitrarily assigning them
    val schema = new Schema.Builder()
        .addColumnsString("Timestamp")
        .addColumnCategorical("VesselType")
        .addColumnsString("MMSI")
        .addColumnsDouble("Lat","Lon") // will convert to Double later
        .addColumnCategorical("Status")
        .addColumnsDouble("ROT","SOG","COG")
        .addColumnInteger("Heading")
        .addColumnsString("IMO","Callsign","Name")
        .addColumnCategorical("ShipType","CargoType")
        .addColumnsInteger("Width","Length")
        .addColumnCategorical("FixingDevice")
        .addColumnDouble("Draught")
        .addColumnsString("Destination","ETA")
        .addColumnCategorical("SourceType")
        .addColumnsString("end")
        .build()
        
    val transform = new TransformProcess.Builder(schema)
        .removeAllColumnsExceptFor("Timestamp","MMSI","Lat","Lon")
        .filter(BooleanCondition.OR(new DoubleColumnCondition("Lat",ConditionOp.GreaterThan,90.0), new DoubleColumnCondition("Lat",ConditionOp.LessThan,-90.0))) // remove erroneous lat
        .filter(BooleanCondition.OR(new DoubleColumnCondition("Lon",ConditionOp.GreaterThan,180.0), new DoubleColumnCondition("Lon",ConditionOp.LessThan,-180.0))) // remove erroneous lon
        .transform(new MinMaxNormalizer("Lat", -90.0,	90.0, 0.0, 1.0))
        .transform(new MinMaxNormalizer("Lon", -180.0,	180.0, 0.0, 1.0))
        .convertToString("Lat")
        .convertToString("Lon")
        .transform(new StringToTimeTransform("Timestamp","dd/MM/YYYY HH:mm:ss",DateTimeZone.UTC))
        .transform(new ConcatenateStringColumns("LatLon", ",", List("Lat","Lon")))
        .convertToSequence("MMSI", new NumericalColumnComparator("Timestamp", true))
        .transform(
            new ReduceSequenceByWindowTransform(
                new Reducer.Builder(ReduceOp.Count).keyColumns("MMSI")
                    .countColumns("Timestamp")
                    .customReduction("LatLon", new Reductions.GeoAveragingReduction("LatLon"))
                    .takeFirstColumns("Timestamp")
                    .build(),
                new TimeWindowFunction.Builder()
                    .timeColumn("Timestamp")
                    .windowSize(1L ,TimeUnit.HOURS)
                    .excludeEmptyWindows(true)
                    .build()
            )
        )
        .removeAllColumnsExceptFor("LatLon")
        .build
        
    // note we temporarily switch between java/scala APIs for convenience
    val rawData = sc
        .textFile(dataFile.getAbsolutePath)
        .filter(row => !row.startsWith("# Timestamp")) // filter out the header  
        .toJavaRDD // datavec API uses Spark's Java API
        .map(new StringToWritablesFunction(new CSVRecordReader()))
        
    // once transform is applied, filter sequences we consider "too short"
    // decombine lat/lon then convert to arrays and split, then convert back to java APIs
    val records = SparkTransformExecutor
        .executeToSequence(rawData,transform)
        .rdd
        .filter( seq => seq.size() > 7)
        .map{ row: java.util.List[java.util.List[Writable]] =>
            row.map{ seq => seq.map(_.toString).map(_.split(",").toList.map(coord => new DoubleWritable(coord.toDouble).asInstanceOf[Writable])).flatten }
        }
        .map(_.toList.map(_.asJava).asJava)
        .toJavaRDD
    
    val split = records.randomSplit(Array[Double](0.8,0.2))
        
    val trainSequences = split(0)
    val testSequences = split(1)
    // for purposes of this notebook, you only need to run this block once
    val trainFiles = new File(cache, "/ais_trajectories_train/")
    val testFiles = new File(cache, "/ais_trajectories_test/")
    
    // if you want to delete previously saved data
    // FileUtils.deleteDirectory(trainFiles)
    // FileUtils.deleteDirectory(testFiles)
    
    if(!trainFiles.exists()) SparkStorageUtils.saveMapFileSequences( trainFiles.getAbsolutePath(), trainSequences )
    if(!testFiles.exists()) SparkStorageUtils.saveMapFileSequences( testFiles.getAbsolutePath(), testSequences )
    // set up record readers that will read the features from disk
    val batchSize = 48
    
    // this preprocessor allows for insertion of GO/STOP tokens for the RNN
    object Preprocessor extends Serializable {
        class Seq2SeqAutoencoderPreProcessor extends MultiDataSetPreProcessor {
    
            override def preProcess(mds: MultiDataSet): Unit = {
                val input: INDArray = mds.getFeatures(0)
                val features: Array[INDArray] = Array.ofDim[INDArray](2)
                val labels: Array[INDArray] = Array.ofDim[INDArray](1)
                
                features(0) = input
                
                val mb: Int = input.size(0)
                val nClasses: Int = input.size(1)
                val origMaxTsLength: Int = input.size(2)
                val goStopTokenPos: Int = nClasses
                
                //1 new class, for GO/STOP. And one new time step for it also
                val newShape: Array[Int] = Array(mb, nClasses + 1, origMaxTsLength + 1)
                features(1) = Nd4j.create(newShape:_*)
                labels(0) = Nd4j.create(newShape:_*)
                //Create features. Append existing at time 1 to end. Put GO token at time 0
                features(1).put(Array[INDArrayIndex](all(), interval(0, input.size(1)), interval(1, newShape(2))), input)
                //Set GO token
                features(1).get(all(), point(goStopTokenPos), all()).assign(1)
                //Create labels. Append existing at time 0 to end-1. Put STOP token at last time step - **Accounting for variable length / masks**
                labels(0).put(Array[INDArrayIndex](all(), interval(0, input.size(1)), interval(0, newShape(2) - 1)), input)
                
                var lastTimeStepPos: Array[Int] = null
                
                if (mds.getFeaturesMaskArray(0) == null) {//No masks
                    lastTimeStepPos = Array.ofDim[Int](input.size(0))
                    for (i <- 0 until lastTimeStepPos.length) {
                      lastTimeStepPos(i) = input.size(2) - 1
                    }
                } else {
                    val fm: INDArray = mds.getFeaturesMaskArray(0)
                    val lastIdx: INDArray = BooleanIndexing.lastIndex(fm, Conditions.notEquals(0), 1)
                    lastTimeStepPos = lastIdx.data().asInt()
                }
                for (i <- 0 until lastTimeStepPos.length) {
                    labels(0).putScalar(i, goStopTokenPos, lastTimeStepPos(i), 1.0)
                }
                //In practice: Just need to append an extra 1 at the start (as all existing time series are now 1 step longer)
                var featureMasks: Array[INDArray] = null
                var labelsMasks: Array[INDArray] = null
                
                if (mds.getFeaturesMaskArray(0) != null) {//Masks are present - variable length
                    featureMasks = Array.ofDim[INDArray](2)
                    featureMasks(0) = mds.getFeaturesMaskArray(0)
                    labelsMasks = Array.ofDim[INDArray](1)
                    val newMask: INDArray = Nd4j.hstack(Nd4j.ones(mb, 1), mds.getFeaturesMaskArray(0))
                    // println(mds.getFeaturesMaskArray(0).shape())
                    // println(newMask.shape())
                    featureMasks(1) = newMask
                    labelsMasks(0) = newMask
                } else {
                    //All same length
                    featureMasks = null
                    labelsMasks = null
                }
                //Same for labels
                mds.setFeatures(features)
                mds.setLabels(labels)
                mds.setFeaturesMaskArrays(featureMasks)
                mds.setLabelsMaskArray(labelsMasks)
            }
            
        }
    }
    
    // because this is an autoencoder, features = labels
    val trainRR = new MapFileSequenceRecordReader()
    trainRR.initialize(new FileSplit(trainFiles))
    val trainIter = new RecordReaderMultiDataSetIterator.Builder(batchSize)
                .addSequenceReader("records", trainRR)
                .addInput("records")
                .build()
    trainIter.setPreProcessor(new Preprocessor.Seq2SeqAutoencoderPreProcessor)
                
    val testRR = new MapFileSequenceRecordReader()
    testRR.initialize(new FileSplit(testFiles))
    val testIter = new RecordReaderMultiDataSetIterator.Builder(batchSize)
                .addSequenceReader("records", testRR)
                .addInput("records")
                .build()
    testIter.setPreProcessor(new Preprocessor.Seq2SeqAutoencoderPreProcessor)
    val conf = new NeuralNetConfiguration.Builder()
                    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                    .iterations(1)
                    .seed(123)
                    .regularization(true)
                    .l2(0.001)
                    .weightInit(WeightInit.XAVIER)
                    .updater(new AdaDelta())
                    .inferenceWorkspaceMode(WorkspaceMode.SINGLE)
                    .trainingWorkspaceMode(WorkspaceMode.SINGLE)
                    .graphBuilder()
                    .addInputs("encoderInput","decoderInput")
                    .setInputTypes(InputType.recurrent(2), InputType.recurrent(3))
                    .addLayer("encoder", new GravesLSTM.Builder().nOut(96).activation(Activation.TANH).build(), "encoderInput")
                    .addLayer("encoder2", new GravesLSTM.Builder().nOut(48).activation(Activation.TANH).build(), "encoder")
                    .addVertex("laststep", new LastTimeStepVertex("encoderInput"), "encoder2")
                    .addVertex("dup", new DuplicateToTimeSeriesVertex("decoderInput"), "laststep")
                    .addLayer("decoder", new GravesLSTM.Builder().nOut(48).activation(Activation.TANH).build(), "decoderInput", "dup")
                    .addLayer("decoder2", new GravesLSTM.Builder().nOut(96).activation(Activation.TANH).build(), "decoder")
                    .addLayer("output", new RnnOutputLayer.Builder().lossFunction(LossFunctions.LossFunction.MSE).activation(Activation.SIGMOID).nOut(3).build(), "decoder2")
                    .setOutputs("output")
                    .build()
        
    val net = new ComputationGraph(conf)
    net.setListeners(new ScoreIterationListener(1))
    // pass the training iterator to fit() to watch the network learn one epoch of training
    net.fit(train)
    // we will pass our training data to an iterator that can handle multiple epochs of training
    val numEpochs = 150
    
    (1 to numEpochs).foreach { i =>
        net.fit(trainIter)
        println(s"Finished epoch $i")
    }
    val modelFile = new File(cache, "/seq2seqautoencoder.zip")
    // write to disk
    ModelSerializer.writeModel(net, modelFile, false)
    // restore from disk
    val net = ModelSerializer.restoreComputationGraph(modelFile)
    def arr2Dub(arr: INDArray): Array[Double] = arr.dup().data().asDouble()
    val format = new java.text.DecimalFormat("#.##")
    
    testIter.reset()
    (0 to 10).foreach{ i =>
        val mds = testIter.next(1)
        val reconstructionError = net.score(mds)
        val output = net.feedForward(mds.getFeatures(), false)
        val feat = arr2Dub(mds.getFeatures(0))
        val orig = feat.map(format.format(_)).mkString(",")
        val recon = arr2Dub(output.get("output")).map(format.format(_)).take(feat.size).mkString(",")
        
        println(s"Reconstruction error for example $i is $reconstructionError")
        println(s"Original array:        $orig")
        println(s"Reconstructed array:   $recon")
    }
    // use the GraphBuilder when your network is a ComputationGraph
    val encoder = new TransferLearning.GraphBuilder(net)
        .setFeatureExtractor("laststep")
        .removeVertexAndConnections("decoder-merge")
        .removeVertexAndConnections("decoder")
        .removeVertexAndConnections("decoder2")
        .removeVertexAndConnections("output")
        .removeVertexAndConnections("dup")
        .addLayer("output", new ActivationLayer.Builder().activation(Activation.IDENTITY).build(), "laststep")
        .setOutputs("output")
        .setInputs("encoderInput")
        .setInputTypes(InputType.recurrent(2))
        .build()
        
    // grab a single batch to test feed forward
    val ds = testIter.next(1)
    val embedding = encoder.feedForward(ds.getFeatures(0), false)
    val shape = embedding.get("output").shape().mkString(",")
    val dsFeat = arr2Dub(ds.getFeatures(0))
    val dsOrig = dsFeat.map(format.format(_)).mkString(",")
    val rep = arr2Dub(embedding.get("output")).map(format.format(_)).take(dsFeat.size).mkString(",")
    
    println(s"Compressed shape:       $shape")
    println(s"Original array:        $dsOrig")
    println(s"Compressed array:      $rep")
    // first we need to grab our representations
    // in a "real world" scenario we'd want something more elegant that preserves our MMSIs
    val dataset = scala.collection.mutable.ListBuffer.empty[Array[Double]]
    testIter.reset()
    
    while(testIter.hasNext()) {
        val ds = testIter.next(1)
        val rep = encoder.feedForward(ds.getFeatures(0), false)
        dataset += rep.get("output").dup.data.asDouble
    }
    import smile.clustering.GMeans
    
    val maxClusterNumber = 1000
    val gmeans = new GMeans(dataset.toArray, maxClusterNumber)
    import smile.manifold.TSNE
    import smile.clustering.GMeans
    
    print("%table x\ty\tgroup\tsize") // this must come before any output
    
    val tsne = new TSNE(dataset.toArray, 2); // 2D plot
    val coordinates = tsne.getCoordinates();
    val gmeans = new GMeans(coordinates, 1000)
    
    (0 to coordinates.length-1).foreach{ i =>
        val x = coordinates(i)(0)
        val y = coordinates(i)(1)
        val label = gmeans.getClusterLabel()(i)
        print(s"$x\t$y\t$label\t1\n")
    }

    Cheat Sheet

    Snippets and links for common functionality in Eclipse Deeplearning4j.

    hashtag
    Quick reference

    Deeplearning4j (and related projects) have a lot of functionality. The goal of this page is to summarize this functionality so users know what exists, and where to find more information.

    Contents

    hashtag

    hashtag

    • DenseLayer - () - A simple/standard fully-connected layer

    • EmbeddingLayer - () - Takes positive integer indexes as input, outputs vectors. Only usable as first layer in a model. Mathematically equivalent (when bias is enabled) to DenseLayer with one-hot input, but more efficient. See also: EmbeddingSequenceLayer.

    hashtag

    Output layers: usable only as the last layer in a network. Loss functions are set here.

    • OutputLayer - () - Output layer for standard classification/regression in MLPs/CNNs. Has a fully connected DenseLayer built in. 2d input/output (i.e., row vector per example).

    • LossLayer - () - Output layer without parameters - only loss function and activation function. 2d input/output (i.e., row vector per example). Unlike Outputlayer, restricted to nIn = nOut.

    • RnnOutputLayer - () - Output layer for recurrent neural networks. 3d (time series) input and output. Has time distributed fully connected layer built in.

    hashtag

    • ConvolutionLayer / Convolution2D - () - Standard 2d convolutional neural network layer. Inputs and outputs have 4 dimensions with shape [minibatch,depthIn,heightIn,widthIn] and [minibatch,depthOut,heightOut,widthOut] respectively.

    • Convolution1DLayer / Convolution1D - () - Standard 1d convolution layer

    • Convolution3DLayer / Convolution3D - () - Standard 3D convolution layer. Supports both NDHWC ("channels last") and NCDHW ("channels first") activations format.

    hashtag

    • LSTM - () - LSTM RNN without peephole connections. Supports CuDNN.

    • GravesLSTM - () - LSTM RNN with peephole connections. Does not support CuDNN (thus for GPUs, LSTM should be used in preference).

    • GravesBidirectionalLSTM - () - A bidirectional LSTM implementation with peephole connections. Equivalent to Bidirectional(ADD, GravesLSTM). Due to addition of Bidirecitonal wrapper (below), has been deprecated on master.

    hashtag

    • VariationalAutoencoder - () - A variational autoencoder implementation with MLP/dense layers for the encoder and decoder. Supports multiple different types of

    • AutoEncoder - () - Standard denoising autoencoder layer

    hashtag

    • GlobalPoolingLayer - () - Implements both pooling over time (for RNNs/time series - input size [minibatch, size, timeSeriesLength], out [minibatch, size]) and global spatial pooling (for CNNs - input size [minibatch, depth, h, w], out [minibatch, depth]). Available pooling modes: sum, average, max and p-norm.

    • ActivationLayer - () - Applies an activation function (only) to the input activations. Note that most DL4J layers have activation functions built in as a config option.

    • DropoutLayer - () - Implements dropout as a separate/single layer. Note that most DL4J layers have a "built-in" dropout configuration option.

    hashtag

    Graph vertex: use with ComputationGraph. Similar to layers, vertices usually don't have any parameters, and may support multiple inputs.

    • ElementWiseVertex - () - Performs an element-wise operation on the inputs - add, subtract, product, average, max

    • L2NormalizeVertex - () - normalizes the input activations by dividing by the L2 norm for each example. i.e., out <- out / l2Norm(out)

    • L2Vertex - () - calculates the L2 distance between the two input arrays, for each example separately. Output is a single value, for each input value.

    hashtag

    An InputPreProcessor is a simple class/interface that operates on the input to a layer. That is, a preprocessor is attached to a layer, and performs some operation on the input, before passing the layer to the output. Preprocessors also handle backpropagation - i.e., the preprocessing operations are generally differentiable.

    Note that in many cases (such as the XtoYPreProcessor classes), users won't need to (and shouldn't) add these manually, and can instead just use .setInputType(InputType.feedForward(10)) or similar, which whill infer and add the preprocessors as required.

    • CnnToFeedForwardPreProcessor - () - handles the activation reshaping necessary to transition from a CNN layer (ConvolutionLayer, SubsamplingLayer, etc) to DenseLayer/OutputLayer etc.

    • CnnToRnnPreProcessor - () - handles reshaping necessary to transition from a (effectively, time distributed) CNN layer to a RNN layer.

    • ComposableInputPreProcessor - () - simple class that allows multiple preprocessors to be chained + used on a single layer

    hashtag

    IterationListener: can be attached to a model, and are called during training, once after every iteration (i.e., after each parameter update). TrainingListener: extends IterationListener. Has a number of additional methods are called at different stages of training - i.e., after forward pass, after gradient calculation, at the start/end of each epoch, etc.

    Neither type (iteration/training) are called outside of training (i.e., during output or feed-forward methods)

    • ScoreIterationListener - (, Javadoc) - Logs the loss function score every N training iterations

    • PerformanceListener - (, Javadoc) - Logs performance (examples per sec, minibatches per sec, ETL time), and optionally score, every N training iterations.

    • EvaluativeListener - (, Javadoc) - Evaluates network performance on a test set every N iterations or epochs. Also has a system for callbacks, to (for example) save the evaluation results.

    hashtag

    Link:

    ND4J has a number of classes for evaluating the performance of a network, against a test set. Deeplearning4j (and SameDiff) use these ND4J evaluation classes. Different evaluation classes are suitable for different types of networks. Note: in 1.0.0-beta3 (November 2018), all evaluation classes were moved from DL4J to ND4J; previously they were in DL4J.

    • Evaluation - () - Used for the evaluation of multi-class classifiers (assumes standard one-hot labels, and softmax probability distribution over N classes for predictions). Calculates a number of metrics - accuracy, precision, recall, F1, F-beta, Matthews correlation coefficient, confusion matrix. Optionally calculates top N accuracy, custom binary decision thresholds, and cost arrays (for non-binary case). Typically used for softmax + mcxent/negative-log-likelihood networks.

    • EvaluationBinary - () - A multi-label binary version of the Evaluation class. Each network output is assumed to be a separate/independent binary class, with probability 0 to 1 independent of all other outputs. Typically used for sigmoid + binary cross entropy networks.

    • EvaluationCalibration

    hashtag

    MultiLayerNetwork.save(File) and MultiLayerNetwork.load(File) methods can be used to save and load models. These use ModelSerializer internally. Similar save/load methods are also available for ComputationGraph.

    MultiLayerNetwork and ComputationGraph can be saved using the class - and specifically the writeModel, restoreMultiLayerNetwork and restoreComputationGraph methods.

    Examples:

    Networks can be trained further after saving and loading: however, be sure to load the 'updater' (i.e., the historical state for updaters like momentum, ). If no futher training is required, the updater state can be ommitted to save disk space and memory.

    Most Normalizers (implementing the ND4J Normalizer interface) can also be added to a model using the addNormalizerToModel method.

    Note that the format used for models in DL4J is .zip: it's possible to open/extract these files using programs supporting the zip format.

    hashtag

    This section lists the various configuration options that Deeplearning4j supports.

    hashtag

    Activation functions can be defined in one of two ways: (a) By passing an enumeration value to the configuration - for example, .activation(Activation.TANH) (b) By passing an instance - for example, .activation(new ActivationSigmoid())

    Note that Deeplearning4j supports custom activation functions, which can be defined by extending

    List of supported activation functions:

    • CUBE - () - f(x) = x^3

    • ELU - () - Exponential linear unit ()

    • HARDSIGMOID - () - a piecewise linear version of the standard sigmoid activation function. f(x) = min(1, max(0, 0.2*x + 0.5))

    hashtag

    Weight initialization refers to the method by which the initial parameters for a new network should be set.

    Weight initialization are usually defined using the enumeration.

    Custom weight initializations can be specified using .weightInit(WeightInit.DISTRIBUTION).dist(new NormalDistribution(0, 1)) for example. As for master (but not 0.9.1 release) .weightInit(new NormalDistribution(0, 1)) is also possible, which is equivalent to the previous approach.

    Available weight initializations. Not again that not all are available in the 0.9.1 release:

    • DISTRIBUTION: Sample weights from a provided distribution (specified via dist configuration method

    • ZERO: Generate weights as zeros

    • ONES: All weights are set to 1

    hashtag

    An 'updater' in DL4J is a class that takes raw gradients and modifies them to become updates. These updates will then be applied to the network parameters. The have a good explanation of some of these updaters.

    Supported updaters in Deeplearning4j:

    • AdaDelta - () -

    • AdaGrad - () -

    • AdaMax - () - A variant of the Adam updater -

    hashtag

    All updaters that support a learning rate also support learning rate schedules (the Nesterov momentum updater also supports a momentum schedule). Learning rate schedules can be specified either based on the number of iterations, or the number of epochs that have elapsed. Dropout (see below) can also make use of the schedules listed here.

    Configure using, for example: .updater(new Adam(new ExponentialSchedule(ScheduleType.ITERATION, 0.1, 0.99 ))) You can plot/inspect the learning rate that will be used at any point by calling ISchedule.valueAt(int iteration, int epoch) on the schedule object you have created.

    Available schedules:

    • ExponentialSchedule - () - Implements value(i) = initialValue * gamma^i

    • InverseSchedule - () - Implements value(i) = initialValue * (1 + gamma * i)^(-power)

    • MapSchedule - () - Learning rate schedule based on a user-provided map. Note that the provided map must have a value for iteration/epoch 0. Has a builder class to conveniently define a schedule.

    Note that custom schedules can be created by implementing the ISchedule interface.

    hashtag

    hashtag

    L1 and L2 regularization can easily be added to a network via the configuration: .l1(0.1).l2(0.2). Note that .regularization(true) must be enabled on 0.9.1 also (this option has been removed after 0.9.1 was released).

    L1 and L2 regularization is applied by default on the weight parameters only. That is, .l1 and .l2 will not impact bias parameters - these can be regularized using .l1Bias(0.1).l2Bias(0.2).

    hashtag

    All dropout types are applied at training time only. They are not applied at test time.

    • Dropout - () - Each input activation x is independently set to (0, with probability 1-p) or (x/p with probability p)

    • GaussianDropout - () - This is a multiplicative Gaussian noise (mean 1) on the input activations. Each input activation x is independently set to: x * y, where y ~ N(1, stdev = sqrt((1-rate)/rate))

    • GaussianNoise

    Note that (as of current master - but not 0.9.1) the dropout parameters can also be specified according to any of the schedule classes mentioned in the Learning Rate Schedules section.

    hashtag

    As per dropout, dropconnect / weight noise is applied only at training time

    • DropConnect - () - DropConnect is similar to dropout, but applied to the parameters of a network (instead of the input activations).

    • WeightNoise - () - Apply noise of the specified distribution to the weights at training time. Both additive and multiplicative modes are supported - when additive, noise should be mean 0, when multiplicative, noise should be mean 1

    hashtag

    Constraints are deterministic limitations that are placed on a model's parameters at the end of each iteration (after the parameter update has occurred). They can be thought of as a type of regularization.

    • MaxNormConstraint - () - Constrain the maximum L2 norm of the incoming weights for each unit to be less than or equal to the specified value. If the L2 norm exceeds the specified value, the weights will be scaled down to satisfy the constraint.

    • MinMaxNormConstraint - () - Constrain the minimum AND maximum L2 norm of the incoming weights for each unit to be between the specified values. Weights will be scaled up/down if required.

    • NonNegativeConstraint - () - Constrain all parameters to be non-negative. Negative parameters will be replaced with 0.

    hashtag

    hashtag

    DataSetIterator is an abstraction that DL4J uses to iterate over minibatches of data, used for training. DataSetIterator returns DataSet objects, which are minibatches, and support a maximum of 1 input and 1 output array (INDArray).

    MultiDataSetIterator is similar to DataSetIterator, but returns MultiDataSet objects, which can have as many input and output arrays as required for the network.

    hashtag

    These iterators download their data as required. The actual datasets they return are not customizable.

    • MnistDataSetIterator - () - DataSetIterator for the well-known MNIST digits dataset. By default, returns a row vector (1x784), with values normalized to 0 to 1 range. Use .setInputType(InputType.convolutionalFlat()) to use with CNNs.

    • EmnistDataSetIterator - () - Similar to the MNIST digits dataset, but with more examples, and also letters. Includes multiple different splits (letters only, digits only, letters + digits, etc). Same 1x784 format as MNIST, hence (other than different number of labels for some splits) can be used as a drop-in replacement for MnistDataSetIterator. ,

    hashtag

    The iterators in this subsection are used with user-provided data.

    • RecordReaderDataSetIterator - () - an iterator that takes a DataVec record reader (such as CsvRecordReader or ImageRecordReader) and handles conversion to DataSets, batching, masking, etc. One of the most commonly used iterators in DL4J. Handles non-sequence data only, as input (i.e., RecordReader, no SequenceeRecordReader).

    • RecordReaderMultiDataSetIterator - () - the MultiDataSet version of RecordReaderDataSetIterator, that supports multiple readers. Has a builder pattern for creating more complex data pipelines (such as different subsets of a reader's output to different input/output arrays, conversion to one-hot, etc). Handles both sequence and non-sequence data as input.

    • SequenceRecordReaderDataSetIterator

    hashtag

    • MultiDataSetIteratorAdapter - () - Wrap a DataSetIterator to convert it to a MultiDataSetIterator

    • SingletonMultiDataSetIterator - () - Wrap a MultiDataSet into a MultiDataSetIterator that returns one MultiDataSet (i.e., the wrapped MultiDataSet is not split up)

    • AsyncDataSetIterator - () - Used automatically by MultiLayerNetwork and ComputationGraph where appropriate. Implements asynchronous prefetching of datasets to improve performance.

    hashtag

    ND4J provides a number of classes for performing data normalization. These are implemented as DataSetPreProcessors. The basic pattern for normalization:

    1. Create your (unnormalized) DataSetIterator or MultiDataSetIterator: DataSetIterator myTrainData = ...

    2. Create the normalizer you want to use: NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler();

    3. Fit the normalizer: normalizer.fit(myTrainData)

    In general, you should fit only on the training data, and do trainData.setPreProcessor(normalizer) and testData.setPreProcessor(normalizer) with the same/single normalizer that has been fit on the training data only.

    Note that where appropriate (NormalizerStandardize, NormalizerMinMaxScaler) statistics such as mean/standard-deviation/min/max are shared across time (for time series) and across image x/y locations (but not depth/channels - for image data).

    Data normalization example:

    Available normalizers: DataSet / DataSetIterator

    • ImagePreProcessingScaler - () - Applies min-max scaling to image activations. Default settings do 0-255 input to 0-1 output (but is configurable). Note that unlike the other normalizers here, this one does not rely on statistics (mean/min/max etc) collected from the data, hence the normalizer.fit(trainData) step is unnecessary (is a no-op).

    • NormalizerStandardize - () - normalizes each feature value independently (and optionally label values) to have 0 mean and a standard deviation of 1

    • NormalizerMinMaxScaler

    Available normalizers: MultiDataSet / MultiDataSetIterator

    • ImageMultiPreProcessingScaler - () - A MultiDataSet/MultiDataSetIterator version of ImagePreProcessingScaler

    • MultiNormalizerStandardize - () - MultiDataSet/MultiDataSetIterator version of NormalizerStandardize

    • MultiNormalizerMinMaxScaler - () - MultiDataSet/MultiDataSetIterator version of NormalizerMinMaxScaler

    hashtag

    Deeplearning4j has classes/utilities for performing transfer learning - i.e., taking an existing network, and modifying some of the layers (optionally freezing others so their parameters don't change). For example, an image classifier could be trained on ImageNet, then applied to a new/different dataset. Both MultiLayerNetwork and ComputationGraph can be used with transfer learning - frequently starting from a pre-trained model from the model zoo (see next section), though any MultiLayerNetwork/ComputationGraph can be used.

    Link:

    The main class for transfer learning is . This class has a builder pattern that can be used to add/remove layers, freeze layers, etc. can be used here to specify the learning rate and other settings for the non-frozen layers.

    hashtag

    Deeplearning4j provides a 'model zoo' - a set of pretrained models that can be downloaded and used either as-is (for image classification, for example) or often for transfer learning.

    Link:

    Models available in DL4J's model zoo:

    • AlexNet - ()

    • Darknet19 - ()

    • FaceNetNN4Small2 - ()

    *Note: Trained Keras models (not provided by DL4J) may also be imported, using Deeplearning4j's Keras model import functionality.

    hashtag
    Cheat sheet code snippets

    The Eclipse Deeplearning4j libraries come with a lot of functionality, and we've put together this cheat sheet to help users assemble neural networks and use tensors faster.

    hashtag
    Neural networks

    Code for configuring common parameters and layers for both MultiLayerNetwork and ComputationGraph. See and for full API.

    Sequential networks

    Most network configurations can use MultiLayerNetwork class if they are sequential and simple.

    Complex networks

    Networks that have complex graphs and "branching" such as Inception need to use ComputationGraph.

    hashtag
    Training

    The code snippet below creates a basic pipeline that loads images from disk, applies random transformations, and fits them to a neural network. It also sets up a UI instance so you can visualize progress, and uses early stopping to terminate training early. You can adapt this pipeline for many different use cases.

    hashtag
    Complex Transformation

    DataVec comes with a portable TransformProcess class that allows for more complex data wrangling and data conversion. It works well with both 2D and sequence datasets.

    We recommend having a look at the before creating more complex transformations.

    hashtag
    Evaluation

    Both MultiLayerNetwork and ComputationGraph come with built-in .eval() methods that allow you to pass a dataset iterator and return evaluation results.

    For advanced evaluation the code snippet below can be adapted into training pipelines. This is when the built-in neuralNetwork.eval() method outputs confusing results or if you need to examine raw data.

    Recurrent Layers

  • Unsupervised Layers

  • Other Layers

  • Graph Vertices

  • InputPreProcessors

  • Updaters (Optimizers)

  • Learning Rate Schedules

  • Regularization

    • L1/L2 regularization

    • Dropout

  • Iterators - User Provided Data

  • Iterators - Adapter and Utility Iterators

  • Reading Raw Data: DataVec RecordReaders

  • Data Normalization

  • Spark Network Training Data Classes

  • RnnLossLayer - (Sourcearrow-up-right) - The 'no parameter' version of RnnOutputLayer. 3d (time series) input and output.

  • CnnLossLayer - (Sourcearrow-up-right) - Used with CNNs, where a prediction must be made at each spatial location of the output (for example: segmentation or denoising). No parameters, 4d input/output with shape [minibatch, depth, height, width]. When using softmax, this is applied depthwise at each spatial location.

  • Cnn3DLossLayer - (Sourcearrow-up-right) - used with 3D CNNs, where a preduction must be made at each spatial location (x/y/z) of the output. Layer has no parameters, 5d data in either NCDHW or NDHWC ("channels first" or "channels last") format (configurable). Supports masking. When using Softmax, this is applied along channels at each spatial location.

  • Yolo2OutputLayer - (Sourcearrow-up-right) - Implentation of the YOLO 2 model for object detection in images

  • CenterLossOutputLayer - (Sourcearrow-up-right) - A version of OutputLayer that also attempts to minimize the intra-class distance of examples' activations - i.e., "If example x is in class Y, ensure that embedding(x) is close to average(embedding(y)) for all examples y in Y"

  • Deconvolution2DLayer - (Sourcearrow-up-right) - also known as transpose or fractionally strided convolutions. Can be considered a "reversed" ConvolutionLayer; output size is generally larger than the input, whilst maintaining the spatial connection structure.

  • SeparableConvolution2DLayer - (Sourcearrow-up-right) - depthwise separable convolution layer

  • SubsamplingLayer - (Sourcearrow-up-right) - Implements standard 2d spatial pooling for CNNs - with max, average and p-norm pooling available.

  • Subsampling1DLayer - (Sourcearrow-up-right) - 1D version of the subsampling layer.

  • Upsampling2D - (Sourcearrow-up-right) - Upscale CNN activations by repeating the row/column values

  • Upsampling1D - (Sourcearrow-up-right) - 1D version of the upsampling layer

  • Cropping2D - (Sourcearrow-up-right) - Cropping layer for 2D convolutional neural networks

  • DepthwiseConvolution2D (Sourcearrow-up-right)- 2d depthwise convolution layer

  • ZeroPaddingLayer - (Sourcearrow-up-right) - Very simple layer that adds the specified amount of zero padding to edges of the 4d input activations.

  • ZeroPadding1DLayer - (Sourcearrow-up-right) - 1D version of ZeroPaddingLayer

  • SpaceToDepth - (Sourcearrow-up-right) - This operation takes 4D array in, and moves data from spatial dimensions (HW) to channels (C) for given blockSize

  • SpaceToBatch - (Sourcearrow-up-right) - Transforms data from a tensor from 2 spatial dimensions into batch dimension according to the "blocks" specified

  • Bidirectional - (Sourcearrow-up-right) - A 'wrapper' layer - converts any standard uni-directional RNN into a bidirectional RNN (doubles number of params - forward/backward nets have independent parameters). Activations from forward/backward nets may be either added, multiplied, averaged or concatenated.

  • SimpleRnn - (Sourcearrow-up-right) - A standard/'vanilla' RNN layer. Usually not effective in practice with long time series dependencies - LSTM is generally preferred.

  • LastTimeStep - (Sourcearrow-up-right) - A 'wrapper' layer - extracts out the last time step of the (non-bidirectional) RNN layer it wraps. 3d input with shape [minibatch, size, timeSeriesLength], 2d output with shape [minibatch, size].

  • EmbeddingSequenceLayer: (Sourcearrow-up-right) - A version of EmbeddingLayer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Can only be used as the first layer for a network.

  • BatchNormalization - (Sourcearrow-up-right) - Batch normalization for 2d (feedforward), 3d (time series) or 4d (CNN) activations. For time series, parameter sharing across time; for CNNs, parameter sharing across spatial locations (but not depth).

  • LocalResponseNormalization - (Sourcearrow-up-right) - Local response normalization layer for CNNs. Not frequently used in modern CNN architectures.

  • FrozenLayer - (Sourcearrow-up-right) - Usually not used directly by users - added as part of transfer learning, to freeze a layer's parameters such that they don't change during further training.

  • LocallyConnected2D - (Sourcearrow-up-right) - a 2d locally connected layer, assumes input is 4d data in NCHW ("channels first") format.

  • LocallyConected1D - (Sourcearrow-up-right) - a 1d locally connected layer, assumes input is 3d data in NCW ([minibatch, size, sequenceLength]) format

  • MergeVertex - (Sourcearrow-up-right) - merge the input activations along dimension 1, to make a larger output array. For CNNs, this implements merging along the depth/channels dimension

  • PreprocessorVertex - (Sourcearrow-up-right) - a simple GraphVertex that contains an InputPreProcessor only

  • ReshapeVertex - (Sourcearrow-up-right) - Performs arbitrary activation array reshaping. The preprocessors in the next section should usually be preferred.

  • ScaleVertex - (Sourcearrow-up-right) - implements simple multiplicative scaling of the inputs - i.e., out = scalar * input

  • ShiftVertex - (Sourcearrow-up-right) - implements simple scalar element-wise addition on the inputs - i.e., out = input + scalar

  • StackVertex - (Sourcearrow-up-right) - used to stack all inputs along the minibatch dimension. Analogous to MergeVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels)

  • SubsetVertex - (Sourcearrow-up-right) - used to get a contiguous subset of the input activations along dimension 1. For example, two SubsetVertex instances could be used to split the activations from an input array into two separate activations. Essentially the opposite of MergeVertex.

  • UnstackVertex - (Sourcearrow-up-right) - similar to SubsetVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels). Opposite of StackVertex

  • FeedForwardToCnnPreProcessor - (Sourcearrow-up-right) - handles activation reshaping to transition from a row vector (per example) to a CNN layer. Note that this transition/preprocessor only makes sense if the activations are actually CNN activations, but have been 'flattened' to a row vector.

  • FeedForwardToRnnPreProcessor - (Sourcearrow-up-right) - handles transition from a (time distributed) feed-forward layer to a RNN layer

  • RnnToCnnPreProcessor - (Sourcearrow-up-right) - handles transition from a sequence of CNN activations with shape [minibatch, depth*height*width, timeSeriesLength] to time-distributed [numExamples*timeSeriesLength, numChannels, inputWidth, inputHeight] format

  • RnnToFeedForwardPreProcessor - (Sourcearrow-up-right) - handles transition from time series activations (shape [minibatch,size,timeSeriesLength]) to time-distributed feed-forward (shape [minibatch*tsLength,size]) activations.

  • CheckpointListener - (Sourcearrow-up-right, Javadoc) - Save network checkpoints periodically - based on epochs, iterations or time (or some combination of all three).

  • StatsListener - (Sourcearrow-up-right) - Main listener for DL4J's web-based network training user interface. See visualization page for more details.

  • CollectScoresIterationListener - (Sourcearrow-up-right, Javadoc) - Similar to ScoreIterationListener, but stores scores internally in a list (for later retrieval) instead of logging scores

  • TimeIterationListener - (Sourcearrow-up-right, Javadoc) - Attempts to estimate time until training completion, based on current speed and specified total number of iterations

  • - (
    ) - Used to evaluation the calibration of a binary or multi-class classifier. Produces reliability diagrams, residual plots, and histograms of probabilities. Export plots to HTML using
    .exportevaluationCalibrationToHtmlFile method
  • ROC - (Sourcearrow-up-right) - Used for single output binary classifiers only - i.e., networks with nOut(1) + sigmoid, or nOut(2) + softmax. Supports 2 modes: thresholded (approximate) or exact (the default). Calculates area under ROC curve, area under precision-recall curve. Plot ROC and P-R curves to HTML using EvaluationToolsarrow-up-right

  • ROCBinary - (Sourcearrow-up-right) - a version of ROC that is used for multi-label binary networks (i.e., sigmoid + binary cross entropy), where each network output is assumed to be an independent binary variable.

  • ROCMultiClass - (Sourcearrow-up-right) - a version of ROC that is used for multi-class (non-binary) networks (i.e., softmax + mcxent/negative-log-likelihood networks). As ROC metrics are only defined for binary classification, this treats the multi-class output as a set of 'one-vs-all' binary classification problems.

  • RegressionEvaluation - (Sourcearrow-up-right) - An evaluation class used for regression models (including multi-output regression models). Reports metrics such as mean-squared error (MSE), mean-absolute error, etc for each output/column.

  • HARDTANH - (Sourcearrow-up-right) - a piecewise linear version of the standard tanh activation function.

  • IDENTITY - (Sourcearrow-up-right) - a 'no op' activation function: f(x) = x

  • LEAKYRELU - (Sourcearrow-up-right) - leaky rectified linear unit. f(x) = max(0, x) + alpha * min(0, x) with alpha=0.01 by default.

  • RATIONALTANH - (Sourcearrow-up-right) - tanh(y) ~ sgn(y) * { 1 - 1/(1+|y|+y^2+1.41645*y^4)} which approximates f(x) = 1.7159 * tanh(2x/3), but should be faster to execute. (Referencearrow-up-right)

  • RELU - (Sourcearrow-up-right) - standard rectified linear unit: f(x) = x if x>0 or f(x) = 0 otherwise

  • RRELU - (Sourcearrow-up-right) - randomized rectified linear unit. Deterministic during test time. (Referencearrow-up-right)

  • SIGMOID - (Sourcearrow-up-right) - standard sigmoid activation function, f(x) = 1 / (1 + exp(-x))

  • SOFTMAX - (Sourcearrow-up-right) - standard softmax activation function

  • SOFTPLUS - (Sourcearrow-up-right) - f(x) = log(1+e^x) - shape is similar to a smooth version of the RELU activation function

  • SOFTSIGN - (Sourcearrow-up-right) - f(x) = x / (1+|x|) - somewhat similar in shape to the standard tanh activation function (faster to calculate).

  • TANH - (Sourcearrow-up-right) - standard tanh (hyperbolic tangent) activation function

  • RECTIFIEDTANH - (Sourcearrow-up-right) - f(x) = max(0, tanh(x))

  • SELU - (Sourcearrow-up-right) - scaled exponential linear unit - used with self normalizing neural networksarrow-up-right

  • SWISH - (Sourcearrow-up-right) - Swish activation function, f(x) = x * sigmoid(x) (Referencearrow-up-right)

  • SIGMOID_UNIFORM: A version of XAVIER_UNIFORM for sigmoid activation functions. U(-r,r) with r=4*sqrt(6/(fanIn + fanOut))

  • NORMAL: Normal/Gaussian distribution, with mean 0 and standard deviation 1/sqrt(fanIn). This is the initialization recommented in Klambauer et al. 2017, "Self-Normalizing Neural Network"arrow-up-right paper. Equivalent to DL4J's XAVIER_FAN_IN and LECUN_NORMAL (i.e. Keras' "lecun_normal")

  • LECUN_UNIFORM: Uniform U[-a,a] with a=3/sqrt(fanIn).

  • UNIFORM: Uniform U[-a,a] with a=1/sqrt(fanIn). "Commonly used heuristic" as per Glorot and Bengio 2010

  • XAVIER: As per Glorot and Bengio 2010arrow-up-right: Gaussian distribution with mean 0, variance 2.0/(fanIn + fanOut)

  • XAVIER_UNIFORM: As per Glorot and Bengio 2010arrow-up-right: Uniform distribution U(-s,s) with s = sqrt(6/(fanIn + fanOut))

  • XAVIER_FAN_IN: Similar to Xavier, but 1/fanIn -> Caffe originally used this.

  • RELU: He et al. (2015), "Delving Deep into Rectifiers"arrow-up-right. Normal distribution with variance 2.0/nIn

  • RELU_UNIFORM: He et al. (2015), "Delving Deep into Rectifiers"arrow-up-right. Uniform distribution U(-s,s) with s = sqrt(6/fanIn)

  • IDENTITY: Weights are set to an identity matrix. Note: can only be used with square weight matrices

  • VAR_SCALING_NORMAL_FAN_IN: Gaussian distribution with mean 0, variance 1.0/(fanIn)

  • VAR_SCALING_NORMAL_FAN_OUT: Gaussian distribution with mean 0, variance 1.0/(fanOut)

  • VAR_SCALING_NORMAL_FAN_AVG: Gaussian distribution with mean 0, variance 1.0/((fanIn + fanOut)/2)

  • VAR_SCALING_UNIFORM_FAN_IN: Uniform U[-a,a] with a=3.0/(fanIn)

  • VAR_SCALING_UNIFORM_FAN_OUT: Uniform U[-a,a] with a=3.0/(fanOut)

  • VAR_SCALING_UNIFORM_FAN_AVG: Uniform U[-a,a] with a=3.0/((fanIn + fanOut)/2)

  • Adam - (Sourcearrow-up-right)
  • Nadam - (Sourcearrow-up-right) - A variant of the Adam updater, using the Nesterov mementum update rule - Referencearrow-up-right

  • Nesterovs - (Sourcearrow-up-right) - Nesterov momentum updater

  • NoOp - (Sourcearrow-up-right) - A 'no operation' updater. That is, gradients are not modified at all by this updater. Mathematically equivalent to the SGD updater with a learning rate of 1.0

  • RmsProp - (Sourcearrow-up-right) - Reference - slide 29arrow-up-right

  • Sgd - (Sourcearrow-up-right) - Standard stochastic gradient descent updater. This updater applies a learning rate only.

  • PolySchedule - (Sourcearrow-up-right) - Implements value(i) = initialValue * (1 + i/maxIter)^(-power)

  • SigmoidSchedule - (Sourcearrow-up-right) - Implements value(i) = initialValue * 1.0 / (1 + exp(-gamma * (iter - stepSize)))

  • StepSchedule - (Sourcearrow-up-right) - Implements value(i) = initialValue * gamma^( floor(iter/step) )

  • - (
    ) - Applies additive, mean-zero Gaussian noise to the input - i.e., x = x + N(0,stddev)
  • AlphaDropout - (Sourcearrow-up-right) - AlphaDropout is a dropout technique proposed by Klaumbauer et al. 2017 - Self-Normalizing Neural Networksarrow-up-right. Designed for self-normalizing neural networks (SELU activation, NORMAL weight init). Attempts to keep both the mean and variance of the post-dropout activations to the same (in expectation) as before alpha dropout was applied

  • UnitNormConstraint - (Sourcearrow-up-right) - Constrain the L2 norm of the incoming weights for each unit to be 1.0.

    IrisDataSetIterator - (Sourcearrow-up-right) - An iterator for the well known Iris dataset. 4 features, 3 output classes.
  • Cifar10DataSetIterator - (Sourcearrow-up-right) - An iterator for the CIFAR-10 images dataset. 10 classes, 4d features/activations format for CNNs in DL4J: [minibatch,channels,height,width] = [minibatch,3,32,32]. Features are not normalized - instead, are in the range 0 to 255.

  • LFWDataSetIterator - (Sourcearrow-up-right) - Labeled Faces from the Wild datasetarrow-up-right.

  • TinyImageNetDataSetIterator (Sourcearrow-up-right) - A subset of the standard imagenet dataset; 200 classes, 500 images per class

  • UciSequenceDataSetIterator (Sourcearrow-up-right) - UCI synthetic control time series dataset

  • - (
    ) - The sequence (SequenceRecordReader) version of RecordReaderDataSetIterator. Users may be better off using RecordReaderMultiDataSetIterator, in conjunction with
  • DoublesDataSetIterator - (Sourcearrow-up-right)

  • FloatsDataSetIterator - (Sourcearrow-up-right)

  • INDArrayDataSetIterator - (Sourcearrow-up-right)

  • AsyncMultiDataSetIterator - (Sourcearrow-up-right) - Used automatically by ComputationGraph where appropriate. Implements asynchronous prefetching of MultiDataSets to improve performance.

  • AsyncShieldDataSetIterator - (Sourcearrow-up-right) - Generally used only for debugging. Stops MultiLayerNetwork and ComputationGraph from using an AsyncDataSetIterator.

  • AsyncShieldMultiDataSetIterator - (Sourcearrow-up-right) - The MultiDataSetIterator version of AsyncShieldDataSetIterator

  • EarlyTerminationDataSetIterator - (Sourcearrow-up-right) - Wraps another DataSetIterator, ensuring that only a specified (maximum) number of minibatches (DataSet) objects are returned between resets. Can be used to 'cut short' an iterator, returning only the first N DataSets.

  • EarlyTerminationMultiDataSetIterator - (Sourcearrow-up-right) - The MultiDataSetIterator version of EarlyTerminationDataSetIterator

  • ExistingDataSetIterator - (Sourcearrow-up-right) - Convert an Iterator<DataSet> or Iterable<DataSet> to a DataSetIterator. Does not split the underlying DataSet objects

  • FileDataSetIterator - (Sourcearrow-up-right) - An iterator that iterates over DataSet files that have been previously saved with DataSet.save(File). Supports randomization, filtering, different output batch size vs. saved DataSet batch size, etc.

  • FileMultiDataSetIterator - (Sourcearrow-up-right) - A MultiDataSet version of FileDataSetIterator

  • IteratorDataSetIterator - (Sourcearrow-up-right) - Convert an Iterator<DataSet> to a DataSetIterator. Unlike ExistingDataSetIterator, the underlying DataSet objects may be split/combined - i.e., the minibatch size may differ for the output, vs. the input iterator.

  • IteratorMultiDataSetIterator - (Sourcearrow-up-right) - The Iterator<MultiDataSet> version of IteratorDataSetIterator

  • MultiDataSetWrapperIterator - (Sourcearrow-up-right) - Convert a MultiDataSetIterator to a DataSetIterator. Note that this is only possible if the number of features and labels arrays is equal to 1.

  • MultipleEpochsIterator - (Sourcearrow-up-right) - Treat multiple passes (epochs) of the underlying iterator as a single epoch, when training.

  • WorkspaceShieldDataSetIterator - (Sourcearrow-up-right) - Generally used only for debugging, and not usually by users. Detaches/migrates DataSets coming out of the underlying DataSetIterator.

  • Set the normalizer/preprocessor on the iterator: myTrainData.setPreProcessor(normalizer);

    End result: the data that comes from your DataSetIterator will now be normalized.

    - (
    ) - normalizes each feature value independently (and optionally label values) to lie between a minimum and maximum value (by default between 0 and 1)
  • VGG16ImagePreProcessor - (Sourcearrow-up-right) - This is a preprocessor specifically for VGG16. It subtracts the mean RGB value, computed on the training set, from each pixel as reported in Linkarrow-up-right

  • MultiNormalizerHybrid - (Sourcearrow-up-right) - A MultiDataSet normalizer that can combine different normalization types (standardize, min/max etc) for different input/feature and output/label arrays.

    InceptionResNetV1 - (Sourcearrow-up-right)
  • LeNet - (Sourcearrow-up-right)

  • ResNet50 - (Sourcearrow-up-right)

  • SimpleCNN - (Sourcearrow-up-right)

  • TextGenerationLSTM - (Sourcearrow-up-right)

  • TinyYOLO - (Sourcearrow-up-right)

  • VGG16 - (Sourcearrow-up-right)

  • VGG19 - (Sourcearrow-up-right)

  • Layers
    Feed-Forward Layers
    Output Layers
    Convolutional Layers
    Iteration/Training Listeners
    Evaluation
    Network Saving and Loading
    Network Configurations
    Activation Functions
    Weight Initialization
    Data Classes
    Iterators
    Iterators - Build-In (DL4J-Provided Data)
    Transfer Learning
    Trained Model Library - Model Zoo
    Keras Import
    Distributed Training (Spark)
    Hyperparameter Optimization (Arbiter)
    Layers
    Feed-Forward Layers
    Sourcearrow-up-right
    Sourcearrow-up-right
    Output Layers
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Convolutional Layers
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Recurrent Layers
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Unsupervised Layers
    Sourcearrow-up-right
    reconstruction distributionsarrow-up-right
    Sourcearrow-up-right
    Other Layers
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Graph Vertices
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Input Pre Processors
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Iteration/Training Listeners
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Evaluation
    Main evaluation page
    Sourcearrow-up-right
    Sourcearrow-up-right
    Network Saving and Loading
    ModelSerializerarrow-up-right
    Saving and loading networkarrow-up-right
    Network Configurations
    Activation Functions
    Activationarrow-up-right
    IActivationarrow-up-right
    BaseActivationFunctionarrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Referencearrow-up-right
    Sourcearrow-up-right
    Weight Initialization
    WeightInitarrow-up-right
    Updaters (Optimizers)
    CS231n course notesarrow-up-right
    Sourcearrow-up-right
    Referencearrow-up-right
    Sourcearrow-up-right
    Referencearrow-up-right
    Sourcearrow-up-right
    Referencearrow-up-right
    Learning Rate Schedules
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Regularization
    L1/L2 Regularization
    Dropout
    Sourcearrow-up-right
    Sourcearrow-up-right
    Weight Noise
    Sourcearrow-up-right
    Referencearrow-up-right
    Sourcearrow-up-right
    Constraints
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Data Classes
    Iterators
    Iterators - Build-In (DL4J-Provided Data)
    Sourcearrow-up-right
    Sourcearrow-up-right
    Reference 1arrow-up-right
    Reference 2arrow-up-right
    Iterators - User Provided Data
    Sourcearrow-up-right
    Sourcearrow-up-right
    Iterators - Adapter and Utility Iterators
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Data Normalization
    linkarrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Transfer Learning
    Transfer learning examplesarrow-up-right
    TransferLearningarrow-up-right
    FineTuneConfigurationarrow-up-right
    Trained Model Library - Model Zoo
    Deeplearning4j Model Zoo
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    MultiLayerNetworkarrow-up-right
    ComputationGrapharrow-up-right
    DataVec examplesarrow-up-right
    Sourcearrow-up-right
    EvaluationToolsarrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    Sourcearrow-up-right
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .seed(1234)
        // parameters below are copied to every layer in the network
        // for inputs like dropOut() or activation() you should do this per layer
        // only specify the parameters you need
        .updater(new AdaGrad())
        .activation(Activation.RELU)
        .dropOut(0.8)
        .l1(0.001)
        .l2(1e-4)
        .weightInit(WeightInit.XAVIER)
        .weightInit(Distribution.TruncatedNormalDistribution)
        .cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST)
        .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
        .gradientNormalizationThreshold(1e-3)
        .list()
        // layers in the network, added sequentially
        // parameters set per-layer override the parameters above
        .layer(new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
                .weightInit(WeightInit.XAVIER)
                .build())
        .layer(new ActivationLayer(Activation.RELU))
        .layer(new ConvolutionLayer.Builder(1,1)
                .nIn(1024)
                .nOut(2048)
                .stride(1,1)
                .convolutionMode(ConvolutionMode.Same)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.IDENTITY)
                .build())
        .layer(new GravesLSTM.Builder()
                .activation(Activation.TANH)
                .nIn(inputNum)
                .nOut(100)
                .build())
        .layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
                .weightInit(WeightInit.XAVIER)
                .activation(Activation.SOFTMAX)
                .nIn(numHiddenNodes).nOut(numOutputs).build())
        .pretrain(false).backprop(true)
        .build();
    
    MultiLayerNetwork neuralNetwork = new MultiLayerNetwork(conf);
    ComputationGraphConfiguration.GraphBuilder graph = new NeuralNetConfiguration.Builder()
        .seed(seed)
        // parameters below are copied to every layer in the network
        // for inputs like dropOut() or activation() you should do this per layer
        // only specify the parameters you need
        .activation(Activation.IDENTITY)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(updater)
        .weightInit(WeightInit.RELU)
        .l2(5e-5)
        .miniBatch(true)
        .cacheMode(cacheMode)
        .trainingWorkspaceMode(workspaceMode)
        .inferenceWorkspaceMode(workspaceMode)
        .cudnnAlgoMode(cudnnAlgoMode)
        .convolutionMode(ConvolutionMode.Same)
        .graphBuilder()
        // layers in the network, added sequentially
        // parameters set per-layer override the parameters above
        // note that you must name each layer and manually specify its input
        .addInputs("input1")
        .addLayer("stem-cnn1", new ConvolutionLayer.Builder(new int[] {7, 7}, new int[] {2, 2}, new int[] {3, 3})
            .nIn(inputShape[0])
            .nOut(64)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            .build(),"input1")
        .addLayer("stem-batch1", new BatchNormalization.Builder(false)
            .nIn(64)
            .nOut(64)
            .build(), "stem-cnn1")
        .addLayer("stem-activation1", new ActivationLayer.Builder()
            .activation(Activation.RELU)
            .build(), "stem-batch1")
        .addLayer("lossLayer", new CenterLossOutputLayer.Builder()
            .lossFunction(LossFunctions.LossFunction.SQUARED_LOSS)
            .activation(Activation.SOFTMAX).nOut(numClasses).lambda(1e-4).alpha(0.9)
            .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer).build(),
            "stem-activation1")
        .setOutputs("lossLayer")
        .setInputTypes(InputType.convolutional(224, 224, 3))
        .backprop(true).pretrain(false).build();
    
    ComputationGraph neuralNetwork = new ComputationGraph(graph);
    ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();
    File mainPath = new File(System.getProperty("user.dir"), "dl4j-examples/src/main/resources/animals/");
    FileSplit fileSplit = new FileSplit(mainPath, NativeImageLoader.ALLOWED_FORMATS, rng);
    int numExamples = Math.toIntExact(fileSplit.length());
    int numLabels = fileSplit.getRootDir().listFiles(File::isDirectory).length; //This only works if your root is clean: only label subdirs.
    BalancedPathFilter pathFilter = new BalancedPathFilter(rng, labelMaker, numExamples, numLabels, maxPathsPerLabel);
    
    InputSplit[] inputSplit = fileSplit.sample(pathFilter, splitTrainTest, 1 - splitTrainTest);
    InputSplit trainData = inputSplit[0];
    InputSplit testData = inputSplit[1];
    
    boolean shuffle = false;
    ImageTransform flipTransform1 = new FlipImageTransform(rng);
    ImageTransform flipTransform2 = new FlipImageTransform(new Random(123));
    ImageTransform warpTransform = new WarpImageTransform(rng, 42);
    List<Pair<ImageTransform,Double>> pipeline = Arrays.asList(
        new Pair<>(flipTransform1,0.9),
        new Pair<>(flipTransform2,0.8),
        new Pair<>(warpTransform,0.5));
    
    ImageTransform transform = new PipelineImageTransform(pipeline,shuffle);
    DataNormalization scaler = new ImagePreProcessingScaler(0, 1);
    
    // training dataset
    ImageRecordReader recordReaderTrain = new ImageRecordReader(height, width, channels, labelMaker);
    recordReader.initialize(trainData, null);
    DataSetIterator trainingIterator = new RecordReaderDataSetIterator(recordReaderTrain, batchSize, 1, numLabels);
    
    // testing dataset
    ImageRecordReader recordReaderTest = new ImageRecordReader(height, width, channels, labelMaker);
    recordReader.initialize(testData, null);
    DataSetIterator testingIterator = new RecordReaderDataSetIterator(recordReaderTest, batchSize, 1, numLabels);
    
    // early stopping configuration, model saver, and trainer
    EarlyStoppingModelSaver saver = new LocalFileModelSaver(System.getProperty("user.dir"));
    EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
        .epochTerminationConditions(new MaxEpochsTerminationCondition(50)) //Max of 50 epochs
        .evaluateEveryNEpochs(1)
        .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES)) //Max of 20 minutes
        .scoreCalculator(new DataSetLossCalculator(testingIterator, true))     //Calculate test set score
        .modelSaver(saver)
        .build();
    
    EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf, neuralNetwork, trainingIterator);
    
    // begin training
    trainer.fit();
    Schema schema = new Schema.Builder()
        .addColumnsDouble("Sepal length", "Sepal width", "Petal length", "Petal width")
        .addColumnCategorical("Species", "Iris-setosa", "Iris-versicolor", "Iris-virginica")
        .build();
    
    TransformProcess tp = new TransformProcess.Builder(schema)
        .categoricalToInteger("Species")
        .build();
    
    // do the transformation on spark
    JavaRDD<List<Writable>> processedData = SparkTransformExecutor.execute(parsedInputData, tp);
    // returns evaluation class with accuracy, precision, recall, and other class statistics
    Evaluation eval = neuralNetwork.eval(testIterator);
    System.out.println(eval.accuracy());
    System.out.println(eval.precision());
    System.out.println(eval.recall());
    
    // ROC for Area Under Curve on multi-class datasets (not binary classes)
    ROCMultiClass roc = neuralNetwork.doEvaluation(testIterator, new ROCMultiClass());
    System.out.println(roc.calculateAverageAuc());
    System.out.println(roc.calculateAverageAucPR());
    //Evaluate the model on the test set
    Evaluation eval = new Evaluation(numClasses);
    INDArray output = neuralNetwork.output(testData.getFeatures());
    eval.eval(testData.getLabels(), output, testMetaData); //Note we are passing in the test set metadata here
    
    //Get a list of prediction errors, from the Evaluation object
    //Prediction errors like this are only available after calling iterator.setCollectMetaData(true)
    List<Prediction> predictionErrors = eval.getPredictionErrors();
    System.out.println("\n\n+++++ Prediction Errors +++++");
    for(Prediction p : predictionErrors){
        System.out.println("Predicted class: " + p.getPredictedClass() + ", Actual class: " + p.getActualClass()
            + "\t" + p.getRecordMetaData(RecordMetaData.class).getLocation());
    }
    
    //We can also load the raw data:
    List<Record> predictionErrorRawData = recordReader.loadFromMetaData(predictionErrorMetaData);
    for(int i=0; i<predictionErrors.size(); i++ ){
        Prediction p = predictionErrors.get(i);
        RecordMetaData meta = p.getRecordMetaData(RecordMetaData.class);
        INDArray features = predictionErrorExamples.getFeatures().getRow(i);
        INDArray labels = predictionErrorExamples.getLabels().getRow(i);
        List<Writable> rawData = predictionErrorRawData.get(i).getRecord();
    
        INDArray networkPrediction = model.output(features);
    
        System.out.println(meta.getLocation() + ": "
            + "\tRaw Data: " + rawData
            + "\tNormalized: " + features
            + "\tLabels: " + labels
            + "\tPredictions: " + networkPrediction);
    }
    
    //Some other useful evaluation methods:
    List<Prediction> list1 = eval.getPredictions(1,2);                  //Predictions: actual class 1, predicted class 2
    List<Prediction> list2 = eval.getPredictionByPredictedClass(2);     //All predictions for predicted class 2
    List<Prediction> list3 = eval.getPredictionsByActualClass(2);       //All predictions for actual class 2
    Weight Noise
    Constraints

    Iterators

    Data iteration tools for loading into neural networks.

    hashtag
    What is an iterator?

    A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

    hashtag
    Usage

    For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

    Many other methods also accept iterators for tasks such as evaluation:

    hashtag
    Available iterators

    hashtag
    MnistDataSetIterator

    MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see

    hashtag
    UciSequenceDataSetIterator

    UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

    Details: Data: Image:

    UciSequenceDataSetIterator

    Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

    • param batchSize Minibatch size

    hashtag
    Cifar10DataSetIterator

    CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

    This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: .

    Cifar10DataSetIterator

    Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

    • param batchSize Minibatch size for the iterator

    hashtag
    IrisDataSetIterator

    IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes

    IrisDataSetIterator

    next

    IrisDataSetIterator handles traversing through the Iris Data Set.

    • see

    • param batch Batch size

    • param numExamples Total number of examples

    hashtag
    LFWDataSetIterator

    LFW iterator - Labeled Faces from the Wild dataset See 13233 images total, with 5749 classes.

    LFWDataSetIterator

    Create LFW data specific iterator

    • param batchSize the batch size of the examples

    • param numExamples the overall number of examples

    • param imgDim an array of height, width and channels

    hashtag
    TinyImageNetDataSetIterator

    Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

    Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

    See: and

    TinyImageNetDataSetIterator

    Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

    • param batchSize Minibatch size for the iterator

    hashtag
    EmnistDataSetIterator

    EMNIST DataSetIterator

    • COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes

    • MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z

    • BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)

    See: and

    EmnistDataSetIterator

    EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

    numExamplesTrain

    Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

    • param dataSet Dataset (subset) to return

    • param batchSize Batch size

    • param train If true: use training set. If false: use test set

    numExamplesTest

    Get the number of test examples for the specified subset

    • param dataSet Subset to get

    • return Number of examples for the specified subset

    numLabels

    Get the number of labels for the specified subset

    • param dataSet Subset to get

    • return Number of labels for the specified subset

    isBalanced

    Get the labels as a character array

    • return Labels

    hashtag
    RecordReaderDataSetIterator

    DataSet objects as well as producing minibatches from individual records.

    hashtag
    Example 1: Image classification, batch size 32, 10 classes

    hashtag
    Example 2: Multi-output regression from CSV, batch size 128

    RecordReaderDataSetIterator

    Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

    • param recordReader Record reader to use as the source of data

    • param batchSize Minibatch size, for each call of .next()

    setCollectMetaData

    Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

    • param recordReader RecordReader: provides the source of the data

    • param batchSize Batch size (number of examples) for the output DataSet objects

    • param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()

    loadFromMetaData

    Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    writableConverter

    Builder class for RecordReaderDataSetIterator

    maxNumBatches

    Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

    • param maxNumBatches Maximum number of minibatches per epoch / reset

    regression

    Use this for single output regression (i.e., 1 output/regression target)

    • param labelIndex Column index that contains the regression target (indexes start at 0)

    regression

    Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

    • param labelIndexFrom Column index of the first regression target (indexes start at 0)

    • param labelIndexTo Column index of the last regression target (inclusive)

    classification

    Use this for classification

    • param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1

    • param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

    preProcessor

    Optional arg. Allows the preprocessor to be set

    • param preProcessor Preprocessor to use

    collectMetaData

    When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

    • param collectMetaData Whether metadata should be collected or not

    hashtag
    RecordReaderMultiDataSetIterator

    The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

    RecordReaderMultiDataSetIterator

    When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

    loadFromMetaData

    Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    hashtag
    SequenceRecordReaderDataSetIterator

    Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

    SequenceRecordReaderDataSetIterator

    Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

    • param featuresReader SequenceRecordReader for the features

    • param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1

    • param miniBatchSize Minibatch size for each call of next()

    hasNext

    Constructor where features and labels come from different RecordReaders (for example, different files)

    loadFromMetaData

    Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

    • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

    • return DataSet with the specified example

    • throws IOException If an error occurs during loading of the data

    loadFromMetaData

    Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

    • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

    • return DataSet with the specified examples

    • throws IOException If an error occurs during loading of the data

    hashtag
    AsyncMultiDataSetIterator

    Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

    Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

    next

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    shutdown

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    IteratorDataSetIterator

    required to get the specified batch size.

    Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

    hashtag
    AsyncDataSetIterator

    Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

    Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

    AsyncDataSetIterator

    Create an Async iterator with the default queue size of 8

    • param baseIterator Underlying iterator to wrap and fetch asynchronously from

    next

    Create an Async iterator with the default queue size of 8

    • param iterator Underlying iterator to wrap and fetch asynchronously from

    • param queue Queue size - number of iterators to

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    shutdown

    We want to ensure, that background thread will have the same thread->device affinity, as master thread

    batch

    Batch size

    • return

    setPreProcessor

    Set a pre processor

    • param preProcessor a pre processor to set

    getPreProcessor

    Returns preprocessors, if defined

    • return

    hasNext

    Get dataset iterator record reader labels

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    DoublesDataSetIterator

    First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

    DoublesDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    IteratorMultiDataSetIterator

    required to get a specified batch size.

    Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

    hashtag
    SamplingDataSetIterator

    A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

    SamplingDataSetIterator

    hashtag
    INDArrayDataSetIterator

    First value in pair is the features vector, second value in pair is the labels.

    INDArrayDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    WorkspacesShieldDataSetIterator

    This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

    WorkspacesShieldDataSetIterator

    • param iterator The underlying iterator to detach values from

    hashtag
    MultiDataSetIteratorSplitter

    This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

    PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

    hashtag
    MultiDataSetIteratorSplitter

    • param baseIterator

    • param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches

    • param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

    getTrainIterator

    This method returns train iterator instance

    • return

    next

    This method returns test iterator instance

    • return

    hashtag
    AsyncShieldDataSetIterator

    This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

    AsyncShieldDataSetIterator

    • param iterator Iterator to wrop, to disable asynchronous prefetching for

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

    PLEASE NOTE: This iterator ALWAYS returns FALSE

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    setPreProcessor

    Set a pre processor

    • param preProcessor a pre processor to set

    getPreProcessor

    Returns preprocessors, if defined

    • return

    hasNext

    Get dataset iterator record reader labels

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    DummyBlockDataSetIterator

    This class provides baseline implementation of BlockDataSetIterator interface

    hashtag
    BaseDatasetIterator

    Baseline implementation includes control over the data fetcher and some basic getters for metadata

    hashtag
    AsyncShieldMultiDataSetIterator

    This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

    next

    Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

    • param num Number of examples to fetch

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    / Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

    PLEASE NOTE: This iterator ALWAYS returns FALSE

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    RandomMultiDataSetIterator

    RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

    RandomMultiDataSetIterator

    • param numMiniBatches Number of minibatches per epoch

    • param features Each triple in the list specifies the shape, array order and type of values for the features arrays

    • param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

    addFeatures

    • param numMiniBatches Number of minibatches per epoch

    addFeatures

    Add a new features array to the iterator

    • param shape Shape of the features

    • param order Order (‘c’ or ‘f’) for the array

    • param values Values to fill the array with

    addLabels

    Add a new labels array to the iterator

    • param shape Shape of the features

    • param values Values to fill the array with

    addLabels

    Add a new labels array to the iterator

    • param shape Shape of the features

    • param order Order (‘c’ or ‘f’) for the array

    • param values Values to fill the array with

    generate

    Generate a random array with the specified shape

    • param shape Shape of the array

    • param values Values to fill the array with

    • return Random array of specified shape + contents

    generate

    Generate a random array with the specified shape and order

    • param shape Shape of the array

    • param order Order of array (‘c’ or ‘f’)

    • param values Values to fill the array with

    hashtag
    EarlyTerminationMultiDataSetIterator

    Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

    EarlyTerminationMultiDataSetIterator

    Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

    • param underlyingIterator, iterator to wrap

    • param terminationPoint, minibatches after which hasNext() will return false

    hashtag
    ExistingDataSetIterator

    ExistingDataSetIterator

    Note that when using this constructor, resetting is not supported

    • param iterator Iterator to wrap

    next

    Note that when using this constructor, resetting is not supported

    • param iterator Iterator to wrap

    • param labels String labels. May be null.

    hashtag
    DummyBlockMultiDataSetIterator

    This class provides baseline implementation of BlockMultiDataSetIterator interface

    hashtag
    EarlyTerminationDataSetIterator

    Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

    EarlyTerminationDataSetIterator

    Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

    • param underlyingIterator, iterator to wrap

    • param terminationPoint, minibatches after which hasNext() will return false

    hashtag
    ReconstructionDataSetIterator

    Wraps a data set iterator setting the first (feature matrix) as the labels.

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    hashtag
    DataSetIteratorSplitter

    This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

    PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

    DataSetIteratorSplitter

    The only constructor

    • param baseIterator - iterator to be wrapped and split

    • param totalBatches - total batches in baseIterator

    • param ratio - train/test split ratio

    getTrainIterator

    This method returns train iterator instance

    • return

    next

    This method returns test iterator instance

    • return

    hashtag
    JointMultiDataSetIterator

    This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

    JointMultiDataSetIterator

    • param iterators Underlying iterators to wrap

    next

    • param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet

    • param iterators Underlying iterators to wrap

    setPreProcessor

    Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

    • param preProcessor MultiDataSetPreProcessor. May be null.

    getPreProcessor

    Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

    • return Preprocessor

    resetSupported

    Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

    • return true if reset method is supported; false otherwise

    asyncSupported

    Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

    • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

    reset

    Resets the iterator back to the beginning

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    next

    Returns the next element in the iteration.

    • return the next element in the iteration

    remove

    PLEASE NOTE: This method is NOT implemented

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

    hashtag
    FloatsDataSetIterator

    First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

    FloatsDataSetIterator

    • param iterable Iterable to source data from

    • param batchSize Batch size for generated DataSet objects

    hashtag
    FileSplitDataSetIterator

    Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

    FileSplitDataSetIterator

    • param files List of files to iterate over

    • param callback Callback for loading the files

    hashtag
    MultipleEpochsIterator

    A dataset iterator for doing multiple passes over a dataset

    Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

    next

    Like the standard next method but allows a customizable number of examples returned

    • param num the number of examples

    • return the next data applyTransformToDestination

    inputColumns

    Input columns for the dataset

    • return

    totalOutcomes

    The number of labels for the dataset

    • return

    reset

    Resets the iterator back to the beginning

    batch

    Batch size

    • return

    hasNext

    Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

    • return {- code true} if the iteration has more elements

    remove

    Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

    • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

    • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

    hashtag
    MultiDataSetWrapperIterator

    This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

    PLEASE NOTE: This only works if number of features/labels/masks is 1

    MultiDataSetWrapperIterator

    • param iterator Undelying iterator to wrap

    hashtag
    RandomDataSetIterator

    RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

    RandomDataSetIterator

    • param numMiniBatches Number of minibatches per epoch

    • param featuresShape Features shape

    • param labelsShape Labels shape

    hashtag
    MultiDataSetIteratorAdapter

    Iterator that adapts a DataSetIterator to a MultiDataSetIterator

    param numLabels the overall number of examples
  • param useSubset use a subset of the LFWDataSet

  • param labelGenerator path label generator to use

  • param train true if use train value

  • param splitTrainTest the percentage to split data for train and remainder goes to test

  • param imageTransform how to transform the image

  • param rng random number to lock in batch shuffling

  • LETTERS: 145,600 examples total. 26 balanced classes

  • DIGITS: 280,000 examples total. 10 balanced classes

  • param seed Random number generator seed

    param numPossibleLabels Number of classes (possible labels) for classification

    param numPossibleLabels Number of classes for the labels

    return Random array of specified shape + contents
    param featureValues Type of values for the features
  • param labelValues Type of values for the labels

  • [source]arrow-up-right
    http://yann.lecun.com/exdb/mnist/arrow-up-right
    [source]arrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Seriesarrow-up-right
    https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.dataarrow-up-right
    https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpegarrow-up-right
    [source]arrow-up-right
    https://pjreddie.com/projects/cifar-10-dataset-mirror/arrow-up-right
    [source]arrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Irisarrow-up-right
    https://archive.ics.uci.edu/ml/datasets/Irisarrow-up-right
    [source]arrow-up-right
    http://vis-www.cs.umass.edu/lfw/arrow-up-right
    [source]arrow-up-right
    http://cs231n.stanford.edu/arrow-up-right
    https://tiny-imagenet.herokuapp.com/arrow-up-right
    [source]arrow-up-right
    https://www.nist.gov/itl/iad/image-group/emnist-datasetarrow-up-right
    https://arxiv.org/abs/1702.05373arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    [source]arrow-up-right
    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    model.init();
    
    // pass an MNIST data iterator that automatically fetches data
    DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
    net.fit(mnistTrain);
    // passing directly to the neural network
    DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
    net.eval(mnistTest);
    
    // using an evaluation class
    Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
    while(mnistTest.hasNext()){
        DataSet next = mnistTest.next();
        INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
        eval.eval(next.getLabels(), output); //check the prediction against the true class
    }
    public UciSequenceDataSetIterator(int batchSize) 
    public Cifar10DataSetIterator(int batchSize) 
    public IrisDataSetIterator()
    public DataSet next() 
    public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                        PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                        ImageTransform imageTransform, Random rng) 
    public TinyImageNetDataSetIterator(int batchSize) 
    public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException 
    public static int numExamplesTrain(Set dataSet) 
    public static int numExamplesTest(Set dataSet) 
    public static int numLabels(Set dataSet) 
    public static boolean isBalanced(Set dataSet) 
    rr.initialize(new FileSplit(new File("/path/to/directory")));
    
    DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
    //Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
    //  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
    // at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
    .classification(1, nClasses)
    .preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
    .build()
    }
    rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));
    
    DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
    //Specify the columns that the regression labels/targets appear in. Note that all other columns will be
    // treated as features. Columns indexes start at 0
    .regression(labelColFrom, labelColTo)
    .build()
    }
    public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize) 
    public void setCollectMetaData(boolean collectMetaData) 
    public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException 
    public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException 
    public Builder writableConverter(WritableConverter converter)
    public Builder maxNumBatches(int maxNumBatches)
    public Builder regression(int labelIndex)
    public Builder regression(int labelIndexFrom, int labelIndexTo)
    public Builder classification(int labelIndex, int numClasses)
    public Builder preProcessor(DataSetPreProcessor preProcessor)
    public Builder collectMetaData(boolean collectMetaData)
    public RecordReaderMultiDataSetIterator build() 
    public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException 
    public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException 
    public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                        int miniBatchSize, int numPossibleLabels) 
    public boolean hasNext() 
    public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException 
    public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException 
    public MultiDataSet next(int num) 
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor) 
    public boolean resetSupported() 
    public boolean asyncSupported() 
    public void reset() 
    public void shutdown() 
    public boolean hasNext() 
    public MultiDataSet next() 
    public void remove() 
    public AsyncDataSetIterator(DataSetIterator baseIterator) 
    public DataSet next(int num) 
    public int inputColumns() 
    public int totalOutcomes() 
    public boolean resetSupported() 
    public boolean asyncSupported() 
    public void reset() 
    public void shutdown() 
    public int batch() 
    public void setPreProcessor(DataSetPreProcessor preProcessor) 
    public DataSetPreProcessor getPreProcessor() 
    public boolean hasNext() 
    public DataSet next() 
    public void remove() 
    public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize) 
    public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples) 
    public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize) 
    public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator) 
    public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio) 
    public MultiDataSetIterator getTrainIterator() 
    public MultiDataSet next(int num) 
    public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator) 
    public DataSet next(int num) 
    public int inputColumns() 
    public int totalOutcomes() 
    public boolean resetSupported() 
    public boolean asyncSupported() 
    public void reset() 
    public int batch() 
    public void setPreProcessor(DataSetPreProcessor preProcessor) 
    public DataSetPreProcessor getPreProcessor() 
    public boolean hasNext() 
    public DataSet next() 
    public void remove() 
    public MultiDataSet next(int num) 
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor) 
    public boolean resetSupported() 
    public boolean asyncSupported() 
    public void reset() 
    public boolean hasNext() 
    public MultiDataSet next() 
    public void remove() 
    public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)
    public Builder addFeatures(long[] shape, Values values) 
    public Builder addFeatures(long[] shape, char order, Values values)
    public Builder addLabels(long[] shape, Values values) 
    public Builder addLabels(long[] shape, char order, Values values)
    public static INDArray generate(long[] shape, Values values) 
    public static INDArray generate(long[] shape, char order, Values values)
    public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint) 
    public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator) 
    public DataSet next(int num) 
    public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint) 
    public DataSet next(int num) 
    public int inputColumns() 
    public int totalOutcomes() 
    public void reset() 
    public int batch() 
    public boolean hasNext() 
    public DataSet next() 
    public void remove() 
    public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio) 
    public DataSetIterator getTrainIterator() 
    public DataSet next(int i) 
    public JointMultiDataSetIterator(DataSetIterator... iterators) 
    public MultiDataSet next(int num) 
    public void setPreProcessor(MultiDataSetPreProcessor preProcessor) 
    public MultiDataSetPreProcessor getPreProcessor() 
    public boolean resetSupported() 
    public boolean asyncSupported() 
    public void reset() 
    public boolean hasNext() 
    public MultiDataSet next() 
    public void remove() 
    public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize) 
    public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback) 
    public DataSet next(int num) 
    public int inputColumns() 
    public int totalOutcomes() 
    public void reset() 
    public int batch() 
    public boolean hasNext() 
    public void remove() 
    public MultiDataSetWrapperIterator(MultiDataSetIterator iterator) 
    public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

    Math

    hashtag
    ClipByAvgNorm

    Clips tensor values to a maximum average L2-norm.

    • x (NUMERIC) - Input variable

    • clipValue - Value for clipping

    • dimensions - Dimensions to reduce over (Size: AtLeast(min=0))

    hashtag
    EmbeddingLookup

    Looks up ids in a list of embedding tensors.

    • x (NUMERIC) - Input tensor

    • indices (INT) - A Tensor containing the ids to be looked up.

    • PartitionMode - partition_mode == 0 - i.e. 'mod' , 1 - 'div'

    hashtag
    MergeMaxIndex

    Return array of max elements indices with along tensor dimensions

    • x (NUMERIC) - Input tensor

    • dataType - Data type - default = DataType.INT

    hashtag
    abs

    Elementwise absolute value operation: out = abs(x)

    • x (NUMERIC) - Input variable

    hashtag
    acos

    Elementwise acos (arccosine, inverse cosine) operation: out = arccos(x)

    • x (NUMERIC) - Input variable

    hashtag
    acosh

    Elementwise acosh (inverse hyperbolic cosine) function: out = acosh(x)

    • x (NUMERIC) - Input variable

    hashtag
    add

    Pairwise addition operation, out = x + y

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    add

    Scalar add operation, out = in + scalar

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    amax

    Absolute max array reduction operation, optionally along specified dimensions: out = max(abs(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    amean

    Absolute mean array reduction operation, optionally along specified dimensions: out = mean(abs(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    amin

    Absolute min array reduction operation, optionally along specified dimensions: out = min(abs(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    and

    Boolean AND operation: elementwise (x != 0) && (y != 0)

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

    • x (BOOL) - Input 1

    • y (BOOL) - Input 2

    hashtag
    asin

    Elementwise asin (arcsin, inverse sine) operation: out = arcsin(x)

    • x (NUMERIC) - Input variable

    hashtag
    asinh

    Elementwise asinh (inverse hyperbolic sine) function: out = asinh(x)

    • x (NUMERIC) - Input variable

    hashtag
    asum

    Absolute sum array reduction operation, optionally along specified dimensions: out = sum(abs(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    atan

    Elementwise atan (arctangent, inverse tangent) operation: out = arctangent(x)

    • x (NUMERIC) - Input variable

    hashtag
    atan2

    Elementwise atan (arctangent, inverse tangent) operation: out = atan2(x,y).

    Similar to atan(y/x) but sigts of x and y are used to determine the location of the result

    • y (NUMERIC) - Input Y variable

    • x (NUMERIC) - Input X variable

    hashtag
    atanh

    Elementwise atanh (inverse hyperbolic tangent) function: out = atanh(x)

    • x (NUMERIC) - Input variable

    hashtag
    bitShift

    Bit shift operation

    • x (NUMERIC) - input

    • shift (NUMERIC) - shift value

    hashtag
    bitShiftRight

    Right bit shift operation

    • x (NUMERIC) - Input tensor

    • shift (NUMERIC) - shift argument

    hashtag
    bitShiftRotl

    Cyclic bit shift operation

    • x (NUMERIC) - Input tensor

    • shift (NUMERIC) - shift argy=ument

    hashtag
    bitShiftRotr

    Cyclic right shift operation

    • x (NUMERIC) - Input tensor

    • shift (NUMERIC) - Shift argument

    hashtag
    ceil

    Element-wise ceiling function: out = ceil(x).

    Rounds each value up to the nearest integer value (if not already an integer)

    • x (NUMERIC) - Input variable

    hashtag
    clipByNorm

    Clipping by L2 norm, optionally along dimension(s)

    if l2Norm(x,dimension) < clipValue, then input is returned unmodifed

    Otherwise, out[i] = in[i] * clipValue / l2Norm(in, dimensions) where each value is clipped according

    to the corresponding l2Norm along the specified dimensions

    • x (NUMERIC) - Input variable

    • clipValue - Clipping value (maximum l2 norm)

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    clipByValue

    Element-wise clipping function:

    out[i] = in[i] if in[i] >= clipValueMin and in[i] <= clipValueMax

    out[i] = clipValueMin if in[i] < clipValueMin

    out[i] = clipValueMax if in[i] > clipValueMax

    • x (NUMERIC) - Input variable

    • clipValueMin - Minimum value for clipping

    • clipValueMax - Maximum value for clipping

    hashtag
    confusionMatrix

    Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

    which are represented as integer values. This version assumes the number of classes is 1 + max(max(labels), max(pred))

    For example, if labels = [0, 1, 1] and predicted = [0, 2, 1] then output is:

    [1, 0, 0]

    [0, 1, 1]

    [0, 0, 0]

    • labels (NUMERIC) - Labels - 1D array of integer values representing label values

    • pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels

    • dataType - Data type

    hashtag
    confusionMatrix

    Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

    which are represented as integer values.

    For example, if labels = [0, 1, 1], predicted = [0, 2, 1], and numClasses=4 then output is:

    [1, 0, 0, 0]

    [0, 1, 1, 0]

    [0, 0, 0, 0]

    [0, 0, 0, 0]

    • labels (NUMERIC) - Labels - 1D array of integer values representing label values

    • pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels

    • numClasses - Number of classes

    hashtag
    confusionMatrix

    Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

    which are represented as integer values. This version assumes the number of classes is 1 + max(max(labels), max(pred))

    For example, if labels = [0, 1, 1], predicted = [0, 2, 1] and weights = [1, 2, 3]

    [1, 0, 0]

    [0, 3, 2]

    [0, 0, 0]

    • labels (NUMERIC) - Labels - 1D array of integer values representing label values

    • pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels

    • weights (NUMERIC) - Weights - 1D array of values (may be real/decimal) representing the weight/contribution of each prediction. Must be same length as both labels and predictions arrays

    hashtag
    confusionMatrix

    Compute the 2d confusion matrix of size [numClasses, numClasses] from a pair of labels and predictions, both of

    which are represented as integer values.

    For example, if labels = [0, 1, 1], predicted = [0, 2, 1], numClasses = 4, and weights = [1, 2, 3]

    [1, 0, 0, 0]

    [0, 3, 2, 0]

    [0, 0, 0, 0]

    [0, 0, 0, 0]

    • labels (NUMERIC) - Labels - 1D array of integer values representing label values

    • pred (NUMERIC) - Predictions - 1D array of integer values representing predictions. Same length as labels

    • weights (NUMERIC) - Weights - 1D array of values (may be real/decimal) representing the weight/contribution of each prediction. Must be same length as both labels and predictions arrays

    hashtag
    cos

    Elementwise cosine operation: out = cos(x)

    • x (NUMERIC) - Input variable

    hashtag
    cosh

    Elementwise cosh (hyperbolic cosine) operation: out = cosh(x)

    • x (NUMERIC) - Input variable

    hashtag
    cosineDistance

    Cosine distance reduction operation. The output contains the cosine distance for each

    tensor/subset along the specified dimensions:

    out = 1.0 - cosineSimilarity(x,y)

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate cosineDistance over (Size: AtLeast(min=0))

    hashtag
    cosineSimilarity

    Cosine similarity pairwise reduction operation. The output contains the cosine similarity for each tensor/subset

    along the specified dimensions:

    out = (sum_i x[i] y[i]) / ( sqrt(sum_i x[i]^2) sqrt(sum_i y[i]^2)

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate cosineSimilarity over (Size: AtLeast(min=0))

    hashtag
    countNonZero

    Count non zero array reduction operation, optionally along specified dimensions: out = count(x != 0)

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    countZero

    Count zero array reduction operation, optionally along specified dimensions: out = count(x == 0)

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    cross

    Returns the pair-wise cross product of equal size arrays a and b: a x b = ||a||x||b|| sin(theta).

    Can take rank 1 or above inputs (of equal shapes), but note that the last dimension must have dimension 3

    • a (NUMERIC) - First input

    • b (NUMERIC) - Second input

    hashtag
    cube

    Element-wise cube function: out = x^3

    • x (NUMERIC) - Input variable

    hashtag
    diag

    Returns an output variable with diagonal values equal to the specified values; off-diagonal values will be set to 0

    For example, if input = [1,2,3], then output is given by:

    [ 1, 0, 0]

    [ 0, 2, 0]

    [ 0, 0, 3]

    Higher input ranks are also supported: if input has shape [a,...,R-1] then output[i,...,k,i,...,k] = input[i,...,k].

    i.e., for input rank R, output has rank 2R

    • x (NUMERIC) - Input variable

    hashtag
    diagPart

    Extract the diagonal part from the input array.

    If input is

    [ 1, 0, 0]

    [ 0, 2, 0]

    [ 0, 0, 3]

    then output is [1, 2, 3].

    Supports higher dimensions: in general, out[i,...,k] = in[i,...,k,i,...,k]

    • x (NUMERIC) - Input variable

    hashtag
    div

    Pairwise division operation, out = x / y

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    div

    Scalar division operation, out = in / scalar

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    entropy

    Entropy reduction: -sum(x * log(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    erf

    Element-wise Gaussian error function - out = erf(in)

    • x (NUMERIC) - Input variable

    hashtag
    erfc

    Element-wise complementary Gaussian error function - out = erfc(in) = 1 - erf(in)

    • x (NUMERIC) - Input variable

    hashtag
    euclideanDistance

    Euclidean distance (l2 norm, l2 distance) reduction operation. The output contains the Euclidean distance for each

    tensor/subset along the specified dimensions:

    out = sqrt( sum_i (x[i] - y[i])^2 )

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate euclideanDistance over (Size: AtLeast(min=0))

    hashtag
    exp

    Elementwise exponent function: out = exp(x) = 2.71828...^x

    • x (NUMERIC) - Input variable

    hashtag
    expm1

    Elementwise 1.0 - exponent function: out = 1.0 - exp(x) = 1.0 - 2.71828...^x

    • x (NUMERIC) - Input variable

    hashtag
    eye

    Generate an identity matrix with the specified number of rows and columns.

    • rows - Number of rows

    hashtag
    eye

    As per eye(String, int, int, DataType) but with the default datatype, Eye.DEFAULT_DTYPE

    • rows - Number of rows

    • cols - Number of columns

    hashtag
    eye

    Generate an identity matrix with the specified number of rows and columns

    Example:

    • rows - Number of rows

    • cols - Number of columns

    • dataType - Data type

    hashtag
    eye

    As per eye(int, int) bit with the number of rows/columns specified as scalar INDArrays

    • rows (INT) - Number of rows

    • cols (INT) - Number of columns

    hashtag
    eye

    As per eye(String, int) but with the number of rows specified as a scalar INDArray

    • rows (INT) - Number of rows

    hashtag
    firstIndex

    First index reduction operation.

    Returns a variable that contains the index of the first element that matches the specified condition (for each

    slice along the specified dimensions)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • in (NUMERIC) - Input variable

    • condition - Condition to check on input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))

    hashtag
    floor

    Element-wise floor function: out = floor(x).

    Rounds each value down to the nearest integer value (if not already an integer)

    • x (NUMERIC) - Input variable

    hashtag
    floorDiv

    Pairwise floor division operation, out = floor(x / y)

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    floorMod

    Pairwise Modulus division operation

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    floorMod

    Scalar floor modulus operation

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    hammingDistance

    Hamming distance reduction operation. The output contains the cosine distance for each

    tensor/subset along the specified dimensions:

    out = count( x[i] != y[i] )

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate hammingDistance over (Size: AtLeast(min=0))

    hashtag
    iamax

    Index of the max absolute value: argmax(abs(in))

    see argmax(String, INDArray, boolean, int...)

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))

    • keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

    hashtag
    iamin

    Index of the min absolute value: argmin(abs(in))

    see argmin(String, INDArray, boolean, int...)

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))

    • keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

    hashtag
    isFinite

    Is finite operation: elementwise isFinite(x)

    Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

    value 0 otherwise

    • x (NUMERIC) - Input variable

    hashtag
    isInfinite

    Is infinite operation: elementwise isInfinite(x)

    Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

    value 0 otherwise

    • x (NUMERIC) - Input variable

    hashtag
    isMax

    Is maximum operation: elementwise x == max(x)

    Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

    value 0 otherwise

    • x (NUMERIC) - Input variable

    hashtag
    isNaN

    Is Not a Number operation: elementwise isNaN(x)

    Returns an array with the same shape/size as the input, with values 1 where condition is satisfied, or

    value 0 otherwise

    • x (NUMERIC) - Input variable

    hashtag
    isNonDecreasing

    Is the array non decreasing?

    An array is non-decreasing if for every valid i, x[i] <= x[i+1]. For Rank 2+ arrays, values are compared

    in 'c' (row major) order

    • x (NUMERIC) - Input variable

    hashtag
    isStrictlyIncreasing

    Is the array strictly increasing?

    An array is strictly increasing if for every valid i, x[i] < x[i+1]. For Rank 2+ arrays, values are compared

    in 'c' (row major) order

    • x (NUMERIC) - Input variable

    hashtag
    jaccardDistance

    Jaccard similarity reduction operation. The output contains the Jaccard distance for each

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate jaccardDistance over (Size: AtLeast(min=0))

    hashtag
    lastIndex

    Last index reduction operation.

    Returns a variable that contains the index of the last element that matches the specified condition (for each

    slice along the specified dimensions)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • in (NUMERIC) - Input variable

    • condition - Condition to check on input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=1))

    hashtag
    listDiff

    Calculates difference between inputs X and Y.

    • x (NUMERIC) - Input variable X

    • y (NUMERIC) - Input variable Y

    hashtag
    log

    Element-wise logarithm function (base e - natural logarithm): out = log(x)

    • x (NUMERIC) - Input variable

    hashtag
    log

    Element-wise logarithm function (with specified base): out = log_{base`(x)

    • x (NUMERIC) - Input variable

    • base - Logarithm base

    hashtag
    log1p

    Elementwise natural logarithm function: out = log_e (1 + x)

    • x (NUMERIC) - Input variable

    hashtag
    logEntropy

    Log entropy reduction: log(-sum(x * log(x)))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    logSumExp

    Log-sum-exp reduction (optionally along dimension).

    Computes log(sum(exp(x))

    • input (NUMERIC) - Input variable

    • dimensions - Optional dimensions to reduce along (Size: AtLeast(min=0))

    hashtag
    manhattanDistance

    Manhattan distance (l1 norm, l1 distance) reduction operation. The output contains the Manhattan distance for each

    tensor/subset along the specified dimensions:

    out = sum_i abs(x[i]-y[i])

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensions - Dimensions to calculate manhattanDistance over (Size: AtLeast(min=0))

    hashtag
    matrixDeterminant

    Matrix determinant op. For 2D input, this returns the standard matrix determinant.

    For higher dimensional input with shape [..., m, m] the matrix determinant is returned for each

    shape [m,m] sub-matrix.

    • in (NUMERIC) - Input

    hashtag
    matrixInverse

    Matrix inverse op. For 2D input, this returns the standard matrix inverse.

    For higher dimensional input with shape [..., m, m] the matrix inverse is returned for each

    shape [m,m] sub-matrix.

    • in (NUMERIC) - Input

    hashtag
    max

    Pairwise max operation, out = max(x, y)

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - First input variable, x

    • y (NUMERIC) - Second input variable, y

    hashtag
    mergeAdd

    Merge add function: merges an arbitrary number of equal shaped arrays using element-wise addition:

    out = sum_i in[i]

    • inputs (NUMERIC) - Input variables

    hashtag
    mergeAvg

    Merge average function: merges an arbitrary number of equal shaped arrays using element-wise mean operation:

    out = mean_i in[i]

    • inputs (NUMERIC) - Input variables

    hashtag
    mergeMax

    Merge max function: merges an arbitrary number of equal shaped arrays using element-wise maximum operation:

    out = max_i in[i]

    • inputs (NUMERIC) - Input variables

    hashtag
    meshgrid

    Broadcasts parameters for evaluation on an N-D grid.

    • inputs (NUMERIC) -

    • cartesian -

    hashtag
    min

    Pairwise max operation, out = min(x, y)

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - First input variable, x

    • y (NUMERIC) - Second input variable, y

    hashtag
    mod

    Pairwise modulus (remainder) operation, out = x % y

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    moments

    Calculate the mean and (population) variance for the input variable, for the specified axis

    • input (NUMERIC) - Input to calculate moments for

    • axes - Dimensions to perform calculation over (Size: AtLeast(min=0))

    hashtag
    mul

    Pairwise multiplication operation, out = x * y

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    mul

    Scalar multiplication operation, out = in * scalar

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    neg

    Elementwise negative operation: out = -x

    • x (NUMERIC) - Input variable

    hashtag
    normalizeMoments

    Calculate the mean and variance from the sufficient statistics

    • counts (NUMERIC) - Rank 0 (scalar) value with the total number of values used to calculate the sufficient statistics

    • means (NUMERIC) - Mean-value sufficient statistics: this is the SUM of all data values

    • variances (NUMERIC) - Variaance sufficient statistics: this is the squared sum of all data values

    hashtag
    or

    Boolean OR operation: elementwise (x != 0) || (y != 0)

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

    • x (BOOL) - Input 1

    • y (BOOL) - Input 2

    hashtag
    pow

    Element-wise power function: out = x^value

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    pow

    Element-wise (broadcastable) power function: out = x[i]^y[i]

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Power

    hashtag
    rationalTanh

    Rational Tanh Approximation elementwise function, as described in the paper:

    Compact Convolutional Neural Network Cascade for Face Detection

    This is a faster Tanh approximation

    • x (NUMERIC) - Input variable

    hashtag
    rdiv

    Pairwise reverse division operation, out = y / x

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    rdiv

    Scalar reverse division operation, out = scalar / in

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    reciprocal

    Element-wise reciprocal (inverse) function: out[i] = 1 / in[i]

    • x (NUMERIC) - Input variable

    hashtag
    rectifiedTanh

    Rectified tanh operation: max(0, tanh(in))

    • x (NUMERIC) - Input variable

    hashtag
    round

    Element-wise round function: out = round(x).

    Rounds (up or down depending on value) to the nearest integer value.

    • x (NUMERIC) - Input variable

    hashtag
    rsqrt

    Element-wise reciprocal (inverse) of square root: out = 1.0 / sqrt(x)

    • x (NUMERIC) - Input variable

    hashtag
    rsub

    Pairwise reverse subtraction operation, out = y - x

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    rsub

    Scalar reverse subtraction operation, out = scalar - in

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    setDiag

    Set the diagonal value to the specified values

    If input is

    [ a, b, c]

    [ d, e, f]

    [ g, h, i]

    and diag = [ 1, 2, 3] then output is

    [ 1, b, c]

    [ d, 2, f]

    [ g, h, 3]

    • in (NUMERIC) - Input variable

    • diag (NUMERIC) - Diagonal

    hashtag
    shannonEntropy

    Shannon Entropy reduction: -sum(x * log2(x))

    • in (NUMERIC) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    sign

    Element-wise sign (signum) function:

    out = -1 if in < 0

    out = 0 if in = 0

    out = 1 if in > 0

    • x (NUMERIC) - Input variable

    hashtag
    sin

    Elementwise sine operation: out = sin(x)

    • x (NUMERIC) - Input variable

    hashtag
    sinh

    Elementwise sinh (hyperbolic sine) operation: out = sinh(x)

    • x (NUMERIC) - Input variable

    hashtag
    sqrt

    Element-wise square root function: out = sqrt(x)

    • x (NUMERIC) - Input variable

    hashtag
    square

    Element-wise square function: out = x^2

    • x (NUMERIC) - Input variable

    hashtag
    squaredDifference

    Pairwise squared difference operation.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    standardize

    Standardize input variable along given axis

    out = (x - mean) / stdev

    with mean and stdev being calculated along the given dimension.

    For example: given x as a mini batch of the shape [numExamples, exampleLength]:

    • use dimension 1 too use the statistics (mean, stdev) for each example

    • use dimension 0 if you want to use the statistics for each column across all examples

    • use dimensions 0,1 if you want to use the statistics across all columns and examples

    hashtag
    step

    Elementwise step function:

    out(x) = 1 if x >= cutoff

    out(x) = 0 otherwise

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    sub

    Pairwise subtraction operation, out = x - y

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    sub

    Scalar subtraction operation, out = in - scalar

    • x (NUMERIC) - Input variable

    • value - Scalar value for op

    hashtag
    tan

    Elementwise tangent operation: out = tan(x)

    • x (NUMERIC) - Input variable

    hashtag
    tanh

    Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)

    • x (NUMERIC) - Input variable

    hashtag
    trace

    Matrix trace operation

    For rank 2 matrices, the output is a scalar vith the trace - i.e., sum of the main diagonal.

    For higher rank inputs, output[a,b,c] = trace(in[a,b,c,:,:])

    • in (NUMERIC) - Input variable

    hashtag
    xor

    Boolean XOR (exclusive OR) operation: elementwise (x != 0) XOR (y != 0)

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    Returns an array with values 1 where condition is satisfied, or value 0 otherwise.

    • x (BOOL) - Input 1

    • y (BOOL) - Input 2

    hashtag
    zeroFraction

    Full array zero fraction array reduction operation, optionally along specified dimensions: out = (count(x == 0) / length(x))

    • input (NUMERIC) - Input variable

    INDArray ClipByAvgNorm(INDArray x, double clipValue, int[] dimensions)
    
    SDVariable ClipByAvgNorm(SDVariable x, double clipValue, int[] dimensions)
    SDVariable ClipByAvgNorm(String name, SDVariable x, double clipValue, int[] dimensions)

    numClasses -

    dimensions - (Size: AtLeast(min=0))

    keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

    keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

    shift - Shift value, possibly 0, used when calculating the sufficient statistics (for numerical stability)

    x (NUMERIC) - Input variable

  • dimensions - (Size: AtLeast(min=1))

  • https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    INDArray EmbeddingLookup(INDArray x, INDArray indices, PartitionMode PartitionMode)
    
    SDVariable EmbeddingLookup(SDVariable x, SDVariable indices, PartitionMode PartitionMode)
    SDVariable EmbeddingLookup(String name, SDVariable x, SDVariable indices, PartitionMode PartitionMode)
    INDArray MergeMaxIndex(INDArray x, DataType dataType)
    INDArray MergeMaxIndex(INDArray x)
    
    SDVariable MergeMaxIndex(SDVariable x, DataType dataType)
    SDVariable MergeMaxIndex(SDVariable x)
    SDVariable MergeMaxIndex(String name, SDVariable x, DataType dataType)
    SDVariable MergeMaxIndex(String name, SDVariable x)
    INDArray abs(INDArray x)
    
    SDVariable abs(SDVariable x)
    SDVariable abs(String name, SDVariable x)
    INDArray acos(INDArray x)
    
    SDVariable acos(SDVariable x)
    SDVariable acos(String name, SDVariable x)
    INDArray acosh(INDArray x)
    
    SDVariable acosh(SDVariable x)
    SDVariable acosh(String name, SDVariable x)
    INDArray add(INDArray x, INDArray y)
    
    SDVariable add(SDVariable x, SDVariable y)
    SDVariable add(String name, SDVariable x, SDVariable y)
    INDArray add(INDArray x, double value)
    
    SDVariable add(SDVariable x, double value)
    SDVariable add(String name, SDVariable x, double value)
    INDArray amax(INDArray in, int[] dimensions)
    
    SDVariable amax(SDVariable in, int[] dimensions)
    SDVariable amax(String name, SDVariable in, int[] dimensions)
    INDArray amean(INDArray in, int[] dimensions)
    
    SDVariable amean(SDVariable in, int[] dimensions)
    SDVariable amean(String name, SDVariable in, int[] dimensions)
    INDArray amin(INDArray in, int[] dimensions)
    
    SDVariable amin(SDVariable in, int[] dimensions)
    SDVariable amin(String name, SDVariable in, int[] dimensions)
    INDArray and(INDArray x, INDArray y)
    
    SDVariable and(SDVariable x, SDVariable y)
    SDVariable and(String name, SDVariable x, SDVariable y)
    INDArray asin(INDArray x)
    
    SDVariable asin(SDVariable x)
    SDVariable asin(String name, SDVariable x)
    INDArray asinh(INDArray x)
    
    SDVariable asinh(SDVariable x)
    SDVariable asinh(String name, SDVariable x)
    INDArray asum(INDArray in, int[] dimensions)
    
    SDVariable asum(SDVariable in, int[] dimensions)
    SDVariable asum(String name, SDVariable in, int[] dimensions)
    INDArray atan(INDArray x)
    
    SDVariable atan(SDVariable x)
    SDVariable atan(String name, SDVariable x)
    INDArray atan2(INDArray y, INDArray x)
    
    SDVariable atan2(SDVariable y, SDVariable x)
    SDVariable atan2(String name, SDVariable y, SDVariable x)
    INDArray atanh(INDArray x)
    
    SDVariable atanh(SDVariable x)
    SDVariable atanh(String name, SDVariable x)
    INDArray bitShift(INDArray x, INDArray shift)
    
    SDVariable bitShift(SDVariable x, SDVariable shift)
    SDVariable bitShift(String name, SDVariable x, SDVariable shift)
    INDArray bitShiftRight(INDArray x, INDArray shift)
    
    SDVariable bitShiftRight(SDVariable x, SDVariable shift)
    SDVariable bitShiftRight(String name, SDVariable x, SDVariable shift)
    INDArray bitShiftRotl(INDArray x, INDArray shift)
    
    SDVariable bitShiftRotl(SDVariable x, SDVariable shift)
    SDVariable bitShiftRotl(String name, SDVariable x, SDVariable shift)
    INDArray bitShiftRotr(INDArray x, INDArray shift)
    
    SDVariable bitShiftRotr(SDVariable x, SDVariable shift)
    SDVariable bitShiftRotr(String name, SDVariable x, SDVariable shift)
    INDArray ceil(INDArray x)
    
    SDVariable ceil(SDVariable x)
    SDVariable ceil(String name, SDVariable x)
    INDArray clipByNorm(INDArray x, double clipValue, int[] dimensions)
    
    SDVariable clipByNorm(SDVariable x, double clipValue, int[] dimensions)
    SDVariable clipByNorm(String name, SDVariable x, double clipValue, int[] dimensions)
    INDArray clipByValue(INDArray x, double clipValueMin, double clipValueMax)
    
    SDVariable clipByValue(SDVariable x, double clipValueMin, double clipValueMax)
    SDVariable clipByValue(String name, SDVariable x, double clipValueMin, double clipValueMax)
    INDArray confusionMatrix(INDArray labels, INDArray pred, DataType dataType)
    
    SDVariable confusionMatrix(SDVariable labels, SDVariable pred, DataType dataType)
    SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, DataType dataType)
    INDArray confusionMatrix(INDArray labels, INDArray pred, int numClasses)
    
    SDVariable confusionMatrix(SDVariable labels, SDVariable pred, int numClasses)
    SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, int numClasses)
    INDArray confusionMatrix(INDArray labels, INDArray pred, INDArray weights)
    
    SDVariable confusionMatrix(SDVariable labels, SDVariable pred, SDVariable weights)
    SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, SDVariable weights)
    INDArray confusionMatrix(INDArray labels, INDArray pred, INDArray weights, int numClasses)
    
    SDVariable confusionMatrix(SDVariable labels, SDVariable pred, SDVariable weights, int numClasses)
    SDVariable confusionMatrix(String name, SDVariable labels, SDVariable pred, SDVariable weights, int numClasses)
    INDArray cos(INDArray x)
    
    SDVariable cos(SDVariable x)
    SDVariable cos(String name, SDVariable x)
    INDArray cosh(INDArray x)
    
    SDVariable cosh(SDVariable x)
    SDVariable cosh(String name, SDVariable x)
    INDArray cosineDistance(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable cosineDistance(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable cosineDistance(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray cosineSimilarity(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable cosineSimilarity(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable cosineSimilarity(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray countNonZero(INDArray in, int[] dimensions)
    
    SDVariable countNonZero(SDVariable in, int[] dimensions)
    SDVariable countNonZero(String name, SDVariable in, int[] dimensions)
    INDArray countZero(INDArray in, int[] dimensions)
    
    SDVariable countZero(SDVariable in, int[] dimensions)
    SDVariable countZero(String name, SDVariable in, int[] dimensions)
    INDArray cross(INDArray a, INDArray b)
    
    SDVariable cross(SDVariable a, SDVariable b)
    SDVariable cross(String name, SDVariable a, SDVariable b)
    INDArray cube(INDArray x)
    
    SDVariable cube(SDVariable x)
    SDVariable cube(String name, SDVariable x)
    INDArray diag(INDArray x)
    
    SDVariable diag(SDVariable x)
    SDVariable diag(String name, SDVariable x)
    INDArray diagPart(INDArray x)
    
    SDVariable diagPart(SDVariable x)
    SDVariable diagPart(String name, SDVariable x)
    INDArray div(INDArray x, INDArray y)
    
    SDVariable div(SDVariable x, SDVariable y)
    SDVariable div(String name, SDVariable x, SDVariable y)
    INDArray div(INDArray x, double value)
    
    SDVariable div(SDVariable x, double value)
    SDVariable div(String name, SDVariable x, double value)
    INDArray entropy(INDArray in, int[] dimensions)
    
    SDVariable entropy(SDVariable in, int[] dimensions)
    SDVariable entropy(String name, SDVariable in, int[] dimensions)
    INDArray erf(INDArray x)
    
    SDVariable erf(SDVariable x)
    SDVariable erf(String name, SDVariable x)
    INDArray erfc(INDArray x)
    
    SDVariable erfc(SDVariable x)
    SDVariable erfc(String name, SDVariable x)
    INDArray euclideanDistance(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable euclideanDistance(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable euclideanDistance(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray exp(INDArray x)
    
    SDVariable exp(SDVariable x)
    SDVariable exp(String name, SDVariable x)
    INDArray expm1(INDArray x)
    
    SDVariable expm1(SDVariable x)
    SDVariable expm1(String name, SDVariable x)
    INDArray eye(int rows)
    
    SDVariable eye(int rows)
    SDVariable eye(String name, int rows)
    INDArray eye(int rows, int cols)
    
    SDVariable eye(int rows, int cols)
    SDVariable eye(String name, int rows, int cols)
    INDArray eye(int rows, int cols, DataType dataType, int[] dimensions)
    
    SDVariable eye(int rows, int cols, DataType dataType, int[] dimensions)
    SDVariable eye(String name, int rows, int cols, DataType dataType, int[] dimensions)
    
    
    `INDArray eye = eye(3,2)
    
    eye:
    
    [ 1, 0]
    
    [ 0, 1]
    
    [ 0, 0]`
    
    INDArray eye(INDArray rows, INDArray cols)
    
    SDVariable eye(SDVariable rows, SDVariable cols)
    SDVariable eye(String name, SDVariable rows, SDVariable cols)
    INDArray eye(INDArray rows)
    
    SDVariable eye(SDVariable rows)
    SDVariable eye(String name, SDVariable rows)
    INDArray firstIndex(INDArray in, Condition condition, int[] dimensions)
    INDArray firstIndex(INDArray in, Condition condition, boolean keepDims, int[] dimensions)
    
    SDVariable firstIndex(SDVariable in, Condition condition, int[] dimensions)
    SDVariable firstIndex(SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
    SDVariable firstIndex(String name, SDVariable in, Condition condition, int[] dimensions)
    SDVariable firstIndex(String name, SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
    INDArray floor(INDArray x)
    
    SDVariable floor(SDVariable x)
    SDVariable floor(String name, SDVariable x)
    INDArray floorDiv(INDArray x, INDArray y)
    
    SDVariable floorDiv(SDVariable x, SDVariable y)
    SDVariable floorDiv(String name, SDVariable x, SDVariable y)
    INDArray floorMod(INDArray x, INDArray y)
    
    SDVariable floorMod(SDVariable x, SDVariable y)
    SDVariable floorMod(String name, SDVariable x, SDVariable y)
    INDArray floorMod(INDArray x, double value)
    
    SDVariable floorMod(SDVariable x, double value)
    SDVariable floorMod(String name, SDVariable x, double value)
    INDArray hammingDistance(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable hammingDistance(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable hammingDistance(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray iamax(INDArray in, int[] dimensions)
    INDArray iamax(INDArray in, boolean keepDims, int[] dimensions)
    
    SDVariable iamax(SDVariable in, int[] dimensions)
    SDVariable iamax(SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable iamax(String name, SDVariable in, int[] dimensions)
    SDVariable iamax(String name, SDVariable in, boolean keepDims, int[] dimensions)
    INDArray iamin(INDArray in, int[] dimensions)
    INDArray iamin(INDArray in, boolean keepDims, int[] dimensions)
    
    SDVariable iamin(SDVariable in, int[] dimensions)
    SDVariable iamin(SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable iamin(String name, SDVariable in, int[] dimensions)
    SDVariable iamin(String name, SDVariable in, boolean keepDims, int[] dimensions)
    INDArray isFinite(INDArray x)
    
    SDVariable isFinite(SDVariable x)
    SDVariable isFinite(String name, SDVariable x)
    INDArray isInfinite(INDArray x)
    
    SDVariable isInfinite(SDVariable x)
    SDVariable isInfinite(String name, SDVariable x)
    INDArray isMax(INDArray x)
    
    SDVariable isMax(SDVariable x)
    SDVariable isMax(String name, SDVariable x)
    INDArray isNaN(INDArray x)
    
    SDVariable isNaN(SDVariable x)
    SDVariable isNaN(String name, SDVariable x)
    INDArray isNonDecreasing(INDArray x)
    
    SDVariable isNonDecreasing(SDVariable x)
    SDVariable isNonDecreasing(String name, SDVariable x)
    INDArray isStrictlyIncreasing(INDArray x)
    
    SDVariable isStrictlyIncreasing(SDVariable x)
    SDVariable isStrictlyIncreasing(String name, SDVariable x)
    INDArray jaccardDistance(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable jaccardDistance(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable jaccardDistance(String name, SDVariable x, SDVariable y, int[] dimensions)
                tensor along the specified dimensions.
    INDArray lastIndex(INDArray in, Condition condition, int[] dimensions)
    INDArray lastIndex(INDArray in, Condition condition, boolean keepDims, int[] dimensions)
    
    SDVariable lastIndex(SDVariable in, Condition condition, int[] dimensions)
    SDVariable lastIndex(SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
    SDVariable lastIndex(String name, SDVariable in, Condition condition, int[] dimensions)
    SDVariable lastIndex(String name, SDVariable in, Condition condition, boolean keepDims, int[] dimensions)
    INDArray[] listDiff(INDArray x, INDArray y)
    
    SDVariable[] listDiff(SDVariable x, SDVariable y)
    SDVariable[] listDiff(String name, SDVariable x, SDVariable y)
    INDArray log(INDArray x)
    
    SDVariable log(SDVariable x)
    SDVariable log(String name, SDVariable x)
    INDArray log(INDArray x, double base)
    
    SDVariable log(SDVariable x, double base)
    SDVariable log(String name, SDVariable x, double base)
    INDArray log1p(INDArray x)
    
    SDVariable log1p(SDVariable x)
    SDVariable log1p(String name, SDVariable x)
    INDArray logEntropy(INDArray in, int[] dimensions)
    
    SDVariable logEntropy(SDVariable in, int[] dimensions)
    SDVariable logEntropy(String name, SDVariable in, int[] dimensions)
    INDArray logSumExp(INDArray input, int[] dimensions)
    
    SDVariable logSumExp(SDVariable input, int[] dimensions)
    SDVariable logSumExp(String name, SDVariable input, int[] dimensions)
    INDArray manhattanDistance(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable manhattanDistance(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable manhattanDistance(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray matrixDeterminant(INDArray in)
    
    SDVariable matrixDeterminant(SDVariable in)
    SDVariable matrixDeterminant(String name, SDVariable in)
    INDArray matrixInverse(INDArray in)
    
    SDVariable matrixInverse(SDVariable in)
    SDVariable matrixInverse(String name, SDVariable in)
    INDArray max(INDArray x, INDArray y)
    
    SDVariable max(SDVariable x, SDVariable y)
    SDVariable max(String name, SDVariable x, SDVariable y)
    INDArray mergeAdd(INDArray inputs)
    
    SDVariable mergeAdd(SDVariable inputs)
    SDVariable mergeAdd(String name, SDVariable inputs)
    INDArray mergeAvg(INDArray inputs)
    
    SDVariable mergeAvg(SDVariable inputs)
    SDVariable mergeAvg(String name, SDVariable inputs)
    INDArray mergeMax(INDArray inputs)
    
    SDVariable mergeMax(SDVariable inputs)
    SDVariable mergeMax(String name, SDVariable inputs)
    INDArray[] meshgrid(INDArray inputs, boolean cartesian)
    
    SDVariable[] meshgrid(SDVariable inputs, boolean cartesian)
    SDVariable[] meshgrid(String name, SDVariable inputs, boolean cartesian)
    INDArray min(INDArray x, INDArray y)
    
    SDVariable min(SDVariable x, SDVariable y)
    SDVariable min(String name, SDVariable x, SDVariable y)
    INDArray mod(INDArray x, INDArray y)
    
    SDVariable mod(SDVariable x, SDVariable y)
    SDVariable mod(String name, SDVariable x, SDVariable y)
    INDArray[] moments(INDArray input, int[] axes)
    
    SDVariable[] moments(SDVariable input, int[] axes)
    SDVariable[] moments(String name, SDVariable input, int[] axes)
    INDArray mul(INDArray x, INDArray y)
    
    SDVariable mul(SDVariable x, SDVariable y)
    SDVariable mul(String name, SDVariable x, SDVariable y)
    INDArray mul(INDArray x, double value)
    
    SDVariable mul(SDVariable x, double value)
    SDVariable mul(String name, SDVariable x, double value)
    INDArray neg(INDArray x)
    
    SDVariable neg(SDVariable x)
    SDVariable neg(String name, SDVariable x)
    INDArray[] normalizeMoments(INDArray counts, INDArray means, INDArray variances, double shift)
    
    SDVariable[] normalizeMoments(SDVariable counts, SDVariable means, SDVariable variances, double shift)
    SDVariable[] normalizeMoments(String name, SDVariable counts, SDVariable means, SDVariable variances, double shift)
    INDArray or(INDArray x, INDArray y)
    
    SDVariable or(SDVariable x, SDVariable y)
    SDVariable or(String name, SDVariable x, SDVariable y)
    INDArray pow(INDArray x, double value)
    
    SDVariable pow(SDVariable x, double value)
    SDVariable pow(String name, SDVariable x, double value)
    INDArray pow(INDArray x, INDArray y)
    
    SDVariable pow(SDVariable x, SDVariable y)
    SDVariable pow(String name, SDVariable x, SDVariable y)
    INDArray rationalTanh(INDArray x)
    
    SDVariable rationalTanh(SDVariable x)
    SDVariable rationalTanh(String name, SDVariable x)
    INDArray rdiv(INDArray x, INDArray y)
    
    SDVariable rdiv(SDVariable x, SDVariable y)
    SDVariable rdiv(String name, SDVariable x, SDVariable y)
    INDArray rdiv(INDArray x, double value)
    
    SDVariable rdiv(SDVariable x, double value)
    SDVariable rdiv(String name, SDVariable x, double value)
    INDArray reciprocal(INDArray x)
    
    SDVariable reciprocal(SDVariable x)
    SDVariable reciprocal(String name, SDVariable x)
    INDArray rectifiedTanh(INDArray x)
    
    SDVariable rectifiedTanh(SDVariable x)
    SDVariable rectifiedTanh(String name, SDVariable x)
    INDArray round(INDArray x)
    
    SDVariable round(SDVariable x)
    SDVariable round(String name, SDVariable x)
    INDArray rsqrt(INDArray x)
    
    SDVariable rsqrt(SDVariable x)
    SDVariable rsqrt(String name, SDVariable x)
    INDArray rsub(INDArray x, INDArray y)
    
    SDVariable rsub(SDVariable x, SDVariable y)
    SDVariable rsub(String name, SDVariable x, SDVariable y)
    INDArray rsub(INDArray x, double value)
    
    SDVariable rsub(SDVariable x, double value)
    SDVariable rsub(String name, SDVariable x, double value)
    INDArray setDiag(INDArray in, INDArray diag)
    
    SDVariable setDiag(SDVariable in, SDVariable diag)
    SDVariable setDiag(String name, SDVariable in, SDVariable diag)
    INDArray shannonEntropy(INDArray in, int[] dimensions)
    
    SDVariable shannonEntropy(SDVariable in, int[] dimensions)
    SDVariable shannonEntropy(String name, SDVariable in, int[] dimensions)
    INDArray sign(INDArray x)
    
    SDVariable sign(SDVariable x)
    SDVariable sign(String name, SDVariable x)
    INDArray sin(INDArray x)
    
    SDVariable sin(SDVariable x)
    SDVariable sin(String name, SDVariable x)
    INDArray sinh(INDArray x)
    
    SDVariable sinh(SDVariable x)
    SDVariable sinh(String name, SDVariable x)
    INDArray sqrt(INDArray x)
    
    SDVariable sqrt(SDVariable x)
    SDVariable sqrt(String name, SDVariable x)
    INDArray square(INDArray x)
    
    SDVariable square(SDVariable x)
    SDVariable square(String name, SDVariable x)
    INDArray squaredDifference(INDArray x, INDArray y)
    
    SDVariable squaredDifference(SDVariable x, SDVariable y)
    SDVariable squaredDifference(String name, SDVariable x, SDVariable y)
    INDArray standardize(INDArray x, int[] dimensions)
    
    SDVariable standardize(SDVariable x, int[] dimensions)
    SDVariable standardize(String name, SDVariable x, int[] dimensions)
    INDArray step(INDArray x, double value)
    
    SDVariable step(SDVariable x, double value)
    SDVariable step(String name, SDVariable x, double value)
    INDArray sub(INDArray x, INDArray y)
    
    SDVariable sub(SDVariable x, SDVariable y)
    SDVariable sub(String name, SDVariable x, SDVariable y)
    INDArray sub(INDArray x, double value)
    
    SDVariable sub(SDVariable x, double value)
    SDVariable sub(String name, SDVariable x, double value)
    INDArray tan(INDArray x)
    
    SDVariable tan(SDVariable x)
    SDVariable tan(String name, SDVariable x)
    INDArray tanh(INDArray x)
    
    SDVariable tanh(SDVariable x)
    SDVariable tanh(String name, SDVariable x)
    INDArray trace(INDArray in)
    
    SDVariable trace(SDVariable in)
    SDVariable trace(String name, SDVariable in)
    INDArray xor(INDArray x, INDArray y)
    
    SDVariable xor(SDVariable x, SDVariable y)
    SDVariable xor(String name, SDVariable x, SDVariable y)
    INDArray zeroFraction(INDArray input)
    
    SDVariable zeroFraction(SDVariable input)
    SDVariable zeroFraction(String name, SDVariable input)

    BaseOps

    circle-info

    These ops are generally available directly on SameDiff instances. Due to an oversight before the release, this ops aren't also available on Nd4j. To use the INDArray variants of these operations, you will have to instantiate a NDBase instance.

    hashtag
    all

    Boolean and array reduction operation, optionally along specified dimensions

    • x (NDARRAY) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    any

    Boolean or array reduction operation, optionally along specified dimensions

    • x (NDARRAY) - Input variable

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    argmax

    Argmax array reduction operation, optionally along specified dimensions.

    Output values are the index of the maximum value of each slice along the specified dimension.

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • in (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    argmin

    Argmin array reduction operation, optionally along specified dimensions.

    Output values are the index of the minimum value of each slice along the specified dimension.

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • in (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    batchMmul

    Matrix multiply a batch of matrices. matricesA and matricesB have to be arrays of same

    length and each pair taken from these sets has to have dimensions (M, N) and (N, K),

    respectively. If transposeA is true, matrices from matricesA will have shape (N, M) instead.

    Likewise, if transposeB is true, matrices from matricesB will have shape (K, N).

    The result of this operation will be a batch of multiplied matrices. The

    result has the same length as both input batches and each output matrix is of shape (M, K).

    • inputsA (NUMERIC) - First array of input matrices, all of shape (M, N) or (N, M)

    • inputsB (NUMERIC) - Second array of input matrices, all of shape (N, K) or (K, N)

    • transposeA - Whether to transpose A arrays or not - default = false

    hashtag
    castTo

    Cast the array to a new datatype - for example, Integer -> Float

    • arg (NDARRAY) - Input variable to cast

    • datatype - Datatype to cast to

    hashtag
    concat

    Concatenate a set of inputs along the specified dimension.

    Note that inputs must have identical rank and identical dimensions, other than the dimension to stack on.

    For example, if 2 inputs have shape [a, x, c] and [a, y, c] and dimension = 1, then the output has shape [a, x+y, c]

    • inputs (NUMERIC) - Input variables

    • dimension - Dimension to concatenate on

    hashtag
    cumprod

    Cumulative product operation.

    For input: [ a, b, c], output is:

    exclusive=false, reverse=false: [a, ab, ab*c]

    exclusive=true, reverse=false, [0, a, a*b]

    exclusive=false, reverse=true: [abc, b*c, c]

    exclusive=true, reverse=true: [b*c, c, 0]

    • in (NUMERIC) - Input variable

    • exclusive - If true: exclude the first value - default = false

    • reverse - If true: reverse the direction of the accumulation - default = false

    hashtag
    cumsum

    Cumulative sum operation.

    For input: [ a, b, c], output is:

    exclusive=false, reverse=false: [a, a+b, a+b+c]

    exclusive=true, reverse=false, [0, a, a+b]

    exclusive=false, reverse=true: [a+b+c, b+c, c]

    exclusive=true, reverse=true: [b+c, c, 0]

    • in (NUMERIC) - Input variable

    • exclusive - If true: exclude the first value - default = false

    • reverse - If true: reverse the direction of the accumulation - default = false

    hashtag
    dot

    Pairwise dot product reduction along dimension

    output = sum(i=0 ... size(dim)-1) x[i] * y[i]

    • x (NUMERIC) - first input

    • y (NUMERIC) - second input

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    dynamicPartition

    Dynamically partition the input variable values into the specified number of paritions, using the indices.

    Example:

    • x (NUMERIC) - Input variable

    • partitions (INT) - 1D input with values 0 to numPartitions-1

    • numPartitions - Number of partitions, >= 1

    hashtag
    dynamicStitch

    Dynamically merge the specified input arrays into a single array, using the specified indices

    • indices (INT) - Indices to use when merging. Must be >= 1, same length as input variables

    • x (NUMERIC) - Input variables.

    hashtag
    eq

    Equals operation: elementwise x == y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    eq

    Equal to operation: elementwise x == y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    expandDims

    Reshape the input by adding a 1 at the specified location.

    For example, if input has shape [a, b], then output shape is:

    axis = 0: [1, a, b]

    axis = 1: [a, 1, b]

    axis = 2: [a, b, 1]

    • x (NDARRAY) - Input variable

    • axis - Axis to expand

    hashtag
    fill

    Generate an output variable with the specified (dynamic) shape with all elements set to the specified value

    • shape (INT) - Shape: must be a 1D array/variable

    • dataType - Datatype of the output array

    • value - Value to set all elements to

    hashtag
    gather

    Gather slices from the input variable where the indices are specified as fixed int[] values.

    Output shape is same as input shape, except for axis dimension, which has size equal to indices.length.

    • df (NUMERIC) - Input variable

    • indices - Indices to get (Size: AtLeast(min=1))

    • axis - Axis that the indices refer to

    hashtag
    gather

    Gather slices from the input variable where the indices are specified as dynamic array values.

    Output shape is same as input shape, except for axis dimension, which has size equal to indices.length.

    • df (NUMERIC) - Input variable

    • indices (INT) - Indices to get slices for. Rank 0 or 1 input

    • axis - Axis that the indices refer to

    hashtag
    gatherNd

    Gather slices from df with shape specified by indices.

    • df (NUMERIC) -

    • indices (NUMERIC) -

    hashtag
    gt

    Greater than operation: elementwise x > y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    gt

    Greater than operation: elementwise x > y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    gte

    Greater than or equals operation: elementwise x >= y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    gte

    Greater than or equal to operation: elementwise x >= y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    identity

    Elementwise identity operation: out = x

    • input (NUMERIC) - Input variable

    hashtag
    invertPermutation

    Compute the inverse permutation indices for a permutation operation

    Example: if input is [2, 0, 1] then output is [1, 2, 0]

    The idea is that x.permute(input).permute(invertPermutation(input)) == x

    • input (INT) - 1D indices for permutation

    hashtag
    isNumericTensor

    Is the director a numeric tensor? In the current version of ND4J/SameDiff, this always returns true/1

    • x (NUMERIC) - Input variable

    hashtag
    linspace

    Create a new 1d array with values evenly spaced between values 'start' and 'stop'

    For example, linspace(start=3.0, stop=4.0, number=3) will generate [3.0, 3.5, 4.0]

    • dataType - Data type of the output array

    • start - Start value

    • stop - Stop value

    hashtag
    linspace

    Create a new 1d array with values evenly spaced between values 'start' and 'stop'

    For example, linspace(start=3.0, stop=4.0, number=3) will generate [3.0, 3.5, 4.0]

    • start (NUMERIC) - Start value

    • stop (NUMERIC) - Stop value

    • number (LONG) - Number of values to generate

    hashtag
    lt

    Less than operation: elementwise x < y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    lt

    Less than operation: elementwise x < y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    lte

    Less than or equals operation: elementwise x <= y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    lte

    Less than or equal to operation: elementwise x <= y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    matchCondition

    Returns a boolean mask of equal shape to the input, where the condition is satisfied - value 1 where satisfied, 0 otherwise

    • in (NUMERIC) - Input

    • condition - Condition

    hashtag
    matchConditionCount

    Returns a count of the number of elements that satisfy the condition

    • in (NUMERIC) - Input

    • condition - Condition

    hashtag
    matchConditionCount

    Returns a count of the number of elements that satisfy the condition (for each slice along the specified dimensions)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • in (NUMERIC) - Input variable

    • condition - Condition

    • keepDim - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    hashtag
    max

    Max array reduction operation, optionally along specified dimensions

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    max

    Element-wise maximum operation: out[i] = max(first[i], second[i])

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • first (NUMERIC) - First input array

    • second (NUMERIC) - Second input array

    hashtag
    mean

    Mean (average) array reduction operation, optionally along specified dimensions

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    merge

    The merge operation is a control operation that forwards the either of the inputs to the output, when

    the first of them becomes available. If both are available, the output is undefined (either input could

    be forwarded to the output)

    • x (NUMERIC) - Input variable

    • y (NUMERIC) - Input variable

    hashtag
    min

    Minimum array reduction operation, optionally along specified dimensions. out = min(in)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    min

    Element-wise minimum operation: out[i] = min(first[i], second[i])

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    • first (NUMERIC) - First input array

    • second (NUMERIC) - Second input array

    hashtag
    mmul

    Matrix multiplication: out = mmul(x,y)

    Supports specifying transpose argument to perform operation such as mmul(a^T, b), etc.

    • x (NUMERIC) - First input variable

    • y (NUMERIC) - Second input variable

    • transposeX - Transpose x (first argument) - default = false

    hashtag
    neq

    Not equals operation: elementwise x != y

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input array

    • y - Double value argument to use in operation

    hashtag
    neq

    Not equal to operation: elementwise x != y

    If x and y arrays have equal shape, the output shape is the same as these inputs.

    Note: supports broadcasting if x and y have different shapes and are broadcastable.

    For example, if X has shape [1,10] and Y has shape [5,10] then op(X,Y) has output shape [5,10]

    Broadcast rules are the same as NumPy:

    Return boolean array with values true where satisfied, or false otherwise.

    • x (NUMERIC) - Input 1

    • y (NUMERIC) - Input 2

    hashtag
    norm1

    Norm1 (L1 norm) reduction operation: The output contains the L1 norm for each tensor/subset along the specified dimensions:

    out = sum_i abs(x[i])

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - dimensions to reduce over (Size: AtLeast(min=0))

    hashtag
    norm2

    Norm2 (L2 norm) reduction operation: The output contains the L2 norm for each tensor/subset along the specified dimensions:

    out = sqrt(sum_i x[i]^2)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - dimensions dimensions to reduce over (Size: AtLeast(min=0))

    hashtag
    normmax

    Max norm (infinity norm) reduction operation: The output contains the max norm for each tensor/subset along the

    specified dimensions:

    out = max(abs(x[i]))

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - dimensions to reduce over (Size: AtLeast(min=0))

    hashtag
    oneHot

    Convert the array to a one-hot array with walues and for each entry

    If input has shape [ a, ..., n] then output has shape [ a, ..., n, depth],

    with {out[i, ..., j, in[i,...,j]] with other values being set to

    • indices (NUMERIC) - Indices - value 0 to depth-1

    • depth - Number of classes

    • axis -

    hashtag
    oneHot

    Convert the array to a one-hot array with walues 0 and 1 for each entry

    If input has shape [ a, ..., n] then output has shape [ a, ..., n, depth],

    with out[i, ..., j, in[i,...,j]] = 1 with other values being set to 0

    see oneHot(SDVariable, int, int, double, double)

    • indices (NUMERIC) - Indices - value 0 to depth-1

    • depth - Number of classes

    hashtag
    onesLike

    Return a variable of all 1s, with the same shape as the input variable. Note that this is dynamic:

    if the input shape changes in later execution, the returned variable's shape will also be updated

    • input (NUMERIC) - Input INDArray

    hashtag
    onesLike

    As per onesLike(String, SDVariable) but the output datatype may be specified

    • input (NUMERIC) -

    • dataType -

    hashtag
    permute

    Array permutation operation: permute the dimensions according to the specified permutation indices.

    Example: if input has shape [a,b,c] and dimensions = [2,0,1] the output has shape [c,a,b]

    • x (NUMERIC) - Input variable

    • dimensions (INT) - Permute dimensions

    hashtag
    permute

    Array permutation operation: permute the dimensions according to the specified permutation indices.

    Example: if input has shape [a,b,c] and dimensions = [2,0,1] the output has shape [c,a,b]

    • x (NUMERIC) - Input variable

    • dimensions - (Size: AtLeast(min=0))

    hashtag
    prod

    Product array reduction operation, optionally along specified dimensions

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    range

    Create a new variable with a 1d array, where the values start at from and increment by step

    up to (but not including) limit.

    For example, range(1.0, 3.0, 0.5) will return [1.0, 1.5, 2.0, 2.5]

    • from - Initial/smallest value

    • to - Largest value (exclusive)

    • step - Step size

    hashtag
    range

    Create a new variable with a 1d array, where the values start at from and increment by step

    up to (but not including) limit.

    For example, range(1.0, 3.0, 0.5) will return [1.0, 1.5, 2.0, 2.5]

    • from (NUMERIC) - Initial/smallest value

    • to (NUMERIC) - Largest value (exclusive)

    • step (NUMERIC) - Step size

    hashtag
    rank

    Returns the rank (number of dimensions, i.e., length(shape)) of the specified INDArray as a 0D scalar variable

    • in (NUMERIC) - Input variable

    hashtag
    replaceWhere

    Element-wise replace where condition:

    out[i] = from[i] if condition(update[i]) is satisfied, or

    out[i] = update[i] if condition(update[i]) is NOT satisfied

    • update (NUMERIC) - Source array

    • from (NUMERIC) - Replacement values array (used conditionally). Must be same shape as 'update' array

    • condition - Condition to check on update array elements

    hashtag
    replaceWhere

    Element-wise replace where condition:

    out[i] = value if condition(update[i]) is satisfied, or

    out[i] = update[i] if condition(update[i]) is NOT satisfied

    • update (NUMERIC) - Source array

    • value - Value to set at the output, if the condition is satisfied

    • condition - Condition to check on update array elements

    hashtag
    reshape

    Reshape the input variable to the specified (fixed) shape. The output variable will have the same values as the

    input, but with the specified shape.

    Note that prod(shape) must match length(input) == prod(input.shape)

    • x (NUMERIC) - Input variable

    • shape (NUMERIC) - New shape for variable

    hashtag
    reshape

    Reshape the input variable to the specified (fixed) shape. The output variable will have the same values as the

    input, but with the specified shape.

    Note that prod(shape) must match length(input) == prod(input.shape)

    • x (NUMERIC) - Input variable

    • shape - New shape for variable (Size: AtLeast(min=0))

    hashtag
    reverse

    Reverse the values of an array for the specified dimensions

    If input is:

    [ 1, 2, 3]

    [ 4, 5, 6]

    then

    reverse(in, 0):

    [3, 2, 1]

    [6, 5, 4]

    reverse(in, 1):

    [4, 5, 6]

    [1, 2 3]

    • x (NUMERIC) - Input variable

    • dimensions - Input variable (Size: AtLeast(min=0))

    hashtag
    reverseSequence

    Reverse sequence op: for each slice along dimension seqDimension, the first seqLength values are reversed

    • x (NUMERIC) - Input variable

    • seq_lengths (INT) - Length of the sequences

    • seqDim - Sequence dimension - default = -1

    hashtag
    scalarFloorMod

    Element-wise scalar floor modulus operation: out = floorMod(in, value).

    i.e., returns the remainder after division by 'value'

    • in (NUMERIC) - Input variable

    • value - Scalar value to compare

    hashtag
    scalarMax

    Element-wise scalar maximum operation: out = max(in, value)

    • in (NUMERIC) - Input variable

    • value - Scalar value to compare

    hashtag
    scalarMin

    Element-wise scalar minimum operation: out = min(in, value)

    • in (NUMERIC) - Input variable

    • value - Scalar value to compare

    hashtag
    scalarSet

    Return a variable with equal shape to the input, but all elements set to value 'set'

    • in (NUMERIC) - Input variable

    • set - Value to set

    hashtag
    scatterAdd

    Scatter addition operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterDiv

    Scatter division operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterMax

    Scatter max operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterMin

    Scatter min operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterMul

    Scatter multiplication operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterSub

    Scatter subtraction operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    scatterUpdate

    Scatter update operation.

    If indices is rank 0 (a scalar), then out[index, ...] = out[index, ...] + op(updates[...])

    If indices is rank 1 (a vector), then for each position i, out[indices[i], ...] = out[indices[i], ...] + op(updates[i, ...])

    If indices is rank 2+, then for each position (i,...,k), out[indices[i], ..., indices[k], ...] = out[indices[i], ..., indices[k], ...] + op(updates[i, ..., k, ...])

    Note that if multiple indices refer to the same location, the contributions from each is handled correctly.

    • ref (NUMERIC) - Initial/source variable

    • indices (NUMERIC) - Indices array

    • updates (NUMERIC) - Updates to add to the initial/source array

    hashtag
    segmentMax

    Segment max operation.

    If data = [3, 6, 1, 4, 9, 2, 8]

    segmentIds = [0, 0, 1, 1, 1, 2, 2]

    then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

    Note that the segment IDs must be sorted from smallest to largest segment.

    See {unsortedSegment (String, SDVariable, SDVariable, int) ops

    for the same op without this sorted requirement

    • data (NDARRAY) - Data to perform segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    hashtag
    segmentMean

    Segment mean operation.

    If data = [3, 6, 1, 4, 9, 2, 8]

    segmentIds = [0, 0, 1, 1, 1, 2, 2]

    then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

    Note that the segment IDs must be sorted from smallest to largest segment.

    See {unsortedSegment (String, SDVariable, SDVariable, int) ops

    for the same op without this sorted requirement

    • data (NDARRAY) - Data to perform segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    hashtag
    segmentMin

    Segment min operation.

    If data = [3, 6, 1, 4, 9, 2, 8]

    segmentIds = [0, 0, 1, 1, 1, 2, 2]

    then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

    Note that the segment IDs must be sorted from smallest to largest segment.

    See {unsortedSegment (String, SDVariable, SDVariable, int) ops

    for the same op without this sorted requirement

    • data (NDARRAY) - Data to perform segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    hashtag
    segmentProd

    Segment product operation.

    If data = [3, 6, 1, 4, 9, 2, 8]

    segmentIds = [0, 0, 1, 1, 1, 2, 2]

    then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

    Note that the segment IDs must be sorted from smallest to largest segment.

    See {unsortedSegment (String, SDVariable, SDVariable, int) ops

    for the same op without this sorted requirement

    • data (NDARRAY) - Data to perform segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    hashtag
    segmentSum

    Segment sum operation.

    If data = [3, 6, 1, 4, 9, 2, 8]

    segmentIds = [0, 0, 1, 1, 1, 2, 2]

    then output = [6, 9, 8] = [op(3,6), op(1,4,9), op(2,8)]

    Note that the segment IDs must be sorted from smallest to largest segment.

    See {unsortedSegment (String, SDVariable, SDVariable, int) ops

    for the same op without this sorted requirement

    • data (NDARRAY) - Data to perform segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    hashtag
    sequenceMask

    Generate a sequence mask (with values 0 or 1) based on the specified lengths

    Specifically, out[i, ..., k, j] = (j < lengths[i, ..., k] ? 1.0 : 0.0)

    • lengths (NUMERIC) - Lengths of the sequences

    • maxLen - Maximum sequence length

    • dataType -

    hashtag
    sequenceMask

    Generate a sequence mask (with values 0 or 1) based on the specified lengths

    Specifically, out[i, ..., k, j] = (j < lengths[i, ..., k] ? 1.0 : 0.0)

    • lengths (NUMERIC) - Lengths of the sequences

    • maxLen (INT) - Maximum sequence length

    • dataType -

    hashtag
    sequenceMask

    see sequenceMask(String, SDVariable, SDVariable, DataType)

    • lengths (NUMERIC) -

    • dataType -

    hashtag
    shape

    Returns the shape of the specified INDArray as a 1D INDArray

    • input (NUMERIC) - Input variable

    hashtag
    size

    Returns the size (number of elements, i.e., prod(shape)) of the specified INDArray as a 0D scalar variable

    • in (NUMERIC) - Input variable

    hashtag
    sizeAt

    Returns a rank 0 (scalar) variable for the size of the specified dimension.

    For example, if X has shape [10,20,30] then sizeAt(X,1)=20. Similarly, sizeAt(X,-1)=30

    • in (NUMERIC) - Input variable

    • dimension - Dimension to get size of

    hashtag
    slice

    Get a subset of the specified input, by specifying the first element and the size of the array.

    For example, if input is:

    [a, b, c]

    [d, e, f]

    then slice(input, begin=[0,1], size=[2,1] will return:

    [b]

    [e]

    Note that for each dimension i, begin[i] + size[i] <= input.size(i)

    • input (NUMERIC) - input Variable to get subset of

    • begin - Beginning index. Must be same length as rank of input array (Size: AtLeast(min=1))

    • size - Size of the output array. Must be same length as rank of input array (Size: AtLeast(min=1))

    hashtag
    slice

    Get a subset of the specified input, by specifying the first element and the size of the array.

    For example, if input is:

    [a, b, c]

    [d, e, f]

    then slice(input, begin=[0,1], size=[2,1] will return:

    [b]

    [e]

    Note that for each dimension i, begin[i] + size[i] <= input.size(i)

    • input (NUMERIC) - input Variable to get subset of

    • begin (INT) - Beginning index. Must be same length as rank of input array

    • size (INT) - Size of the output array. Must be same length as rank of input array

    hashtag
    squaredNorm

    Squared L2 norm: see norm2(String, SDVariable, boolean, int...)

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) -

    • keepDims - - default = false

    • dimensions - (Size: AtLeast(min=0))

    hashtag
    squeeze

    Remove a single dimension of size 1.

    For example, if input has shape [a,b,1,c] then squeeze(input, 2) returns an array of shape [a,b,c]

    • x (NUMERIC) - Input variable

    • axis - Size 1 dimension to remove

    hashtag
    stack

    Stack a set of N INDArray of rank X into one rank X+1 variable.

    If inputs have shape [a,b,c] then output has shape:

    axis = 0: [N,a,b,c]

    axis = 1: [a,N,b,c]

    axis = 2: [a,b,N,c]

    axis = 3: [a,b,c,N]

    see unstack(String[], SDVariable, int, int)

    • values (NDARRAY) - Input variables to stack. Must have the same shape for all inputs

    • axis - Axis to stack on

    hashtag
    standardDeviation

    Stardard deviation array reduction operation, optionally along specified dimensions

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • biasCorrected - If true: divide by (N-1) (i.e., sample stdev). If false: divide by N (population stdev)

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    hashtag
    stridedSlice

    Get a subset of the specified input, by specifying the first element, last element, and the strides.

    For example, if input is:

    [a, b, c]

    [d, e, f]

    [g, h, i]

    then stridedSlice(input, begin=[0,1], end=[2,2], strides=[2,1], all masks = 0) will return:

    [b, c]

    [h, i]

    • in (NUMERIC) - Variable to get subset of

    • begin - Beginning index (Size: AtLeast(min=1))

    • end - End index (Size: AtLeast(min=1))

    hashtag
    sum

    Sum array reduction operation, optionally along specified dimensions.

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • keepDims - If true: keep the dimensions that are reduced on (as length 1). False: remove the reduction dimensions - default = false

    • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    hashtag
    switchOp

    Switch operation

    Predictate - if false, values are output to left (first) branch/output; if true, to right (second) branch/output

    • x (NDARRAY) - Input variable

    • predicate (BOOL) - Predictate - if false, values are output to left (first) branch/output; if true, to right (second) branch/output

    hashtag
    tensorMmul

    //TODO: Ops must be documented.

    • x (NUMERIC) - Input variable x

    • y (NUMERIC) - Input variable y

    • dimensionsX - dimensions for first input array (x) (Size: AtLeast(min=1))

    hashtag
    tile

    Repeat (tile) the input tensor the specified number of times.

    For example, if input is

    [1, 2]

    [3, 4]

    and repeat is [2, 3]

    then output is

    [1, 2, 1, 2, 1, 2]

    [3, 4, 3, 4, 3, 4]

    [1, 2, 1, 2, 1, 2]

    [3, 4, 3, 4, 3, 4]

    • x (NDARRAY) - Input variable

    • repeat (INT) - Number of times to repeat in each axis. Must have length equal to the rank of the input array

    hashtag
    tile

    see tile(String, SDVariable, int...)

    • x (NDARRAY) -

    • repeat - (Size: AtLeast(min=1))

    hashtag
    transpose

    Matrix transpose operation: If input has shape [a,b] output has shape [b,a]

    • x (NDARRAY) - Input variable

    hashtag
    unsortedSegmentMax

    Unsorted segment max operation. As per segmentMax(String, SDVariable, SDVariable) but without

    the requirement for the indices to be sorted.

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [6, 9, 8] = [max(3,6), max(1,4,9), max(2,8)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unsortedSegmentMean

    Unsorted segment mean operation. As per segmentMean(String, SDVariable, SDVariable) but without

    the requirement for the indices to be sorted.

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [4.5, 4.666, 5] = [mean(3,6), mean(1,4,9), mean(2,8)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unsortedSegmentMin

    Unsorted segment min operation. As per segmentMin(String, SDVariable, SDVariable) but without

    the requirement for the indices to be sorted.

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [3, 1, 2] = [min(3,6), min(1,4,9), min(2,8)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unsortedSegmentProd

    Unsorted segment product operation. As per segmentProd(String, SDVariable, SDVariable) but without

    the requirement for the indices to be sorted.

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [4.5, 4.666, 5] = [mean(3,6), mean(1,4,9), mean(2,8)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unsortedSegmentSqrtN

    Unsorted segment sqrtN operation. Simply returns the sqrt of the count of the number of values in each segment

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [1.414, 1.732, 1.414] = [sqrt(2), sqrtN(3), sqrtN(2)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unsortedSegmentSum

    Unsorted segment sum operation. As per segmentSum(String, SDVariable, SDVariable) but without

    the requirement for the indices to be sorted.

    If data = [1, 3, 2, 6, 4, 9, 8]

    segmentIds = [1, 0, 2, 0, 1, 1, 2]

    then output = [9, 14, 10] = [sum(3,6), sum(1,4,9), sum(2,8)]

    • data (NUMERIC) - Data (variable) to perform unsorted segment max on

    • segmentIds (NUMERIC) - Variable for the segment IDs

    • numSegments - Number of segments

    hashtag
    unstack

    Unstack a variable of rank X into N rank X-1 variables by taking slices along the specified axis.

    If input has shape [a,b,c] then output has shape:

    axis = 0: [b,c]

    axis = 1: [a,c]

    axis = 2: [a,b]

    • value (NDARRAY) - Input variable to unstack

    • axis - Axis to unstack on

    • num - Number of output variables

    hashtag
    variance

    Variance array reduction operation, optionally along specified dimensions

    Note that if keepDims = true, the output variable has the same rank as the input variable,

    with the reduced dimensions having size 1. This can be useful for later broadcast operations (such as subtracting

    the mean along a dimension).

    Example: if input has shape [a,b,c] and dimensions=[1] then output has shape:

    keepDims = true: [a,1,c]

    keepDims = false: [a,c]

    • x (NUMERIC) - Input variable

    • biasCorrected - If true: divide by (N-1) (i.e., sample variable). If false: divide by N (population variance)

    • keepDims - If true: keep the dimensions that are reduced on (as size 1). False: remove the reduction dimensions - default = false

    hashtag
    zerosLike

    Return a variable of all 0s, with the same shape as the input variable. Note that this is dynamic:

    if the input shape changes in later execution, the returned variable's shape will also be updated

    • input (NUMERIC) - Input

    transposeB - Whether to transpose B arrays or not - default = false

    axis - Scalar axis argument for dimension to perform cumululative sum operations along (Size: AtLeast(min=1))

    axis - Scalar axis argument for dimension to perform cumululative sum operations along (Size: AtLeast(min=1))

    number - Number of values to generate

    dataType - Data type of the output array

    dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    transposeY - Transpose y (second argument) - default = false

  • transposeZ - Transpose result array - default = false

  • on -

  • off -

  • dataType - Output data type - default = DataType.FLOAT

  • dataType -

    dataType -

    batchDim - Batch dimension - default = 0

    dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    strides - Stride ("step size") for each dimension. For example, stride of 2 means take every second element. (Size: AtLeast(min=1))

  • beginMask - Bit mask: If the ith bit is set to 1, then the value in the begin long[] is ignored, and a value of 0 is used instead for the beginning index for that dimension - default = 0

  • endMask - Bit mask: If the ith bit is set to 1, then the value in the end long[] is ignored, and a value of size(i)-1 is used instead for the end index for that dimension - default = 0

  • ellipsisMask - Bit mask: only one non-zero value is allowed here. If a non-zero value is set, then other dimensions are inserted as required at the specified position - default = 0

  • newAxisMask - Bit mask: if the ith bit is set to 1, then the begin/end/stride values are ignored, and a size 1 dimension is inserted at this point - default = 0

  • shrinkAxisMask - Bit mask: if the ith bit is set to 1, then the begin/end/stride values are ignored, and a size 1 dimension is removed at this point. Note that begin/end/stride values must result in a size 1 output for these dimensions - default = 0

  • dimensionsY - dimensions for second input array (y) (Size: AtLeast(min=1))

  • transposeX - Transpose x (first argument) - default = false

  • transposeY - Transpose y (second argument) - default = false

  • transposeZ - Transpose result array - default = false

  • dimensions - Dimensions to reduce over. If dimensions are not specified, full array reduction is performed (Size: AtLeast(min=0))

    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlarrow-up-right
    INDArray all(INDArray x, int[] dimensions)
    
    SDVariable all(SDVariable x, int[] dimensions)
    SDVariable all(String name, SDVariable x, int[] dimensions)
    INDArray any(INDArray x, int[] dimensions)
    
    SDVariable any(SDVariable x, int[] dimensions)
    SDVariable any(String name, SDVariable x, int[] dimensions)
    INDArray argmax(INDArray in, boolean keepDims, int[] dimensions)
    INDArray argmax(INDArray in, int[] dimensions)
    
    SDVariable argmax(SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable argmax(SDVariable in, int[] dimensions)
    SDVariable argmax(String name, SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable argmax(String name, SDVariable in, int[] dimensions)
    INDArray argmin(INDArray in, boolean keepDims, int[] dimensions)
    INDArray argmin(INDArray in, int[] dimensions)
    
    SDVariable argmin(SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable argmin(SDVariable in, int[] dimensions)
    SDVariable argmin(String name, SDVariable in, boolean keepDims, int[] dimensions)
    SDVariable argmin(String name, SDVariable in, int[] dimensions)
    INDArray batchMmul(INDArray inputsA, INDArray inputsB, boolean transposeA, boolean transposeB)
    INDArray batchMmul(INDArray inputsA, INDArray inputsB)
    
    SDVariable batchMmul(SDVariable inputsA, SDVariable inputsB, boolean transposeA, boolean transposeB)
    SDVariable batchMmul(SDVariable inputsA, SDVariable inputsB)
    SDVariable batchMmul(String name, SDVariable inputsA, SDVariable inputsB, boolean transposeA, boolean transposeB)
    SDVariable batchMmul(String name, SDVariable inputsA, SDVariable inputsB)
    INDArray castTo(INDArray arg, DataType datatype)
    
    SDVariable castTo(SDVariable arg, DataType datatype)
    SDVariable castTo(String name, SDVariable arg, DataType datatype)
    INDArray concat(INDArray inputs, int dimension)
    
    SDVariable concat(SDVariable inputs, int dimension)
    SDVariable concat(String name, SDVariable inputs, int dimension)
    INDArray cumprod(INDArray in, boolean exclusive, boolean reverse, int[] axis)
    INDArray cumprod(INDArray in, int[] axis)
    
    SDVariable cumprod(SDVariable in, boolean exclusive, boolean reverse, int[] axis)
    SDVariable cumprod(SDVariable in, int[] axis)
    SDVariable cumprod(String name, SDVariable in, boolean exclusive, boolean reverse, int[] axis)
    SDVariable cumprod(String name, SDVariable in, int[] axis)
    INDArray cumsum(INDArray in, boolean exclusive, boolean reverse, int[] axis)
    INDArray cumsum(INDArray in, int[] axis)
    
    SDVariable cumsum(SDVariable in, boolean exclusive, boolean reverse, int[] axis)
    SDVariable cumsum(SDVariable in, int[] axis)
    SDVariable cumsum(String name, SDVariable in, boolean exclusive, boolean reverse, int[] axis)
    SDVariable cumsum(String name, SDVariable in, int[] axis)
    INDArray dot(INDArray x, INDArray y, int[] dimensions)
    
    SDVariable dot(SDVariable x, SDVariable y, int[] dimensions)
    SDVariable dot(String name, SDVariable x, SDVariable y, int[] dimensions)
    INDArray dynamicPartition(INDArray x, INDArray partitions, int numPartitions)
    
    SDVariable dynamicPartition(SDVariable x, SDVariable partitions, int numPartitions)
    SDVariable dynamicPartition(String name, SDVariable x, SDVariable partitions, int numPartitions)
    
    
    input = [1,2,3,4,5]
    
    numPartitions = 2
    
    partitions = [1,0,0,1,0]
    
    out[0] = [2,3,5]
    
    out[1] = [1,4] `
    
    INDArray dynamicStitch(INDArray indices, INDArray x)
    
    SDVariable dynamicStitch(SDVariable indices, SDVariable x)
    SDVariable dynamicStitch(String name, SDVariable indices, SDVariable x)
    INDArray eq(INDArray x, double y)
    
    SDVariable eq(SDVariable x, double y)
    SDVariable eq(String name, SDVariable x, double y)
    INDArray eq(INDArray x, INDArray y)
    
    SDVariable eq(SDVariable x, SDVariable y)
    SDVariable eq(String name, SDVariable x, SDVariable y)
    INDArray expandDims(INDArray x, int axis)
    
    SDVariable expandDims(SDVariable x, int axis)
    SDVariable expandDims(String name, SDVariable x, int axis)
    INDArray fill(INDArray shape, DataType dataType, double value)
    
    SDVariable fill(SDVariable shape, DataType dataType, double value)
    SDVariable fill(String name, SDVariable shape, DataType dataType, double value)
    INDArray gather(INDArray df, int[] indices, int axis)
    
    SDVariable gather(SDVariable df, int[] indices, int axis)
    SDVariable gather(String name, SDVariable df, int[] indices, int axis)
    INDArray gather(INDArray df, INDArray indices, int axis)
    
    SDVariable gather(SDVariable df, SDVariable indices, int axis)
    SDVariable gather(String name, SDVariable df, SDVariable indices, int axis)
    INDArray gatherNd(INDArray df, INDArray indices)
    
    SDVariable gatherNd(SDVariable df, SDVariable indices)
    SDVariable gatherNd(String name, SDVariable df, SDVariable indices)
    INDArray gt(INDArray x, double y)
    
    SDVariable gt(SDVariable x, double y)
    SDVariable gt(String name, SDVariable x, double y)
    INDArray gt(INDArray x, INDArray y)
    
    SDVariable gt(SDVariable x, SDVariable y)
    SDVariable gt(String name, SDVariable x, SDVariable y)
    INDArray gte(INDArray x, double y)
    
    SDVariable gte(SDVariable x, double y)
    SDVariable gte(String name, SDVariable x, double y)
    INDArray gte(INDArray x, INDArray y)
    
    SDVariable gte(SDVariable x, SDVariable y)
    SDVariable gte(String name, SDVariable x, SDVariable y)
    INDArray identity(INDArray input)
    
    SDVariable identity(SDVariable input)
    SDVariable identity(String name, SDVariable input)
    INDArray invertPermutation(INDArray input)
    
    SDVariable invertPermutation(SDVariable input)
    SDVariable invertPermutation(String name, SDVariable input)
    INDArray isNumericTensor(INDArray x)
    
    SDVariable isNumericTensor(SDVariable x)
    SDVariable isNumericTensor(String name, SDVariable x)
    INDArray linspace(DataType dataType, double start, double stop, long number)
    
    SDVariable linspace(DataType dataType, double start, double stop, long number)
    SDVariable linspace(String name, DataType dataType, double start, double stop, long number)
    INDArray linspace(INDArray start, INDArray stop, INDArray number, DataType dataType)
    
    SDVariable linspace(SDVariable start, SDVariable stop, SDVariable number, DataType dataType)
    SDVariable linspace(String name, SDVariable start, SDVariable stop, SDVariable number, DataType dataType)
    INDArray lt(INDArray x, double y)
    
    SDVariable lt(SDVariable x, double y)
    SDVariable lt(String name, SDVariable x, double y)
    INDArray lt(INDArray x, INDArray y)
    
    SDVariable lt(SDVariable x, SDVariable y)
    SDVariable lt(String name, SDVariable x, SDVariable y)
    INDArray lte(INDArray x, double y)
    
    SDVariable lte(SDVariable x, double y)
    SDVariable lte(String name, SDVariable x, double y)
    INDArray lte(INDArray x, INDArray y)
    
    SDVariable lte(SDVariable x, SDVariable y)
    SDVariable lte(String name, SDVariable x, SDVariable y)
    INDArray matchCondition(INDArray in, Condition condition)
    
    SDVariable matchCondition(SDVariable in, Condition condition)
    SDVariable matchCondition(String name, SDVariable in, Condition condition)
    INDArray matchConditionCount(INDArray in, Condition condition)
    
    SDVariable matchConditionCount(SDVariable in, Condition condition)
    SDVariable matchConditionCount(String name, SDVariable in, Condition condition)
    INDArray matchConditionCount(INDArray in, Condition condition, boolean keepDim, int[] dimensions)
    INDArray matchConditionCount(INDArray in, Condition condition, int[] dimensions)
    
    SDVariable matchConditionCount(SDVariable in, Condition condition, boolean keepDim, int[] dimensions)
    SDVariable matchConditionCount(SDVariable in, Condition condition, int[] dimensions)
    SDVariable matchConditionCount(String name, SDVariable in, Condition condition, boolean keepDim, int[] dimensions)
    SDVariable matchConditionCount(String name, SDVariable in, Condition condition, int[] dimensions)
    INDArray max(INDArray x, boolean keepDims, int[] dimensions)
    INDArray max(INDArray x, int[] dimensions)
    
    SDVariable max(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable max(SDVariable x, int[] dimensions)
    SDVariable max(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable max(String name, SDVariable x, int[] dimensions)
    INDArray max(INDArray first, INDArray second)
    
    SDVariable max(SDVariable first, SDVariable second)
    SDVariable max(String name, SDVariable first, SDVariable second)
    INDArray mean(INDArray x, boolean keepDims, int[] dimensions)
    INDArray mean(INDArray x, int[] dimensions)
    
    SDVariable mean(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable mean(SDVariable x, int[] dimensions)
    SDVariable mean(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable mean(String name, SDVariable x, int[] dimensions)
    INDArray merge(INDArray x, INDArray y)
    
    SDVariable merge(SDVariable x, SDVariable y)
    SDVariable merge(String name, SDVariable x, SDVariable y)
    INDArray min(INDArray x, boolean keepDims, int[] dimensions)
    INDArray min(INDArray x, int[] dimensions)
    
    SDVariable min(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable min(SDVariable x, int[] dimensions)
    SDVariable min(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable min(String name, SDVariable x, int[] dimensions)
    INDArray min(INDArray first, INDArray second)
    
    SDVariable min(SDVariable first, SDVariable second)
    SDVariable min(String name, SDVariable first, SDVariable second)
    INDArray mmul(INDArray x, INDArray y, boolean transposeX, boolean transposeY, boolean transposeZ)
    INDArray mmul(INDArray x, INDArray y)
    
    SDVariable mmul(SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable mmul(SDVariable x, SDVariable y)
    SDVariable mmul(String name, SDVariable x, SDVariable y, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable mmul(String name, SDVariable x, SDVariable y)
    INDArray neq(INDArray x, double y)
    
    SDVariable neq(SDVariable x, double y)
    SDVariable neq(String name, SDVariable x, double y)
    INDArray neq(INDArray x, INDArray y)
    
    SDVariable neq(SDVariable x, SDVariable y)
    SDVariable neq(String name, SDVariable x, SDVariable y)
    INDArray norm1(INDArray x, boolean keepDims, int[] dimensions)
    INDArray norm1(INDArray x, int[] dimensions)
    
    SDVariable norm1(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable norm1(SDVariable x, int[] dimensions)
    SDVariable norm1(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable norm1(String name, SDVariable x, int[] dimensions)
    INDArray norm2(INDArray x, boolean keepDims, int[] dimensions)
    INDArray norm2(INDArray x, int[] dimensions)
    
    SDVariable norm2(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable norm2(SDVariable x, int[] dimensions)
    SDVariable norm2(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable norm2(String name, SDVariable x, int[] dimensions)
    INDArray normmax(INDArray x, boolean keepDims, int[] dimensions)
    INDArray normmax(INDArray x, int[] dimensions)
    
    SDVariable normmax(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable normmax(SDVariable x, int[] dimensions)
    SDVariable normmax(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable normmax(String name, SDVariable x, int[] dimensions)
    INDArray oneHot(INDArray indices, int depth, int axis, double on, double off, DataType dataType)
    INDArray oneHot(INDArray indices, int depth, int axis, double on, double off)
    
    SDVariable oneHot(SDVariable indices, int depth, int axis, double on, double off, DataType dataType)
    SDVariable oneHot(SDVariable indices, int depth, int axis, double on, double off)
    SDVariable oneHot(String name, SDVariable indices, int depth, int axis, double on, double off, DataType dataType)
    SDVariable oneHot(String name, SDVariable indices, int depth, int axis, double on, double off)
    INDArray oneHot(INDArray indices, int depth)
    
    SDVariable oneHot(SDVariable indices, int depth)
    SDVariable oneHot(String name, SDVariable indices, int depth)
    INDArray onesLike(INDArray input)
    
    SDVariable onesLike(SDVariable input)
    SDVariable onesLike(String name, SDVariable input)
    INDArray onesLike(INDArray input, DataType dataType)
    
    SDVariable onesLike(SDVariable input, DataType dataType)
    SDVariable onesLike(String name, SDVariable input, DataType dataType)
    INDArray permute(INDArray x, INDArray dimensions)
    
    SDVariable permute(SDVariable x, SDVariable dimensions)
    SDVariable permute(String name, SDVariable x, SDVariable dimensions)
    INDArray permute(INDArray x, int[] dimensions)
    
    SDVariable permute(SDVariable x, int[] dimensions)
    SDVariable permute(String name, SDVariable x, int[] dimensions)
    INDArray prod(INDArray x, boolean keepDims, int[] dimensions)
    INDArray prod(INDArray x, int[] dimensions)
    
    SDVariable prod(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable prod(SDVariable x, int[] dimensions)
    SDVariable prod(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable prod(String name, SDVariable x, int[] dimensions)
    INDArray range(double from, double to, double step, DataType dataType)
    
    SDVariable range(double from, double to, double step, DataType dataType)
    SDVariable range(String name, double from, double to, double step, DataType dataType)
    INDArray range(INDArray from, INDArray to, INDArray step, DataType dataType)
    
    SDVariable range(SDVariable from, SDVariable to, SDVariable step, DataType dataType)
    SDVariable range(String name, SDVariable from, SDVariable to, SDVariable step, DataType dataType)
    INDArray rank(INDArray in)
    
    SDVariable rank(SDVariable in)
    SDVariable rank(String name, SDVariable in)
    INDArray replaceWhere(INDArray update, INDArray from, Condition condition)
    
    SDVariable replaceWhere(SDVariable update, SDVariable from, Condition condition)
    SDVariable replaceWhere(String name, SDVariable update, SDVariable from, Condition condition)
    INDArray replaceWhere(INDArray update, double value, Condition condition)
    
    SDVariable replaceWhere(SDVariable update, double value, Condition condition)
    SDVariable replaceWhere(String name, SDVariable update, double value, Condition condition)
    INDArray reshape(INDArray x, INDArray shape)
    
    SDVariable reshape(SDVariable x, SDVariable shape)
    SDVariable reshape(String name, SDVariable x, SDVariable shape)
    INDArray reshape(INDArray x, long[] shape)
    
    SDVariable reshape(SDVariable x, long[] shape)
    SDVariable reshape(String name, SDVariable x, long[] shape)
    INDArray reverse(INDArray x, int[] dimensions)
    
    SDVariable reverse(SDVariable x, int[] dimensions)
    SDVariable reverse(String name, SDVariable x, int[] dimensions)
    INDArray reverseSequence(INDArray x, INDArray seq_lengths, int seqDim, int batchDim)
    INDArray reverseSequence(INDArray x, INDArray seq_lengths)
    
    SDVariable reverseSequence(SDVariable x, SDVariable seq_lengths, int seqDim, int batchDim)
    SDVariable reverseSequence(SDVariable x, SDVariable seq_lengths)
    SDVariable reverseSequence(String name, SDVariable x, SDVariable seq_lengths, int seqDim, int batchDim)
    SDVariable reverseSequence(String name, SDVariable x, SDVariable seq_lengths)
    INDArray scalarFloorMod(INDArray in, double value)
    
    SDVariable scalarFloorMod(SDVariable in, double value)
    SDVariable scalarFloorMod(String name, SDVariable in, double value)
    INDArray scalarMax(INDArray in, double value)
    
    SDVariable scalarMax(SDVariable in, double value)
    SDVariable scalarMax(String name, SDVariable in, double value)
    INDArray scalarMin(INDArray in, double value)
    
    SDVariable scalarMin(SDVariable in, double value)
    SDVariable scalarMin(String name, SDVariable in, double value)
    INDArray scalarSet(INDArray in, double set)
    
    SDVariable scalarSet(SDVariable in, double set)
    SDVariable scalarSet(String name, SDVariable in, double set)
    INDArray scatterAdd(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterAdd(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterAdd(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterDiv(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterDiv(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterDiv(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterMax(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterMax(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterMax(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterMin(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterMin(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterMin(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterMul(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterMul(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterMul(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterSub(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterSub(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterSub(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray scatterUpdate(INDArray ref, INDArray indices, INDArray updates)
    
    SDVariable scatterUpdate(SDVariable ref, SDVariable indices, SDVariable updates)
    SDVariable scatterUpdate(String name, SDVariable ref, SDVariable indices, SDVariable updates)
    INDArray segmentMax(INDArray data, INDArray segmentIds)
    
    SDVariable segmentMax(SDVariable data, SDVariable segmentIds)
    SDVariable segmentMax(String name, SDVariable data, SDVariable segmentIds)
    INDArray segmentMean(INDArray data, INDArray segmentIds)
    
    SDVariable segmentMean(SDVariable data, SDVariable segmentIds)
    SDVariable segmentMean(String name, SDVariable data, SDVariable segmentIds)
    INDArray segmentMin(INDArray data, INDArray segmentIds)
    
    SDVariable segmentMin(SDVariable data, SDVariable segmentIds)
    SDVariable segmentMin(String name, SDVariable data, SDVariable segmentIds)
    INDArray segmentProd(INDArray data, INDArray segmentIds)
    
    SDVariable segmentProd(SDVariable data, SDVariable segmentIds)
    SDVariable segmentProd(String name, SDVariable data, SDVariable segmentIds)
    INDArray segmentSum(INDArray data, INDArray segmentIds)
    
    SDVariable segmentSum(SDVariable data, SDVariable segmentIds)
    SDVariable segmentSum(String name, SDVariable data, SDVariable segmentIds)
    INDArray sequenceMask(INDArray lengths, int maxLen, DataType dataType)
    
    SDVariable sequenceMask(SDVariable lengths, int maxLen, DataType dataType)
    SDVariable sequenceMask(String name, SDVariable lengths, int maxLen, DataType dataType)
    INDArray sequenceMask(INDArray lengths, INDArray maxLen, DataType dataType)
    
    SDVariable sequenceMask(SDVariable lengths, SDVariable maxLen, DataType dataType)
    SDVariable sequenceMask(String name, SDVariable lengths, SDVariable maxLen, DataType dataType)
    INDArray sequenceMask(INDArray lengths, DataType dataType)
    
    SDVariable sequenceMask(SDVariable lengths, DataType dataType)
    SDVariable sequenceMask(String name, SDVariable lengths, DataType dataType)
    INDArray shape(INDArray input)
    
    SDVariable shape(SDVariable input)
    SDVariable shape(String name, SDVariable input)
    INDArray size(INDArray in)
    
    SDVariable size(SDVariable in)
    SDVariable size(String name, SDVariable in)
    INDArray sizeAt(INDArray in, int dimension)
    
    SDVariable sizeAt(SDVariable in, int dimension)
    SDVariable sizeAt(String name, SDVariable in, int dimension)
    INDArray slice(INDArray input, int[] begin, int[] size)
    
    SDVariable slice(SDVariable input, int[] begin, int[] size)
    SDVariable slice(String name, SDVariable input, int[] begin, int[] size)
    INDArray slice(INDArray input, INDArray begin, INDArray size)
    
    SDVariable slice(SDVariable input, SDVariable begin, SDVariable size)
    SDVariable slice(String name, SDVariable input, SDVariable begin, SDVariable size)
    INDArray squaredNorm(INDArray x, boolean keepDims, int[] dimensions)
    INDArray squaredNorm(INDArray x, int[] dimensions)
    
    SDVariable squaredNorm(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable squaredNorm(SDVariable x, int[] dimensions)
    SDVariable squaredNorm(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable squaredNorm(String name, SDVariable x, int[] dimensions)
    INDArray squeeze(INDArray x, int axis)
    
    SDVariable squeeze(SDVariable x, int axis)
    SDVariable squeeze(String name, SDVariable x, int axis)
    INDArray stack(INDArray values, int axis)
    
    SDVariable stack(SDVariable values, int axis)
    SDVariable stack(String name, SDVariable values, int axis)
    INDArray standardDeviation(INDArray x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    INDArray standardDeviation(INDArray x, boolean biasCorrected, int[] dimensions)
    
    SDVariable standardDeviation(SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    SDVariable standardDeviation(SDVariable x, boolean biasCorrected, int[] dimensions)
    SDVariable standardDeviation(String name, SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    SDVariable standardDeviation(String name, SDVariable x, boolean biasCorrected, int[] dimensions)
    INDArray stridedSlice(INDArray in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
    INDArray stridedSlice(INDArray in, long[] begin, long[] end, long[] strides)
    
    SDVariable stridedSlice(SDVariable in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
    SDVariable stridedSlice(SDVariable in, long[] begin, long[] end, long[] strides)
    SDVariable stridedSlice(String name, SDVariable in, long[] begin, long[] end, long[] strides, int beginMask, int endMask, int ellipsisMask, int newAxisMask, int shrinkAxisMask)
    SDVariable stridedSlice(String name, SDVariable in, long[] begin, long[] end, long[] strides)
    INDArray sum(INDArray x, boolean keepDims, int[] dimensions)
    INDArray sum(INDArray x, int[] dimensions)
    
    SDVariable sum(SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable sum(SDVariable x, int[] dimensions)
    SDVariable sum(String name, SDVariable x, boolean keepDims, int[] dimensions)
    SDVariable sum(String name, SDVariable x, int[] dimensions)
    INDArray[] switchOp(INDArray x, INDArray predicate)
    
    SDVariable[] switchOp(SDVariable x, SDVariable predicate)
    SDVariable[] switchOp(String name, SDVariable x, SDVariable predicate)
    INDArray tensorMmul(INDArray x, INDArray y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
    INDArray tensorMmul(INDArray x, INDArray y, int[] dimensionsX, int[] dimensionsY)
    
    SDVariable tensorMmul(SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable tensorMmul(SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY)
    SDVariable tensorMmul(String name, SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY, boolean transposeX, boolean transposeY, boolean transposeZ)
    SDVariable tensorMmul(String name, SDVariable x, SDVariable y, int[] dimensionsX, int[] dimensionsY)
    INDArray tile(INDArray x, INDArray repeat)
    
    SDVariable tile(SDVariable x, SDVariable repeat)
    SDVariable tile(String name, SDVariable x, SDVariable repeat)
    INDArray tile(INDArray x, int[] repeat)
    
    SDVariable tile(SDVariable x, int[] repeat)
    SDVariable tile(String name, SDVariable x, int[] repeat)
    INDArray transpose(INDArray x)
    
    SDVariable transpose(SDVariable x)
    SDVariable transpose(String name, SDVariable x)
    INDArray unsortedSegmentMax(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentMax(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentMax(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    INDArray unsortedSegmentMean(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentMean(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentMean(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    INDArray unsortedSegmentMin(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentMin(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentMin(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    INDArray unsortedSegmentProd(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentProd(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentProd(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    INDArray unsortedSegmentSqrtN(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentSqrtN(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentSqrtN(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    INDArray unsortedSegmentSum(INDArray data, INDArray segmentIds, int numSegments)
    
    SDVariable unsortedSegmentSum(SDVariable data, SDVariable segmentIds, int numSegments)
    SDVariable unsortedSegmentSum(String name, SDVariable data, SDVariable segmentIds, int numSegments)
    void unstack(INDArray value, int axis, int num)
    
    void unstack(SDVariable value, int axis, int num)
    void unstack(String name, SDVariable value, int axis, int num)
    INDArray variance(INDArray x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    INDArray variance(INDArray x, boolean biasCorrected, int[] dimensions)
    
    SDVariable variance(SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    SDVariable variance(SDVariable x, boolean biasCorrected, int[] dimensions)
    SDVariable variance(String name, SDVariable x, boolean biasCorrected, boolean keepDims, int[] dimensions)
    SDVariable variance(String name, SDVariable x, boolean biasCorrected, int[] dimensions)
    INDArray zerosLike(INDArray input)
    
    SDVariable zerosLike(SDVariable input)
    SDVariable zerosLike(String name, SDVariable input)

    Release Notes

    New changes in each release of Eclipse Deeplearning4j.

    hashtag
    Version 1.0.0-beta7

    Read the announcement at https://blog.konduit.ai/2020/05/14/deeplearning4j-1-0-0-beta7-released/arrow-up-right for the highlights of this release.

    hashtag
    Deeplearning4j

    hashtag
    Features and Enhancements

    • Added Keras model import support for tf.keras models ,

      • Full inference and training support is available for ops/layers in the tf.keras namespace; inference only for general Tensorflow operations outside of the tf.keras namespace

      • Note also improvements to Keras import for reshape, permute, etc operations due to NHWC and NWC support in DL4J

    hashtag
    Bug Fixes and Optimizations

    • Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff ,

    • Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules)

      • Note: this is a breaking change for some class packages/imports. See

    hashtag
    ND4J/SameDiff:

    hashtag
    Features and Enhancements

    • SameDiff multi-threaded inference enhanced (and fixed) - a single SameDiff instance can now be used for inference safely and efficiently from multiple threads

    • cuDNN support added to SameDiff (automatically enabled for nd4j-cuda-10.x backend)

    • Added ND4J namespaces: Nd4j.cnn, Nd4j.rnn, Nd4j.image

    hashtag
    Bug Fixes and Optimizations

    • Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff ,

    • SameDiff - added CuDNN support

    • Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules)

    hashtag
    DataVec

    hashtag
    Features and Enhancements

    • datavec-python: added zero-copy support for bytes/byte buffers

    • datavec-python: Python exceptions are now thrown as Java exceptions

    • datavec-python: Added support for additional NumPy datatypes

    hashtag
    Bug Fixes and Optimizations

    • Deleted not properly maintained modules: datavec-camel, datavec-perf

    • Fixed missing BOOL datatype support for arrow conversion functionality

    • Assorted fixes for datavec-python ,

    hashtag
    RL4J

    hashtag
    Features and Enhancements

    • Refactoring to decouple configuration and learning methods from their implementations

    • Added builder patterns for all configuration classes

    hashtag
    Arbiter

    hashtag
    Bug Fixes and Optimizations

    • Fixes an issue with GridSearchCandidateGenerator not working correctly for some cases ,

    hashtag
    Version 1.0.0-beta6

    hashtag
    Highlights - 1.0.0-beta6 Release

    • Added support for CUDA 10.2. 1.0.0-beta6 released with CUDA 9.2, 10.0, 10.1 and 10.2 support

    • SameDiff optimizations - memory use for inference and training significantly reduced, with some performance improvements also

    • Deeplearning4j UI - Play framework replaced with Vertx; deeplearning4j-ui dependency now no longer has Scala dependency or Scala version suffix

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: Features and Enhancements

    • DNNL (MKL-DNN) upgraded to version 1.1

    • Added causal convolution mode for Convolution1D layer (ConvolutionMode.Causal) and added causal conv1d support for Keras import

    • Keras import now supports scaled identity weight initialization

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • KDTree implementation optimized

    • Deeplearning4j zoo models and datasets hosting location updated

    • Fixed nIn validation for Deconv2D layer

    hashtag
    Deeplearning4j: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

    • Deeplearning4j UI artifact ID has changed: deeplearning4j-ui_2.1x (beta5 and earlier) with deeplearning4j-ui

    hashtag
    ND4J and SameDiff

    hashtag
    ND4J/SameDiff: Features and Enhancements

    • Added suport for CUDA 10.2

    • DNNL (MKL-DNN) upgraded to version 1.1

    • Added ND4j namespaces to match SameDiff: Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network)

    hashtag
    ND4J/SameDiff: Bug Fixes and Optimizations

    • OpenMP replaced with ThreadPool abstraction, enables parallelism for platforms without OpenMP support

    • SameDiff memory management overheauled for (in some cases significantlny) reduced memory consumption and improved performance ,

    • Switched to Clang instead of gcc for OSX compilation to avoid compiler-related issues

    hashtag
    ND4J: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

    • SameDiff.outputs() now requires user to call SameDiff.setOutputs(String...) first; previous “best guess” output inference was unreliable

    • SameDiff.zero and .one methods now create constants, not vairables

    hashtag
    DataVec

    hashtag
    DataVec: Bug Fixes and Optimizations

    • NativeImageLoader now checks for empty input streams and throws an exception instead of crashing

    • NDArrayScalarOpTransform now supports modulus operator

    hashtag
    RL4J

    hashtag
    RL4J: Features and Enhancements

    • Added AsyncTrainingListener

    • Replaced multiple uses of java.util.Random with ND4J Random

    • Added Observable and LegacyMDPWrapper

    hashtag
    RL4J: Bug Fixes and Optimizations

    • Refactored RL4J video recording to separate VideoRecorder class

    • Fixed an issue with target for DQN ,

    • Refactoring for DQN and double DQN for improved maintainability

    hashtag
    PyDataVec

    hashtag
    PyDataVec Features and Enhancements

    • PyDataVec TransformProcess now supports non-inplace operations

    hashtag
    PyDataVec Bug Fixes and Optimizations

    • Fixed various issues with PyDataVec

    • Fixed an issue with data locality that could cause incorrect results under some circumstances when running on CUDA

    hashtag
    Version 1.0.0-beta5

    hashtag
    Highlights - 1.0.0-beta5 Release

    • Added model server - remote inference of SameDiff and DL4J models using JSON or (optionally) binary serialization

      • Server: See

      • Client: See

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: Features and Enhancements

    • Added FastText - inference and training, including OOV (out of vocabulary) support ()

    • Scala 2.12 support added, Scala 2.10 support dropped ()

    • Added model server (DL4J and SameDiff models, JSON and binary communication) - , , ,

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • Updated deeplearning4j-ui theme ()

    • Fixed an issue with MergeVertex and CNN3D activations ()

    • Fixed typo in Yolo2OutputLayer builder/configuration method name ()

    hashtag
    Deeplearning4j: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

    • DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J, use org.nd4j.linalg.dataset.Async(Multi)DataSetIterator instead

    • Saved models with custom layers from 1.0.0-alpha and before can no longer be loaded. Workaround: load in 1.0.0-beta4, and re-save the model (). Models without custom layers can still be loaded back to 0.5.0

    • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading, change versions as follows: 1.0.0-beta4_spark2 -> 1.0.0-beta5

    hashtag
    Deeplearning4j: 1.0.0-beta5 Known Issues

    • dl4j-spark_2.11 and _2.12 dependencies incorrectly pull in datavec-spark_2.11/2.12 version 1.0.0-SNAPSHOT. Workaround: control version using dependency management as per or

    • Some layers (such as LSTM) may run slower on 1.0.0-beta5 than 1.0.0-beta4 on CUDA when not using cuDNN, due to added synchronization. This synchronization will be removed in the next release after 1.0.0-beta5

    • CUDA 10.1: Rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems, when running CUDA 10.1 Update 1 (and maybe 10.1). CUDA 10.1 update 2 is recommended.

    hashtag
    ND4J and SameDiff

    hashtag
    ND4J/SameDiff: Features and Enhancements

    • Added new data types: BFLOAT16, UINT16, UINT32, UINT64 ()

    • CUDA support for all operations without CUDA implementations (, , , , )

    • Added model server (DL4J and SameDiff models, JSON and binary communication) - , , ,

    hashtag
    ND4J/SameDiff: Bug Fixes and Optimizations

    • Updated to JavaCPP/JavaCV 1.5.1-1 ()

    • SameDiff: Placeholders must now only be provided if required to calculate the requested variables ()

    • SameDiff: Fixed an issue with duplicate variable name validation ()

    hashtag
    ND4J: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

    • OldAddOp, OldSubOp, etc removed: Replace with AddOp, SubOp, etc

    • Nd4j.trueScalar and trueVector removed; use Nd4j.scalar and Nd4j.createFromArray methods

    • INDArray.javaTensorAlongDimension removed; use INDArray.tensorAlongDimension instead

    hashtag
    ND4J: 1.0.0-beta5 Known Issues

    • nd4j-native on some OSX systems can fail with Symbol not found: ___emutls_get_address - See

    • SBT 1.3.0 can fail with an Illegal character in path error; SBT 1.2.8 is OK. This is an SBT issue, not an ND4J issue. See for details

    hashtag
    DataVec

    hashtag
    DataVec: Features and Enhancements

    • ImageRecordReader: Support for 16-bit TIFF added ()

    • Added SequenceTrimToLengthTransform ()

    hashtag
    DataVec: Bug Fixes and Optimizations

    • Fixed an issue with AnalyzeSpark and String columns ()

    • Fixed an issue with URL scheme detection in NumberedFileInputScheme ()

    • Fixed an issue with RandomPathFilter sampling being biased (, )

    hashtag
    RL4J

    hashtag
    RL4J: Features and Enhancements

    • API cleanup and refactoring (, , , )

    hashtag
    RL4J: Bug Fixes and Optimizations

    • Fixed issue with compression for HistoryProcessor ()

    hashtag
    Arbiter

    hashtag
    Bug Fixes and Optimizations

    • Updated EvaluationScoreFunction to use ND4J Evaluation class metrics ()

    • Fixed incorrect search size in GridSearchCandidateGenerator ()

    hashtag
    Arbiter: Known Issues

    • The Jackson version upgrade necessitated a change to how generic object serialization was performed; Arbiter JSON data stored in 1.0.0-beta4 or earlier format may not be readable in 1.0.0-beta5 ()

    hashtag
    ND4S

    hashtag
    ND4S Features and Enhancements

    • Added full data type support to ND4S as per ND4J ()

    • Added syntactic sugar for SameDiff (implicits, operator overloads) ()

    hashtag
    Version 1.0.0-beta4

    hashtag
    Highlights - 1.0.0-beta4 Release

    Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following are supported:

    • DOUBLE: double precision floating point, 64-bit (8 byte)

    • FLOAT: single precision floating point, 32-bit (4 byte)

    • HALF: half precision floating point, 16-bit (2 byte), "FP16"

    • LONG: long signed integer, 64 bit (8 byte)

    ND4J Behaviour changes of note:

    • When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)

      • For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray

      • Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively

    DL4J Behaviour changes of note:

    • MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.

      • The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)

      • Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType)

    Main new methods:

    • Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument

    • INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another

    • New Nd4j.createFromArray(...) methods for

    ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

    CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

    ND4J: Mac/OSX CUDA support dropped

    Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

    DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

    • ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)

    • SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)

    • BatchNormalization layer (and BatchNorm ND4J op)

    MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

    MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

    MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

    ND4J: Improved Performance due to Memory Management Changes

    Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

    In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

    To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

    ND4J: Improved Rank 0/1 Array Support

    In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

    DL4J: Attention layers added

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: Features and Enhancements

    • Added MKL-DNN support for Conv/Pool/BatchNorm/LRN layers. MKL-DNN will be used automatically when using nd4j-native backend. (, )

    • L1/L2 regularization now made into a class; weight decay added, with better control as to when/how it is applied. See for more details on the difference between L2 and weight decay. In general, weight decay should be preferred to L2 regularization. (, )

    • Added dot product attention layers: , , and

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly ()

    • cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown ()

    • Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training ()

    hashtag
    ND4J and SameDiff

    hashtag
    ND4J/SameDiff: Features and Enhancements

    • Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays ()

    • Added INDArray.close() method to allow users to manually release off-heap memory immediately ()

    • SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff ()

    hashtag
    ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

    • ND4J datatypes - significant changes, see highlights at top of this section

    • nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api ()

    • When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. ()

    hashtag
    ND4J/SameDiff: Bug Fixes and Optimizations

    • Fixed bug with InvertMatrix.invert() with [1,1] shape matrices ()

    • Fixed edge case bug for Updater instances with length 1 state arrays ()

    • Fixed edge case with FileDocumentIterator with empty documents ()

    hashtag
    ND4J: Known Issues

    • Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.

    • Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See . Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

    hashtag
    DataVec

    hashtag
    DataVec: Features and Enhancements

    • Added PythonTransform (arbitrary python code execution for pre processing) (, )

    • Added FirstDigit (Benford's law) transform (, )

    • StringToTimeTransform now supports setting Locale (, )

    hashtag
    DataVec: Optimizations and Bug Fixes

    • Fixed issue with ImageLoader.scalingIfNeeded ()

    hashtag
    Arbiter

    hashtag
    Arbiter: Enhancements

    • Arbiter now supports genetic algorithm search ()

    hashtag
    Arbiter: Fixes

    • Fixed an issue where early stopping used in Arbiter would result in a serialization exception ()

    hashtag
    Version 1.0.0-beta3

    hashtag
    Highlights - 1.0.0-beta3 Release

    • ND4J/Deeplearning4j: Added support for CUDA 10.0. Dropped support for CUDA 8.0. (1.0.0-beta3 release has CUDA 9.0, 9.2 and 10.0 support)

    • SameDiff now supports training and evaluation from DataSetIterator and MultiDataSetIterator. Evaluation classes have been moved to ND4J.

    • DL4J Spark training (gradient sharing) is now fully fault tolerant, and has improvements for threshold adaption (potentially more robust convergence). Ports can now be easily configured independently on master/workers.

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: New Features

    • Added OutputAdapter interface and MultiLayerNetwork/ComputationGraph.output method overloads using OutputAdapter (avoids allocating off-heap memory that needs to be cleaned up by GC) , ,

    • Added ComputationGraph/MultiLayerNetwork rnnTimeStep overload with user-specified workspace.

    • Added Cnn3DLossLayer

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • Fixed an issue where L1/L2 and updaters (Adam, Nesterov, etc) were applied before dividing gradients by minibatch to obtain average gradient. To maintain old behaviour, use NeuralNetConfiguration.Builder.legacyBatchScaledL2(true) .

      • Note that learning rates may need to be decreased for some updaters (such as Adam) to account for this change vs. earlier versions. Some other updaters (such as SGD, NoOp, etc) should be unaffected.

      • Note that deserialized (loaded) configurations/networks saved in 1.0.0-beta2 or earlier will default to old behaviour for backward compatibility. All new networks (created in 1.0.0-beta3) will default to the new behaviour.

    hashtag
    Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

    • IEvaluation classes in DL4J have been deprecated and moved to ND4J so they are available for SameDiff training. Functionality and APIs are unchanged

    • MultiLayerConfiguration/ComputationGraphConfiguration pretrain(boolean) and backprop(boolean) have been deprecated and are no longer used. Use fit and pretrain/pretrainLayer methods instead.

    • ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is

    hashtag
    Deeplearning4J: Known issues: 1.0.0-beta3

    • Running multiple Spark training jobs simultaneously on the one physical node (i.e., multiple JVMs from one or more Spark jobs) may cause problems with network communication. A workaround for this is to manually set a unique stream ID manually in the VoidConfiguration. Use a unique (or random) integer value for different jobs

    hashtag
    Deeplearning4J: Keras Import

    • Fixed import issue due to Keras JSON format changes for Keras 2.2.3+

    • Added Keras import for timeseries preprocessing

    • Elephas

    hashtag
    ND4J

    hashtag
    ND4J: New Features

    • Added SameDiff training and evaluation: SameDiff instances can now be trained directly using DataSetIterator and MultiDataSetIterator, and evaluated using IEvaluation instances (that have been moved from ND4J to DL4J)

    • Added GraphServer implementation: c++ inference server for SameDiff (and Tensorflow, via TF import) with Java API

    • SameDiff instances can now be loaded from serialized FlatBuffers format (SameDiff.asFlatFile plus fromFlatFile)

    hashtag
    ND4J: Bug Fixes and Optimizations

    • Fixes for android: Remove use of RawIndexer

    • Libnd4j custom ops: conv op weight layouts are now not dependent on the input format (NCHW/NHWC) - now always [kH, kW, inChannels, outChannels] for 2d CNNs, [kH, kW, kD, inChannels, outChannels] for 3d CNNs. ,

    • Libnd4j native op fixes:

    hashtag
    ND4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

    • CUDA 8.0 support has been removed. CUDA 9.0, 9.2 and 10.0 support is available in 1.0.0-beta3

    • nd4j-base64 module contents have been deprecated; use the equivalent classes in nd4j-api from now on

    • Some classes in nd4j-jackson module has been deprecated; use the equivalent classes in nd4j-api from now on

    hashtag
    ND4J: Known issues: 1.0.0-beta3

    • Android users may need to manually exclude the (now deprecated) module nd4j-base64. This is due to org.nd4j.serde.base64.Nd4jBase64 class being present in both nd4j-api and nd4j-base64 modules. Both versions have identical content. Use exclude group: 'org.nd4j', module: 'nd4j-base64' to exclude.

    hashtag
    DataVec

    hashtag
    DataVec: New Features

    • Added NativeImageLoader method overloads for org.opencv.core.Mat and String as filename

    hashtag
    DataVec: Optimizations and Bug Fixes

    • Fix for JDBCRecordReader handling of null values

    • Improved errors/validation for ObjectDetectionRecordReader for invalid input (where image object centers are outside of image bounds)

    • Fixed issue where FileSplit using methods that are unavailable on earlier versions of Android

    hashtag
    Arbiter

    hashtag
    Arbiter: Fixes

    • Fixed some issues with dropout layers

    hashtag
    ND4S

    • Added conversion between org.nd4j.linalg.primitives.Pair/Triple and Scala Tuple

    hashtag
    Version 1.0.0-beta2

    hashtag
    Highlights - 1.0.0-beta2 Release

    • ND4J/Deeplearning4j: Added support for CUDA 9.2. Dropped support for CUDA 9.1. (1.0.0-beta2 release has CUDA 8.0, 9.0 and 9.2 support)

    • Deeplearning4j: New SameDiff layers with training support -

    • Deeplearning4j resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: New Features

    • Added new SameDiff layers (automatic differentiation - only single class, forward pass definition required) to DL4J with full training support - SameDiffLayer, SameDiffVertex, SameDiffOutputLayer, SameDiffLambdaLayer, SameDiffLambdaVertex - note that these are CPU-only execution for now

    • Resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property. Note that it is also possible to set a different base location for downloads (for local mirrors of DL4J resources)

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • ComputationGraph.addListeners was not working correctly if listeners were already present ,

    • TinyImageNetDataSetIterator did not validate/correctly use input shape configuration ,

    • BatchNormalization layer now correctly asserts that nOut is set if required (instead of unfriendly shape errors later)

    hashtag
    Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

    • GravesLSTM has been deprecated in favor of LSTM due to lack of CuDNN support but otherwise similar accuracy to in practice. Use LSTM class instead.

    • deeplearning4j-modelexport-solr: now uses Lucene/Solr version 7.4.0 (was 7.3.0)

    • Mask arrays for CNN2d layers must be in broadcastable 4d format: [minibatch,depth or 1, height or 1, width or 1] - previously they were 2d with shape [minibatch,height] or [minibatch,width]

    hashtag
    Deelpearning4J: 1.0.0-beta2 Known Issues

    • Windows users are unable to load the HDF5 files used in SvhnLabelProvider (used in HouseNumberDetection example). Linux/Mac users are unaffected. A workaround for windows users is to add the sonatype snapshot dependency org.bytedeco.javacpp-presets:hdf5-platform:jar:1.10.2-1.4.3-SNAPSHOT

    hashtag
    Deeplearing4J: Keras Import

    • Keras model import now imports every Keras application

    • Supports GlobalPooling3D layer import

    • Supports RepeatVector layer import

    • Supports LocallyConnected1D and LocallyConnected2D layers

    hashtag
    ND4J

    hashtag
    ND4J: New Features

    • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

    • Added the ability to write Numpy .npy format using Nd4j.writeAsNumpy(INDArray,File) and convert an INDArray to a numpy strict in-memory using Nd4j.convertToNumpy(INDArray)

    • ND4j-common ClassPathResource: added ClassPathResource.copyDirectory(File)

    hashtag
    ND4J: Bug Fixes and Optimizations

    • SameDiff: a significant number of bug fixes for execution and individual ops

    • Fixed issue where INDArray.toDoubleArray() with true scalars (rank 0 arrays)

    • Fixed issue with DataSet.sample() not working for rank 3+ features

    hashtag
    ND4J: Known Issues

    hashtag
    ND4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

    • CUDA 9.1 support has been removed. CUDA 8.0, 9.0 and 9.2 support is available

    • Due to long indexing changes, long/long[] should be used in place of int/int[] in some places (such as INDArray.size(int), INDArray.shape())

    • Simplified DataSetIterator API: totalExamples(), cursor() and numExamples() - these were unsupported on most DataSetIterator implementations, and not used in practice for training. Custom iterators should remove these methods also

    hashtag
    DataVec

    hashtag
    DataVec: New Features

    • Added AnalyzeLocal class to mirror functionality of AnalyzeSpark (but without Spark dependency)

    • Added JacksonLineSequenceRecordReader: RecordReader used for multi-example JSON/XML where each line in a file is an independent example

    • Added RecordConvert.toRecord(Schema, List<Object>)

    hashtag
    DataVec: Optimizations and Bug Fixes

    • Fixed issue with NativeImageLoader on Android

    • Fixed issue with ExcelRecordReader

    • Fixed issue where bad args for CSVRecordReader.next(int) could cause an unnecessarily large list to be generated

    hashtag
    DataVec: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

    hashtag
    Arbiter

    hashtag
    Arbiter: New Features

    • Added DataSource interface. Unlike old DataProvider, this does not require JSON serializability (only a no-arg constructor)

    • Added numerous enhancements and missing configuration options (constraints, dilation, etc)

    hashtag
    Arbiter: Fixes

    • DataProvider has been deprecated. Use DataSource instead.

    hashtag
    RL4J

    • stepCounter, epochCounter and historyProcessor can now be set

    • Random seed is now loaded for ACPolicy is loaded

    hashtag
    Version 1.0.0-beta

    hashtag
    Highlights - 1.0.0-beta Release

    • Performance and memory optimizations for DL4J

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: New Features

    • New or enhanced layers:

      • Added Cropping1D layer

      • Added Convolution3D, Cropping3D, UpSampling3D, ZeroPadding3D, Subsampling3D layers (all with Keras import support):

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • Performance and memory optimizations via optimizations of internal use of workspaces

    • Reflections library has entirely been removed from DL4J and is no longer required for custom layer serialization/deserialization ,

      • Fixes issues with custom and some Keras import layers on Android

    hashtag
    Deeplearning4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

    • WorkspaceMode.SINGLE and SEPARATE have been deprecated; use WorkspaceMode.ENABLED instead

    • Internal layer API changes: custom layers will need to be updated to the new Layer API - see built-in layers or custom layer example

    • Custom layers etc in pre-1.0.0-beta JSON (ModelSerializer) format need to be registered before they can be deserialized due to JSON format change. Built-in layers and models saved in 1.0.0-beta or later do not require this. Use NeuralNetConfiguration.registerLegacyCustomClassesForJSON(Class) for this purpose

    hashtag
    Deelpearning4J: 1.0.0-beta Known Issues

    • ComputationGraph TrainingListener onEpochStart and onEpochEnd methods are not being called correctly

    • DL4J Zoo Model FaceNetNN4Small2 model configuration is incorrect, causing issues during forward pass

    • Early stopping score calculators with values thar should be maximized (accuracy, f1 etc) are not working properly (values are minimized not maximized). Workaround: override ScoreCalculator.calculateScore(...) and return 1.0 - super.calculateScore(...).

    hashtag
    Deeplearing4J: Keras Import

    hashtag
    Deeplearning4J: Keras Import - API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

    hashtag
    ND4J

    hashtag
    ND4J: New Features

    hashtag
    ND4J: Known Issues

    • Not all op gradients implemented for automatic differentiation

    • Vast majority of new operations added in 1.0.0-beta do NOT use GPU yet.

    hashtag
    ND4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

    hashtag
    DataVec

    hashtag
    DataVec: New Features

    • ImageRecordReader now logs number of inferred label classes (to reduce risk of users missing a problem if something is misconfigured)

    • Added AnalyzeSpark.getUnique overload for multiple columns

    • Added performance/timing module

    hashtag
    DataVec: Optimizations and Bug Fixes

    • Reduced ImageRecordReader garbage generation via buffer reuse

    • Fixes for Android compilation (aligned versions, removed some dependencies)

    • Removed Reflections library use in DataVec

    hashtag
    DataVec: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

    • DataVec ClassPathResource has been deprecated; use nd4j-common version instead

    hashtag
    Arbiter

    hashtag
    Arbiter: New Features

    • Added LayerSpace for OCNN (one-class neural network)

    hashtag
    Arbiter: Fixes

    • Fixed timestamp issue that could cause incorrect rendering of first model's results in UI

    • Execution now waits for last model(s) to complete before returning when a termination condition is hit

    • As per DL4J etc: use of Reflections library has been removed entirely from Arbiter

    hashtag
    Version 1.0.0-alpha

    hashtag
    Highlights - 1.0.0-alpha Release

    • ND4J: Added SameDiff - Java automatic differentiation library (alpha release) with Tensorflow import (technology preview) and hundreds of new operations

    • ND4J: Added CUDA 9.0 and 9.1 support (with cuDNN), dropped support for CUDA 7.5, continued support for CUDA 8.0

    • ND4J: Native binaries (nd4j-native on Maven Central) now ship with AVX/AVX2/AVX-512 support (Windows/Linux)

    hashtag
    Deeplearning4J

    hashtag
    Deeplearning4J: New Features

    • Layers (new and enhanced)

      • Added Yolo2OutputLayer CNN layer for object detection (). See also DataVec's

      • Adds support for 'no bias' layers via hasBias(boolean) config (DenseLayer, EmbeddingLayer, OutputLayer, RnnOutputLayer, CenterLossOutputLayer, ConvolutionLayer, Convolution1DLayer). EmbeddingLayer now defaults to no bias ()

    hashtag
    Deeplearning4J: Bug Fixes and Optimizations

    • Lombok is no longer included as a transitive dependency ()

    • ComputationGraph can now have a vertex as the output (not just layers) (, )

    • Performance improvement for J7FileStatsStorage with large amount of history ()

    hashtag
    Deeplearning4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

    • Default training workspace mode has been switched to SEPARATE from NONE for MultiLayerNetwork and ComputationGraph ()

    • Behaviour change: fit(DataSetIterator) and similar methods no longer perform layerwise pretraining followed by backprop - only backprop is performed in these methods. For pretraining, use pretrain(DataSetIterator) and pretrain(MultiDataSetIterator) methods ()

    hashtag
    Deeplearning4J: 1.0.0-alpha Known Issues

    • Performance on some networks types may be reduced on CUDA compared to 0.9.1 (with workspaces configured). This will be addressed in the next release

    • Some issues have been noted with FP16 support on CUDA ()

    hashtag
    Deeplearing4J: Keras Import

    • Keras 2 support, keeping backward compatibility for keras 1

    • Keras 2 and 1 import use exact same API and are inferred by DL4J

    • Keras unit test coverage increased by 10x, many more real-world integration tests

    hashtag
    Deeplearning4J: Keras Import - API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

    • In 0.9.1 deprecated Model and ModelConfiguration have been permanently removed. Use instead, which is now the only entry point for Keras model import.

    hashtag
    Deeplearning4J: Keras Import - Known Issues

    • Embedding layer: In DL4J the output of an embedding layer is 2D by default, unless preprocessors are specified. In Keras the output is always 3D, but depending on specified parameters can be interpreted as 2D. This often leads to difficulties when importing Embedding layers. Many cases have been covered and issues fixed, but inconsistencies remain.

    • Batchnormalization layer: DL4J's batch normalization layer is much more restrictive (in a good way) than Keras' version of it. For instance, DL4J only allows to normalize spatial dimensions for 4D convolutional inputs, while in Keras any axis can be used for normalization. Depending on the dimension ordering (NCHW vs. NHWC) and the specific configuration used by a Keras user, this can lead to expected (!) and unexpected import errors.

    • Support for importing a Keras model for training purposes in DL4J (enforceTrainingConfig == true) is still very limited and will be tackled properly for the next release.

    hashtag
    ND4J

    hashtag
    ND4J: New Features

    • Hundreds of new operations added

    • New DifferentialFunction api with automatic differentiation (see samediff section)

    • Technology preview of tensorflow import added (supports 1.4.0 and up)

    hashtag
    ND4J: Known Issues

    • Not all op gradients implemented for automatic differentiation

    • Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

    hashtag
    ND4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

    hashtag
    ND4J - SameDiff

    • Initial tech preview

    • Control flow is supported with IF and WHILE primitives.

    Alpha release of auto-differentiation engine for ND4J.

    hashtag
    Features

    • Two execution modes available: Java-driven execution, and Native execution for serialized graphs.

    • SameDiff graphs can be serialized using FlatBuffers

    • Building and running computation graphs build from SameDiff operations.

    hashtag
    Known Issues and Limitations

    • Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

    • While many of the widely used base operations and high-level layers used in practice are supported, op coverage is still limited. Goal is to achieve feature parity with TensorFlow and fully support import for TF graphs.

    • Some of the existing ops do not have a backward pass implemented (called doDiff in SameDiff).

    hashtag
    DataVec

    hashtag
    DataVec: New Features

    • Added ObjectDetectionRecordReader - for use with DL4J's Yolo2OutputLayer () (also supports image transforms: )

    • Added ImageObjectLabelProvider, VocLabelProvider and SvhnLabelProvider (Streetview house numbers) for use with ObjectDetectionRecordReader (, )

    • Added LocalTransformExecutor for single machine execution (without Spark dependency) ()

    hashtag
    DataVec: Fixes

    • Lombok is no longer included as a transitive dependency ()

    • MapFileRecordReader and MapFileSequenceRecordReader can handle empty partitions/splits for multi-part map files ()

    • CSVRecordReader is now properly serializable using Java serialization () and Kryo serialization ()

    hashtag
    DataVec: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

    • Many of the util classes (in org.datavec.api.util mainly) have been deprecated or removed; use equivalently named util clases in nd4j-common module ()

    • RecordReader.next(int) method now returns List<List<Writable>> for batches, not List<Writable>. See also

    hashtag
    Arbiter

    hashtag
    Arbiter: New Features

    • Workspace support added (, )

    • Added new layer spaces: LSTM, CenterLoss, Deconvolution2D, LossLayer, Bidirectional layer wrapper (, )

    • As per DL4J API changes: Updater configuration options (learning rate, momentum, epsilon, rho etc) have been moved to ParameterSpace instead. Updater spaces (AdamSpace, AdaGradSpace etc) introduced ()

    hashtag
    Arbiter: Fixes

    • Fix parallel job execution (when using multiple execution threads) (, )

    • Improved logging for failed task execution ()

    • Fix for UI JSON serialization ()

    hashtag
    Arbiter: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

    • As per DL4J updater API changes: old updater configuration (learningRate, momentum, etc) methods have been removed. Use .updater(IUpdater) or .updater(ParameterSpace<IUpdater>) methods instead

    hashtag
    RL4J

    • Add support for LSTM layer to A3C

    • Fix A3C to make it actually work using new ActorCriticLoss and correct use of randomness

    • Fix cases when QLearning would fail (non-flat input, incomplete serialization, incorrect normalization)

    hashtag
    ScalNet

    • First release of , which closely resembles Keras' API.

    • Can be built with sbt and maven.

    • Supports both Keras inspired models, corresponding to DL4J's MultiLayerNetwork, and , corresponding to ComputationGraph.

    hashtag
    ND4S

    • Scala 2.12 support

    hashtag
    Version 0.9.1

    Deeplearning4J

    • Fixed issue with incorrect version dependencies in 0.9.0

    • Added EmnistDataSetIterator

    • Numerical stability improvements to LossMCXENT / LossNegativeLogLikelihood with softmax (should reduce NaNs with very large activations)

    ND4J

    • Added runtime version checking for ND4J, DL4J, RL4J, Arbiter, DataVec

    Known Issues

    • Deeplearning4j: Use of Evaluation class no-arg constructor (i.e., new Evaluation()) can result in accuracy/stats being reported as 0.0. Other Evaluation class constructors, and ComputationGraph/MultiLayerNetwork.evaluate(DataSetIterator) methods work as expected.

      • This also impacts Spark (distributed) evaluation: workaround is to replace sparkNet.evaluate(testData); with sparkNet.doEvaluation(testData, 64, new Evaluation(10))[0];, where 10 is the number of classes and 64 in the evaluation minibatch size to use.

    hashtag
    Version 0.9.0

    Deeplearning4J

    • Workspaces feature added (faster training performance + less memory)

    • SharedTrainingMaster added for Spark network training (improved performance) ,

    • ParallelInference added - wrapper that server inference requests using internal batching and queues

    ND4J

    • Workspaces feature added

    • Native parallel sort was added

    • New ops added: SELU/SELUDerivative, TAD-based comparisons, percentile/median, Reverse, Tan/TanDerivative, SinH, CosH, Entropy, ShannonEntropy, LogEntropy, AbsoluteMin/AbsoluteMax/AbsoluteSum, Atan2

    DataVec

    • MapFileRecordReader and MapFileSequenceRecordReader added

    • Spark: Utilities to save and load JavaRDD<List<Writable>> and JavaRDD<List<List<Writable>> data to Hadoop MapFile and SequenceFile formats

    • TransformProcess and Transforms now support NDArrayWritables and NDArrayWritable columns

    Arbiter

    • Arbiter UI:

      • UI now uses Play framework, integrates with DL4J UI (replaces Dropwizard backend). Dependency issues/clashing versions fixed.

      • Supports DL4J StatsStorage and StatsStorageRouter mechanisms (FileStatsStorage, Remote UI via RemoveUIStatsStorageRouter)

    hashtag
    0.8.0 -> 0.9.0 Transition Notes

    Deeplearning4j

    • Updater configuration methods such as .momentum(double) and .epsilon(double) have been deprecated. Instead: use .updater(new Nesterovs(0.9)) and .updater(Adam.builder().beta1(0.9).beta2(0.999).build()) etc to configure

    DataVec

    • CsvRecordReader constructors: now uses characters for delimiters, instead of Strings (i.e., ',' instead of ",")

    Arbiter

    • Arbiter UI is now a separate module, with Scala version suffixes: arbiter-ui_2.10 and arbiter-ui_2.11

    hashtag
    Version 0.8.0

    • Added transfer learning API

    • Spark 2.0 support (DL4J and DataVec; see transition notes below)

    • New layers

    hashtag
    0.7.2 -> 0.8.0 Transition Notes

    • Spark versioning schemes: with the addition of Spark 2 support, the versions for Deeplearning4j and DataVec Spark modules has changed

      • For Spark 1: use <version>0.8.0_spark_1</version>

      • For Spark 2: use <version>0.8.0_spark_2</version>

    hashtag
    0.8.0 Known Issues (At Launch)

    • UI/CUDA/Linux issue:

    • Dirty shutdown on JVM exit is possible for CUDA backend sometimes:

    • Issues with RBM implementation

    hashtag
    Version 0.7.2

    • Added variational autoencoder

    • Activation function refactor

      • Activation functions are now an interface

    hashtag
    0.7.1 -> 0.7.2 Transition Notes

    • Activation functions (built-in): now specified using Activation enumeration, not String (String-based configuration has been deprecated)

    hashtag
    Version 0.7.1

    • RBM and AutoEncoder key fixes:

      • Ensured visual bias updated and applied during pretraining.

      • RBM HiddenUnit is the activation function for this layer; thus, established derivative calculations for backprop according to respective HiddenUnit.

    hashtag
    Version 0.7.0

    • UI overhaul: new training UI has considerably more information, supports persistence (saving info and loading later), Japanese/Korean/Russian support. Replaced Dropwizard with Play framework.

    • Import of models configured and trained using

      • Imports both Keras model and

    hashtag
    0.6.0 -> 0.7.0 Transition Notes

    Notable changes for upgrading codebases based on 0.6.0 to 0.7.0:

    • UI: new UI package name is deeplearning4j-ui_2.10 or deeplearning4j-ui_2.11 (previously: deeplearning4j-ui). Scala version suffix is necessary due to Play framework (written in Scala) being used now.

    • Histogram and Flow iteration listeners deprecated. They are still functional, but using new UI is recommended

    • DataVec ImageRecordReader: labels are now sorted alphabetically by default before assigning an integer class index to each - previously (0.6.0 and earlier) they were according to file iteration order. Use .setLabels(List) to manually specify the order if required.

    hashtag
    Version 0.6.0

    • Custom layer support

    • Support for custom loss functions

    • Support for compressed INDArrays, for memory saving on huge data

    • Native support for BooleanIndexing where applicable

    hashtag
    Version 0.5.0

    • FP16 support for CUDA

    • Better performance for multi-gpu

    • Including optional P2P memory access support

    • Normalization support for time series and images

    hashtag
    Version 0.4.0

    • Initial multi-GPU support viable for standalone and Spark.

    • Refactored the Spark API significantly

    • Added CuDNN wrapper

    • Performance improvements for ND4J

  • DL4J now supports NHWC (channels last) data format for all CNN 2D layers, in addition to NCHW Linkarrow-up-right

  • DL4J now supports NWC (channels last - [minibatch, sequence_length, size]) for all RNN and CNN 1D layers, in addition to NCW Linkarrow-up-right

  • Added Deconvolution3D layer Linkarrow-up-right

  • Keras import: added ReLU, ELU and Softmax advanced activation layers Linkarrow-up-right and Swish activation function Linkarrow-up-right

  • Added DL4J SameDiffLoss class (for easily-defined DL4J ILossFunction's via SameDiff) Linkarrow-up-right

  • Useful exceptions are now thrown when attempting to perform unsupported operations on FastText Linkarrow-up-right

  • Added MultiLayerNetwork.evaluate(MultiDataSetIterator) and .evaluateRegression(MultiDataSetIterator) methods Linkarrow-up-right, Linkarrow-up-right

  • for details on exact package changes
  • Deeplearning4j UI: Webjars versions locked down using dependency management to avoid check on each build Linkarrow-up-right

  • Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Linkarrow-up-right

  • Refactored/merged modules dl4j-perf and dl4j-util into deeplearning4j-core Linkarrow-up-right

  • Fixed an issue with BertWordPieceTokenizer - potential StackOverflowError with certain inputs Linkarrow-up-right

  • Fixed an issue with GlobalPooling layer with masks of different datatype to the activations datatype Linkarrow-up-right

  • Fixed an issue with DL4JModelValidator for ComputationGraph Linkarrow-up-right

  • Fixed an issue where SameDiff layers in DL4J could throw an exception when used with transfer learning Linkarrow-up-right

  • Weight initialization for EmbeddingLayer and EmbeddingSequenceLayer now no longer depend on the vocabulary size (only the vector size) Linkarrow-up-right

  • Fixed an issue with Keras import with bidirectional layers + preprocessors Linkarrow-up-right

  • DL4J UI: added redirect from /train to /train/overview Linkarrow-up-right

  • Fixed an issue where RecordReaderDataSetIterator builder collectMetaData configuration was not being applied Linkarrow-up-right

  • Fixed an issue where MultiLayerNetwork evaluation was not passing metadata to the IEvaluation instances during evaluation Linkarrow-up-right, Linkarrow-up-right

  • Fixed an issue with Spark training SharedTrainingMaster when training with a ComputationGraph and MultiDataSets Linkarrow-up-right

  • Assorted fixes for edge cases for DL4J Keras import Linkarrow-up-right

  • deelpearning4j-nlp-korean will no longer be released for Scala 2.12 due to required dependency only having Scala 2.11 version avairable Linkarrow-up-right

  • Fix for ConvolutionalIterationListener for ComputationGraph Linkarrow-up-right

  • Fixed an issue where dataset and model zoo downloads could get stuck if the server fails to send any data (now: timeout + retry) Linkarrow-up-right

  • DL4J ModelSerializer no longer writes temporary files when restoring models from InputStream Linkarrow-up-right

  • Fixes issues with UIServer multi session mode, and potential shutdown race condition Linkarrow-up-right

  • Fixed an issue where TfidfVectorizer.vectorize() could throw a NPE when fit from LabelAwareIterator Linkarrow-up-right

  • Added new Image operations namespace operations:

    • rgbToHsv, hsvToRgb Linkarrow-up-right

    • rgbToYiq, yiqToRgb, rgbToYuv, yuvToRgb Linkarrow-up-right

    • imageResize

  • Added new Random operations namespace operations:

    • gamma, poisson, shuffle Linkarrow-up-right

  • Added new Math namespace operations:

    • clipByAvgNorm, embeddingLookup Linkarrow-up-right

    • mergeMaxIndex Linkarrow-up-right

  • Added new NN namespace operations:

    • cReLU Linkarrow-up-right

  • Added new CNN namespace operations:

    • upsampling3d Linkarrow-up-right

  • Added new linalg operations namespace

    • triangular_solve Linkarrow-up-right

    • tri operation Linkarrow-up-right

    • triu operation

  • Added new RNN operation namespace operations:

    • lstmLayer (note old lstmLayer method renamed to lstmBlock) Linkarrow-up-right

    • gru Linkarrow-up-right

  • Added new Loss operations namespace - Nd4j.loss Linkarrow-up-right

  • Mapped operations for Tensorflow import:

    • HSVToRGB, RGBToHSV, Igamma, Igammac, RandomGamma, RandomPoisson, RandomPoissonV2, RandomShuffle Linkarrow-up-right

  • Added SameDiff ProfilingListener - writes op performance profiles in Chrome profiler format (load in chrome://tracing/) Linkarrow-up-right Linkarrow-up-right

  • Added SameDiff ProfileAnalyzer tool to compare profiles output from ProfilingListener (or Tensorflow) Linkarrow-up-right Linkarrow-up-right

  • SameDiff listener API: added frame and iteration information for listener methods Linkarrow-up-right Linkarrow-up-right

  • Added (non-backend-specific) method of accessing Nd4j environment: Nd4j.getEnvironment() method (environment info and low-level configuration options) Linkarrow-up-right Linkarrow-up-right

  • Improved memory limits/configuration support for libnd4j (c++) Linkarrow-up-right

  • Added pairwise (broadcastable) power backprop operation Linkarrow-up-right

  • Updated JavaCPP presets MKL version to 2020.0 from 2019.5 Linkarrow-up-right

  • Added DynamicCustomOp dargs - datatype arguments Linkarrow-up-right Linkarrow-up-right

    • Output datatype configuration for Range op Linkarrow-up-right, SequenceOp Linkarrow-up-right, ConfusionMatrix Linkarrow-up-right

  • Added tensormmul_bp op Linkarrow-up-right

  • OpenBLAS version upgraded to 0.3.8 Linkarrow-up-right

  • libnd4j (c++ codebase underlying DL4J, ND4J and SameDiff) refactored to be more easily embeddable in other C++ projects Linkarrow-up-right

  • ImagePreProcessingScaler now supports preprocessing of labels (for segmentation) Linkarrow-up-right

  • Additional datatypes now supported for nd4j-tensorflow TensorflowConversion Linkarrow-up-right

  • SameDiff operation namespaces (sd.math, sd.image, etc) are now code generated to ensure SameDiff and ND4J namespaces are identical (all operations included, same API) Linkarrow-up-right

  • Added ND4J ArchiveUtils.unzipFileTo(String, String, boolean logFiles) overload to enable/disable extracted file path logging Linkarrow-up-right

  • Added weight format configuration for following operations: conv1D, conv2D, conv3D, deconv2d, deconv3d, depthwiseConv2d, pointwiseConv2d, sconv2d Linkarrow-up-right

  • Added backprop operation implementations for mergemax, mergeadd, mergeavg operations Linkarrow-up-right

  • MKL version upgraded to 2020.0 2020.1; OpenCV upgraded from 4.2.0 to 4.3.0 Linkarrow-up-right

  • SameDiff: DifferentialFunctionFactory class removed in favor of namespace methods (sd.math, sd.linalg, etc) Linkarrow-up-right

  • Added lstmLayer_bp operation Linkarrow-up-right

  • Added gru_bp operation Linkarrow-up-right

  • linspace operation can now use both targs and arrays for start/end/size arguments Linkarrow-up-right

  • Assorted dependency updates - OpenBLAS (0.3.9), OpenCV (4.3.0), Leptonica (1.79.0) Linkarrow-up-right

  • Upgraded assorted dependency versions: javax.activation:activation (1.1 -> 1.1.1), stream analytics (2.7.0->2.9.8), Apache Spark (2.4.3->2.4.5), Jackson databind (2.10.1 -> 2.10.3), Vertx (3.8.3 -> 3.9.0) Linkarrow-up-right

  • Added nd4j-common-tests ResourceUtils.listClassPathfiles method Linkarrow-up-right

  • Note: this is a breaking change for some class packages/imports. See this linkarrow-up-right for details on exact package changes

  • Fixed some issues with Tensorflow import of FusedBatchNorm operation Linkarrow-up-right

  • Fixed an issue where the Roll operation did not match Tensorflow operation Linkarrow-up-right Linkarrow-up-right

  • Fixed an issue where ArchiveUtils could fail to create the top level destination directory when it does not exist Linkarrow-up-right

  • Fixed an issue where resize_bicubic operation did not match Tensorflow for some configuration values Linkarrow-up-right Linkarrow-up-right

  • Pad operation now supports long/int64 values for padding array Linkarrow-up-right Linkarrow-up-right

  • Fixed an issue where hashcode operation shape function wasn't always returning int64/long dtype Linkarrow-up-right

  • Fixed an issue with reshape operation on empty arrays with -1s Linkarrow-up-right Linkarrow-up-right

  • Improved performance on CUDA for concat operation Linkarrow-up-right and CPU/GPU Linkarrow-up-right

  • Improved performance for bias_add operation

    • On CPU for NHWC case Linkarrow-up-right

    • Generally Linkarrow-up-right

    • On CUDA for 2D case

  • Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Linkarrow-up-right

  • Fixed a small SameDiff execution issue for switch operation where the predicate is a constant Linkarrow-up-right

  • Fixed an issue with batchnorm operation when input arrays have unusual strides Linkarrow-up-right

  • Merged nd4j-buffer, nd4j-content modules into nd4j-api Linkarrow-up-right

  • Deleted deprecated nd4j-jackson module (remaining functionality available in nd4j-api) Linkarrow-up-right

  • Deleted unused/unmaintained nd4j-camel and nd4j-gson modules Linkarrow-up-right

  • Optimization for legacy random ops Linkarrow-up-right

  • Optimization for broadcast operations Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right

  • Performance optimization for multiple operations: softmax, squeeze, expand_dims, tanh Linkarrow-up-right

  • Optimization for transpose/permute operations Linkarrow-up-right

  • Performance enhancement: MKLDNN matmul used for some mmul operation cases Linkarrow-up-right

  • Optimization for gather operation on CPU Linkarrow-up-right

  • Optimization for stack/unstack operations on CPU Linkarrow-up-right

  • Optimization for split operation (CPU and CUDA) Linkarrow-up-right Linkarrow-up-right

  • ND4J initialization no longer logs number of OpenMP BLAS threads for CUDA Linkarrow-up-right

  • Optimization: Fixed issues with auto-vectorization on multple CPU operations Linkarrow-up-right

  • Optimization for reshape operation Linkarrow-up-right, Linkarrow-up-right

  • Fixed an issue where INDArray.hashCode() could cause an exception on some datatypes Linkarrow-up-right

  • Optimization for CPU: MKLDNN is now used for softmax, tanh, softmax_bp and tanh_bp operations Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right

  • Fixed random_exponential operation Linkarrow-up-right

  • Improved performance on C++ SameDiff graph execution via reduced array zeroing where safe to do so Linkarrow-up-right

  • Improved C++ indexing implementation impacting CPU performance on some operations Linkarrow-up-right

  • Fixed an issue where Split operation could have incorrect output shapes for empty arrays Linkarrow-up-right

  • Fixed some issues with SameDiff.equals method Linkarrow-up-right

  • Fixed an issue with reshape operation output shape on empty arrays Linkarrow-up-right, Linkarrow-up-right

  • Nd4j.gemm now uses Mmul operation internally to avoid potential threading issues with direct BLAS calls on CUDA Linkarrow-up-right

  • Fixed an edge case issue with percentile operation linkarrow-up-right

  • Fixed an edge case issue for cusolved (CUDA) in libnd4j Linkarrow-up-right

  • Fixed an issue with error formatting for segment operations for incorrect lengths Linkarrow-up-right

  • Fixed an issue where ND4J workspaces were not guaranteed to be unique Linkarrow-up-right

  • Fixed some operation implementations when operating on views (Batch/Space to Space/Batch/Depth; batchnorm_bp) Linkarrow-up-right

  • Fixed an issue where exponential distribution random number generation operation could produce infinities extremely rarely (~1 in 10^9 values) Linkarrow-up-right

  • Fixed an issue with long file paths for memory mapped workspaces on Windows Linkarrow-up-right

  • Memory for memory mapped workspaces are now deallocated immediately when workspace is destroyed, instead of waiting for GC to free memory Linkarrow-up-right

  • Fall-back to other BLAS implementation for cases where MKLDNN GEMM implementation is slow Linkarrow-up-right

  • Set nd4j-native source/target to Java 7 Linkarrow-up-right, Linkarrow-up-right

  • datavec-python: Python version upgraded from 3.7.6 to 3.7.7 Linkarrow-up-right

    Fixed an issue with LineRecordReader where initialization was performed unnecessarily (adding performance overhead) Linkarrow-up-right

    Note: No API changes, only artifact ID change: replace deeplearning4j-ui_2.1x with deeplearning4j-ui

  • ND4j namespace operation methods: operations are available through the Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network), for example Nd4j.math.abs(INDArray), Nd4j.random.logNormal etc Linkarrow-up-right.

    • Note that additional ND4J namespaces API will have additions (new namespaces and methods), and may have some API changes, in the next release

  • OpenMP replaced with thread pool c++ parallelism framework; enabled c++ parallelism for platforms without C++-level threading for operations

  • Added Mish activation function Linkarrow-up-right, Linkarrow-up-right

  • BertIterator now has a BertIterator.featurizeSentences(List<String>) method for inference Linkarrow-up-right, Linkarrow-up-right

  • BertIterator now supports sentence pairs for supervised training Linkarrow-up-right

  • Added Sparse multi-class cross entropy for both Deeplearning4j and Keras import Linkarrow-up-right, Linkarrow-up-right

  • Deeplearning4j UI: migrated from Play to Vertx for web serving backend, also removing dependency on Scala libraries; no API changes, only artifact ID change - replace deeplearning4j-ui_2.1x with deeplearning4j-ui Linkarrow-up-right, Linkarrow-up-right

  • Added TimeDistributed wrapper layer Linkarrow-up-right

  • Fixed an issue with incorrect Deconvolution2d results for Keras import models Linkarrow-up-right

  • Added DNNL/MKLDNN support for batch normalization layer Linkarrow-up-right, Linkarrow-up-right

  • Fixed various integer casts to avoid overflows for very large arrays (with dimensions or length > Integer.MAX_VALUE) Linkarrow-up-right

  • Fixed an issue with UNet non-pretrained model architecture (last layer kernel size) Linkarrow-up-right

  • Deeplearning4j SameDiff layers now use DL4J workspaces for better performance and reduced memory consumption Linkarrow-up-right

  • Updated broken links in afew error messages Linkarrow-up-right

  • Cleaned up a few unused dependencies in various modules Linkarrow-up-right

  • Cleaned up duplicate SamplingDataSetIterator class Linkarrow-up-right

  • Fixed an issue where ComputationGraph instances with a single input going into multiple embedding layers could throw a NPE Linkarrow-up-right

  • Fixed an issue where loss function weights were not automatically cast to network datatype, resulting in an exception if not already correct type Linkarrow-up-right

  • Shaded Jackson version upgraded from 2.9.9/2.9.9.3 to 2.10.1 Linkarrow-up-right

  • Fixed an issue with KNN where getMostPopulatedClusters actually returned the least populated clusters Linkarrow-up-right

  • Added SameDiff.calculateGradientsAndOutputs method Linkarrow-up-right Linkarrow-up-right

  • Additional SameDiff single batch .output method overloads for DataSet/MultiDataSet added Linkarrow-up-right

  • TensorFlow import ops coverage enhanced (significant number of additional ops supported) Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right

  • PRelu op added Linkarrow-up-right

  • adjust_contrast, igamma and igammac ops added Linkarrow-up-right

  • ND4J/SameDiff: BitCast, CompareAndBitpack, DivideNoNan, DrawBoundingBoxes, FakeQuantWithMinMaxVarsPerChannel ops added Linkarrow-up-right

  • non_max_suppression_overlaps op added Linkarrow-up-right

  • ImagePreProcessingScaler now supports segmentation use cases Linkarrow-up-right

  • concat operation now supports the concatenation axis being specified via the last input array Linkarrow-up-right

  • Added Gamma and Poisson RNG distributions Linkarrow-up-right

  • SameDiff’s use of DeviceLocal for variables/constants etc is now configurable Linkarrow-up-right

  • Uniform distribution op now supports random integer generation, not just random floating point generation Linkarrow-up-right

  • SameDiff: Added simple OpBenchmarkListener for benchmarking purposes Linkarrow-up-right

  • Added the ability to disable platform helpers (DNNL/MKLDNN etc) via Nd4jCPU.Environment.getInstance().allowHelpers(false); and Nd4jCuda.Environment.getInstance().allowHelpers(false); Linkarrow-up-right

  • Added draw_bounding_boxes operation Linkarrow-up-right

  • Added resize_bicubic operation Linkarrow-up-right

  • Added causal padding mode to conv1d operation Linkarrow-up-right

  • DNNL (MKLDNN) is included and enabled by default for non-AVX builds Linkarrow-up-right

  • Added SameDiff ArraySavingListener for debugging purposes Linkarrow-up-right

  • Removed SameDiff.outputs() “best guess” output inference due to being unreliable, in favor of explicit SameDiff.setOutputs(String...) call Linkarrow-up-right

  • Fixed an issue with Nd4j.hstack on 1D arrays Linkarrow-up-right

  • SameDiff no longer allows empty arrays for variables Linkarrow-up-right

  • Fixed an issue with Nadam updater LR schedules not being cloned Linkarrow-up-right

  • Cleaned up IActivation interface Linkarrow-up-right

  • Added new LSTM op implementation with DNNL/MKLDNN support (forward pass only so far) Linkarrow-up-right

  • SameDiff API cleaned up; deprecated methods removed Linkarrow-up-right

  • Switched SameDiff variable initialization to non-lazy, to avoid unexpected behaviour when mixing execution and ND4J RNG seed setting Linkarrow-up-right

  • SameDiff.zero and .one methods now create constants, not vairables Linkarrow-up-right

  • Moved CUDA build version and device logging to Java logging, from c++ stdout to enable disabling logging (via ND4J config or slf4j config) Linkarrow-up-right

  • Added DNNL/MKLDNN support for batch normalization Linkarrow-up-right

  • SameDiff: Fixed an issue where listeners weren’t being called for gradient calculation Linkarrow-up-right

  • Added DNNL/MKLDNN support for deconv2d/3d operations Linkarrow-up-right

  • Fixed an issue with biasadd_bp operation and NHWC data format Linkarrow-up-right

  • Fixed an issue with certain strided slice backprop configurations Linkarrow-up-right, Linkarrow-up-right

  • Fixed an issue with LogSumExp reduction operation backprop for along dimension case Linkarrow-up-right, Linkarrow-up-right

  • INDArray.toString() now has correct brackets for rank 1+ scalars to avoid ambiguity Linkarrow-up-right

  • Fixed an issue where some ND4J methods could fail when the library is compiled on Java 9+ but run on Java 8 Linkarrow-up-right

  • Fixed empty array input case for is_strictly_increasing, non_decreasing and non_max_suppression ops Linkarrow-up-right, Linkarrow-up-right

  • Fixed empty input arrays for legacy ops (transform, scalar, pairwise, broadcast) Linkarrow-up-right

  • CUDA compute capability 3.0 is supported again Linkarrow-up-right

  • Improved performance for Scatter operations (1D case) + index validation Linkarrow-up-right

  • Fixed an issue where SameDiff TrainingConfig serialization would fail if evaluation instances are set Linkarrow-up-right, Linkarrow-up-right

  • SameDiff execution will now throw an exception when assertion operations in the graph fail Linkarrow-up-right

  • PolyGamma function now returns NaNs when passed double for args requiring integer values Linkarrow-up-right

  • Fixed some issues for pad and mirror_pad ops to ensure they conform with Tensorflow for imported networks Linkarrow-up-right

  • Updated and fixed some issues for TensorFlow graph runner Linkarrow-up-right

  • Improved performance for Reverse operation Linkarrow-up-right

  • Removed/cleanup up unused ND4J list functionality Linkarrow-up-right

  • Fixed reduce bool operation results (such as any, all, IsInf, etc) for empty array inputs Linkarrow-up-right

  • Internal refactoring and various bug fixes Linkarrow-up-right

    Tests/examples: See Linkarrow-up-right and Linkarrow-up-right
  • Added Scala 2.12 support, dropped Scala 2.10 support. Modules with Scala dependencies are now released with Scala 2.11 and 2.12 versions

  • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading: 1.0.0-beta4_spark2 -> 1.0.0-beta5

  • Added FastText support to deeplearning4j-nlp

  • CUDA support for all ND4J/SameDiff Operations

    • In 1.0.0-beta4, some operations were CPU only. Now, all operations have full CUDA support

  • Added support for new data types in ND4J (and DL4J/SameDiff): BFLOAT16, UINT16, UINT32, UINT64

  • ND4J: Implicit broadcasting support added to INDArray (already present in SameDiff - for example shape [3,1]+[3,2]=[3,2])

  • CUDA 9.2, 10.0 and 10.1-Update2 still supported

    • NOTE: For CUDA 10.1, CUDA 10.1 update 2 is recommended. CUDA 10.1 and 10.1 Update 1 will still run, but rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems

  • Dependency upgrades: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, and shaded to avoid dependency clashes)

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

  • Added saved model format validation utilities - DL4JModelValidator, DL4JKerasModelValidator (Linkarrow-up-right)
  • Added LabelLastTimeStepPreProcessor (Linkarrow-up-right)

  • BertIterator: added option to prepend token to the output (such as [cls] expected by some models) (Linkarrow-up-right)

  • Added trace level logging to MultiLayerNetwork and ComputationGraph assist with debugging certain issues (Linkarrow-up-right)

  • Upsampling3D: Added NDHWC support (Linkarrow-up-right)

  • MergeVertex now supports broadcasting (Linkarrow-up-right)

  • LSTM and Dropout will now fall back on built-in implementations if an exception is encountered from cuDNN (same as Subsampling/ConvolutionLayer) (Linkarrow-up-right)

  • Improved JavaDoc and cleanup up API for WordVectorSerializer (Linkarrow-up-right, Linkarrow-up-right)

  • Improved ComputationGraph builder InputType validation (Linkarrow-up-right)
  • Removed dl4j-spark-ml module until it can be properly maintained (Linkarrow-up-right)

  • Fixed an issue with BertWordPieceTokenizerFactory and bad character encoding (Linkarrow-up-right)

  • Fixed an issue with LearnedSelfAttentionLayer and variable minibatch size (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issue with SharedTrainingMaster controller address when set from environment variable (Linkarrow-up-right)

  • Fixed issue with SameDiffOutputLayer initialization under some circumstances (Linkarrow-up-right)

  • https is now used by default for data and zoo model downloads (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where UI WebJars dependencies would check for updates on every single build (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issue where Upsampling layer memory report could produce an OOM exception (Linkarrow-up-right)

  • Improved UX/validation for RecordReaderDataSetIterator (Linkarrow-up-right)

  • Fixed an issue where EmbeddingSequenceLayer would not check mask array datatype (Linkarrow-up-right)

  • Improved validation when initializing networks with a non rank-2 (shape [1, numParams]) array (Linkarrow-up-right)

  • Fixed a DataType issue for BertIterator (Linkarrow-up-right)

  • Fixed Word2Vec model backward compatibilty (beta3 and earlier models now loadable again) Linkarrow-up-right

  • Fixed issue where some Keras import models could fail with Could not read abnormally long HDF5 attribute (Linkarrow-up-right)

  • Added validation for RnnOutputLayer - feature/label array lengths (Linkarrow-up-right)

  • Fixed an issue where SameDiffOutputLayer would not support variable minibatch size (Linkarrow-up-right)

  • Fixed DL4J SameDiff layer mask support (Linkarrow-up-right)

  • DL4J UI: Fixed an issue where tab switching did not work when visualizing saved/stored data (Linkarrow-up-right, Linkarrow-up-right)

  • DL4J UI: Fixed a rare UI threading issue (Linkarrow-up-right)

  • Fixed a Keras import issue with JSON format change (Linkarrow-up-right)

  • Fixed a Keras import issue where updater learning rate schedule could be imported incorrectly (Linkarrow-up-right)

  • Fixed an issue with CnnSentenceDataSetIterator when using UnknownWordHandling.UseUnknownVector (Linkarrow-up-right, Linkarrow-up-right)

  • Fixes and optimizations to DL4J SameDiff layers (Linkarrow-up-right)

  • MultiLayerNetwork/ComputationGraph will now log the original exception if a second exception occurs during workspace closing, instead of swallowing it (inference/fit operation try/finally blocks) (Linkarrow-up-right)

  • Upgraded dependencies: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, shaded to avoid dependency clashes) (Linkarrow-up-right)

  • Logging framework can now be configured for DL4J UI (due to Play framework dependency upgrade) (Linkarrow-up-right)

  • Reduced amount of garbage produced by MnistDataFetcher (impacts MNIST and EMNIST DataSetIterators) (Linkarrow-up-right)

  • Activation function backpropagation has been optimized for many activation functions (Linkarrow-up-right, Linkarrow-up-right)

  • Scala 2.10 dropped, Scala 2.12 added (for modules with Scala dependencies)

  • Added support for empty arrays with zeros in shape, for compatibility with TensorFlow import (Linkarrow-up-right)

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

  • Improved SameDiff training API - added "in line" test set evaluation, returning History object with loss curve, etc (Linkarrow-up-right)

  • Added saved model format validation utilities - Nd4jValidator, Nd4jCommonValidator (Linkarrow-up-right)

  • Added SameDiff ScoreListener (equivalent to DL4J ScoreIterationListener/PerformanceListener) (Linkarrow-up-right, Linkarrow-up-right)

  • Added SameDiff.convertDataTypes method, for variable dtype conversion (Linkarrow-up-right)

  • Added crop and resize op (Linkarrow-up-right)

  • DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J Linkarrow-up-right

  • Added basic/MVP SameDiff UI listener (Linkarrow-up-right)

  • Added SameDiff CheckpointListener (Linkarrow-up-right, Linkarrow-up-right)

  • Added SameDiff name scopes (Linkarrow-up-right)

  • SameDiff: Updater state and training configuration is now written to FlatBuffers format (Linkarrow-up-right)

  • Added c++ benchmark suite callable from Java - call using Nd4j.getExecutioner().runLightBenchmarkSuit() and Nd4j.getExecutioner().runFullBenchmarkSuit() (Linkarrow-up-right)

  • Added SameDiff.save/load methods with InputStream/OutputStream arguments (Linkarrow-up-right, Linkarrow-up-right)

  • Added axis configuraiton for evaluation instances (Evaluation, RegressionEvaluation, ROC, etc - getAxis and setAxis methods) to allow different data formats (NCHW vs. NHWC for CNNs, for example) (Linkarrow-up-right)

  • SameDiff: Added support to convert constants to placeholders, via SDVariable.convertToConstant() method (Linkarrow-up-right)

  • SameDiff: Added GradCheckUtil.checkActivationGradients method to check activation gradients for SameDiff instance (not just parameter gradients as in existing gradient check methods) (Linkarrow-up-right)

  • Added CheckNumerics op (Linkarrow-up-right)

  • Added FakeQuantWithMinMaxArgs and FakeQuantWithMinMaxVars ops (Linkarrow-up-right)

  • Added INDArray reduction methods with "keep dimensions" option - for example, INDArray.mean(boloean, int... dimension) (Linkarrow-up-right)

  • Added Nd4j SystemInfo class - SystemInfo.getSystemInfo, .writeSystemInfo(File) to aid with debugging issues (Linkarrow-up-right, Linkarrow-up-right)

  • Added INDArray.toString(NDArrayStrings options), toStringFull() and toString overloads for easier control of array printing (Linkarrow-up-right)

  • Added HashCode op, INDArray.hashCode() (Linkarrow-up-right)

  • SameDiff: added whileLoop, ifCond methods for loops/conditional ops (Linkarrow-up-right)

  • Cleaned up some infrequently used Nd4j methods (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Added bitwise integer operations: left/right bit shift, left/right cyclical bit shift, bitwise Hamming distance (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • deeplearning4j-nlp: renamed AggregatingSentencePreProcessor to sentencePreProcessor method (Linkarrow-up-right)

  • Upgraded (and shaded) Protobuf version - 3.5.1 to 3.8.0 (Linkarrow-up-right)

  • Switched to c=style error handling for libnd4j native operations (Linkarrow-up-right)

  • Renamed FlatBuffers enum org.nd4j.graph.DataType to org.nd4j.graph.DType to avoid users importing incorrect type when using Nd4j methods (Linkarrow-up-right, Linkarrow-up-right)

  • Added SameDiff.bitwise namespace for bitwise ops (Linkarrow-up-right, Linkarrow-up-right)

  • SameDiff: Fixed an issue with SDVariable.getArr for scalars (Linkarrow-up-right)
  • Added delayed mode to DeviceLocalNDArray (don't replicate to device until needed) (Linkarrow-up-right)

  • ND4J: Fixed an issue with writing 0d (scalar) NDArrays in numpy .npy format (Linkarrow-up-right)

  • Fixed an issue with Pad operation for some constant cases (Linkarrow-up-right)

  • Fixed some issues with strided_slice operation (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • SameDiff: Fixed issue with DataType inference for some ops using ND4J default datatype (Linkarrow-up-right)

  • INDArray.castTo(DataType) is now a no-op when array is already the correct type (Linkarrow-up-right)

  • SameDiff: Fixed an issue with training mixed precision networks (Linkarrow-up-right)

  • Fixed an issue where Evaluation class was incorrectly reporting macro-averaged precision for binary case (Linkarrow-up-right)

  • Removed trainableParams config/field from SameDiff TrainingConfig (no longer required) (Linkarrow-up-right)

  • Improvements and cleanup to ND4J Javadoc (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue with Cholesky Lapack op on CUDA (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where [1,N] and [N,1] arrays were not considered a matrix (rank 2 array) according to INDArray.isMatrix() (Linkarrow-up-right)

  • Fixed RegressionEvaluation for 4D arrays (CNNs / segmentation) (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issue with INDArray.median(int... dimension) (Linkarrow-up-right)

  • Fixed NPE that could occur when executing gather operation backprop (Linkarrow-up-right)

  • Fixed issue with LogSumExp operation Java/C++ mapping (Linkarrow-up-right)

  • Added header validation when reading Numpy .npy files, to ensure file is valid (Linkarrow-up-right)

  • Fixed a possible issue with reading Numpy .npy files on CUDA (Linkarrow-up-right)

  • Fixed an issue when reading Numpy .npy boolean files (Linkarrow-up-right)

  • Various fixes for TensorFlow import (Linkarrow-up-right)

  • Fixed an issue with a small number of Nd4j.create methods not creating arrays corresponding to the java primitive (Linkarrow-up-right)

  • Improved shape validation for some Nd4j.create methods (Linkarrow-up-right)

  • Cleaned up unmaintained Nd4j.createSparse methods (Linkarrow-up-right)

  • Fixed a CUDA issue for CUDA GPUs with CC 3.0 (Linkarrow-up-right)

  • Fixed some possible integer overflows in c++ code (Linkarrow-up-right)

  • Removed deprecated methods: Nd4j.trueScalar and Nd4j.trueVector (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where some JVMs could warn about "Illegal reflective access" due to a (now removed) SameDiff dependency (Linkarrow-up-right)

  • SDVariable now no longer extends DifferentialFunction (Linkarrow-up-right)

  • Moved numerous operation calculateOutputShape instances from Java to C++ (Linkarrow-up-right)

  • Fixed an issue where maxpool2d_bp could throw an exception when NaN values are present (Linkarrow-up-right)

  • Fixed an issue with concatenation of empty shapes (with zeros) (Linkarrow-up-right)

  • Removed INDArray.javaTensorAlongDimension (Linkarrow-up-right)

  • LayerNorm operation now properly supports axis arg, NCHW format data (Linkarrow-up-right)

  • libnd4j: cuBLAS hgemm (FP16 gemm) wil only be called for devices with compute capability >= 5.3 due to cuBLAS limitations (Linkarrow-up-right)

  • Nd4j.readNumpy optimized (Linkarrow-up-right)

  • Added configurable alpha parameter to ELU and lrelu_bp operations in c++ (Linkarrow-up-right)

  • Cleaned up SameDiff SDCNN/SDRNN (SameDiff.cnn, .rnn) API/methods (Linkarrow-up-right, Linkarrow-up-right)

  • INDArray.lengthLong() removed; use INDArray.length() instead
  • INT: signed integer, 32 bit (4 byte)

  • SHORT: signed short integer, 16 bit (2 byte)

  • UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255

  • BYTE: signed byte, 8 bit (1 byte), -128 to 127

  • BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization

  • UTF8: String array type, UTF8 format

  • Some operations require matched datatypes for operands

    • For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))

  • Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.

  • method
    LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)
  • Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)

  • The parameter/activation datatypes for new models can be set for new networks using the dataType(DataType) method on NeuralNetConfiguration.Builder (Linkarrow-up-right)

  • MultiLayerNetwork/ComputationGraph can be converted between (floating point) datatypes FP16/32/64 for the parameters and activations using the MultiLayerNetwork/ComputationGraph.convertDataType(DataType) methods (Linkarrow-up-right, Linkarrow-up-right)

  • EmbeddingLayer and EmbeddingSequenceLayer builders now have .weightInit(INDArray) and .weightInit(Word2Vec) methods for initializing parameters from pretrained word vectors (Linkarrow-up-right)

  • PerformanceListener can now be configured to report garbage collection information (number/duration) Linkarrow-up-right

  • Evaluation class will now check for NaNs in the predicted output and throw an exception instead treating argMax(NaNs) as having value 0 (Linkarrow-up-right)

  • Added ModelAdapter for ParallelInference for convenience and for use cases such as YOLO (allows improved performance by avoiding detached (out-of-workspace) arrays) (Link)

  • Added GELU Activation function (Linkarrow-up-right)

  • Added BertIterator (a MultiDataSetIterator for BERT training - supervised and unsupervised) Linkarrow-up-right

  • Added validation to MultiLayerNetwork/ComputationGraph that throws an exception when attempting to perform Regression evaluation on a classifier, or vice-versa (Linkarrow-up-right, Linkarrow-up-right)

  • Added ComputationGraph.output(List<String> layers, boolean train, INDArray[] features, INDArray[] featureMasks) method to get the activations for a specific set of layers/vertices only (without redundant calculations) (Linkarrow-up-right)

  • Weight initialization for networks is now implemented as classes (not just enumerations) and hence is now extesible via IWeightInit interface (Linkarrow-up-right); i.e., custom weight initializations are now supported (Linkarrow-up-right, Linkarrow-up-right)

  • Added Capsule Network layers (no GPU acceleration until next release) - CapsuleLayerarrow-up-right, CapsuleStrengthLayerarrow-up-right and PrimaryCapsulesarrow-up-right (Linkarrow-up-right)

  • Added Cifar10DataSetIterator to replace CifarDataSetIterator (Linkarrow-up-right, Linkarrow-up-right)

  • Keras import: Importing models from InputStream is now supported (Linkarrow-up-right, Linkarrow-up-right)

  • Layer/NeuralNetConfiguration builders now have getter/setter methods also, for better Kotlin support (Linkarrow-up-right)

  • Most JavaScript dependencies and fonts for UI have been migrated to WebJars (Linkarrow-up-right)

  • CheckpointListener now has static availableCheckpoints(File), loadCheckpointMLN(File, int) and lostLastCheckpointMLN(File) etc methods (Linkarrow-up-right)

  • MultiLayerNetwork/ComputationGraph now validate and throw an exception in certain incompatible RNN configurations, like truncated backpropagation through time combined with LastTimeStepLayer/Vertex (Linkarrow-up-right)

  • Added BERT WordPiece tokenizers (Linkarrow-up-right)

  • Deeplearning4j UI now has multi-user/multi-session support - use UIServer.getInstance(boolean multiSession, Function<String,StatsStorage>) to start UI in multi-session mode (Linkarrow-up-right)

  • Layer/NeuralNetworkConfiguration builder method validation standardized and improved (Linkarrow-up-right)

  • WordVectorSerializer now supports reading and exporting text forwat vectors via WordVectorSerializer.writeLookupTable and readLookupTable (Linkarrow-up-right]

  • Updated to JavaCPP, JavaCPP presets, and JavaCV version 1.5 (Linkarrow-up-right)

  • Added EvaluationBinary false alarm rate calculation (Linkarrow-up-right)

  • ComputationGraph GraphBuilder now has an appendLayer method that can be used to add layers connected to the last added layer/vertex (Linkarrow-up-right)

  • Added Wasserstein loss function (Linkarrow-up-right)

  • Keras import: Improved errors/exceptions for lambda layer import (Linkarrow-up-right)

  • Apache Lucene/Solr upgraded from 7.5.0 to 7.7.1 (Linkarrow-up-right)

  • KMeans clustering strategy is now configurable (Linkarrow-up-right)

  • Fixed a bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout (Linkarrow-up-right, Linkarrow-up-right)
  • Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs (Linkarrow-up-right)

  • Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores (Linkarrow-up-right)

  • Fixed an edge case with ParallelInference on CUDA where (very rarely) input array operations (such as normalization) may not be fully completed before transferring an array between threads (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an edge case with KFoldIterator when the total number of examples is not a multiple of the batch size (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where DL4J UI could throw a NoClassDefFoundError on Java 9/10/11 (Linkarrow-up-right, Linkarrow-up-right)

  • Keras import: added aliases for weight initialization (Linkarrow-up-right)

  • Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned (Linkarrow-up-right)

  • Fixed workspace issue with ElementwiseVertex with single input (Linkarrow-up-right)

  • Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception (Linkarrow-up-right)

  • Fixed issue where LossMultiLabel would generate NaNs when all labels in minibatch are the same class. Now 0 gradient is returned instead. (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format (Linkarrow-up-right)

  • Fixed issue where BaseDatasetIterator.next() would not apply preprocessors, if one was set (Linkarrow-up-right)

  • Improved default configuration for CenterLossOutputLayer (Linkarrow-up-right)

  • Fixed an issue for UNet non-pretrained configuration (Linkarrow-up-right)

  • Fixed an issue where Word2Vec VocabConstructor could deadlock under some circumstances (Linkarrow-up-right)

  • SkipGram and CBOW (used in Word2Vec) were made native operations for better performance (Linkarrow-up-right)

  • Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener (Linkarrow-up-right)

  • Optimization: Workspaces were added to SequenceVectors and Word2Vec (Linkarrow-up-right)

  • Improved validation for RecordReaderDataSetIterator (Linkarrow-up-right)

  • Improved handling of unknown words in WordVectors implementation (Linkarrow-up-right)

  • Yolo2OutputLayer: Added validation for incorrect labels shape. (Linkarrow-up-right)

  • LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) (Linkarrow-up-right)

  • Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases (Linkarrow-up-right)

  • Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() (Linkarrow-up-right)

  • Async iterators are now used in EarlyStoppingTrained to improve data loading performance (Linkarrow-up-right)

  • EmbeddingLayer and EmbeddingSequenceLayer performance has been improved on CUDA (Linkarrow-up-right)

  • Removed outdated/legacy scala tools repository (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issues in L2NormalizeVertex equals/hashcode methods (Linkarrow-up-right)

  • Fixed Workspace issue in ConvolutionalListener (Linkarrow-up-right)

  • Fixed EvaluationBinary falsePositiveRate calculation (Linkarrow-up-right)

  • Added validation and useful exception for MultiLayerNetwork.output(DataSetIterator) methods (Linkarrow-up-right)

  • Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called (Linkarrow-up-right)

  • Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training (Linkarrow-up-right)

  • Improved performance for KMeans implementation (Linkarrow-up-right)

  • Fixed an issue with rnnGetPreviousState for RNNs in 'wrapper' layers such as FrozenLayer (Linkarrow-up-right)

  • Keras import: Fixed an issue with order of words when importing some Keras tokenizers (Linkarrow-up-right)

  • Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class (Linkarrow-up-right)

  • Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers (Linkarrow-up-right)

  • Keras import: fixed an import issue with input type inference for some RNN models (Linkarrow-up-right)

  • Fixed some padding issues in LocallyConnected1D/2D layers (Linkarrow-up-right)

  • Added Nd4j.createFromNpzFile method to load Numpy npz files (Linkarrow-up-right)

  • Added support for importing BERT models into SameDiff (Linkarrow-up-right, Linkarrow-up-right)

  • Added SameDiff GraphTransformUtil for performing transfer learning and other graph modifications (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Evaluation, RegressionEvaluation etc now support 4d (CNN segmentation) data formats; also added Evaluation.setAxis(int) method to support other data formats such as channels-last/NHWC for CNNs and NWC for CNN1D/RNNs. Defaults to axis 1 (which matches DL4J CNN and RNN data formats) (Linkarrow-up-right, Linkarrow-up-right)

  • Added basic ("technology preview") of SameDiff UI. Should be considered early WIP with breaking API changes expected in future releases. Supports plotting of SameDiff graphs as well as various metrics (line charts, histograms, etc)

    • Currenty embedding in the DL4J UI - call UIServer.getInstance() then go to localhost:9000/samediff to access.

    • For more details, see 1arrow-up-right, 2arrow-up-right, 3arrow-up-right

  • Added DotProductAttention and MultiHeadDotProductAttention operations (Linkarrow-up-right)

  • Added Nd4j.exec(Op) and Nd4j.exec(CustomOp) convenience methods (Linkarrow-up-right)

  • ND4J/SameDiff - new operations added:

    • NonMaxSuppressionarrow-up-right, LogMatrixDeterminantarrow-up-right, NthElementarrow-up-right, TruncateModarrow-up-right

    • Cholesky Decompositionarrow-up-right, Image resize nearest neighborarrow-up-right, crop_and_resizearrow-up-right

    • , , ),

    • , ,

  • SameDiff TensorFlow Import

    • Import of TF Assertions added (Linkarrow-up-right)

    • Support/fixes for control dependencies (Linkarrow-up-right)

    • Support/fixes for TensorArray and related ops (, , )

  • nd4j-common - tar/tar.gz support added; Zip file listing and single file extraction added (Linkarrow-up-right, Linkarrow-up-right)

  • SameDiff: reductions operations now support "dynamic" (non-constant) inputs for axis argument (Linkarrow-up-right)

  • ROCBinary now has .getROC(int outputNum) method (Linkarrow-up-right)

  • SameDiff: L1/L2 regularization added (Linkarrow-up-right, Linkarrow-up-right)

  • SameDiff: Added SDVariable.convertToVariable() and convertToConstant() - to change SDVariable type (Linkarrow-up-right)

  • Added checks and useful exceptions for reductions on empty arrays (Linkarrow-up-right)

  • SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Linkarrow-up-right)

  • SameDiff TensorFlow import: import can now be overridden for cases such as user-defined functions (Linkarrow-up-right, Linkarrow-up-right)

  • Libnd4j (c++) benchmarking framework added (Linkarrow-up-right)

  • Added OpExecutioner.inspectArray(INDArray) method to get summary statistics for analysis/debugging purposes (Linkarrow-up-right)

  • Added INDArray.reshape(char order, boolean enforceView, long... newShape) to reshape array whilst throwing an exception (instead of returning a copy) if the reshape cannot be performed (Linkarrow-up-right, Linkarrow-up-right)

  • Added SDVariable method overloads (plus, minus, times, etc) for Kotlin (Linkarrow-up-right)

  • Added SDVariable convenience methods for dot, reshape, permute (Linkarrow-up-right)

  • Added SameDiff SDIndex.point(long, boolean keepDim) method (to keep point indices in output array as size 1 axis) (Linkarrow-up-right)

  • Added SameDiff ProtoBufToFlatBufConversion command line tool for doing TensorFlow frozen model (protobuf) to SameDiff FlatBuffers conversion (Linkarrow-up-right)

  • Improved DataType validation for SameDiff operations (Linkarrow-up-right)

  • Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. (Linkarrow-up-right)
  • Removed Nd4j.ENFORCE_NUMERICAL_STABILITY, Nd4j.copyOnOps, etc (Linkarrow-up-right)

  • SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Linkarrow-up-right)

  • Nd4j.emptyLike(INDArray) has been removed. Use Nd4j.like(INDArray) instead (Linkarrow-up-right)

  • org.nd4jutil.StringUtils removed; suggest using Apache commons lang3 StringUtils instead (Linkarrow-up-right)

  • ND4J Jackson RowVector(De)Serializer has been deprecated due to datatype changes; NDArrayText(De)Serializer should be used instead (Linkarrow-up-right, Linkarrow-up-right)

  • nd4j-instrumentation module has been removed due to lack of use/maintenance (Linkarrow-up-right)

  • SameDiff: Numerous fixes and enhancements
    • 1arrow-up-right, 2arrow-up-right, 3arrow-up-right, 4arrow-up-right

    • Improved functionality for losses (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

    • Improved errors for missing/misspelled placeholders (Linkarrow-up-right)

    • Fixed edge cases in loops (, )

  • Fixed issue with Nd4j.vstack on 1d arrays returning 1d output, not 2d stacked output (Linkarrow-up-right)

  • Conv2D op can infer kernel size from input arrays directly when required (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue with Numpy format export - Nd4j.toNpyByteArray(INDArray) (Linkarrow-up-right)

  • Fixes for SameDiff when it is used within an external workspace (Linkarrow-up-right)

  • Fixed an issue where empty NDArrays would be reported as having scalar shape information, length 1 (Linkarrow-up-right)

  • Optimization: libnd4j (c++) indexing for ops will use uint for faster offset calculations when required and possible (Linkarrow-up-right)

  • Optimization: libnd4j loops performance improved for faster execution of some operations (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Local response normalization op optimized (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue with INDArray.repeat on some view arrays (Linkarrow-up-right)

  • Improved performance for execution of some operations on view arrays (Linkarrow-up-right)

  • Improved performance on broadcast operations (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Improved performance for non-EWS reduction along dimension operations (Linkarrow-up-right)

  • Improved performance fo IndexReduce operations (Linkarrow-up-right) and small reductions (Linkarrow-up-right)

  • Improved performonce of one_hot operation (Linkarrow-up-right), tanh operation (Linkarrow-up-right)

  • Improved performance for transform operations (Linkarrow-up-right)

  • Optimization: empty arrays are created only once and cached (as they are immutable) (Linkarrow-up-right)

  • Improved performance on operations using tensor along dimension for parallelization (Linkarrow-up-right, Linkarrow-up-right)

  • Improved performance on "reduce 3" reduction operations (Linkarrow-up-right)

  • Improved handling of CUDA contexts in heavily multi-threaded environments (Linkarrow-up-right)

  • Fixed an issue where Evaluation.reset() would incorrectly clear the String class labels (Linkarrow-up-right)

  • SameDiff: Improved gradient calculation performance/efficiency; "gradients" are now no longer defined for non-floating-point variables, and variables that aren't required to calculate loss or parameter gradients (Linkarrow-up-right)

  • Behaviour of IEvaluation instances now no longer depends on the global (default) datatype setting (Linkarrow-up-right)

  • INDArray.get(point(x), y) or .get(y, point(x)) now returns rank 1 arrays when performed on rank 2 arrays (Linkarrow-up-right)

  • Removed reliance on Guava for SameDiff, fixing potential issue for Java 11/12 and when earlier versions of Guava are on the classpath (Linkarrow-up-right, Linkarrow-up-right)

  • ND4J indexing (INDArray.get) implementation rewritten for better performance and reliability (Linkarrow-up-right)

  • Fixes for local response normalization backprop op (Linkarrow-up-right)

  • Added StreamInputSplit for creating local data pipelines where data is stored remotely on storage such as HDFS or S3 (Linkarrow-up-right, Linkarrow-up-right)
  • LineRecordReader (and subtypes) now have the option to define the character set (Linkarrow-up-right)

  • Added TokenizerBagOfWordsTermSequenceIndexTransform (TFIDF transform), GazeteerTransform (binary vector for word present) and MultiNlpTransform transforms; added BagOfWordsTransform interface (Linkarrow-up-right)

  • ParallelInference: Instances can now update the model in real-time (without re-init) Linkarrow-up-right

  • ParallelInferenc: Added ParallelInference INPLACE mode Linkarrow-up-right

  • Added validation for incompatible loss/activation function combinations (such as softmax+nOut=1, or sigmoid+mcxent). New validation can be disabled using outputValidation(false) Linkarrow-up-right

  • Spark training: Added full fault tolerance (robust failure recovery) for gradient sharing implementation Linkarrow-up-right Linkarrow-up-right

  • Spark training now supports configuring ports more flexibly (and differently for different workers) using PortSupplier Linkarrow-up-right Linkarrow-up-right

  • Spark training: overhauled gradient sharing threshold adaption algorithms; made it possible to customize threshold settings, plus made defaults more robust to initial threshold configuration improving convergence speed in some cases. Linkarrow-up-right

  • Spark training: implemented chunked messaging to reduce memory requirements (and insufficient buffer length issues) for large messages Linkarrow-up-right

  • Spark training: Added MeshBuildMode configuration for improved scalability for large clusters Linkarrow-up-right

  • Spark network data pipelines: added FileBatch, FileBatchRecordReader etc for "small files" (images etc) distributed training use cases Linkarrow-up-right

  • Added FailureTestingListener for fault tolerance/debugging purposes Linkarrow-up-right

  • Upgraded Apache Lucene/Solr to version 7.5.0 (from 7.4.0) Linkarrow-up-right

  • Added system properties (org.deeplearning4j.tempdir and org.nd4j.tempdir) to allow overriding of the temporary directories ND4J and DL4J use Linkarrow-up-right Linkarrow-up-right

  • Mode MultiLayerNetwork/ComputationGraph.clearLayerStates methods public (was protected) Linkarrow-up-right

  • AbstactLayer.layerConf() method is now public Linkarrow-up-right

  • ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper Linkarrow-up-right

  • Improved validation and error mesages for invalid inputs/labels in Yolo2OutputLayer Linkarrow-up-right

  • Spark training: added SharedTrainingMaster.Builder.workerTogglePeriodicGC and .workerPeriodicGCFrequency to easily configure the ND4J garbage collection configuration on workers. Set default GC to 5 seconds on workers Linkarrow-up-right

  • Spark training: added threshold encoding debug mode (logs current threshold and encoding statistics on each worker during training). Enable using SharedTrainingConfiguration.builder.encodingDebugMode(true). Note this operation has computational overhead. Linkarrow-up-right

  • Fixed an issue where EarlyStoppingScoreCalculator would not correctly handle "maximize score" cases instead of minimizing Linkarrow-up-right

  • Fixed order (BGR vs. RGB) for VGG16ImagePreProcessor channel offset values Linkarrow-up-right

  • Fixed bug with variational autoencoders using weight noise Linkarrow-up-right

  • Fixed issue with BaseDataSetIterator not respecting the 'maximum examples' configuration Linkarrow-up-right

  • Optimization: A workspace is now used for ComputationGraph/MultiLayerNetwork evaluation methods (avoids allocating off-heap memory during evaluation that must be cleaned up by garbage collector) Linkarrow-up-right

  • Fixed an issue where shuffling combined with a subset for MnistDataSetIterator would not maintain the same subset between resets Linkarrow-up-right

  • Fixed issue with StackVertex.getOutputType Linkarrow-up-right

  • Fix issue with CNN to/from RNN preprocessors handling of mask arrays Linkarrow-up-right

  • Fixed issue with VGG16 non-pretrained configuration in model zoo Linkarrow-up-right

  • Fixed issue with TransferLearning nOutReplace where multiple layers in a row are modified Linkarrow-up-right

  • Fixed issue with CuDNN workspaces where backpropagation is performed outside of a standard fit call Linkarrow-up-right

  • Fixed an issue with dropout masks being cleared prematurely on output layers in ComputationGraph Linkarrow-up-right

  • RecordReaderMultiDataSetIterator now supports 5D arrays (for 3D CNNs) Linkarrow-up-right

  • Fixed bug in multi input/output ComputationGraphs with TBPTT combined with both masking and different number of input/output arrays Linkarrow-up-right

  • Improved input validation/exceptions for batch normalization layer Linkarrow-up-right

  • Fixed bug with TransferLearning GraphBuilder nOutReplace when combined with subsampling layers Linkarrow-up-right

  • SimpleRnnParamInitializer now properly respects bias initialization configuration Linkarrow-up-right

  • Fixed SqueezeNet zoo model non-pretrained configuration Linkarrow-up-right

  • Fixed Xception zoo model non-pretrained configuration Linkarrow-up-right

  • Fixed an issue with some evaluation signatures for multi-output ComputationGraphs Linkarrow-up-right

  • Improved MultiLayerNetwork/ComputationGraph summary method formatting for large nets Linkarrow-up-right

  • Fixed an issue where gradient normalization could result in NaNs if gradient is exactly 0.0 for all parameters in a layer Linkarrow-up-right

  • Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate could throw an exception for SGD and NoOp updaters Linkarrow-up-right

  • Fixed an issue with StackVertex plus masking in some rare cases Linkarrow-up-right

  • Fixed an issue with JSON deserialization of frozen layers in pre-1.0.0-alpha format Linkarrow-up-right

  • Fixed an issue where GraphBuilder.removeVertex can fail under some limited circumstances Linkarrow-up-right

  • Fixed a bug in CacheableExtractableDataSetFetcher Linkarrow-up-right

  • DL4J Spark training: Fixed issues with thread/device affinity for multi-GPU training + evaluation Linkarrow-up-right

  • DL4J Spark training: Made all Aeron threads daemon threads to prevent Aeron from stopping JVM shutdown when all other threads have completed Linkarrow-up-right

  • Added cudnnAllowFallback configuration for BatchNormalization layer (fallback to built-in implementation if CuDNN fails unexpectedly) Linkarrow-up-right

  • Fixed some rare concurrency issues with multi-worker (multi-GPU) nodes for Spark training Linkarrow-up-right Linkarrow-up-right

  • Fixed an issue with BatchNormalization layers that prevented the mean/variance estimates from being synced properly on each worker for GradientSharing training, causing convergence issues Linkarrow-up-right

  • Added a check to detect ZipSlip CVE attempts in ArchiveUtils Linkarrow-up-right

  • DL4J Spark training and evaluation: methods now use Hadoop Configuration from Spark context to ensure runtime-set configuration is available in Spark functions reading directly from remote storage (HDFS etc) Linkarrow-up-right

  • MultiLayerNetwork and ComputationGraph now properly support more than Integer.MAX_VALUE parameters Linkarrow-up-right Linkarrow-up-right

  • Added data validation for Nd4j.readTxt - now throws exception on invalid input instead of returning incorrect values Linkarrow-up-right

  • Fixed an issue with KNN implementation where a deadlock could occur if an invalid distance function (one returning "distances" less than 0) was utilized Linkarrow-up-right

  • Added synchronization to loading of Keras import models to avoid thread safety issues in the underlying HDFS library used for loading Linkarrow-up-right

  • Fixed rare issue for Async(Multi)DataSetIterator with large prefetch values Linkarrow-up-right

  • deeplearning4j-parallel-wrapper
    which should be used instead
  • deeplearning4j-nlp-korean module now has Scala version suffix due to scala dependencies; new artifact ID is deeplearning4j-nlp-korean_2.10 and deeplearning4j-nlp-korean_2.11 Linkarrow-up-right

  • Fixed issue with importing models with reshaping after an embedding layer Linkarrow-up-right
  • Added support for Keras masking layers Linkarrow-up-right

  • Fixed JSON deserialization issue with some layers/preprocessors, such as Permute Linkarrow-up-right

  • Fixed issue with Keras import of Nadam configuration Linkarrow-up-right

  • Added MKL-DNN support for some operations (Conv2d, etc) Linkarrow-up-right

  • Upgraded ND4J (and DataVec) to Arrow 0.11.0 Linkarrow-up-right, which also fixes Linkarrow-up-right

  • Added Nd4j.where op method (same semantics as numpy.where) Linkarrow-up-right

  • Added Nd4j.stack op method (combine arrays + increase array rank by 1) Linkarrow-up-right

  • Libnd4j new ops:

    • Matrix band part Linkarrow-up-right

    • Scatter ND, ND-add, ND-sub and ND-update ops Linkarrow-up-right

    • Sparse softmax cross entropy loss with logits

    • Histogram fixed width op

    • broadcast_to op

    • deconv3d op added

    • Unsorted segment ops added

    • Segment_X backprop ops added

    • batchnorm_new op added that supports multiple axes for mean/variance

    • GRU cell backprop added

  • Nd4j Preconditions class now has methods for formatting INDArray arguments Linkarrow-up-right, Linkarrow-up-right

  • SameDiff loss functions: cleanup plus forward pass implementation Linkarrow-up-right

  • CudaGridExecutioner now warns that exception stack traces may be delayed to avoid confusion in debugging exceptions occuring during asynchronous execution of ops Linkarrow-up-right

  • JavaCPP and JavaCPP-presets have been upgraded to version 1.4.3 Linkarrow-up-right

  • Improved Javadoc on SDVariable class Linkarrow-up-right

  • Dot operation backprop Linkarrow-up-right, determinant Linkarrow-up-right

  • Backprop op fix for the broadcast case for some pairwise transform custom op implementations Linkarrow-up-right

  • Fix for reverse custom op with rank 1 inputs Linkarrow-up-right

  • ATan2 op is now broadcastable

  • Boolean custom op broadcast fixes/additions

  • Scatter op edge case fixes

  • ArgMin shape function fix , negative axis fix

  • Unique op fix

  • Pad op fix

  • Fixed where op shape function

  • SVD rank 1 edge case fix

  • Range op

  • Split and space_to_batch fixes

  • Broadcast dynamic shape

  • embedding_lookup op now supports multiple input arrays

  • Matrix determinant op edge case (rank 0 result) shape fix

  • SameDiff TensorFlow import: fixes for multiple operations Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right

  • SameDiff: Improved error handling for multiple outputs case Linkarrow-up-right

  • Fixed issue where INDArray.permute would not correctly throw an exception for invalid length case Linkarrow-up-right

  • Fixed issues with INDArray.get/put with SpecifiedIndex Linkarrow-up-right, Linkarrow-up-right

  • Minor change to DataSet.merge - signature now accepts any DataSet subtypes Linkarrow-up-right

  • INDArray.transposei operation was not in-place Linkarrow-up-right

  • Fixed issues with INDArray.mmul with MMulTranspose Linkarrow-up-right

  • Added additional order validation for ND4J creation methods (create, rand, etc) Linkarrow-up-right

  • Fix for ND4J binary deserialization (BinarySerde) when deserializing from heap byte buffers Linkarrow-up-right

  • Fixed issue with Nd4j-common ClassPathResource path resolution in some IDEs Linkarrow-up-right

  • Fixed issue where INDArray.get(interval) on rank 1 array would return rank 2 array Linkarrow-up-right

  • Fixed a validation issue with Nd4j.gemm/mmuli on views Linkarrow-up-right Linkarrow-up-right

  • INDArray.assign(INDArray) no longer allows assigning different shape arrays (other than scalar/vector cases) Linkarrow-up-right

  • NDarrayStrings (and INDArray.toString()) now always uses US locale when formatting numbers Linkarrow-up-right

  • Fixed an issue with GaussianDistribution specific to V100 GPUs Linkarrow-up-right

  • Fixed an issue with bitmap compression/encoding specific to V100 GPUs Linkarrow-up-right

  • Transforms.softmax now throws an error on unsupported shapes instead of simply not applying operation Linkarrow-up-right

  • VersionCheck functionality: handle case where SimpleFileVisitor is not available on earlier versions of Android Linkarrow-up-right

  • SameDiff convolution layer configuration (Conv2dConfig/Conv3dConfig/Pooling3dConfig etc) have had parameter names aligned Linkarrow-up-right

  • Added SerializableHadoopConfiguration and BroadcastHadoopConfigHolder for cases where a Hadoop configuration is required in Spark functions Linkarrow-up-right Linkarrow-up-right
  • Fixed issue with JDBCRecordReader's handling of real-valued column result types Linkarrow-up-right

  • Added validation and useful exception for CSVRecordReader/LineRecordReader being used without initialization Linkarrow-up-right

  • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

  • ND4J: nd4j-native-platform will now use Intel MKL-DNN as the default/bundled BLAS implementation (replacing OpenBLAS as the previous default)

  • Deeplearning4j: Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs (to assist with debugging and tuning memory configuration).

  • Deeplearning4j - new layers: Locally connected 1d Linkarrow-up-right, Locally connected 2d Linkarrow-up-right

  • Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs. Same information is available (without a crash) for MultiLayerNetwork/ComputationGraph.memoryInfo methods. Can be disabled (or output directory set) using system propertiesarrow-up-right - Linkarrow-up-right
  • Added Composite[Multi]DataSetPreProcessor to enable multiple [Multi]DataSetPreProcessors to be applied in a single iterator Linkarrow-up-right

  • Added ComputationGraph evaluate methods for multi-output networks: evaluate(DataSetIterator, Map<Integer,IEvaluation[]>) and evaluate(MultiDataSetIterator, Map<Integer,IEvaluation[]>) Linkarrow-up-right

  • Added JointMultiDataSetIterator - utility iterator used to create MultiDataSetIterator from multiple DataSetIterators Linkarrow-up-right

  • GraphVertices may now have trainable parameters directly (not just enclose layers with trainable parameters) Linkarrow-up-right

  • Added MultiLayerNetwork/ComputationGraph getLearningRate methods Linkarrow-up-right

  • Added RandomDataSetIterator and RandomMultiDataSetIterator (mainly for testing/debugging) Linkarrow-up-right Linkarrow-up-right

  • Added cyclical "1cycle" schedule for learning rate schedules etc - Linkarrow-up-right

  • RDD repartitioning for Spark training is more configurable (adds Repartitioner interface) Linkarrow-up-right

  • Added ComputationGraph.getIterationCount() and .getEpochCount() for consistency with MultiLayerNetwork Linkarrow-up-right

  • Added locally connected 1d layer Linkarrow-up-right Linkarrow-up-right

  • Spark "data loader" API (mainly for Spark) Linkarrow-up-right Linkarrow-up-right Linkarrow-up-right

  • Spark evaluation: added evaluation method overloads that allow specifying the number of evaluation workers (less than number of Spark threads) Linkarrow-up-right

  • CnnSentenceDataSetIterator now has a Format argument, and supports outputting data for RNNs and 1D CNNs Linkarrow-up-right

  • Added ComputationGraph/MultiLayerNetwork.pretrain((Multi)DataSetIterator, int epochs) method overloads Linkarrow-up-right

  • MultiLayerNetwork and ComputationGraph now have output method overloads where the network output can be placed in the user-specified workspace, instead of being detached Linkarrow-up-right Linkarrow-up-right. This can be used to avoid creating INDArrays that need to be garbage collected before native memory can be freed.

  • EmbeddingSequenceLayer now supports [minibatch,1,seqLength] format sequence data in addition to [minibatch,seqLength] format data Linkarrow-up-right

  • CuDNN batch norm implementation will now be used for rank 2 input, not just rank 4 input Linkarrow-up-right

  • Environment variables and system properties for DL4J have been centralized into DL4JResources and DL4JEnvironmentVars classes, with proper descriptions Linkarrow-up-right Linkarrow-up-right

  • MultiLayerNetwork and ComputationGraph output/feedForward/fit methods are now thread-safe via synchronization. Note that concurrent use is not recommended due to performance (instead: use ParallelInference); however the now-synchronized methods should avoid obscure errors due to concurrent modifications Linkarrow-up-right

  • BarnesHutTSNE now throws a useful exception in the case where the distance metric is undefined (for example, all zeros plus cosine similarity) Linkarrow-up-right

  • Fixed issue where OutputLayer may not initialize parameter constraints correctly Linkarrow-up-right
  • Fixed performance issue with Nesterov updater using CPU-only op for CUDA execution Linkarrow-up-right

  • Removed TerminationCondition for DL4J optimizers - was not used in practice, and had minor overhead Linkarrow-up-right

  • Fixed issue where EvaluativeListener could hit a workspace validation exception when workspaces are enabled Linkarrow-up-right

  • Fixed issue where TrainingListener.onEpochStart/onEpochEnd were not being called correctly for ComputationGraph Linkarrow-up-right

  • Fixed workspace issue with TensorFlowCnnToFeedForwardPreProcessor Linkarrow-up-right

  • Performance optimization for BatchNormalization when using CuDNN Linkarrow-up-right

  • Performance optimization: Dropout will be applied in-place when safe to do so, avoiding a copy Linkarrow-up-right

  • Added CuDNN implementation of Dropout Linkarrow-up-right

  • Reduced memory use for CuDNN: CuDNN working memory is now shared and reused between layers within a network Linkarrow-up-right

  • CuDNN batch normalization implementation would fail with FP16 datatype Linkarrow-up-right

  • Fixed issue Bidirectional LSTM may incorrectly use workspaces causing an exception Linkarrow-up-right

  • Fixed issue with early stopping where scores to be maximized (accuracy, f1, etc) were not properly triggering termination conditions Linkarrow-up-right

  • Fixed issue where label mask counter could be incorrectly incremented in ComputationGraph.computeGradientAndScore() Linkarrow-up-right

  • ComputationGraph was not setting lastEtlTime field during training Linkarrow-up-right

  • Fixed issue with AutoEncoder layer when workspaces are enabled Linkarrow-up-right

  • Fixed issue with EmbeddingSequenceLayer use of mask arrays Linkarrow-up-right

  • Lombok is now provided scope everywhere, isn't on user classpath when using DL4J Linkarrow-up-right

  • Fixed issue where WordVectorSerializer.readParagraphVectors(File) initialization of label source Linkarrow-up-right

  • Spark training (gradient sharing) now properly handles empty partition edge case when encountered during training Linkarrow-up-right

  • Errors are propagated better/more consistently for Spark gradient sharing training Linkarrow-up-right

  • Fixed issue with 1D CNN layers with mask arrays and stride > 1 (masks not being correctly downsized) Linkarrow-up-right

  • DL4J Batch norm implementation was not correctly adding epsilon value during inference, only during training (CuDNN unaffected) Linkarrow-up-right

  • CuDNN subsampling layers with max pooling and ConvolutionMode.SAME may have taken padding value (0) as the maximum for border values when all non-padding values are less than 0 Linkarrow-up-right

  • Spark training with gradient sharing now passes listeners to workers correctly Linkarrow-up-right

  • Fixed rare (and non-terminal) concurrent modification issue with UI and FileStatsStorage Linkarrow-up-right

  • CuDNN convolution layer now supports dilation > 2 (previously: used DL4J conv layer implementation as a fallback) Linkarrow-up-right

  • Yolo2OutputLayer now implements computeScoreForExamples() Linkarrow-up-right

  • SequenceRecordReeaderDataSetIterator now handles the "no labels" case correctly Linkarrow-up-right

  • Fixed issue where BarnesHutTSNE could hit a workspace validation exception Linkarrow-up-right

  • EMNIST iterator could produce incorrect data in some cases after a reset Linkarrow-up-right

  • . This provents ambiguity in later cases (pooling layers), and allows for more complex masking scenarios (such as masking for different image sizes in same minibatch).
  • Some older/deprecated Model and Layer methods have been removed. (validateInput(), initParams()). Some custom layers may need to be updated as a result Linkarrow-up-right

  • Keras Lambda layers can now be imported by registering custom SameDiff layers

  • All Keras optimizers are now supported

  • All advanced activation functions can now be imported.

  • Many minor bugs have been fixed, including proper weight setting for all configurations of BatchNormalization, improvements to Reshape SeparableConvolution2D, and full support of Bidirectional layers.

  • SameDiff: A significant number of new ops, and backprop implementations for existing ops

  • Added Nd4j.randomBernoulli/Binomial/Exponential convenience methods Linkarrow-up-right

  • Added way to disable/suppress ND4J initialization logging via org.nd4j.log.initialization system property Linkarrow-up-right

  • SameDiff class - most op/constructor methods now have complete/useful javadoc Linkarrow-up-right

  • Workspaces can now be disabled globally, ignoring workspace configuration. This is mainly used for debugging; use Nd4j.getWorkspaceManager().setDebugMode(DebugMode.DISABLED) or Nd4j.getWorkspaceManager().setDebugMode(DebugMode.SPILL_EVERYTHING); to enable this. Linkarrow-up-right [Link]

  • Added EnvironmentalAction API for environment variable processing Linkarrow-up-right

  • ND4J environment variables and system properties have been centralized in ND4jEnvironmentVars and ND4jSystemProperties classes Linkarrow-up-right and Linkarrow-up-right

  • IActivation implementations now validate/enforce same shape for activations and gradients Linkarrow-up-right
  • Fixed issue with muliColumnVector where vector is 1d Linkarrow-up-right

  • ImagePreProcessingScaler now supports serialization via NormalizerSerializerStrategy and ModelSerializer Linkarrow-up-right

  • Performance optimization for threshold encoding used in DL4J's Spark gradient sharing distributed training implementation Linkarrow-up-right

  • SameDiff: Fixed issue where memory wasn't always released after execution Linkarrow-up-right

  • DataSet.save() and MultiDataSet.save() methods now save example metadata when present Linkarrow-up-right

  • Fixed issue with KFoldIterator when dataset does not divide equally into folds with no remainder Linkarrow-up-right

  • Fixed issue where version check functionality could fail to load resources if resources are on a path with spaces Linkarrow-up-right

  • Long-deprecated DataSet.getFeatureMatrix() has been removed. Use DataSet.getFeatures() instead. Linkarrow-up-right
  • Unused and not properly tested/maintained utility class BigDecimalMath has been removed. Users should find an aternative library for this functionality, if required.

  • Not properly maintained complex number support classes (IComplexNumber, IComplexNDArray) have been removed entirely Linkarrow-up-right

  • Added missing FloatColumnCondition Linkarrow-up-right

  • Added CSVLineSequenceRecordReader for "each line in CSV is a sequence, and sequence is single-valued/univariate" Linkarrow-up-right

  • Added CSVMultiSequenceRecordReader for "multiple multi-valued sequences in a single CSV" data Linkarrow-up-right

  • Added EmbeddingSequenceLayer (EmbeddingLayer for time series) Linkarrow-up-right
  • Added OCNNOutputLayer (one-class neural network) - implementation of this paperarrow-up-right - Linkarrow-up-right

  • Added FrozenLayerWithBackprop layer Linkarrow-up-right

  • Added DepthwiseConvolution2D layer Linkarrow-up-right

  • Added ComputationGraph.output(DataSetIterator) method Linkarrow-up-right

  • Added MultiLayerNetwork/ComputationGraph.layerInputSize methods Linkarrow-up-right Linkarrow-up-right

  • Added SparkComputationGraph.feedForwardWithKey overload with feature mask support Linkarrow-up-right

  • Added MultiLayerNetwork.calculateGradients method (for easily getting parameter and input gradients, for example for some model interpretabilithy approaches) Linkarrow-up-right Linkarrow-up-right

  • Added support to get input/activation types for each layer from configuration: ComputationGraphConfiguration.getLayerActivationTypes(InputType...), ComputationGraphConfiguration.GraphBuilder.getLayerActivationTypes(), NeuralNetConfiguration.ListBuilder.getLayerActivationTypes(), MultiLayerConfiguration.getLayerActivationTypes(InputType) methods Linkarrow-up-right

  • Evaluation.stats() now prints confusion matrix in easier to read matrix format, rather than list format Linkarrow-up-right

  • Added ModelSerializer.addObjectToFile, .getObjectFromFile and .listObjectsInFile for storing arbitrary Java objects in same file as saved network Linkarrow-up-right

  • Added SpatialDropout support (with Keras import support) Linkarrow-up-right

  • Added MultiLayerNetwork/ComputationGraph.fit((Multi)DataSetIterator, int numEpochs) overloads Linkarrow-up-right

  • Added performance (hardware) listeners: SystemInfoPrintListener and SystemInfoFilePrintListener Linkarrow-up-right

  • RecordReaderMultiDataSetIterator will no longer try to convert unused columns to numerical values Linkarrow-up-right

  • Added new model zoo models:

    • (to do)

  • Fixes for Android compilation (removed duplicate classes, aligned versions, removed some dependencies) Linkarrow-up-right Linkarrow-up-right Linkarrow-up-right

  • Fix for RecordReaderMulitDataSetIterator where output could be incorrect for some constructors Linkarrow-up-right

  • Non-frozen layers before a frozen layer will no longer be skipped during backprop (useful for GANs and similar architectures) Linkarrow-up-right Linkarrow-up-right

  • Fixed issue where ComputationGraph topological sort may not be consistent on all platforms; could sometimes break ComputationGraphs (with multiple valid topological orderings) trained on PC and deployed on Android Linkarrow-up-right

  • Fixed issue with CuDNN batch norm using 1-decay instead of decay Linkarrow-up-right

  • deeplearning4j-cuda no longer throws exceptions if present on classpath with nd4j-native backend set to higher priority Linkarrow-up-right

  • Added RNG control for CifarDataSetIterator Linkarrow-up-right

  • WordVectorSerializer now deletes temp files immediately once done Linkarrow-up-right

  • IterationListener has been deprecated in favor of TrainingListener. For existing custom listeners, switch from implements TrainingListener to extends BaseTrainingListener Linkarrow-up-right

  • ExistingDataSetIterator has been deprecated; use fit(DataSetIterator, int numEpochs) method instead

  • Fix for TransformProcessRecordReader batch support Linkarrow-up-right
  • Fix for TransformProcessRecordReader with filter operations Linkarrow-up-right

  • Fixed issue with ImageRecordReader/ParentPathLabelGenerator incorrectly filtering directories containing . character(s) Linkarrow-up-right

  • ShowImageTransform now initializes frame lazily to avoid blank windows Linkarrow-up-right

  • Remove use of Eclipse Collections library due to issues with Android compilation Linkarrow-up-right
  • Improved cleanup of completed models to reduce maximum memory requirements for training Linkarrow-up-right

  • DL4J: Large number of new layers and API improvements
  • DL4J: Keras 2.0 import support

  • Adds support for dilated convolutions (aka 'atrous' convolutions) - ConvolutionLayer, SubsamplingLayer, and 1D versions there-of. (Linkarrow-up-right)

  • Added Upsampling2D layer, Upsampling1D layer (Linkarrow-up-right, Linkarrow-up-right)

  • ElementWiseVertex now (additionally) supports Average and Max modes in addition to Add/Subtract/Product (Linkarrow-up-right)

  • Added SeparableConvolution2D layer (Linkarrow-up-right)

  • Added Deconvolution2D layer (aka transpose convolution, fractionally strided convolution layer) (Linkarrow-up-right)

  • Added ReverseTimeSeriesVertex (Linkarrow-up-right)

  • Added RnnLossLayer - no-parameter version of RnnOutputLayer, or RNN equivalent of LossLayer (Linkarrow-up-right)

  • Added CnnLossLayer - no-parameter CNN output layer for use cases such as segmentation, denoising, etc. (Linkarrow-up-right)

  • Added Bidirectional layer wrapper (converts any uni-directional RNN to a bidirectional RNN) (Linkarrow-up-right)

  • Added SimpleRnn layer (aka "vanilla" RNN layer) (Linkarrow-up-right)

  • Added LastTimeStep wrapper layer (wraps a RNN layer to get last time step, accounting for masking if present) (Linkarrow-up-right)

  • Added MaskLayer utility layer that simply zeros out activations on forward pass when a mask array is present (Linkarrow-up-right)

  • Added alpha-version (not yet stable) SameDiff layer support to DL4J (Note: forward pass, CPU only for now)(Linkarrow-up-right)

  • Added SpaceToDepth and SpaceToBatch layers (Linkarrow-up-right, Linkarrow-up-right)

  • Added Cropping2D layer (Linkarrow-up-right)

  • Added parameter constraints API (LayerConstraint interface), and MaxNormConstraint, MinMaxNormConstraint, NonNegativeConstraint, UnitNormConstraint implementations (Linkarrow-up-right)

  • Significant refactoring of learning rate schedules (Linkarrow-up-right)

    • Added ISchedule interface; added Exponential, Inverse, Map, Poly, Sigmoid and Step schedule implementations (Linkarrow-up-right)

    • Added support for both iteration-based and epoch-based schedules via ISchedule. Also added support for custom (user defined) schedules

    • Learning rate schedules are configured on the updaters, via the .updater(IUpdater) method

  • Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types (Linkarrow-up-right)

  • Added support for dropout schedules via ISchedule interface (Linkarrow-up-right)

  • Added weight/parameter noise API (IWeightNoise interface); added DropConnect and WeightNoise (additive/multiplicative Gaussian noise) implementations (Linkarrow-up-right); dropconnect and dropout can now be used simultaneously

  • Adds layer configuration alias .units(int) equivalent to .nOut(int) (Linkarrow-up-right)

  • Adds ComputationGraphConfiguration GraphBuilder .layer(String, Layer, String...) alias for .addLayer(String, Layer, String...)

  • Layer index no longer required for MultiLayerConfiguration ListBuilder (i.e., .list().layer(<layer>) can now be used for configs) (Linkarrow-up-right)

  • Added MultiLayerNetwork.summary(InputType) and ComputationGraph.summary(InputType...) methods (shows layer and activation size information) (Linkarrow-up-right)

  • MultiLayerNetwork, ComputationGraph and layerwise trainable layers now track the number of epochs (Linkarrow-up-right)

  • Added deeplearning4j-ui-standalone module: uber-jar for easy launching of UI server (usage: java -jar deeplearning4j-ui-standalone-1.0.0-alpha.jar -p 9124 -r true -f c:/UIStorage.bin)

  • Weight initializations:

    • Added .weightInit(Distribution) convenience/overload (previously: required .weightInit(WeightInit.DISTRIBUTION).dist(Distribution)) (Linkarrow-up-right)

    • WeightInit.NORMAL (for self-normalizing neural networks) (Linkarrow-up-right)

    • Ones, Identity weight initialization ()

    • Added new distributions (LogNormalDistribution, TruncatedNormalDistribution, OrthogonalDistribution, ConstantDistribution) which can be used for weight initialization ()

    • RNNs: Added ability to specify weight initialization for recurrent weights separately to "input" weights ()

  • Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) (Linkarrow-up-right)

  • Added Spark IteratorUtils - wraps a RecordReaderMultiDataSetIterator for use in Spark network training (Linkarrow-up-right)

  • CuDNN-supporting layers (ConvolutionLayer, etc) now warn the user if using CUDA without CuDNN (Linkarrow-up-right)

  • Binary cross entropy (LossBinaryXENT) now implements clipping (1e-5 to (1 - 1e-5) by default) to avoid numerical underflow/NaNs (Linkarrow-up-right)

  • SequenceRecordReaderDataSetIterator now supports multi-label regression (Linkarrow-up-right)

  • TransferLearning FineTuneConfiguration now has methods for setting training/inference workspace modes (Linkarrow-up-right)

  • IterationListener iterationDone method now reports both current iteration and epoch count; removed unnecessary invoke/invoked methods (Linkarrow-up-right)

  • Added MultiLayerNetwork.layerSize(int), ComputationGraph.layerSize(int)/layerSize(String) to easily determine size of layers (Linkarrow-up-right)

  • Added MultiLayerNetwork.toComputationGraph() method (Linkarrow-up-right)

  • Added NetworkUtils convenience methods to easily change the learning rate of an already initialized network (Linkarrow-up-right)

  • Added MultiLayerNetwork.save(File)/.load(File) and ComputationGraph.save(File)/.load(File) convenience methods (Linkarrow-up-right)

  • Added CheckpointListener to periodically save a copy of the model during training (every N iter/epochs, every T time units) (Linkarrow-up-right)

  • Added ComputationGraph output method overloads with mask arrays (Linkarrow-up-right)

  • New LossMultiLabel loss function for multi-label classification (Linkarrow-up-right)

  • Added new model zoo models:

    • Darknet19 (Linkarrow-up-right)

    • TinyYOLO (Linkarrow-up-right)

  • New iterators, and iterator improvements:

    • Added FileDataSetIterator, FileMultiDataSetIterator for flexibly iterating over directories of saved (Multi)DataSet objects (Linkarrow-up-right)

    • UCISequenceDataSetIterator (Linkarrow-up-right)

    • RecordReaderDataSetIterator now has builder pattern for convenience, improved javadoc ()

    • Added DataSetIteratorSplitter, MultiDataSetIteratorSplitter (, )

  • Added additional score functions for early stopping (ROC metrics, full set of Evaluation/Regression metrics, etc) (Linkarrow-up-right)

  • Added additional ROC and ROCMultiClass evaluation overloads for MultiLayerNetwork and ComputationGraph (Linkarrow-up-right)

  • Clarified Evaluation.stats() output to refer to "Predictions" instead of "Examples" (former is more correct for RNNs) (Linkarrow-up-right)

  • EarlyStoppingConfiguration now supports Supplier<ScoreCalculator> for use with non-serializable score calculators (Linkarrow-up-right)

  • Improved ModelSerializer exceptions when trying to load a model via wrong method (i.e., try to load ComputationGraph via restoreMultiLayerNetwork) (Linkarrow-up-right)

  • Added SparkDataValidation utility methods to validate saved DataSet and MultiDataSet on HDFS or local (Linkarrow-up-right)

  • ModelSerializer: added restoreMultiLayerNetworkAndNormalizer and restoreComputationGraphAndNormalizer methods (Linkarrow-up-right)

  • ParallelInference now has output overloads with support for input mask arrays (Linkarrow-up-right)

  • Fixed UI layer sizes for variational autoencoder layers (Linkarrow-up-right)
  • Fixes to avoid HDF5 library crashes (Linkarrow-up-right, Linkarrow-up-right)

  • UI Play servers switch to production (PROD) mode (Linkarrow-up-right)

  • Related to the above: users can now set play.crypto.secret system property to manually set the Play application secret; is randomly generated by default (Linkarrow-up-right).

  • SequenceRecordReaderDataSetIterator would apply preprocessor twice (Linkarrow-up-right)

  • Evaluation no-arg constructor could cause NaN evaluation metrics when used on Spark

  • CollectScoresIterationListener could recurse endlessly (Linkarrow-up-right)

  • Async(Multi)DataSetIterator calling reset() on underlying iterator could cause issues in some situations (Linkarrow-up-right)

  • In some cases, L2 regularization could be (incorrectly) applied to frozen layers (Linkarrow-up-right)

  • Logging fixes for NearestNeighboursServer (Linkarrow-up-right)

  • Memory optimization for BaseStatsListener (Linkarrow-up-right)

  • ModelGuesser fix for loading Keras models from streams (previously would fail) (Linkarrow-up-right)

  • Various fixes for workspaces in MultiLayerNetwork and ComputationGraph (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Fix for incorrect condition in DuplicateToTimeSeriesVertex (Linkarrow-up-right)

  • Fix for getMemoryReport exception on some valid ComputationGraph networks (Linkarrow-up-right)

  • RecordReaderDataSetIterator when used with preprocessors could cause an exception under some circumstances (Linkarrow-up-right)

  • CnnToFeedForwardPreProcessor could silently reshape invalid input, as long as the input array length matches the expected length (Linkarrow-up-right)

  • ModelSerializer temporary files would not be deleted if JVM crashes; now are deleted immediately when no longer required (Linkarrow-up-right)

  • RecordReaderMultiDataSetIterator may not add mask arrays under some circumstances, when set to ALIGN_END mode (Linkarrow-up-right)

  • ConvolutionIterationListener previously produced an IndexOutOfBoundsException when all convolution layers are frozen (Linkarrow-up-right)

  • PrecisionRecallCurve.getPointAtRecall could return a point with a correct but sub-optimal precision when multiple points had identical recall (Linkarrow-up-right)

  • Setting dropout(0) on transfer learning FineTuneConfiguration did not remove dropout if present on existing layer (Linkarrow-up-right)

  • Under some rare circumstances, Spark evaluation could lead to a NullPointerException (Linkarrow-up-right)

  • ComputationGraph: disconnected vertices were not always detected in configuration validation (Linkarrow-up-right)

  • Activation layers would not always inherit the global activation function configuration (Linkarrow-up-right)

  • RNN evaluation memory optimization: when TBPTT is configured for training, also use TBPTT-style splitting for evaluation (identical result, less memory) (Linkarrow-up-right, Linkarrow-up-right)

  • PerformanceListener is now serializable (Linkarrow-up-right)

  • ScoreIterationListener and PerformanceListener now report model iteration, not "iterations since listener creation" (Linkarrow-up-right)

  • Precision/recall curves cached values in ROC class may not be updated after merging ROC instances (Linkarrow-up-right)

  • ROC merging after evaluating a large number of examples may produce IllegalStateException (Linkarrow-up-right)

  • Added checks for invalid input indices to EmbeddingLayer (Linkarrow-up-right)

  • Fixed possible NPE when loading legacy (pre-0.9.0) model configurations from JSON (Linkarrow-up-right)

  • Fixed issues with EvaluationCalibration HTML export chart rendering (Linkarrow-up-right)

  • Fixed possible incorrect redering of UI/StatsStorage charts with J7FileStatsStorage when used with Spark training (Linkarrow-up-right)

  • MnistDataSetIterator would not always reliably detect and automatically fix/redownload on corrupted download data (Linkarrow-up-right)

  • MnistDataSetIterator / EmnistDataSetIterator: updated download location after hosting URL change (Linkarrow-up-right, Linkarrow-up-right)

  • Fixes to propagation of thread interruptions (Linkarrow-up-right)

  • MultiLayerNetwork/ComputationGraph will no longer throw an ND4JIllegalStateException during initialization if a network contains no parameters (Linkarrow-up-right, Linkarrow-up-right)

  • Fixes for TSNE posting of data to UI for visualization (Linkarrow-up-right)

  • PerformanceListener now throws a useful exception (in constructor) on invalid frequency argument, instead of runtime ArithmeticException (Linkarrow-up-right)

  • RecordReader(Multi)DataSetIterator now throws more useful exceptions when Writable values are non-numerical (Linkarrow-up-right)

  • UI: Fixed possible character encoding issues for non-English languages when internationalization data .txt files are read from uber JARs (Linkarrow-up-right)

  • UI: Fixed UI incorrectly trying to parse non-DL4J UI resources when loading I18N data (Linkarrow-up-right)

  • Various threading fixes (Linkarrow-up-right)

  • Evaluation: no-arg methods (f1(), precion(), etc) now return single class value for binary case instead of macro-averaged value; clarify values in stats() method and javadoc (Linkarrow-up-right)

  • Early stopping training: TrainingListener opEpochStart/End (etc) methods were not being called correctly (Linkarrow-up-right)

  • Fixes issue where dropout was not always applied to input of RNN layers (Linkarrow-up-right)

  • ModelSerializer: improved validation/exceptions when reading from invalid/empty/closed streams (Linkarrow-up-right)

  • ParallelInference fixes:

    • fixes for variable size inputs (variable length time series, variable size CNN inputs) when using batch mode (Linkarrow-up-right)

    • fixes undelying model exceptions during output method are now properly propagated back to the user (Linkarrow-up-right)

    • fixes support for 'pre-batched' inputs (i.e., inputs where minibatch size is > 1) ()

  • Memory optimization for network weight initialization via in-place random ops (Linkarrow-up-right)

  • Fixes for CuDNN with SAME mode padding (Linkarrow-up-right, Linkarrow-up-right)

  • Fix for VariationalAutoencoder builder decoder layer size validation (Linkarrow-up-right)

  • Improved Kmeans throughputlinkarrow-up-right

  • Add RPForest to nearest neighbors linkarrow-up-right

  • Previously deprecated updater configuration methods (.learningRate(double), .momentum(double) etc) all removed
    • To configure learning rate: use .updater(new Adam(lr)) instead of .updater(Updater.ADAM).learningRate(lr)

    • To configure bias learning rate: use .biasUpdater(IUpdater) method

    • To configure learning rate schedules: use .updater(new Adam(ISchedule)) and similar

  • Updater configuration via enumeration (i.e., .updater(Updater)) has been deprecated; use .updater(IUpdater)

  • .regularization(boolean) config removed; functionality is now always equivalent to .regularization(true)

  • .useDropConnect(boolean) removed; use .weightNoise(new DropConnect(double)) instead

  • .iterations(int) method has been removed (was rarely used and confusing to users)

  • Multiple utility classes (in org.deeplearning4j.util) have been deprecated and/or moved to nd4j-common. Use same class names in nd4j-common org.nd4j.util instead.

  • DataSetIterators in DL4J have been moved from deeplearning4j-nn module to new deeplearning4j-datasets, deeplearning4j-datavec-iterators and deeplearning4j-utility-iterators modules. Packages/imports are unchanged; deeplearning4j-core pulls these in as transitive dependencies hence no user changes should be required in most cases (Linkarrow-up-right)

  • Previously deprecated .activation(String) has been removed; use .activation(Activation) or .activation(IActivation) instead

  • Layer API change: Custom layers may need to implement applyConstraints(int iteration, int epoch) method

  • Parameter initializer API change: Custom parameter initializers may need to implement isWeightParam(String) and isBiasParam(String) methods

  • RBM (Restricted Boltzmann Machine) layers have been removed entirely. Consider using VariationalAutoencoder layers as a replacement (Linkarrow-up-right)

  • GravesBidirectionalLSTM has been deprecated; use new Bidirectional(Bidirectional.Mode.ADD, new GravesLSTM.Builder()....build())) instead

  • Previously deprecated WordVectorSerializer methods have now been removed (Linkarrow-up-right)

  • Removed deeplearning4j-ui-remote-iterationlisteners module and obsolete RemoteConvolutionalIterationListener (Linkarrow-up-right)

  • Unit tests for importing and checking layer weights
  • Leaky ReLU, ELU, SELU support for model import

  • All Keras layers can be imported with optional bias terms

  • Old deeplearning4j-keras module removed, old "Model" API removed

  • All Keras initializations (Lecun normal, Lecun uniform, ones, zeros, Orthogonal, VarianceScaling, Constant) supported

  • 1D convolution and pooling supported in DL4J and Keras model import

  • Atrous Convolution 1D and 2D layers supported in Keras model import

  • 1D Zero padding layers supported

  • Keras constraints module fully supported in DL4J and model import

  • Upsampling 1D and 2D layers in DL4J and Keras model import (including GAN examples in tests)

  • Most merge modes supported in Keras model import, Keras 2 Merge layer API supported

  • Separable Convolution 2D layer supported in DL4J and Keras model import

  • Deconvolution 2D layer supported in DL4J and Keras model import

  • Full support of Keras noise layers on import (Alpha dropout, Gaussian dropout and noise)

  • Support for SimpleRNN layer in Keras model import

  • Support for Bidirectional layer wrapper Keras model import

  • Addition of LastTimestepVertex in DL4J to support return_sequences=False for Keras RNN layers.

  • DL4J support for recurrent weight initializations and Keras import integration.

  • SpaceToBatch and BatchToSpace layers in DL4J for better YOLO support, plus end-to-end YOLO Keras import test.

  • Cropping2D support in DL4J and Keras model import

  • Keras Merge layers: seem to work fine with the Keras functional API, but have issues when used in a Sequential model.

  • Reshape layers: can be somewhat unreliable on import. DL4J rarely has a need to explicitly reshape input beyond (inferred) standard input preprocessors. In Keras, Reshape layers are used quite often. Mapping the two paradigms can be difficult in edge cases.

  • Apache Arrow serialization added supporting new tensor API Linkarrow-up-right
  • Add support for AVX/AVX2 and AVX-512 instruction sets for Windows/Linux for nd4j-native backend Linkarrow-up-right

  • nVidia CUDA 8/9.0/9.1 now supported

  • Worskpaces improvements were introduced to ensure safety: SCOPE_PANIC profiling mode is enabled by default

  • FlatBuffers support for INDArray serde

  • Support for auto-broadcastable operations was added

  • libnd4j, underlying c++ library, got functionality boost and now offers: NDArray class, Graph class, and can be used as standalone library or executable.

  • Convolution-related ops now support NHWC in addition to NCHW data format.

  • Accumulation ops now have option to keep reduced dimensions.

  • Graphs can run forward pass on input data and compute gradients for the backward pass.
  • Already supports many high-level layers, like dense layers, convolutions (1D-3D) deconvolutions, separable convolutions, pooling and upsampling, batch normalization, local response normalization, LSTMs and GRUs.

  • In total there are about 350 SameDiff operations available, including many basic operations used in building complex graphs.

  • Supports rudimentary import of TensorFlowarrow-up-right and ONNX graphs for inference.

  • TFOpTestsarrow-up-right is a dedicated project for creating test resources for TensorFlow import.

  • Added ArrowRecordReader (for reading Apache Arrow format data) (Linkarrow-up-right)
  • Added RecordMapper class for conversion between RecordReader and RecordWriter (Linkarrow-up-right)

  • RecordWriter and InputSplit APIs have been improved; more flexible and support for partitioning across all writers (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)

  • Added ArrowWritableRecordBatch and NDArrayRecordBatch for efficient batch storage (List<List<Writable>>) (Linkarrow-up-right, Linkarrow-up-right)

  • Added BoxImageTransform - an ImageTransform that either crops or pads without changing aspect ratio (Linkarrow-up-right)

  • TransformProcess now has executeToSequence(List<Writable)), executeSequenceToSingle(List<List<Writable>>) and executeToSequenceBatch(List<List<Writable>>) methods (Linkarrow-up-right, Linkarrow-up-right)

  • Added CSVVariableSlidingWindowRecordReader (Linkarrow-up-right)

  • ImageRecordReader: supports regression use cases for labels (previously: only classification) (Linkarrow-up-right)

  • ImageRecordReader: supports multi-class and multi-label image classification (via PathMultiLabelGenerator interface) (Linkarrow-up-right, Linkarrow-up-right)

  • DataAnalysis/AnalyzeSpark now includes quantiles (via t-digest) (Linkarrow-up-right)

  • Added AndroidNativeImageLoader.asBitmap(), Java2DNativeImageLoader.asBufferedImage() (Linkarrow-up-right)

  • Add new RecordReader / SequenceRecordReader implementations:

    • datavec-excel module and ExcelRecordReader (Linkarrow-up-right)

    • JacksonLineRecordReader (Linkarrow-up-right)

    • ConcatenatingRecordReader ()

  • Add new transforms:

    • TextToTermIndexSequenceTransform (Linkarrow-up-right)

    • ConditionalReplaceValueTransformWithDefault (Linkarrow-up-right)

    • GeographicMidpointReduction ()

  • StringToTimeTransform will con try to guess time format if format isn't provided (Linkarrow-up-right)

  • Improved performance for NativeImageLoader on Android (Linkarrow-up-right)

  • Added BytesWritable (Writable for byte[] data) (Linkarrow-up-right)

  • Added TranformProcess.inferCategories methods to auto-infer categories from a RecordReader (Linkarrow-up-right)

  • Writables: equality semantics have been changed: for example, now DoubleWritable(1.0) is equal to IntWritable(1) (Linkarrow-up-right)
  • NumberedFileInputSplit now supports leading zeros (Linkarrow-up-right)

  • CSVSparkTransformServer and ImageSparkTransformServer Play severs changed to production mode (Linkarrow-up-right)

  • Fix for JSON subtype info for FloatMetaData (Linkarrow-up-right)

  • Serialization fixes for JacksonRecordReader, RegexSequenceRecordReader (Linkarrow-up-right)

  • Added RecordReader.resetSupported() method (Linkarrow-up-right)

  • SVMLightRecordReader now implements nextRecord() method (Linkarrow-up-right)

  • Fix for custom reductions when using conditions (Linkarrow-up-right)

  • SequenceLengthAnalysis is now serializable (Linkarrow-up-right) and supports to/from JSON (Linkarrow-up-right)

  • Fixes for FFT functionality (Linkarrow-up-right, Linkarrow-up-right)

  • Remove use of backported java.util.functions; use ND4J functions API instead (Linkarrow-up-right)

  • Fix for transforms data quality analysis for time columns (Linkarrow-up-right)

  • RecordWriter and SequenceRecordWriter APIs have been updated with multiple new methods
    As per DL4J API changes: Dropout configuration is now via ParameterSpace<IDropout>, DropoutSpace introduced (Linkarrow-up-right)
  • RBM layer spaces removed (Linkarrow-up-right)

  • ComputationGraphSpace: added layer/vertex methods with overloads for preprocessors (Linkarrow-up-right)

  • Added support to specify 'fixed' layers using DL4J layers directly (instead of using LayerSpaces, even for layers without hyperparameters) (Linkarrow-up-right)

  • Added LogUniformDistribution (Linkarrow-up-right)

  • Improvements to score functions; added ROC score function (Linkarrow-up-right)

  • Learning rate schedule support added (Linkarrow-up-right)

  • Add math ops for ParameterSpace<Double> and ParameterSpace<Integer> (Linkarrow-up-right)

  • Fix threading issues when running on CUDA and multiple execution threads (Linkarrow-up-right, Linkarrow-up-right, Linkarrow-up-right)
  • Rename saved model file to model.bin (Linkarrow-up-right)

  • Fix threading issues with non thread-safe candidates / parameter spaces (Linkarrow-up-right)

  • Lombok is no longer included as a transitive dependency (Linkarrow-up-right)

  • Fix logic of HistoryProcessor with async algorithms and failures when preprocessing images

  • Tidy up and correct the output of statistics, also allowing the use of IterationListener

  • Fix issues preventing efficient execution with CUDA

  • Provide access to more of the internal structures with NeuralNet.getNeuralNetworks(), Policy.getNeuralNet(), and convenience constructors for Policy

  • Add MDPs for ALE (Arcade Learning Environment) and MALMO to support Atari games and Minecraft

  • Update MDP for Doom to allow using the latest version of VizDoom

  • Project structure is closely aligned to both DL4J model-import module and Keras.

  • Supports the following layers: Convolution2D, Dense, EmbeddingLayer, AvgPooling2D, MaxPooling2D, GravesLSTM, LSTM, Bidirectional layer wrapper, Flatten, Reshape. Additionally, DL4J OutputLayers are supported.

  • SequenceRecordReaderDataSetIterator applies preprocessors (such as normalization) twice to each DataSet (possible workaround: use RecordReaderMultiDataSetIterator + MultiDataSetWrapperIterator)
  • TransferLearning: ComputationGraph may incorrectly apply l1/l2 regularization (defined in FinetuneConfiguration) to frozen layers. Workaround: set 0.0 l1/l2 on FineTuneConfiguration, and required l1/l2 on new/non-frozen layers directly. Note that MultiLayerNetwork with TransferLearning appears to be unaffected.

  • ParallelWrapper now able to work with gradients sharing, in addition to existing parameters averaging mode Linkarrow-up-right
  • VPTree performance significantly improved

  • CacheMode network configuration option added - improved CNN and LSTM performance at the expense of additional memory use Linkarrow-up-right

  • LSTM layer added, with CuDNN support Linkarrow-up-right (Note that the existing GravesLSTM implementation does not support CuDNN)

  • New native model zoo with pretrained ImageNet, MNIST, and VGG-Face weights Linkarrow-up-right

  • Convolution performance improvements, including activation caching

  • Custom/user defined updaters are now supported Linkarrow-up-right

  • Evaluation improvements

    • EvaluationBinary, ROCBinary classes added: for evaluation of binary multi-class networks (sigmoid + xent output layers) Linkarrow-up-right

    • Evaluation and others now have G-Measure and Matthews Correlation Coefficient support; also macro + micro-averaging support for Evaluation class metrics Linkarrow-up-right

    • ComputationGraph and SparkComputationGraph evaluation convenience methods added (evaluateROC, etc)

    • ROC and ROCMultiClass support exact calculation (previous: thresholded calculation was used)

    • ROC classes now support area under precision-recall curve calculation; getting precision/recall/confusion matrix at specified thresholds (via PrecisionRecallCurve class)

    • RegressionEvaluation, ROCBinary etc now support per-output masking (in addition to per-example/per-time-step masking)

    • EvaluationCalibration added (residual plots, reliability diagrams, histogram of probabilities)

    • Evaluation and EvaluationBinary: now supports custom classification threshold or cost array

  • Optimizations: updaters, bias calculation

  • Network memory estimation functionality added. Memory requirements can be estimated from configuration without instantiating networks Link 1arrow-up-right Link 2arrow-up-right

  • New loss functions:

    • Mixture density loss function Linkarrow-up-right

    • F-Measure loss function Linkarrow-up-right

  • New distance functions added: CosineDistance, HammingDistance, JaccardDistance

    Multiple new Transform classes

    General UI improvements (additional information, formatting fixes)
    Global pooling (aka "pooling over time"; usable with both RNNs and CNNs) Linkarrow-up-right
  • Center loss output layer Linkarrow-up-right

  • 1D Convolution and subsampling layers Linkarrow-up-right Link2arrow-up-right

  • ZeroPaddingLayer Linkarrow-up-right

  • New ComputationGraph vertices

    • L2 distance vertex

    • L2 normalization vertex

  • Per-output masking is now supported for most loss functions (for per output masking, use a mask array equal in size/shape to the labels array; previous masking functionality was per-example for RNNs)

  • L1 and L2 regularization can now be configured for biases (via l1Bias and l2Bias configuration options)

  • Evaluation improvements:

    • DL4J now has an IEvaluation class (that Evaluation, RegressionEvaluation, etc all implement. Also allows custom evaluation on Spark) Linkarrow-up-right

    • Added multi-class (one vs. all) ROC: ROCMultiClass Linkarrow-up-right

    • For both MultiLayerNetwork and SparkDl4jMultiLayer: added evaluateRegression, evaluateROC, evaluateROCMultiClass convenience methods

    • HTML export functionality added for ROC charts

    • TSNE re-added to new UI

    • Training UI: now usable without an internet connection (no longer relies on externally hosted fonts)

    • UI: improvements to error handling for ‘no data’ condition

  • Epsilon configuration now used for Adam and RMSProp updaters

  • Fix for bidirectional LSTMs + variable-length time series (using masking)

  • Added CnnSentenceDataSetIterator (for use with ‘CNN for Sentence Classification’ architecture) Linkarrow-up-right Link2arrow-up-right

  • Spark + Kryo: now test serialization + throw exception if misconfigured (instead of logging an error that can be missed)

  • MultiLayerNetwork now adds default layer names if no name is specified

  • DataVec:

    • JSON/YAML support for DataAnalysis, custom Transforms etc

    • ImageRecordReader refactored to reduce garbage collection load (hence improve performance with large training sets)

    • Faster quality analysis.

  • Arbiter: added new layer types to match DL4J

    • Performance improvement for Word2Vec/ParagraphVectors tokenization & training.

  • Batched inference introduced for ParagraphVectors

  • Nd4j improvements

    • New native operations available for ND4j: firstIndex, lastIndex, remainder, fmod, or, and, xor.

    • OpProfiler NAN_PANIC & INF_PANIC now also checks result of BLAS calls.

    • Nd4.getMemoryManager() now provides methods to tweak GC behavior.

  • Alpha version of parameter server for Word2Vec/ParagraphVectors were introduced for Spark. Please note: It’s not recommended for production use yet.

  • Performance improvements for CNN inference

  • Also note: Modules with Spark 2 support are released with Scala 2.11 support only. Spark 1 modules are released with both Scala 2.10 and 2.11 support

    Keras 1D convolutional and pooling layers cannot be imported yet. Will be supported in forthcoming release.
  • Keras v2 model configurations cannot be imported yet. Will be supported in forthcoming release.

  • Configuration now via enumeration, not via String (see examples - Linkarrow-up-right)
  • Custom activation functions now supported Linkarrow-up-right

  • New activation functions added: hard sigmoid, randomized leaky rectified linear units (RReLU)

  • Multiple fixes/improvements for Keras model import

  • Added P-norm pooling for CNNs (option as part of SubsamplingLayer configuration)

  • Iteration count persistence: stored/persisted properly in model configuration + fixes to learning rate schedules for Spark network training

  • LSTM: gate activation function can now be configured (previously: hard-coded to sigmoid)

  • UI:

    • Added Chinese translation

    • Fixes for UI + pretrain layers

    • Added Java 7 compatible stats collection compatibility Linkarrow-up-right

    • Improvements in front-end for handling NaNs

    • Added UIServer.stop() method

    • Fixed score vs. iteration moving average line (with subsampling)

  • Solved Jaxb/Jackson issue with Spring Boot based applications

  • RecordReaderDataSetIterator now supports NDArrayWritable for the labels (set regression == true; used for multi-label classification + images, etc)

  • RNG performance issues fixed for CUDA backend
  • OpenBLAS issues fixed for macOS, powerpc, linux.

  • DataVec is back to Java 7 now.

  • Multiple minor bugs fixed for ND4J/DL4J

  • Supported models: Sequentialarrow-up-right models

  • Supported layersarrow-up-right: Dense, Dropout, Activation, Convolution2D, MaxPooling2D, LSTM

  • Added ‘Same’ padding more for CNNs (ConvolutionMode network configuration option) Linkarrow-up-right

  • Weighted loss functions: Loss functions now support a per-output weight array (row vector)

  • ROC and AUC added for binary classifiers Linkarrow-up-right

  • Improved error messages on invalid configuration or data; improved validation on both

  • Added metadata functionality: track source of data (file, line number, etc) from data import to evaluation. Loading a subset of examples/data from this metadata is now supported. Linkarrow-up-right

  • Removed Jackson as core dependency (shaded); users can now use any version of Jackson without issue

  • Added LossLayer: version of OutputLayer that only applies loss function (unlike OutputLayer: it has no weights/biases)

  • Functionality required to build triplet embedding model (L2 vertex, LossLayer, Stack/Unstack vertices etc)

  • Reduced DL4J and ND4J ‘cold start’ initialization/start-up time

  • Pretrain default changed to false and backprop default changed to true. No longer needed to set these when setting up a network configuration unless defaults need to be changed.

  • Added TrainingListener interface (extends IterationListener). Provides access to more information/state as network training occurs Linkarrow-up-right

  • Numerous bug fixes across DL4J and ND4J

  • Performance improvements for nd4j-native & nd4j-cuda backends

  • Standalone Word2Vec/ParagraphVectors overhaul:

    • Performance improvements

    • ParaVec inference available for both PV-DM & PV-DBOW

    • Parallel tokenization support was added, to address computation-heavy tokenizers.

  • Native RNG introduced for better reproducibility within multi-threaded execution environment.

  • Additional RNG calls added: Nd4j.choice(), and BernoulliDistribution op.

  • Off-gpu storage introduced, to keep large things, like Word2Vec model in host memory. Available via WordVectorSerializer.loadStaticModel()

  • Two new options for performance tuning on nd4j-native backend: setTADThreshold(int) & setElementThreshold(int)

  • CNNs: configuration validation is now less strict. With new ConvolutionMode option, 0.6.0 was equivalent to ‘Strict’ mode, but new default is ‘Truncate’

    • See ConvolutionMode javadoc for more details: Linkarrow-up-right

  • Xavier weight initialization change for CNNs and LSTMs: Xavier now aligns better with original Glorot paper and other libraries. Xavier weight init. equivalent to 0.6.0 is available as XAVIER_LEGACY

  • DataVec: Custom RecordReader and SequenceRecordReader classes require additional methods, for the new metadata functionality. Refer to existing record reader implementations for how to implement these methods.

  • Word2Vec/ParagraphVectors:

    • Few new builder methods:

      • allowParallelTokenization(boolean)

      • useHierarchicSoftmax(boolean)

    • Behaviour change: batchSize: now batch size is ALSO used as threshold to execute number of computational batches for sg/cbow

  • Initial support for combined operations on CUDA

  • Significant performance improvements on CPU & CUDA backends

  • Better support for Spark environments using CUDA & cuDNN with multi-gpu clusters

  • New UI tools: FlowIterationListener and ConvolutionIterationListener, for better insights of processes within NN.

  • Special IterationListener implementation for performance tracking: PerformanceListener

  • Inference implementation added for ParagraphVectors, together with option to use existing Word2Vec model

  • Severely decreased file size on the deeplearnning4j api

  • nd4j-cuda-8.0 backend is available now for cuda 8 RC

  • Added multiple new built-in loss functions

  • Custom preprocessor support

  • Performance improvements to Spark training implementation

  • Improved network configuration validation using InputType functionality

  • Normalization support for labels

  • Removal of Canova and shift to DataVec: Javadoc, Github Repoarrow-up-right

  • Numerous bug fixes

  • Spark improvements

  • Introducing DataVecarrow-up-right: Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)

  • New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator

  • New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively

  • New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs

  • Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph

  • Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint

  • Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    JsonModelServerarrow-up-right
    JsonRemoteInferencearrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    JsonModelServerarrow-up-right
    JsonRemoteInferencearrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    herearrow-up-right
    herearrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    JsonModelServerarrow-up-right
    JsonRemoteInferencearrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    this linkarrow-up-right
    this linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    datatypesarrow-up-right
    AttentionVertexarrow-up-right
    LearnedSelfAttentionLayerarrow-up-right
    RecurrentAttentionLayerarrow-up-right
    SelfAttentionLayerarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    this pagearrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    AttentionVertexarrow-up-right
    LearnedSelfAttentionLayerarrow-up-right
    RecurrentAttentionLayerarrow-up-right
    SelfAttentionLayerarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Issue 7637arrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    ObjectDetectionRecordReaderarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Link
    Linkarrow-up-right
    Linkarrow-up-right
    KerasModelImportarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    SameDiffarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    NDArrayRecordBatcharrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    ScalNet Scala APIarrow-up-right
    Sequentialarrow-up-right
    Modelarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Link
    Link 1arrow-up-right
    Link 2arrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Link 1arrow-up-right
    Link 2arrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Kerasarrow-up-right
    configurationsarrow-up-right
    stored weightsarrow-up-right
    Linkarrow-up-right
    this linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Nd4j.getMemoryManager().togglePeriodicGc(true);
    Nd4j.getMemoryManager().setAutoGcWindow(5000);
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    fake_quant_with_min_max_varsarrow-up-right
    reduce_logsumexparrow-up-right
    pow (broadcastable)arrow-up-right
    linspace (dynamic args)arrow-up-right
    ExtractImagePatchesarrow-up-right
    GELUarrow-up-right
    LSTMBlockCell, LSTMBLock, GRUCellarrow-up-right
    Standardize and LayerNorm opsarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right
    Link 1arrow-up-right
    Link 2arrow-up-right
    Linkarrow-up-right
    Linkarrow-up-right