1 of 100

EN 1.0.0-M1.1

Deeplearning4j Suite Overview

Introduction to core Deeplearning4j concepts.

Eclipse DeepLearning4J

Eclipse Deeplearning4j is a suite of tools for running deep learning on the JVM. It's the only framework that allows you to train models from java while interoperating with the python ecosystem through a mix of python execution via our cpython bindings, model import support, and interop of other runtimes such as tensorflow-java and onnxruntime.

The use cases include importing and retraining models (Pytorch, Tensorflow, Keras) models and deploying in JVM Micro service environments, mobile devices, IoT, and Apache Spark. It is a great compliment to your python environment for running models built in python, deployed to or packaged for other environments.

Deeplearning4j has several submodules including:

Samediff: a tensorflow/pytorch like framework for execution of complex graphs. This framework is lower level, but very flexible. It's also the base api for running onnx and tensorflow graphs.
Nd4j: numpy ++ for java. Contains a mix of numpy operations and tensorflow/pytorch operations.
Libnd4j: A lightweight, standalone c++ library enable math code to run on different devices. Optimizable for running on a wide variety of devices.
Python4j: A python script execution framework easing deployment of python scripts in to production.
Apache Spark Integration: An integration with the Apache Spark framework enabling execution of deep learning pipelines on spark
Datavec: A data transformation library converting raw input data to tensors suitable for running neural networks on.

How to use this website

This website follows the divio framework layout. This website has several sections of documentation following this layout. Below is an overview of the sections of the site:

Multi project contains all cross project documentation such as end to end training and other whole project related documentation. This should be the default entry point for those getting started.
Deeplearning4j contains all of the documentation related to the core deeplearning4j apis such as the multi layer network and the computation graph. Consider this the high level framework for building neural networks. If you would like something lower level like tensorflow or pytorch, consider using samediff
Samediff contains all the documentation related to the samediff submodule of ND4j. Samediff is a lower level api for building neural networks similar to pytorch or tensorflow with built in automatic differentiation.
Datavec contains all the documentation related to our data transformation library datavec.
Python4j contains all the documentation related to our cpython execution framework python4j.
Libnd4j contains all the documentation related to our underlying C++ framework libnd4j.
Apache Spark contains all of the documentation related to our Apache Spark integration.
Concepts/Theory contains all of the documentation related to general mathematical or computer science theory needed to understand various aspects of the framework.

Open Source

The libraries are completely open-source, Apache 2.0 under open governance at the Eclipse foundation. The Eclipse Deeplearning4j project welcomes all contributions. See our community and our Contribution guide to get involved.

JVM/Python/C++

Deeplearning4j can either be a compliment to your existing workflows in python and c++ or a standalone library for you to build and deploy models. Use what components you find useful.

Release Notes

1.0.0-M1.1

Highlights

A number of bug fixes following the M1 release, thanks to the feedback from the community, allowed us to quickly sort out a few issues. This is a minor bug fix release to address short comings found with M1. Most fixes were related to keras import, the cnn/rnn helpers, and python4j.

Snapshots will also be published every 2 days automatically now https://github.com/eclipse/deeplearning4j/pull/9355 to get around sonatype ossrh deletion of snapshots every 3 days. This should increase robustness of the snapshots.

Worked around an issue with github actions pre emptively upgrading visual studio breaking the cuda builds: https://github.com/eclipse/deeplearning4j/pull/9364

Added backwards compatibility for centos 6 via a new linux-x86_64-compat classifier enabling use of older glibcs on centos 7:

https://github.com/eclipse/deeplearning4j/pull/9368 https://github.com/eclipse/deeplearning4j/pull/9368 https://github.com/eclipse/deeplearning4j/pull/9373

A number of bugs were fixed with LSTM and CUDNN: https://github.com/eclipse/deeplearning4j/pull/9372

Known issues

https://github.com/eclipse/deeplearning4j/issues/9142 - avoid shuffle operations on gpu. Pre save data on cpu in mini batches. For more help, please post on the forums at https://community.konduit.ai/

Deeplearning4j

Features and Enhancements

Add batch normalization support for RNNs: https://github.com/eclipse/deeplearning4j/pull/9338
Disable old helpers by default https://github.com/eclipse/deeplearning4j/pull/9343
Minor unit test fixes: https://github.com/eclipse/deeplearning4j/pull/9346
Add keras support for cnn 1d NWHC: https://github.com/eclipse/deeplearning4j/pull/9353
Move the warning about version check to tracing so it stops logging this during normal usage confusing users: https://github.com/eclipse/deeplearning4j/pull/9356
Allow 1d convolutions to accept feed forward as input type: https://github.com/eclipse/deeplearning4j/pull/9365
Remove the old benchmark suite and migrate it to contrib: https://github.com/eclipse/deeplearning4j/pull/9374
Remove old MKLDNNLSTM helper (it never fully functioned anyways): https://github.com/eclipse/deeplearning4j/pull/9381

Bug fixes

Fixed an issue with helper reflection ensuring the classes would be loaded properly https://github.com/eclipse/deeplearning4j/pull/9333 https://github.com/eclipse/deeplearning4j/pull/9350
Fix minor workspace activation bug: https://github.com/eclipse/deeplearning4j/pull/9341
Fixed compilation error when running anything more than jdk 8 and NIO buffers: https://github.com/eclipse/deeplearning4j/pull/9351
Move logback to be a test dependency for some modules: https://github.com/eclipse/deeplearning4j/pull/9362
Keras model import fixes for GlobalPooling: https://github.com/eclipse/deeplearning4j/pull/9378 https://github.com/eclipse/deeplearning4j/pull/9384

Nd4j

Features and Enhancements

Add Eigen op as public ensuring easier use when running eigenvalue decomposition https://github.com/eclipse/deeplearning4j/pull/9328

Bug fixes

Fixes minor issue with choice(..) op https://github.com/eclipse/deeplearning4j/pull/9360 thanks to https://github.com/Romira915
Minor applyScalar typo fix: https://github.com/eclipse/deeplearning4j/pull/9385

Datavec

Features and Enhancements

Bug fixes

Fixed serialization bug with StringToTimeTransform: https://github.com/eclipse/deeplearning4j/pull/9377 thanks to community member https://github.com/yumg

Python4j

Features and Enhancements

Made python4j's python path setting more robust by migrating from set path calls to add path calls: https://github.com/eclipse/deeplearning4j/pull/9386

Bug fixes

Fixes bug with numpy import array jvm crashes: https://github.com/eclipse/deeplearning4j/pull/9348

Samediff

Features and Enhancements

Bug fixes

Fixed inconsistent conventions between SameDiffVariable getArr and getArrForName().. https://github.com/eclipse/deeplearning4j/pull/9357

1.0.0-M1

Highlights

In light of the coming 1.0, the project has decided to cut a number of modules before the final release. These modules have not had many users in the past and have created confusion for many users just trying to use a few simple apis. Many of these modules have not been maintained.

There will likely be 1 or 2 more milestone releases before the final 1.0. These should be considered checkpoints.

These modules include:

Arbiter
Jumpy
Datavec modules for video, audio, audio, sound. The computer vision datavec module
will continue to be available.
Tokenizers: The tokenizers for chinese, japanese, korean were imported from other frameworks
and not really updated.
Scalnet, Nd4s: We removed the scala modules due to the small user base. We welcome 3rd party enhancements
to the framework for syntatic sugar such as kotlin and scala. The framework's focus will be on providing
the underlying technology rather than the defacto interfaces. If there is interest in something higher level, please discuss it on community forums

ARM support: We have included armcompute modules for core convolution routines. These routines can be found here

TVM: We now support running TVM modules. Docs coming soon.

We've updated our shaded modules to newer versions to mitigate security risks. These modules include: 1. jackson 2. guava

Cuda 11: We've upgraded dl4j and associated modules to support cuda 11 and 11.2.

A more modular model import framework supporting tensorflow and onnx: 1. Model mapping procedures loadable as protobuf 2. Defining custom rules for import to work around unsupported or custom layers/operations 3. Op descriptor for all operations in nd4j

This will enable users to override model import behavior to run their own custom models. This means, in most circumstances, there will be no need to modify model import core code anymore. Instead, users will be able to provide definitions and custom rules for their graphs.

Users will be expected to convert their models in an external process. This means running standalone conversions for their models. This extends to keras import as well. Sometimes users convert their models in production directly from keras.

The workflow going forward is to ensure that your model is converted ahead of time to avoid performance issues with converting large models.

Removed ppc from nd4j-native-platform and nd4j-cuda-platform. If you need this architecture, please contact us or build from source.

Added more support for avx/mkldnn/cudnn linked acceleration in our c++ library. We now have the ability to distribute more combinations of pre compiled math kernels via different combinations of classifiers. See the ADR here for more details.

The class loader is now overridable. This is useful for OSGI and application server environments.

We've upgraded arrow to 4.0.0 enabling the associated nd4j-arrow and datavec-arrow modules to be used without netty clashes.

Deeplearning4j

Bug fixes

Improved keras model import support for NWHC as well as NCHW input formats for both rnn and cnn
Added Adabelief updater
Added maximum merge for Keras import
Keras cropping 2d validation fixes
Lenet input shape fix
Fix for obtaining the UI port from a property

Nd4j

Features and Enhancements

CTC Loss: We now have basic support for CTC loss in nd4j. This will enable the import of CTC loss based models for speech recognition as well as OCR.
tensormmul_bp now run from c++
Arm compute added for conv2d and pooling operations
Add IndexUtils containing ravelMultiIndex and unravelIndex methods
Updates sortCooolIndicesGeneric to take any datatype
Add TVM runner

Bug fixes

Python4j

Features and Enhancements

Rewritten and more stable python execution. This allows better support for multi threaded environments.

Bug fixes

Contributors: https://github.com/eclipse/deeplearning4j/issues?q=is%3Apr+author%3Amjlorenzo305

0.9.1

Deeplearning4J

Fixed issue with incorrect version dependencies in 0.9.0
Added EmnistDataSetIterator
Numerical stability improvements to LossMCXENT / LossNegativeLogLikelihood with softmax (should reduce NaNs with very large activations)

ND4J

Added runtime version checking for ND4J, DL4J, RL4J, Arbiter, DataVec

Known Issues

Deeplearning4j: Use of Evaluation class no-arg constructor (i.e., new Evaluation()) can result in accuracy/stats being reported as 0.0. Other Evaluation class constructors, and ComputationGraph/MultiLayerNetwork.evaluate(DataSetIterator) methods work as expected.
- This also impacts Spark (distributed) evaluation: workaround is to replace sparkNet.evaluate(testData); with sparkNet.doEvaluation(testData, 64, new Evaluation(10))[0];, where 10 is the number of classes and 64 in the evaluation minibatch size to use.
SequenceRecordReaderDataSetIterator applies preprocessors (such as normalization) twice to each DataSet (possible workaround: use RecordReaderMultiDataSetIterator + MultiDataSetWrapperIterator)
TransferLearning: ComputationGraph may incorrectly apply l1/l2 regularization (defined in FinetuneConfiguration) to frozen layers. Workaround: set 0.0 l1/l2 on FineTuneConfiguration, and required l1/l2 on new/non-frozen layers directly. Note that MultiLayerNetwork with TransferLearning appears to be unaffected.

0.9.0

Deeplearning4J

Workspaces feature added (faster training performance + less memory) Link
SharedTrainingMaster added for Spark network training (improved performance) Link 1, Link 2
ParallelInference added - wrapper that server inference requests using internal batching and queues Link
ParallelWrapper now able to work with gradients sharing, in addition to existing parameters averaging mode Link
VPTree performance significantly improved
CacheMode network configuration option added - improved CNN and LSTM performance at the expense of additional memory use Link
LSTM layer added, with CuDNN support Link (Note that the existing GravesLSTM implementation does not support CuDNN)
New native model zoo with pretrained ImageNet, MNIST, and VGG-Face weights Link
Convolution performance improvements, including activation caching
Custom/user defined updaters are now supported Link
Evaluation improvements
- EvaluationBinary, ROCBinary classes added: for evaluation of binary multi-class networks (sigmoid + xent output layers) Link
- Evaluation and others now have G-Measure and Matthews Correlation Coefficient support; also macro + micro-averaging support for Evaluation class metrics Link
- ComputationGraph and SparkComputationGraph evaluation convenience methods added (evaluateROC, etc)
- ROC and ROCMultiClass support exact calculation (previous: thresholded calculation was used) Link
- ROC classes now support area under precision-recall curve calculation; getting precision/recall/confusion matrix at specified thresholds (via PrecisionRecallCurve class) Link
- RegressionEvaluation, ROCBinary etc now support per-output masking (in addition to per-example/per-time-step masking)
- EvaluationCalibration added (residual plots, reliability diagrams, histogram of probabilities) Link 1 Link 2
- Evaluation and EvaluationBinary: now supports custom classification threshold or cost array Link
Optimizations: updaters, bias calculation
Network memory estimation functionality added. Memory requirements can be estimated from configuration without instantiating networks Link 1 Link 2
New loss functions:
- Mixture density loss function Link
- F-Measure loss function Link

ND4J

Workspaces feature added Link
Native parallel sort was added
New ops added: SELU/SELUDerivative, TAD-based comparisons, percentile/median, Reverse, Tan/TanDerivative, SinH, CosH, Entropy, ShannonEntropy, LogEntropy, AbsoluteMin/AbsoluteMax/AbsoluteSum, Atan2
New distance functions added: CosineDistance, HammingDistance, JaccardDistance

DataVec

MapFileRecordReader and MapFileSequenceRecordReader added Link 1 Link 2
Spark: Utilities to save and load JavaRDD<List<Writable>> and JavaRDD<List<List<Writable>> data to Hadoop MapFile and SequenceFile formats Link
TransformProcess and Transforms now support NDArrayWritables and NDArrayWritable columns
Multiple new Transform classes

Arbiter

Arbiter UI: Link
- UI now uses Play framework, integrates with DL4J UI (replaces Dropwizard backend). Dependency issues/clashing versions fixed.
- Supports DL4J StatsStorage and StatsStorageRouter mechanisms (FileStatsStorage, Remote UI via RemoveUIStatsStorageRouter)
- General UI improvements (additional information, formatting fixes)

0.7.2

Added variational autoencoder Link
Activation function refactor
- Activation functions are now an interface Link
- Configuration now via enumeration, not via String (see examples - Link)
- Custom activation functions now supported Link
- New activation functions added: hard sigmoid, randomized leaky rectified linear units (RReLU)
Multiple fixes/improvements for Keras model import
Added P-norm pooling for CNNs (option as part of SubsamplingLayer configuration)
Iteration count persistence: stored/persisted properly in model configuration + fixes to learning rate schedules for Spark network training
LSTM: gate activation function can now be configured (previously: hard-coded to sigmoid)
UI:
- Added Chinese translation
- Fixes for UI + pretrain layers
- Added Java 7 compatible stats collection compatibility Link
- Improvements in front-end for handling NaNs
- Added UIServer.stop() method
- Fixed score vs. iteration moving average line (with subsampling)
Solved Jaxb/Jackson issue with Spring Boot based applications
RecordReaderDataSetIterator now supports NDArrayWritable for the labels (set regression == true; used for multi-label classification + images, etc)

0.7.1 -> 0.7.2 Transition Notes

Activation functions (built-in): now specified using Activation enumeration, not String (String-based configuration has been deprecated)

0.7.1

RBM and AutoEncoder key fixes:
- Ensured visual bias updated and applied during pretraining.
- RBM HiddenUnit is the activation function for this layer; thus, established derivative calculations for backprop according to respective HiddenUnit.
RNG performance issues fixed for CUDA backend
OpenBLAS issues fixed for macOS, powerpc, linux.
DataVec is back to Java 7 now.
Multiple minor bugs fixed for ND4J/DL4J

0.6.0

Custom layer support
Support for custom loss functions
Support for compressed INDArrays, for memory saving on huge data
Native support for BooleanIndexing where applicable
Initial support for combined operations on CUDA
Significant performance improvements on CPU & CUDA backends
Better support for Spark environments using CUDA & cuDNN with multi-gpu clusters
New UI tools: FlowIterationListener and ConvolutionIterationListener, for better insights of processes within NN.
Special IterationListener implementation for performance tracking: PerformanceListener
Inference implementation added for ParagraphVectors, together with option to use existing Word2Vec model
Severely decreased file size on the deeplearnning4j api
nd4j-cuda-8.0 backend is available now for cuda 8 RC
Added multiple new built-in loss functions
Custom preprocessor support
Performance improvements to Spark training implementation
Improved network configuration validation using InputType functionality

0.5.0

FP16 support for CUDA
Better performance for multi-gpu
Including optional P2P memory access support
Normalization support for time series and images
Normalization support for labels
Removal of Canova and shift to DataVec: Javadoc, Github Repo
Numerous bug fixes
Spark improvements

0.4.0

Initial multi-GPU support viable for standalone and Spark.
Refactored the Spark API significantly
Added CuDNN wrapper
Performance improvements for ND4J
Introducing : Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)
New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator
New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively
New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs
Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph
Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint

Multi-Project

Tutorials

How To Guides

Import in to your favorite IDE

Pre requisites

Ensure that you clone the deeplearning4j project locally.

Before importing the project, a few things of note no matter what IDE you use:

One submodule (libnd4j) is a c++ project that uses maven to invoke a cmake build. You may wish to edit libnd4j separately in a cmake oriented IDE like VS Code, Clion, or Eclipse c/c++. In order to build a particular nd4j backend, libnd4j should already be compiled. By default, relevant nd4j backends all look for a pre compiled libnd4j in the libnd4j directory included within the same project.
Maven profiles for deeplearning4j matter a lot. Especially if you want to run tests. Read more on the test profiles . For most code nd4j-tests-cpu should probably be the main profile you use.
Deeplearning4j uses lombok for its dependencies. Ensure you install lombok for your favorite IDE in order to use the project. Please follow the for setting this up in your IDE.

Intellij

Once cloned locally, open intellij. Please follow the guide to import from external maven sources.

Once imported, please give the project time to download associated dependencies. You can verify the status of the project in the bottom right corner.

In order to enable the project to work, the following modifications need to be made.

Shaded modules

Eclipse Deeplearning4j has a set of shaded modules. Shaded modules are artifacts that re namespace a dependency to a different location in order to use it as a set of private dependencies that do not clash with other libraries that may also share the dependency.

Intellij does not handle this very well. In order to work around this, you need to exclude all projects under the nd4j/nd4j-shade folder individually. Right click on each folder. Go to Maven -> Ignore Projects.

Assuming you follow the other steps above (lombok,libdn4j,..) then you should be able to run any module you want.

Eclipse

Note: for now the latest version of eclipse appears to fail upon first import. Any suggestions maybe reported on the .

Once cloned locally, open eclipse. Please follow the guide to import from external maven sources. Importing your project in to eclipse may take a while. Of note is due to the profile sensitive nature of the deeplearning4j suite, there maybe issues when opening and building the project.

When first finishing import of the project, a number of maven connector errors should be highlighted. Afterwards, just click resolve all later and finish. Let eclipse finish downloading sources and javadoc.

As of the latest version of eclipse, build errors may occur.

Contribute

How to contribute to the Eclipse Deeplearning4j source code.

Prerequisites

Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j . These include:

DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.
ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.
DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.
RL4J: Reinforcement Learning for Java. This set of libraries contains the ability to do reinforcement learning built on the deeplearning4j library.
Samediff: Built within the nd4j library, this library contains a tensorflow/pytorch like library for building data flow graphs.

We also have an extensive examples repository at .

Ways to contribute

There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:

Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)
Add a new training feature
Bug fixes
DL4J examples: Is there an application or network architecture that we don’t have examples for?
Testing performance and identifying bottlenecks or areas to improve
Improve website documentation (or write tutorials, etc)
Improve the JavaDocs

There are a number of different ways to find things to work on. These include:

Looking at the issue trackers:
Reviewing our Roadmap
Talking to the developers on the
Reviewing recent papers and blog posts on training features, network architectures and applications
Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?

General guidelines

Before you dive in, there’s a few things you need to know. In particular, the tools we use:

Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.
Git: the version control system we use
Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE
VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.
IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.

Things to keep in mind:

Code should be Java 7 compliant
If you are adding a new method or class: add JavaDocs
You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)
Provide informative comments throughout your code. This helps to keep all code maintainable.
Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.
If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct
If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.
If you are unsure about something - ask us on the !

Eclipse Contributors

IP/Copyright requirements for Eclipse Foundation Projects

This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository:

Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:

Sign the Eclipse Contributor Agreement (once)
Sign commits (each time)

Why Is This Required?

These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here:

By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.

By signing your commits, you are asserting that the code in that particular commit is your own.

Signing the Eclipse Contributor Agreement

You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:

Step 1: Sign up for an Eclipse account

This can be done at

Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).

Step 2: Sign the ECA

Go to and follow the instructions.

Signing Your Commits

Signing a New Commit

There are a few ways to sign commits. Note that you can use any of these aoptions.

Option 1: Use -s When Committing on Command Line

Signing commits here is simple:

Note the use of -s (lower case s) - upper-case S (i.e., -S) is for GPG signing (see below).

Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing

For example, you could set up the following alias in Bash:

Then committing would be done with the following:

For Windows command line, similar options are available through a few mechanisms (see )

One simple way is to create a gcm.bat file with the following contents, and add it to your system path:

You can then commit using the same process as above (i.e., gcm "My Commit")

Option 3: Use GPG Signing

For details on GPG signing, see

Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m' - note the upper case -S for GPG signing.

Option 4: Commit using IntelliJ with Auto Signing

IntelliJ can be used to perform git commits, including through signed commits. See for details.

Checking If A Commit Is Signed

After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1 to show the signature for the last commit (use -5 to show the last 5 commits, for example)

The output will look like:

The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by).

If You Forget to Sign a Commit - Amending the Last Commit

If you forgot to sign the last commit, you can use the following command:

If You Forget to Sign Multiple Commits

Suppose your branch has 3 new commits, all of which are unsigned:

One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)

The result:

You can confirm that the commit is signed using git log -1 --show-signature as shown earlier.

Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.

If you are updating an existing PR, you may need to force push using -f (as in git push X -f).

Developer Docs

Github Actions/Build Infra

Github actions Configuration Overview

Overview of a Github Actions Configuration

Each has 10 parameters for manually invoking builds. The reason this is manual is due to the different ways a release can break. Being manual also allows us to re invoke only the parts of a build we need, rather than the whole release pipeline.

Most workflows implement a matrix structure for handling different combinations of builds related to the following: 1. Platform specific optimizations: On windows/linux/mac we allow cpu + optional linking against mkldnn. Each combination is enumerated and ran as part of a matrix build on github actions.

Cuda, optional cudnn: We also allow optional linking against cudnn for gpu routines.

Input parameters:

buildThreads: This is the number of builds threads used for compilation in linbnd4j. This is the equivalent of make -j. For specific platforms that use more memory, 1 is the recommended value. On self hosted setups, you may use more threads to make builds run faster.
deployToReleaseStaging: 0 or 1. If 1, this will create a staging repository on oss sonatype. Otherwise, it will deploy to ossrh snapshots. Snapshots is the default.
releaseVersion: This is the intended release version to be converted to from snapshots. The script is run converting the versions of every module to that specific version intended for release. This is what will get uploaded to a staging repository for release. Otherwise, all intended versions should be SNAPSHOT.
snapshotVersion: The current in development snapshot version
releaseRepoId: If blank, then a new staging repository for a version is created. Otherwise, a staging repository id should be obtained from the ossrh nexus sonatype. This releaseRepoId should be passed to subsequent builds so all of the artifacts associated with a version get propagated to 1 place.
serverId: This should be ossrh 90% of the time. A github profile is also available for use with github actions.
modules: The maven modules to build. This is fairly raw and error prone. The intended usage is with the Typical usage is to skip libnd4j builds with something like:
to skip a libnd4j compile. This can speed builds up significantly.
libnd4jDownload/libnd4jUrl: In tandem with modules, you can specify a libnd4j zip file distribution that was compiled before for download. The builds will download a libnd4j distribution and use that for linking. This can be handy when recompiling the nd4j-native/nd4j-cuda backends for a specific platform without needing to recompile the whole c++ codebase. A url in a matrix build will be sourced from a hard coded file name from - each file name will be updated to point to a zip file distribution appropriate for an individual matrix build. This was done because 1 url is not going to be suitable for individual matrix builds.
runsOn: This is the operating system upon which to run the build. For linux, this defaults to ubuntu-16.04. For windows, windows-2019. self-hosted can also be specified for faster builds.

Matrix builds

Many configurations on cpu and cuda require a matrix based build structure to capture the various combinations of optimization and software versions people may want to use. In order to accomodate these workflows, we need to attach variables proxying the values of the manual inputs to the individual matrix workers themselves. These parameters are analogous to the above described parameters. Note we will not repeat the descriptions here, but the values can be seen from their values in the form of $ where SOME_VALUE is one of the values above.

The configuration to look for is as follows:

Expected timings

CUDA: Most cuda builds take 4-5 hours. Both windows and linux on GH actions just download the cuda distribution and compile things on their respective platforms.
CPU builds: From scratch libnd4j + cpu builds typically take 1-2 hours max. Anything more than that, your build may have something wrong.

Build error causes

Out of disk: It is very common for a github actions VM to run out of disk. If a build fails with no logs after and all steps terminated, this maybe one of the reasons.
Out of memory: Sometimes builds run out of memory. A few common causes include:
- Clang out of memory on android, depending on the number of builds threads assigned, it is easy for clang to run out of memory
- Maven javadoc: The maven javadoc plugin for bigger projects can use a ton of ram and crash a job
Network failures: Maven can sometimes (rarely) fail to download certain dependencies in the middle of a job

Environment variables:

MAVEN_GPG_KEY: The maven gpg key secret for a release
CROSS_COMPILER_DIR: For the pi_build.sh script in libnd4j. This contains the root directory
for cross compiler invocation. We need this because all cross compilation for various libnd4j builds happens
on x86. We cross compile for speed reasons also easily allowing us to run on github actions.
Debian frontend: This is to ensure that all debian commands by default don't prompt for yes/no
GITHUB_TOKEN: This is for authentication with github actions
BUILD_USING_MAVEN: This is for pi_build.sh. This toggles (0 or 1) whether to use maven or buildnativeoperation.sh
in the libnd4j root directory directly.
NDK_VERSION: Default is r21d. Libnd4j's android is compiled with the android r21 currently.
CURRENT_TARGET: This variable is for pi_build.sh. It tells pi_build.sh which architecture to build for.
PUBLISH_TO: The repo to publish to for releases or snapshots. Valid values are github or ossrh.
These are repositories defined in the deeplearning4j root pom.
OPENBLAS_PATH: We compile libnd4j against openblas for several different cpus. Openblas is manually downloaded and linked against.
This specifies the path to the download for the libnd4j cmake invocation.
MAVEN_USERNAME: The user name to login to for the ossrh maven repository
MAVEN_PASSWORD: The password to login to for the ossrh maven repository
MAVEN_GPG_PASSPHRSE: The gpg password for signing artifacts for uploading to maven central
DEPLOY_TO> Valid values are either ossrh or github.
LIBND4J_BUILD_THREADS: This is the equivalent of make -j. It specifies the number of threads
that should be used to compile libnd4j
PERFORM_RELEASE: Whether to perform a release or not (0 or 1)
RELEASE_VERSION: The version to be released to maven central. change-versions.sh will be run
to change versions throughout the code base from the snapshot verison to the intended release version.
SNAPSHOT_VERSION: The current snapshot version to be changed when performing a release.
After a release is conducted, this should generally be the next development version.
RELEASE_REPO_ID: Leave this empty when first creating a release repository in combination with
DEPLOY set to 1. Afterwards, note which staging repository id gets created in the ossrh interface when publishing
to maven central. Use that id for further buidls to ensure that all uploads for 1 version are synchronized to 1 staging repository.
MODULES: Extra maven flags for pi_build.sh if more flags are needed (such as for debugging or only building specific modules)
LIBND4J_URL: Used when building nd4j-native. If a user does not want to recompile libnd4j for their particular build, you can instead
skip this step and specify a libnd4j zip file download (generally built with the maven assembly plugin)

Javacpp

DL4J and Javacpp

DL4J and Javacpp overview

DL4J heavily depends on for its interop between java and platform optimized c++ libraries. However, due to our usage of JNI this comes with certain complexities in the build anyone should be aware of.

The following modules rely on javacpp as part of their build process: 1. nd4j-native 2. nd4j-native-presets 3. nd4j-cuda 4. nd4j-cuda-presets

Each of these libraries are what comprise our nd4j backends. Leveraging [libnd4j], javacpp handles linking each nd4j-backend against the libnd4j c++ codebase. This linking is done using a libnd4j home. This will contain all of the include files and necessary binary files for specific platforms. By default, nd4j backends and the libnd4j code base are compiled within the same build step. This is the recommended default, but for specific circumstances. A libnd4j release is also uploaded to maven central as a zip file and can be used in place of libnd4j compilation. See our for more information on this.

Each backend consists of 2 modules

The codebase: This represents the actual nd4j backend logic for specific platforms. Conceptually, this logic will be anything that a developer should need to control such as memory management, environment variables, or other execution logic.
The presets: This is a similar concept in spirit to the In order to avoid a race condition between the backend and the presets compilation, this is a separate dependency that just exists to handle interop between the libnd4j code base and the java frontend. The above backend then contains the rest of the logic needed for execution of the math operations on specific platforms.

Compilation flow

After a libnd4j build is executed for a specific platform, we need to leverage javacpp to actually link against libnd4j to create a complete libnd4j backend. When invoking a maven build, the is used to actually invoke a build. The presets will be compiled first. Generally the presets are just 1 or 2 classes containing a description of how to map the actual nd4j code base to the libnd4j codebase.

Next, the actual backend is compiled with a dependency on the above presets code base. The javacpp plugin will leverage the description from the presets we specify as a dependency and facilitate linking against a LIBND4J_HOME (a folder which contains the platform specific libnd4j binaries and include sources) specified by the user. In the actual plugin declaration on the backend pom.xml we include the target presets class to use for our particular backend.

Note: This still requires the native platform specific tools to be installed since binaries are generated for each platform. Please see our github actions for instructions on specific platforms.

-platform dependencies

Nd4j reuses javacpp's notion of a -platform library. This is a curated set of dependencies most users will use as part of a build. Each backend will have an associated -platform artifact so users don't have to deal with maven classifiers. See for how to leverage this artifact.

Caution to users: By default, this means that a large number of dependencies for all platforms will be included. If you do not need dependencies for all platforms, then please read the above documentation to figure out how to build a jar for your specific platform.

Generally, the main thing to know is when you build your application, use:

A comprehensive list of classifiers can be found Note that each library we link against such as will also have a similar set of classifiers.

Javacpp platform specific profiles

Throughout the dl4j pom.xml files, platform specific profiles that setup dependencies exist. An can be found here. This helps us dynamically figure out which platform someone is building for.

Running javacpp on termux + android/lineagos

A testing setup the team uses for testing android involves lineageos, termux, and some arm32 based open jdk debian files that can be found

In order to bootstrap this environment, a from scratch install of the latest lineageos flashed on an sd card using the raspberry pi is suggested.

Afterwards, install

In order to properly setup the test environment,

you need to execute your test from the command line as follows:

A proper execution environment after the above jdk is installed involves manually setting the environment as follows:

This will setup the jdk + maven to ignore ssl errors due to issues with cacerts + termux. This is largely irrelevant for our small testing use case, but not recommended for production environments.

Redist artifacts

Redist artifacts are easy ways of distributing dependencies without installation.

Note that for the presets that are part of nd4j (nd4j-cuda-presets and nd4j-native-presets) only the latest versions support redist artifacts. The presets preload versions only support pre loading (eg: linking against libraries from the javacpp cache) against the latest version. This is because during pre loading, certain version numbers are checked for.

Release

How to conduct a release to Maven Central

Deeplearning4j has several steps to a release. Below is a brief outline with follow on descriptions.

Compile libnd4j for different cpu architectures
Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date
Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu
Create a staging repository for testing using github actions running manually on each platform
Update the examples to be compatible with the latest release
Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)
to sanity check platform specific numerical bugs using the staging repository
Double check any user related bugs to see if they should block a release
Hit release button
Perform follow up release of -platform projects under same version
Tag release

Compile libnd4j on different cpu architectures

Compiling libnd4j on different cpu architectures ensures there is platform optimized math in c++ for each platform. The is a self contained cmake project that can be run on different platforms. In each there are steps for deploying for each platform.

At the core of compiling from source for libnd4j is a maven pom.xml that is run as part of the overall build process that invokes our with various parameters that then get passed to our overall cmake structure for compilation. This script exists to formalize some of the required parameters for invokving cmake. Any developer is welcome to invoke cmake directly.

Platform compatibility
We currently compile libnd4j on ubuntu 16.04. This means glibc 2.23.
For our cuda builds, we use gcc7.
Users of older glibc versions may need to compile from source. For our standard release, we try to keep it reasonably old, but do not support end of lifed
end of linux distributions for public builds.
Platform specific helpers

Each build of libnd4j links against an accelerated backend for and convolution operations such as , , or The implementations for each platform can be found

Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date

This is a step that just ensures that the dl4j release matches the current state of the dependencies provided by javacpp on maven central. This affects every module including python4j, nd4j-native/cuda, datavec-image, among others. The versions of everything can be found in the top level The general convention is library version followed by a - and the version of javacpp that that version uses.

Of note here is that certain older versions of libraries can use older javacpp versions. It is recommended that that the desired version be up to date if possible. Otherwise, if an older version of javacpp is the only version available, this is generally ok.

Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu

We run all of the major integration tests on the core major platforms where higher end compute is accessible. This is generally a bigger machine. It is expected that some builds can take up to 2 hours depending on the specs of the desired machine.

This step may also involve invoking tests with specific tags if only running a subset of tests is desired. This can be achived using the -Dgroups flag.

Update the examples to be compatible with the latest release

To ensure the examples stay compatible with the current release, we also tag the release version to be the latest version found on maven central. This step may also involve adding or removing examples for new or deprecated features respectivley.

Ensure different classifiers work

Different supported cuda versions with and without cudnn
Onednn and associated classifiers per platform

Android

Ensure testing happens on the android emulator.

Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)

The examples contain a set of tests which just allow us to run maven clean test on a small number of examples. Instead of us picking examples manually, we can just run mvn clean test on any platform we need by just specifying a version of dl4j to depend on and usually a

Generally, sometimes users will raise issues right before a release that can be critical. It is the sole discretion of the maintainers to ask the user to use snapshots or to wait for a follow on version. For certain fixes, we will publish quick bugfix releases. If your team has specific requirements on a release, please contact us on the

Hit release button

This means after , hitting the release button initiating a sync of the staging repository with the desired version to maven central. Sync usually takes 2 hours or less.

Ensure a tag exists

After a release happens, a version update to the stable version + a github tag needs to happen. This is achived in the desktop app by going to: 1. History 2. Right click on target commit you want to tag 3. Click tag 4. Push the revision 5. Update the version back to snapshot after tag.

Testing

How to conduct a release to Maven Central

Parameters for testing

test.heap.size: The heap size used for maven surefire plugin sub processes
test.offheap.size: The off heap size used for maven surefire sub processes. This is very important for
configuration (especially on gpu systems)

Test resources

In order to run the deeplearning4j tests, many pretrained models and other resources are required. Ensure as a dependency on your classpath. It is a big repository that needs to be mvn clean installed in order to run the tests properly. You can do this by adding -Ptestresources to your test execution when running the tests from maven.

Test profiles for enabling nd4j backends

When running deeplearning4j's tests, there are 2 main profiles to be aware of: nd4j-tests-cpu and nd4j-tests-cuda. These each enable running cpu or gpu tests respectively across the whole code base. Please ensure one of these is selected when running tests.

testresources: Used to add the test resources used for nd4j.

Test categories

Deeplearning4j uses' junit 5's tags to categorize tests in to different types. All of the tag names used throughout the code base can be found Nd4j-common-tests is included as a dependency for all tests and has a few reusable utilities used throughout the code base for tests. This makes it a great location to put common utilities we want to use throughout the code base. The tag names are mainly there to categorize tests that can take longer or use more resources so we can avoid running those dynamically depending on the size of the machine we are running tests on.

GPUs and multi threaded boxes

Note when running gpu tests on a box with more than 1 gpu, it can/will run out of memory if test.heap.size is at not at least 4g. Also of note, is when running tests

Build From Source

Instructions to build all DL4J libraries from source.

A reference for building dl4j from source can be found for every platform in our workflows. For maintenance reasons, we would prefer to have a canonical source of up to date build information for users rather than out of date install instructions in this guide. This guide will contain specific long lived tips for how to interpret the workflows and what to consider when building.

For an overview of the GitHub actions workflows see the overview doc

This document will cover the specific components of the build by platform rather than step through what's already in the workflows. If you have suggestions for improving this document, please comment over at the community forums

Core steps:

Building libnd4j for your specific platform
Linking the nd4j backend you want to compile for against libnd4j via JavaCPP
Compiling the rest of the code in to jar files

Key concepts

Libnd4j is a CMake based c++ project that supports running optimized math code on different architectures. Its sole focus is being a tiny self contained library for running math kernels. It can link against optimized BLAS routines, platform specific CNN libraries such as OneDNN and CuDNN, and contains hundreds of math kernels for implementing neural networks and other math routines.
Maven: Maven is the core build tool for deeplearning4j. Understanding maven is key to building deeplearning4j from source
Maven and CMake: For compiling libnd4j, we invoke a buildnativeoperations.sh wrapper script via maven. buildnativeoperations.sh in turn automatically sets up CMake to then build the c++ project
pi_build.sh: This is our build script for embedded and ARM based platforms. It focuses on cross compilation running on a Linux x86 based platform.
buildnativeoperations.sh: The main build script for libnd4j. It initializes CMake and invokes CMake compilation for the user on whatever platform the user is currently on unless the user specifies an alternative platform. Specifying a different platform is possible for android for example.

Building for x86_64

The main considerations for building on x86_64 are:

Whether to compile for avx2 or avx512
Whether to use OpenBLAS or MKL
Whether to link against OneDNN

From there, the normal platform specific libraries should be installed before hand. Up to date install instructions can be found in our CPU builds for Windows, Mac and Linux

Building for ARM

ARM based builds all link against the armcompute library by default and, as mentioned above, use the pi_build.sh script for building libnd4j on specific platforms. Note that pi_build.sh can also be used to compile all of dl4j for a specific project.

pi_build.sh mainly focuses on cross compilation.

In order to properly use the pi_build.sh script, a number of environment variables should be set. Per platform, you can find these environment variables in the final build step under the environment section.

If you would like to compile deeplearning4j on an actual ARM device, please use the normal buildnativeoperations.sh workflow.

Building for CUDA

In order to compile deeplearning4j for a particular version, you must first invoke change-cuda-versions.sh in the root directory:

./change-cuda-versions.sh $YOUR_CUDA_VERSION

This will ensure that all library versions are set to the appropriate version. Ensure that the CUDA toolkit you need is installed. If you intend on using CuDNN, ensure that is also installed correctly. For installing CUDA, consider using our install scripts as a reference if you intend on doing automated installs.

Jetson nano users: please see this thread for successfully compiling deeplearning4j on Jetson nano.

In short: It relies on CUDA 10.0. The JavaCPP presets for CUDA are also only compiled for arm64 for CUDA 10.0. You can find the supported CUDA versions for CUDA 10.0 here If you would like something more up to date, please feel free to contact us over at our forums As of 1.0.0-M1.1 you can also use updated dependencies:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

Note for windows users

We use msys2 for compiling libnd4j. CUDA requires MSVC in order to be installed in order to properly compile CUDA kernels. If you want to compile libnd4j for CUDA from source, please ensure you first invoke the vcvars.bat script in a cmd terminal, then launch msys2 manually. For more specifics, please see our Windows CUDA 11 and 11.2 build files.

Reference

Examples Tour

Brief tour of available examples in DL4J.

Deeplearning4J has a wealth of examples of how to use its many parts. You can find the examples in the .

Prerequisites

The consists of several separate Maven Java projects, each with their own pom files. Maven is a popular build automation tool for Java Projects. The contents of a "pom.xml" file dictate the configurations. Read more about how to configure Maven .

Users can also refer to the to get started with a clean project from scratch.

Build tools are considered standard software engineering best practice. Besides this the complexities posed by the projects in the DL4J ecosystem make dependencies too difficult to manage manually. All the projects in the DL4J ecosystem can be used with other build tools like Gradle, SBT etc. More information on that can be found .

Example Content

Projects are based on what functionality the included examples demonstrate to the user and not necessarily which library in the DL4J stack the functionality lives in.

Examples in a project are in general separated into "quickstart" and "advanced".

Each project README also lists all the examples it contains, with a recommended order to explore them in.

This project contains a set of examples that demonstrate use of the high level DL4J API to build a variety of neural networks. Some of these examples are end to end, in the sense they start with raw data, process it and then build and train neural networks on it.
This project contains a set of examples that demonstrate how to import Keras h5 models and TensorFlow frozen pb models into the DL4J ecosystem. Once imported into DL4J these models can be treated like any other DL4J model - meaning you can continue to run training on them or modify them with the transfer learning API or simply run inference on them.
This project contains a set of examples that demonstrate how to do distributed training, inference and evaluation in DL4J on Apache Spark. DL4J distributed training employs a "hybrid" asynchronous SGD approach - further details can be found in the distributed deep learning documentation
This project contains a set of examples that demonstrate how to leverage multiple GPUs for data-parallel training of neural networks for increased performance.
This project contains a set of examples that demonstrate the SameDiff API. SameDiff (which is part of the ND4J library) can be used to build lower level auto-differentiating computation graphs. An analogue to the SameDiff API vs the DL4J API is the low level TensorFlow API vs the higher level of abstraction Keras API.
This project contains a set of examples that demonstrate how raw data in various formats can be loaded, split and preprocessed to build serializable (and hence reproducible) ETL pipelines.
This project contains a set of examples that demonstrate how to manipulate NDArrays. The functionality of ND4J demonstrated here can be likened to NumPy.
This project contains a set of examples that demonstrate usage of the Arbiter library for hyperparameter tuning of Deeplearning4J neural networks.
This project contains examples of using RL4J, the reinforcement learning library in DL4J.
This project contains an Android example project, that shows DL4J being used in an Android application.

Feedback & Contributions

While these set of examples don't cover all the features available in DL4J the intent is to cover functionality required for most users - beginners and advanced. File an issue if you have feedback or feature requests that are not covered here. We are also available via our for questions. We welcome contributions from the community. More information can be found We love hearing from you. Cheers!

Explanation

Configuration

Backends

Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.

ND4J works atop so-called backends, or linear-algebra libraries, such as Native nd4j-native and nd4j-cuda-10.2 (GPUs), which you can select by pasting the right dependency into your project’s POM.xml file.

ND4J backends for GPUs and CPUs

You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.

If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:

<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-cuda-11.2</artifactId>
 <version>1.0.0-M1.1</version>
</dependency>

As of now, the artifactId for the CUDA versions can be one of nd4j-cuda-11.0,nd4j-cuda-11.2. Generally, the last 2 cuda versions are supported for a given release.

You can also find the available CUDA versions via Maven Central search or in the Release Notes.

Otherwise you will need to use the native implementation of ND4J as a CPU backend:

<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-native</artifactId>
 <version>1.0.0-M1.1</version>
</dependency>

Building for Multiple Operating Systems

If you are developing your project on multiple operating systems/system architectures, you can add -platform to the end of your artifactId which will download binaries for most major systems.

<dependency>
 ...
 <artifactId>nd4j-native-platform</artifactId>
 ...
</dependency>

Bundling multiple Backends

For enabling different backends at runtime, you set the priority with your environment via the environment variable

BACKEND_PRIORITY_CPU=SOME_NUM
BACKEND_PRIORITY_GPU=SOME_NUM

Relative to the priority, it will allow you to dynamically set the backend type.

CuDNN

See our page on CuDNN.

CUDA Installation

Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA website.

Troubleshooting

Nd4jBackend$NoAvailableBackendException

 org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
    at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:221)
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5091)
    ... 2 more

There are multiple reasons why you might run into this error message.

You haven't configured an ND4J backend at all.
You have a jar file that doesn't contain a backend for your platform.
You have a jar file that doesn't contain service loader files.

You haven't configured any ND4J Backend

Read this page and add a ND4J Backend to your dependencies:

You have a jar file that doesn't contain a backend for your platform.

This happens when you use a non -platform type backend dependency definition. In this case, only the Backend for the system that the jar file was built on will be included.

To solve this issue, use nd4j-native-platform instead of nd4j-native, if you are running on CPU and nd4j-cuda-11.2-platform instead of nd4j-cuda-11.2 when using the GPU backend.

If the jar file only contains the GPU backend, but your system has no CUDA capable (CC >= 3.5) GPU or CUDA isn't installed on the system, the CPU Backend should be used instead.

You have a jar file that doesn't contain service loader files.

ND4J uses the Java ServiceLoader in order to detect which backends are available on the class path. Depending on your uberjar packaging configuration, those files might be stripped away or broken.

To double check that the required files are included, open your uberjar and make sure it contains /META-INF/services/org.nd4j.linalg.factory.Nd4jBackend. Then open the file, and make sure there are entries for all of your configured backends.

If your uberjar does not contain that file, or if not all of the configured backends are listed there, you will have to reconfigure your shade plugin. See ServicesResourceTransformer documentation for how to do that.

CPU

CPU and AVX support in ND4J/Deeplearning4j

What is AVX, and why does it matter?

AVX (Advanced Vector Extensions) is a set of CPU instructions for accelerating numerical computations. See for more details.

Note that AVX only applies to nd4j-native (CPU) backend for x86 devices, not GPUs and not ARM/PPC devices.

Why AVX matters: performance. You want to use the version of ND4J compiled with the highest level of AVX supported by your system.

AVX support for different CPUs - summary:

Most modern x86 CPUs: AVX2 is supported
Some high-end server CPUs: AVX512 may be supported
Old CPUs (pre 2012) and low power x86 (Atom, Celeron): No AVX support (usually)

Note that CPUs supporting later versions of AVX include all earlier versions also. This means it's possible run a generic x86 or AVX2 binary on a system supporting AVX512. However it is not possible to run binaries built for later versions (such as avx512) on a CPU that doesn't have support for those instructions.

In version 1.0.0-beta6 and later you may get a warning as follows, if AVX is not configured optimally:

This warning has been removed in more recent versions as it's more confusing to users and out of date.

Configure mkl usage

When using the nd4j-native backend on intel platforms, our openblas bindings give the ability to also use mkl instead. In order to use mkl, set the system property as follows eitehr on launch or before Nd4j is initialized with Nd4j.create():

Configuring AVX in ND4J/DL4J

As noted earlier, for best performance you should use the version of ND4J that matches your CPU's supported AVX level.

ND4J defaults configuration (when just including the nd4j-native or nd4j-native-platform dependencies without maven classifier configuration) is "generic x86" (no AVX) for nd4j/nd4j-platform dependencies.

To configure AVX2 and AVX512, you need to specify a classifier for the appropriate architecture.

The following binaries (nd4j-native classifiers) are provided for x86 architectures:

Generic x86 (no AVX): linux-x86_64, windows-x86_64, macosx-x86_64
AVX2: linux-x86_64-avx2, windows-x86_64-avx2, macosx-x86_64-avx2
AVX512: linux-x86_64-avx512

As of 1.0.0-M1, the following combinations are also possible with :

Generic x86 (no AVX): linux-x86_64-onednn, windows-x86_64-onednn, macosx-x86_64-onednn
AVX2: linux-x86_64-onednn-avx2, windows-x86_64-onednn-avx2, macosx-x86_64-onednn-avx2
AVX512: linux-x86_64-onednn-avx512

Example: Configuring AVX2 on Windows (Maven pom.xml)

Example: Configuring AVX512 on Linux (Maven pom.xml)

Example: Configuring AVX512 on Linux with onednn(Maven pom.xml)

Note that you need both nd4j-native dependencies - with and without the classifier.

In the examples above, it is assumed that a Maven property nd4j.version is set to an appropriate ND4J version such as 1.0.0-M1.1

Cudnn

Using the NVIDIA cuDNN library with DL4J.

Using Deeplearning4j with cuDNN

There are 2 ways of using cudnn with deeplearning4j. One is an older way described below that is built in to the various deeplearning4j layers at the java level.

The other is to use the new nd4j cuda bindings that link to cudnn at the c++ level. Both will be described below. The newer way first, followed by the old way.

Cudnn setup

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

NVIDIA cuDNN

Note there are multiple combinations of cuDNN and CUDA supported. Deeplearning4j's cuda support is based on javacpp's cuda bindings. The way to read the versioning is: cuda version - cudnn version - javacpp version. For example, if the cuda version is set to 11.2, you can expect us to support cudnn 8.1.

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

Alternatively, in the case of the most recent supported cuda version, cuDNN comes bundled with the "redist" package of the JavaCPP Presets for CUDA. After agreeing to the license, we can add the following dependencies instead of installing CUDA and cuDNN:

 <dependency>
     <groupId>org.bytedeco</groupId>
     <artifactId>cuda-platform-redist</artifactId>
     <version>$CUDA_VERSION-$CUDNN_VERSIUON-$JAVACPP_VERSION</version>
 </dependency>

The same versioning scheme for redist applies to the cuda bindings that leverage an installed cuda.

Using cuDNN via nd4j

Similar to our avx bindings, nd4j leverages our c++ library libnd4j for running mathematical operations. In order to use cudnn, all you need to do is change the cuda backend dependency from:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.0</artifactId>
    <version>1.0.0-M1</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

For jetson nano cuda 10.2:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
  <version>linux-arm64</version>
</dependency>

Note that we are only adding an additional dependency. The reason we use an additional classifier is to pull in an optional dependency on cudnn based routines. The default does not use cudnn, but instead built in standalone routines for various operations implemented in cudnn such as conv2d and lstm.

For users of the -platform dependencies such as nd4j-cuda-11.2-platform, this classifier is still required. The -platform dependencies try to set sane defaults for each platform, but give users the option to include whatever they want. If you need optimizations, please become familiar with this.

Using cudnn via deeplearning4j

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-11.0, or deeplearning4j-cuda-11.2, for example:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.0</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the NO_WORKSPACE mode settable via the network configuration, instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

    // for the whole network
    new NeuralNetConfiguration.Builder()
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...
    // or separately for each layer
    new ConvolutionLayer.Builder(h, w)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...

Memory

Setting available Memory/RAM for a DL4J application

Memory Management for ND4J/DL4J: How does it work?

ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.

"Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.

To manage memory allocations, we use two approaches:

JVM Garbage Collector (GC) and WeakReference tracking
MemoryWorkspaces - see for details

Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and MemoryWorkspaces approaches is in when and how the memory is released.

For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.
For MemoryWorkspaces: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.

Configuring Memory Limits

With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:

-Xms - this defines how much memory JVM heap will use at application start.
-Xmx - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.
-Dorg.bytedeco.javacpp.maxbytes - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.
-Dorg.bytedeco.javacpp.maxphysicalbytes - this specifies the maximum bytes for the entire process - usually set to maxbytes plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. Unlike setting maxbytes setting maxphysicalbytes is optional. This can also be a percentage (>100%), in which case it would apply to maxMemory.

Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:

Gotchas: A few things to watch out for

With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.
For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.
If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.
If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.
If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. -Xmx8G will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.
In limited memory environments, it's usually a bad idea to use high -Xmx value together with -Xms option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set -Xms14G: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.

Memory-mapped files

ND4J supports the use of a memory-mapped file instead of RAM when using the nd4j-native backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.

Here's sample code:

In this case, a 1GB temporary file will be created and mmap'ed, and NDArray x will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.

GPUs

When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.

We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.

We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.

If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.

Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the Xmx option.

Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM at minimum. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.

It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.

Example:

It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the INDArray.unsafeDuplication() method.

Workspaces

Workspaces are an efficient model for memory paging in DL4J.

What are workspaces?

ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all INDArrays' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.

The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call INDArray.detach() and you'll get an independent INDArray copy.

Neural Networks

For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards. Thus for most users, no explicit worspaces configuration is required.

To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:

.trainingWorkspaceMode(WorkspaceMode.SEPARATE) and/or .inferenceWorkspaceMode(WorkspaceMode.SINGLE) in your neural network configuration.

The difference between SEPARATE and SINGLE workspaces is a tradeoff between the performance & memory footprint:

SEPARATE is slightly slower, but uses less memory.
SINGLE is slightly faster, but uses more memory.

That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).

With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference. The only exclusion is the output() method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting INDArray from the workspaces, thus providing you with independent INDArray which will be handled by the JVM GC.

Please note: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.

Garbage Collector

If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:

Put that somewhere before your model.fit(...) call.

ParallelWrapper & ParallelInference

For ParallelWrapper, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.

Iterators

We provide asynchronous prefetch iterators, AsyncDataSetIterator and AsyncMultiDataSetIterator, which are usually used internally.

These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first DataSet coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).

Caution: If you’re using a custom iterator or the RecordReader, please make sure you’re not initializing something huge within the first next() call. Do that in your constructor to avoid undesired workspace growth.

Caution: With AsyncDataSetIterator being used, DataSets are supposed to be used before calling the next() DataSet. You are not supposed to store them, in any way, without the detach() call. Otherwise, the memory used for INDArrays within DataSet will be overwritten within AsyncDataSetIterator eventually.

If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: AsyncShieldDataSetIterator and AsyncShieldMultiDataSetIterator. Basically, those are just thin wrappers that prevent prefetch.

Evaluation

Usually, evaluation assumes use of the model.output() method, which essentially returns an INDArray detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:

This piece of code will run a single cycle over iteratorTest, and it will update both (or less/more if required by your needs) IEvaluation implementations without any additional INDArray allocation.

Workspace Destruction

There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.

That could be done like so: Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.

Workspace Exceptions

If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:

DL4J's LayerWorkspaceMgr

DL4J's Layer API includes the concept of a "layer workspace manager".

The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces. For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons. However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.

What does this mean in practice? Usually it's quite simple...

When returning activations (activate(boolean training, LayerWorkspaceMgr workspaceMgr) method), make sure the returned array is defined in ArrayType.ACTIVATIONS (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)
When returning activation gradients (backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)), similarly return an array defined in ArrayType.ACTIVATION_GRAD

You can also leverage an array defined in any workspace to the appropriate workspace using, for example, LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)

Note that if you are not implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use LayerWorkspaceMgr.noWorkspaces().

Build Tools

Configure the build tools for Deeplearning4j.

Configuring your build tool

While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects.

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Gradle

You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:

Add a backend by adding the following:

You can also swap the standard CPU implementation for .

SBT

You can use Deeplearning4j with SBT by adding the following to your build.sbt:

Add a backend by adding the following:

You can also swap the standard CPU implementation for .

Ivy

You can use Deeplearning4j with ivy by adding the following to your ivy.xml:

Add a backend by adding the following:

You can also swap the standard CPU implementation for .

Leinengen

Clojure programmers may want to use or to work with Maven. A .

NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.

Maven

Configure the Maven build tool for Deeplearning4j.

Configuring the Maven build tool

You can use Deeplearning4j with Maven by adding the following to your pom.xml:

<dependencies>
  <dependency>
      <groupId>org.deeplearning4j</groupId>
      <artifactId>deeplearning4j-core</artifactId>
      <version>1.0.0-M1.1</version>
  </dependency>
</dependencies>

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Add a backend

DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your pom.xml:

<dependencies>
  <dependency>
      <groupId>org.nd4j</groupId>
      <artifactId>nd4j-native-platform</artifactId>
      <version>1.0.0-M1.1</version>
  </dependency>
</dependencies>

You can also swap the standard CPU implementation for GPUs.

Deeplearning4j

Tutorials

Language Processing

Overview of language processing in DL4J

Although not designed to be comparable to tools such as Stanford CoreNLP or NLTK, deepLearning4J does include some core text processing tools that are described here.

Deeplearning4j's NLP support contains interfaces for different NLP libraries. A user wraps third party libraries via our interfaces. Deeplearning4j as of M1, does not support any 3rd party libraries directly. This is due to the lack of maintenance and custom work needed to make this work well for users. Instead, we expose interfaces to allow users to implement their own tokenizers.

SentenceIterator

There are several steps involved in processing natural language. The first is to iterate over your corpus to create a list of documents, which can be as short as a tweet, or as long as a newspaper article. This is performed by a SentenceIterator, which will appear like this:

The SentenceIterator encapsulates a corpus or text, organizing it, say, as one Tweet per line. It is responsible for feeding text piece by piece into your natural language processor. The SentenceIterator is not analogous to a similarly named class, the DatasetIterator, which creates a dataset for training a neural net. Instead it creates a collection of strings by segmenting a corpus.

Tokenizer

A Tokenizer further segments the text at the level of single words, also alternatively as n-grams. ClearTK contains the underlying tokenizers, such as parts of speech (PoS) and parse trees, which allow for both dependency and constituency parsing, like that employed by a recursive neural tensor network (RNTN).

A Tokenizer is created and wrapped by a . The default tokens are words separated by spaces. The tokenization process also involves some machine learning to differentiate between ambibuous symbols like . which end sentences and also abbreviate words such as Mr. and vs.

Both Tokenizers and SentenceIterators work with Preprocessors to deal with anomalies in messy text like Unicode, and to render such text, say, as lowercase characters uniformly.

Vocab

Each document has to be tokenized to create a vocab, the set of words that matter for that document or corpus. Those words are stored in the vocab cache, which contains statistics about a subset of words counted in the document, the words that "matter". The line separating significant and insignifant words is mobile, but the basic idea of distinguishing between the two groups is that words occurring only once (or less than, say, five times) are hard to learn and their presence represents unhelpful noise.

The vocab cache stores metadata for methods such as Word2vec and Bag of Words, which treat words in radically different ways. Word2vec creates representations of words, or neural word embeddings, in the form of vectors that are hundreds of coefficients long. Those coefficients help neural nets predict the likelihood of a word appearing in any given context; for example, after another word. Here's Word2vec, configured:

Once you obtain word vectors, you can feed them into a deep net for classification, prediction, sentiment analysis and the like.

Doc2Vec

Doc2Vec and arbitrary documents for language processing in DL4J.

The main purpose of Doc2Vec is associating arbitrary documents with labels, so labels are required. Doc2vec is an extension of word2vec that learns to correlate labels and words, rather than words with other words. Deeplearning4j's implentation is intended to serve the Java, Scala and Clojure communities.

The first step is coming up with a vector that represents the "meaning" of a document, which can then be used as input to a supervised machine learning algorithm to associate documents with labels.

In the ParagraphVectors builder pattern, the labels() method points to the labels to train on. In the example below, you can see labels related to sentiment analysis:

Here's a full working example of :

Sentence Iterator

Iteration of words, documents, and sentences for language processing in DL4J.

A sentence iterator is used in both Word2vec and Bag of Words.

It feeds bits of text into a neural network in the form of vectors, and also covers the concept of documents in text processing.

In natural-language processing, a document or sentence is typically used to encapsulate a context which an algorithm should learn.

A few examples include analyzing Tweets and full-blown news articles. The purpose of the sentence iterator is to divide text into processable bits. Note the sentence iterator is input agnostic. So bits of text (a document) can come from a file system, the Twitter API or Hadoop.

Depending on how input is processed, the output of a sentence iterator will then be passed to a tokenizer for the processing of individual tokens, which are usually words, but could also be ngrams, skipgrams or other units. The tokenizer is created on a per-sentence basis by a tokenizer factory. The tokenizer factory is what is passed into a text-processing vectorizer.

Some typical examples are below:

SentenceIterator iter = new LineSentenceIterator(new File("your file"));

This assumes that each line in a file is a sentence.

You can also do list of strings as sentence as follows:

Collection<String> sentences = ...;
SentenceIterator iter = new CollectionSentenceIterator(sentences);

This will assume that each string is a sentence (document). Remember this could be a list of Tweets or articles -- both are applicable.

You can iterate over files as follows:

SentenceIterator iter = new FileSentenceIterator(new File("your dir or file"));

This will parse the files line by line and return individual sentences on each one.

For anything complex, we recommend any pipeline that can implement more in depth support than space separated tokens.

Tokenization

Breaking text into individual words for language processing in DL4J.

Notes to write on: 1. Tokenizer factory interface 2. Tokenizer interface 2. How to write your own factory and tokenizer

Tokenization

What is Tokenization?

Tokenization is the process of breaking text down into individual words. Word windows are also composed of tokens. can output text windows that comprise training examples for input into neural nets, as seen here.

Example

Here's an example of tokenization done with DL4J tools:

The above snippet creates a tokenizer capable of stemming.

In Word2Vec, that's the recommended a way of creating a vocabulary, because it averts various vocabulary quirks, such as the singular and plural of the same noun being counted as two different words.

Vocabulary Cache

Mechanism for handling general NLP tasks in DL4J.

The vocabulary cache, or vocab cache, is a mechanism for handling general-purpose natural-language tasks in Deeplearning4j, including normal TF-IDF, word vectors and certain information-retrieval techniques. The goal of the vocab cache is to be a one-stop shop for text vectorization, encapsulating techniques common to bag of words and word vectors, among others.

Vocab cache handles storage of tokens, word-count frequencies, inverse-document frequencies and document occurrences via an inverted index. The InMemoryLookupCache is the reference implementation.

In order to use a vocab cache as you iterate over text and index tokens, you need to figure out if the tokens should be included in the vocab. The criterion is usually if tokens occur with more than a certain pre-configured frequency in the corpus. Below that frequency, an individual token isn't a vocab word, and it remains just a token.

We track tokens as well. In order to track tokens, do the following:

When you want to add a vocab word, do the following:

Adding the word to the index sets the index. Then you declare it as a vocab word. (Declaring it as a vocab word will pull the word from the index.)

How To Guides

Custom Layers

Extend DL4J functionality for custom layers.

There are two components to adding a custom layer:

Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer
Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer

The configuration layer ((1) above) class handles the settings. It's the one you would use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom settings here, and use them in your layer.

The implementation layer ((2) above) class has parameters, and handles network forward pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...) method. In other words: the instantiate method is how we go from the configuration to the implementation; MultiLayerNetwork or ComputationGraph will call this method when initializing the

An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the implementation class). Both of these classes have extensive comments regarding their methods.

You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes, etc: the reason is because one is for the configuration, one is for the implementation. We have not followed this "same name" pattern here to hopefully avoid confusion.

Testing Your Custom Layer

Once you have added a custom layer, it is necessary to run some tests to ensure it is correct.

These tests should at a minimum include the following:

Tests to ensure that the JSON configuration (to/from JSON) works correctly
This is necessary for networks with your custom layer to function with both
model serialization (saving) and Spark training.
Gradient checks to ensure that the implementation is correct.

Example

A full custom layer example is available in our examples repository.

Keras Import

Overview of model import.

Deeplearning4j: Keras model import

Keras model import provides routines for importing neural network models originally configured and trained using Keras, a popular Python deep learning library.

Once you have imported your model into DL4J, our full production stack is at your disposal. We support import of all Keras model types, most layers and practically all utility functionality. Please check here for a complete list of supported Keras features.

Note to users: tf.keras models are also supported. Please check here for an overview of what to expect for tf.keras as well as other features. Our documentation needs to be updated to reflect the changes between keras and tf.keras. For now, users should aware of this as you read the below docs. Migrating from keras to tf.keras mainly involves changing the imports in your python script. The equivalent kind of changes needed to happen for the model import in deeplearning4j. Those changes happened in beta7.

Getting started: Import a Keras model in 60 seconds

To import a Keras model, you need to create and serialize such a model first. Here's a simple example that you can use. The model is a simple MLP that takes mini-batches of vectors of length 100, has two Dense layers and predicts a total of 10 categories. After defining the model, we serialize it in HDF5 format.

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('simple_mlp.h5')

If you put this model file (simple_mlp.h5) into the base of your resource folder of your project, you can load the Keras model as DL4J MultiLayerNetwork as follows

This shows only how to import a Keras Sequential model. For more details take a look at both Functional Model import and Sequential Model import.

String simpleMlp = new ClassPathResource("simple_mlp.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp);

That's it! The KerasModelImport is your main entry point to model import and class takes care of mapping Keras to DL4J concepts internally. As user you just have to provide your model file, see our Getting started guide for more details and options to load Keras models into DL4J.

You can now use your imported model for inference (here with dummy data for simplicity)

INDArray input = Nd4j.create(DataType.FLOAT, 256, 100);
INDArray output = model.output(input);

Here's how you do training in DL4J for your imported model:

model.fit(input, output);

The full example just shown can be found in our DL4J examples.

Project setup

To use Keras model import in your existing project, all you need to do is add the following dependency to your pom.xml.

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-modelimport</artifactId>
    <version>1.0.0-beta6</version> // This version should match that of your other DL4J project dependencies.
</dependency>

If you need a project to get started in the first place, consider cloning DL4J examples and follow the instructions in the repository to build the project.

Backend

DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models can be imported into DL4J.

Popular models and applications

We support import for a growing number of applications, check here for a full list of currently covered models. These applications include

Deep convolutional and Wasserstein GANs
UNET
ResNet50
SqueezeNet
MobileNet
Inception
Xception

Troubleshooting and support

An IncompatibleKerasConfigurationException message indicates that you are attempting to import a Keras model configuration that is not currently supported in Deeplearning4j (either because model import does not cover it, or DL4J does not implement the layer, or feature).

Once you have imported your model, we recommend our own ModelSerializer class for further saving and reloading of your model.

You can inquire further by visiting the community forums. You might consider filing a feature request via Github so that this missing functionality can be placed on the DL4J development roadmap or even sending us a pull request with the necessary changes!

Why Keras model import?

Keras is a popular and user-friendly deep learning library written in Python. The intuitive API of Keras makes defining and running your deep learning models in Python easy. Keras allows you to choose which lower-level library it runs on, but provides a unified API for each such backend. Currently, Keras supports Tensorflow, CNTK and Theano backends.

There is often a gap between the production system of a company and the experimental setup of its data scientists. Keras model import allows data scientists to write their models in Python, but still seamlessly integrates with the production stack.

Keras model import is targeted at users mainly familiar with writing their models in Python with Keras. With model import you can bring your Python models to production by allowing users to import their models into the DL4J ecosystem for either further training or evaluation purposes.

You should use this module when the experimentation phase of your project is completed and you need to ship your models to production. Konduit commercial support for Keras implementations in enterprise.

Functional Models

Importing the functional model.

Getting started with importing Keras functional Models

Let's say you start with defining a simple MLP using Keras' functional API:

In Keras there are several ways to save a model. You can store the whole model (model definition, weights and training configuration) as HDF5 file, just the model configuration (as JSON or YAML file) or just the weights (as HDF5 file). Here's how you do each:

If you decide to save the full model, you will have access to the training configuration of the model, otherwise you don't. So if you want to further train your model in DL4J after import, keep that in mind and use model.save(...) to persist your model.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

In case you didn't compile your Keras model, it will not come with a training configuration. In that case you need to explicitly tell model import to ignore training configuration by setting the enforceTrainingConfig flag to false like this:

To load just the model configuration from JSON, you use KerasModelImport as follows:

If additionally you also want to load the model weights with the configuration, here's what you do:

In the latter two cases no training configuration will be read.

Sequential Models

Importing the functional model.

Getting started with importing Keras Sequential models

Let's say you start with defining a simple MLP using Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('full_model.h5')  # save everything in HDF5 format

model_json = model.to_json()  # save just the config. replace with "to_yaml" for YAML serialization
with open("model_config.json", "w") as f:
    f.write(model_json)

model.save_weights('model_weights.h5') # save just the weights.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

String fullModel = new ClassPathResource("full_model.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel);

MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel, false);

To load just the model configuration from JSON, you use KerasModelImport as follows:

String modelJson = new ClassPathResource("model_config.json").getFile().getPath();
MultiLayerNetworkConfiguration modelConfig = KerasModelImport.importKerasSequentialConfiguration(modelJson)

If additionally you also want to load the model weights with the configuration, here's what you do:

String modelWeights = new ClassPathResource("model_weights.h5").getFile().getPath();
MultiLayerNetwork network = KerasModelImport.importKerasSequentialModelAndWeights(modelJson, modelWeights)

In the latter two cases no training configuration will be read.

Custom Layers

How to implement custom Keras layers for import in Deeplearning4J.

Many more advanced models will contain custom layers, i.e. layers that aren't included in Keras.

You can import those models too, but you will have to provide an implementation of that layer yourself, as the exported model file only provides us with a name for it.

Usually, you will have found out about needing to implement a custom layer, when you saw an exception like the following:

org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException:
No SameDiff Lambda layer found for Lambda layer lambda_123. You can register a SameDiff Lambda layer using 
KerasLayer.registerLambdaLayer(lambdaLayerName, sameDiffLambdaLayer);

org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: 
Unsupported keras layer type LayerName.

Implementing a custom layer for Keras import

There are two ways of implementing a custom layer for Keras import. Which one is the right approach for you, depends on the type of layer you need to implement.

SameDiffLambdaLayer Use this approach if your layer doesn't have any weights and defines just a computation. It is most useful when you have to define a custom layer because you are using a lambda in your model definition. This is the approach you should be using when you've gotten the exception about no lambda layer being found.
KerasLayer Use this approach if your layer needs its own weights. It is most useful when you have to define some complex layer that is more than just a simple computation. This is the approach you should be using when you've gotten the exception about an unsupported layer type.

SameDiffLambdaLayer

Using a SameDiffLambdaLayer is pretty easy. You create a new class that extends it, and override the defineLayer and getOutputType methods.

public class TimesThreeLambda extends SameDiffLambdaLayer {
    @Override
    public SDVariable defineLayer(SameDiff sd, SDVariable x) { 
        return x.mul(3); 
    }

    @Override
    public InputType getOutputType(int layerIndex, InputType inputType) {
        return inputType; 
    }
}

This simple lambda layer just multiplies its input by 3.

defineLayer will only be called once to create the SameDiff graph that is used as the definition of this layer. Do not use information about the size of the inputs or other non-static sizes, like batch size, when defining the layer, or it may fail later on.

After defining your layer, you have to register it to make it available on import.

KerasLayer.registerLambdaLayer("lambda_2", new TimesThreeLambda());

The correct name for your lambda layer will depend on the model you are importing. As you, most likely, were made aware of needing to implement the lambda layer by an exception, this exception should have given you the proper name already.

KerasLayer

Implementing a full layer with weights is more complex than defining a lambda layer. You will have to create a new class that extends KerasLayer and that reads the configuration of that layer and defines it appropriately.

For examples on how this was done, take a look at KerasLRN and KerasPoolHelper which are custom layers that were needed to be able to import GoogLeNet.

After you've defined your layer, you will have to register it to make it available on import:

KerasLayer.registerCustomLayer("PoolHelper", KerasPoolHelper.class);

Again, the appropriate name will we apparent from the exception that has notified you about needing to implement the custom layer in the first place.

Advanced Activations

KerasPReLU

[source]

Imports PReLU layer from Keras

KerasPReLU

public KerasPReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getPReLULayer

public PReLULayer getPReLULayer()

Get DL4J ActivationLayer.

return ActivationLayer

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

param weights Dense layer weights

KerasThresholdedReLU

[source]

Imports ThresholdedReLU layer from Keras

KerasThresholdedReLU

public KerasThresholdedReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

public ActivationLayer getActivationLayer()

Get DL4J ActivationLayer.

return ActivationLayer

KerasLeakyReLU

[source]

Imports LeakyReLU layer from Keras

KerasLeakyReLU

public KerasLeakyReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

public ActivationLayer getActivationLayer()

Get DL4J ActivationLayer.

return ActivationLayer

Embedding Layers

KerasEmbedding

Imports an Embedding layer from Keras.

KerasEmbedding

Pass through constructor for unit tests

throws UnsupportedKerasConfigurationException Unsupported Keras config

getEmbeddingLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (1)

setWeights

Set weights for layer.

param weights Embedding layer weights

Local Layers

KerasLocallyConnected1D

[source]

Imports a 1D locally connected layer from Keras.

KerasLocallyConnected1D

public KerasLocallyConnected1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected1DLayer

public LocallyConnected1D getLocallyConnected1DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for 1D locally connected layer.

param weights Map from parameter name to INDArray.

KerasLocallyConnected2D

[source]

Imports a 2D locally connected layer from Keras.

KerasLocallyConnected2D

public KerasLocallyConnected2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected2DLayer

public LocallyConnected2D getLocallyConnected2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for 2D locally connected layer.

param weights Map from parameter name to INDArray.

Noise Layers

KerasGaussianNoise

Keras wrapper for DL4J dropout layer with GaussianNoise.

KerasGaussianNoise

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianNoiseLayer

Get DL4J DropoutLayer with Gaussian dropout.

return DropoutLayer

KerasAlphaDropout

Keras wrapper for DL4J dropout layer with AlphaDropout.

KerasAlphaDropout

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getAlphaDropoutLayer

Get DL4J DropoutLayer with Alpha dropout.

return DropoutLayer

KerasGaussianDropout

Keras wrapper for DL4J dropout layer with GaussianDropout.

KerasGaussianDropout

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Invalid Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianDropoutLayer

Get DL4J DropoutLayer with Gaussian dropout.

return DropoutLayer

Normalization Layers

KerasBatchNormalization

[source]

Imports a BatchNormalization layer from Keras.

KerasBatchNormalization

public KerasBatchNormalization(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getBatchNormalizationLayer

public BatchNormalization getBatchNormalizationLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

return number of trainable parameters (4)

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

param weights Map from parameter name to INDArray.

Pooling Layers

KerasPooling1D

Imports a Keras 1D Pooling layer as a DL4J Subsampling layer.

KerasPooling1D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling1DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasPoolingUtils

Utility functionality for Keras pooling layers.

mapPoolingType

Map Keras pooling layers to DL4J pooling types.

param className name of the Keras pooling class
return DL4J pooling type
throws UnsupportedKerasConfigurationException Unsupported Keras config

KerasPooling3D

Imports a Keras 3D Pooling layer as a DL4J Subsampling3D layer.

KerasPooling3D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling3DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasGlobalPooling

Imports a Keras Pooling layer as a DL4J Subsampling layer.

KerasGlobalPooling

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGlobalPoolingLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras config
see org.deeplearning4j.nn.conf.InputPreProcessor

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasPooling2D

Imports a Keras 2D Pooling layer as a DL4J Subsampling layer.

KerasPooling2D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling2DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

Recurrent Layers

KerasSimpleRnn

Imports a Keras SimpleRNN layer as a DL4J SimpleRnn layer.

KerasSimpleRnn

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSimpleRnnLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (12)

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

getUnroll

Get whether SimpleRnn layer should be unrolled (for truncated BPTT).

return whether RNN should be unrolled (boolean)

setWeights

Set weights for layer.

param weights Simple RNN weights
throws InvalidKerasConfigurationException Invalid Keras configuration exception

KerasRnnUtils

Utility functions for Keras RNN layers

getUnrollRecurrentLayer

Get unroll parameter to decide whether to unroll RNN with BPTT or not.

param conf KerasLayerConfiguration
param layerConfig dictionary containing Keras layer properties
return boolean unroll parameter
throws InvalidKerasConfigurationException Invalid Keras configuration

getRecurrentDropout

Get recurrent weight dropout from Keras layer configuration. Non-zero dropout rates are currently not supported.

param conf KerasLayerConfiguration
param layerConfig dictionary containing Keras layer properties
return recurrent dropout rate
throws InvalidKerasConfigurationException Invalid Keras configuration

KerasLSTM

Imports a Keras LSTM layer as a DL4J LSTM layer.

KerasLSTM

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLSTMLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (12)

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

Set weights for layer.

param weights LSTM layer weights

getUnroll

Get whether LSTM layer should be unrolled (for truncated BPTT).

return whether to unroll the LSTM

getGateActivationFromConfig

Get LSTM gate activation function from Keras layer configuration.

param layerConfig dictionary containing Keras layer configuration
return LSTM inner activation function
throws InvalidKerasConfigurationException Invalid Keras config

getForgetBiasInitFromConfig

Get LSTM forget gate bias initialization from Keras layer configuration.

param layerConfig dictionary containing Keras layer configuration
return LSTM forget gate bias init
throws InvalidKerasConfigurationException Unsupported Keras config

Wrapper Layers

KerasBidirectional

Builds a DL4J Bidirectional layer from a Keras Bidirectional layer wrapper

KerasBidirectional

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getUnderlyingRecurrentLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getBidirectionalLayer

Get DL4J Bidirectional layer.

return Bidirectional Layer

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

Set weights for Bidirectional layer.

param weights Map of weights

Activations

Supported Keras activations.

We support all Keras activation functions, namely:

softmax
elu
selu
softplus
softsign
relu
tanh
sigmoid
hard_sigmoid
linear

The mapping of Keras to DL4J activation functions is defined in KerasActivationUtils

Constraints

Supported Keras constraints.

All Keras constraints are supported:

max_norm
non_neg
unit_norm
min_max_norm

Mapping Keras to DL4J constraints happens in KerasConstraintUtils.

Initializers

Supported Keras weight initializers.

DL4J supports all available Keras initializers, namely:

Zeros
Ones
Constant
RandomNormal
RandomUniform
TruncatedNormal
VarianceScaling
Orthogonal
Identity
lecun_uniform
lecun_normal
glorot_normal
glorot_uniform
he_normal
he_uniform

The mapping of Keras to DL4J initializers can be found in KerasInitilizationUtils.

Losses

Supported Keras loss functions.

DL4J supports all available (except for logcosh), namely:

mean_squared_error
mean_absolute_error
mean_absolute_percentage_error
mean_squared_logarithmic_error
squared_hinge
hinge
categorical_hinge
logcosh
categorical_crossentropy
sparse_categorical_crossentropy
binary_crossentropy
kullback_leibler_divergence
poisson
cosine_proximity

The mapping of Keras loss functions can be found in .

Optimizers

Supported Keras optimizers

All standard Keras optimizers are supported, but importing custom TensorFlow optimizers won't work:

SGD
RMSprop
Adagrad
Adadelta
Adam
Adamax
Nadam
TFOptimizer

Regularizers

Supported Keras regularizers.

All [Keras regularizers] are supported by DL4J model import:

l1
l2
l1_l2

Mapping of regularizers can be found in .

Tuning and Training

Early Stopping

Terminate a training session given certain conditions.

What is early stopping?

When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).

Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.

The idea behind early stopping is relatively simple:

Split data into training and test sets
At the end of each epoch (or, every N epochs):
- evaluate the network performance on the test set
- if the network outperforms the previous best model: save a copy of the network at the current epoch
Take as our final model the model that has the best test set performance

This is shown graphically below:

The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.

Using DL4J's early stopping functionality requires you to provide a number of configuration options:

A score calculator, such as the DataSetLossCalculator(JavaDoc, Source Code) for a Multi Layer Network, or DataSetLossCalculatorCG (JavaDoc, Source Code) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)
How frequently we want to calculate the score function (default: every epoch)
One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:
- Epoch termination conditions: evaluated every N epochs
- Iteration termination conditions: evaluated once per minibatch
A model saver, that defines how models are saved

An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:

MultiLayerConfiguration myNetworkConfiguration = ...;
DataSetIterator myTrainData = ...;
DataSetIterator myTestData = ...;

EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
        .epochTerminationConditions(new MaxEpochsTerminationCondition(30))
        .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
        .scoreCalculator(new DataSetLossCalculator(myTestData, true))
        .evaluateEveryNEpochs(1)
        .modelSaver(new LocalFileModelSaver(directory))
        .build();

EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);

//Conduct early stopping training:
EarlyStoppingResult result = trainer.fit();

//Print out the results:
System.out.println("Termination reason: " + result.getTerminationReason());
System.out.println("Termination details: " + result.getTerminationDetails());
System.out.println("Total epochs: " + result.getTotalEpochs());
System.out.println("Best epoch number: " + result.getBestModelEpoch());
System.out.println("Score at best epoch: " + result.getBestModelScore());

//Get the best model:
MultiLayerNetwork bestModel = result.getBestModel();

You can also implement your own iteration and epoch termination conditions.

Early Stopping w/ Parallel Wrapper

The early stopping implementation described above will only work with a single device. However, EarlyStoppingParallelTrainer provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. EarlyStoppingParallelTrainer wraps your model in a ParallelWrapper class and performs localized distributed training.

Note that EarlyStoppingParallelTrainer doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.

Transfer Learning

DL4J’s Transfer Learning API

The DL4J transfer learning API enables users to:

Modify the architecture of an existing model
Fine tune learning configurations of an existing model.
Hold parameters of a specified layer constant during training, also referred to as “frozen"

Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

The transfer learning helper

The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

Featurize an input dataset to save for future use
Fit the model with frozen layers with a featurized dataset
Output from the model with frozen layers given a featurized input.

When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

Show me the code

This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from

I. Import a zoo model

Deeplearning4j has a new native model zoo. Read about the module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:

II. Set up a fine-tune configuration

III. Build new models based on VGG16

A.Modifying only the last layer, keeping other frozen

The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

B. Attach new layers to the bottleneck (block5_pool)

Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

C. Fine tune layers from a previously saved model

Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

IV. Saving “featurized” datasets and training with them.

We use the transfer learning helper API. Note this freezes the layers of the model passed in.

Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.

Notes

The TransferLearning builder returns a new instance of a dl4j model.

Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

The trained model helper imports models from Keras without enforcing a training configuration.

Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

Frozen layer configurations are not saved when writing the model to disk.

In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
In place: With the transfer learning helper API which will apply the frozen layers to the given model.
FineTune configurations will selectively update learning parameters.

For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.

Utilities

Reference

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

And initialize another VGG16 model with weights trained on VGGFace:

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this .

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J .

Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see .

Convolutional Layers

KerasConvolution2D

[source]

Imports a 2D Convolution layer from Keras.

KerasConvolution2D

public KerasConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getConvolution2DLayer

public ConvolutionLayer getConvolution2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping2D

[source]

Imports a Keras Cropping 2D layer.

KerasCropping2D

public KerasCropping2D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping2DLayer

public Cropping2D getCropping2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasUpsampling3D

[source]

Keras Upsampling3D layer support

KerasUpsampling3D

public KerasUpsampling3D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling3DLayer

public Upsampling3D getUpsampling3DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolution1D

[source]

Imports a 1D Convolution layer from Keras.

KerasConvolution1D

public KerasConvolution1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException

getConvolution1DLayer

public Convolution1DLayer getConvolution1DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException
throws UnsupportedKerasConfigurationException

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

param weights Map from parameter name to INDArray.

KerasUpsampling1D

[source]

Keras Upsampling1D layer support

KerasUpsampling1D

public KerasUpsampling1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling1DLayer

public Upsampling1D getUpsampling1DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasAtrousConvolution2D

[source]

Keras 1D atrous / dilated convolution layer. Note that in keras 2 this layer has been removed and dilations are now available through the “dilated” argument in regular Conv1D layers

author: Max Pumperla

KerasAtrousConvolution2D

public KerasAtrousConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getAtrousConvolution2D

public ConvolutionLayer getAtrousConvolution2D()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasAtrousConvolution1D

[source]

Keras 1D atrous / dilated convolution layer. Note that in keras 2 this layer has been removed and dilations are now available through the “dilated” argument in regular Conv1D layers

author: Max Pumperla

KerasAtrousConvolution1D

public KerasAtrousConvolution1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getAtrousConvolution1D

public Convolution1DLayer getAtrousConvolution1D()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping3D

[source]

Imports a Keras Cropping 3D layer.

KerasCropping3D

public KerasCropping3D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping3DLayer

public Cropping3D getCropping3DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding2D

[source]

Imports a Keras ZeroPadding 2D layer.

KerasZeroPadding2D

public KerasZeroPadding2D(Map<String, Object> layerConfig)
                    throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding2DLayer

public ZeroPaddingLayer getZeroPadding2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolution3D

[source]

Imports a 3D Convolution layer from Keras.

KerasConvolution3D

public KerasConvolution3D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getConvolution3DLayer

public ConvolutionLayer getConvolution3DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasDeconvolution2D

[source]

Imports a 2D Deconvolution layer from Keras.

KerasDeconvolution2D

public KerasDeconvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getDeconvolution2DLayer

public Deconvolution2D getDeconvolution2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding3D

[source]

Imports a Keras ZeroPadding 3D layer.

KerasZeroPadding3D

public KerasZeroPadding3D(Map<String, Object> layerConfig)
                    throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding3DLayer

public ZeroPadding3DLayer getZeroPadding3DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolutionUtils

[source]

Utility functionality for Keras convolution layers.

getConvolutionModeFromConfig

public static ConvolutionMode getConvolutionModeFromConfig(Map<String, Object> layerConfig,
                                                               KerasLayerConfiguration conf)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Get (convolution) stride from Keras layer configuration.

param layerConfig dictionary containing Keras layer configuration
return Strides array from Keras configuration
throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding1D

[source]

Imports a Keras ZeroPadding 1D layer.

KerasZeroPadding1D

public KerasZeroPadding1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding1DLayer

public ZeroPadding1DLayer getZeroPadding1DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping1D

[source]

Imports a Keras Cropping 1D layer.

KerasCropping1D

public KerasCropping1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping1DLayer

public Cropping1D getCropping1DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasSpaceToDepth

[source]

Constructor from parsed Keras layer configuration dictionary.

KerasSpaceToDepth

public KerasSpaceToDepth(Map<String, Object> layerConfig, boolean enforceTrainingConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getSpaceToDepthLayer

public SpaceToDepthLayer getSpaceToDepthLayer()

Get DL4J SpaceToDepth layer.

return SpaceToDepth layer

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasUpsampling2D

[source]

Keras Upsampling2D layer support

KerasUpsampling2D

public KerasUpsampling2D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling2DLayer

public Upsampling2D getUpsampling2DLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras configuration exception
throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasSeparableConvolution2D

[source]

Keras separable convolution 2D layer support

KerasSeparableConvolution2D

public KerasSeparableConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras configuration

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras configuration
throws UnsupportedKerasConfigurationException Unsupported Keras configuration

getSeparableConvolution2DLayer

public SeparableConvolution2D getSeparableConvolution2DLayer()

Get DL4J SeparableConvolution2D.

return SeparableConvolution2D

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasDepthwiseConvolution2D

[source]

Keras depth-wise convolution 2D layer support

KerasDepthwiseConvolution2D

public KerasDepthwiseConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras configuration

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras configuration
throws UnsupportedKerasConfigurationException Unsupported Keras configuration

getDepthwiseConvolution2DLayer

public DepthwiseConvolution2D getDepthwiseConvolution2DLayer()

Get DL4J DepthwiseConvolution2D.

return DepthwiseConvolution2D

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

[source]

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

public ActivationLayer clone()

param activation Activation function for the layer

activation

public Builder activation(String activationFunction)

Activation function for the layer

activation

public Builder activation(IActivation activationFunction)

param activationFunction Activation function for the layer

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

DenseLayer

[source]

Dense layer: a standard fully connected feed forward layer

hasBias

public Builder hasBias(boolean hasBias)

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

public Builder hasLayerNorm(boolean hasLayerNorm)

If true (default = false): enable layer normalization on this layer

DropoutLayer

[source]

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

public DropoutLayer build()

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

param dropout Activation retain probability.

EmbeddingLayer

[source]

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

[source]

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

inputLength

public Builder inputLength(int inputLength)

Set input sequence length for this embedding layer.

param inputLength input sequence length
return Builder

inferInputLength

public Builder inferInputLength(boolean inferInputLength)

Set input sequence inference mode for embedding layer.

param inferInputLength whether to infer input length
return Builder

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

[source]

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only
CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

[miniBatchSize, vectorSize, 1] for RNN output,
[miniBatchSize, channels, 1, 1] for CNN output, and
[miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

public Builder poolingDimensions(int... poolingDimensions)

Pooling type for global pooling

poolingType

public Builder poolingType(PoolingType poolingType)

param poolingType Pooling type for global pooling

collapseDimensions

public Builder collapseDimensions(boolean collapseDimensions)

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]
param collapseDimensions Whether to collapse the dimensions or not

pnorm

public Builder pnorm(int pnorm)

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

param pnorm P-norm constant

LocalResponseNormalization

[source]

Local response normalization layer See section 3.3 of http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

public Builder k(double k)

LRN scaling constant k. Default: 2

public Builder n(double n)

Number of adjacent kernel maps to use when doing LRN. default: 5

param n Number of adjacent kernel maps

alpha

public Builder alpha(double alpha)

LRN scaling constant alpha. Default: 1e-4

param alpha Scaling constant

beta

public Builder beta(double beta)

Scaling constant beta. Default: 0.75

param beta Scaling constant

cudnnAllowFallback

public Builder cudnnAllowFallback(boolean allowFallback)

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

[source]

SameDiff version of a 1D locally connected layer.

nIn

public Builder nIn(int nIn)

Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int k)

param k Kernel size for the layer

stride

public Builder stride(int s)

param s Stride for the layer

padding

public Builder padding(int p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int d)

param d Dilation for the layer

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int inputSize)

Set input filter size for this locally connected 1D layer

param inputSize height of the input filters
return Builder

LocallyConnected2D

[source]

SameDiff version of a 2D locally connected layer.

setKernel

public void setKernel(int... kernel)

Number of inputs to the layer (input size)

setStride

public void setStride(int... stride)

param stride Stride for the layer. Must be 2 values (height/width)

setPadding

public void setPadding(int... padding)

param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

public void setDilation(int... dilation)

param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

public Builder nIn(int nIn)

param nIn Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int... k)

param k Kernel size for the layer. Must be 2 values (height/width)

stride

public Builder stride(int... s)

param s Stride for the layer. Must be 2 values (height/width)

padding

public Builder padding(int... p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int... d)

param d Dilation for the layer. Must be 2 values (height/width)

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int... inputSize)

Set input filter size (h,w) for this locally connected 2D layer

param inputSize pair of height and width of the input filters to this layer
return Builder

LossLayer

[source]

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

public Builder nIn(int nIn)

param lossFunction Loss function for the loss layer

OutputLayer

[source]

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

public OutputLayer build()

param lossFunction Loss function for the output layer

Pooling1D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

[source]

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

public void setKernelSize(int... kernelSize)

Kernel size

param kernelSize kernel size

setStride

public void setStride(int... stride)

Stride

param stride stride value

setPadding

public void setPadding(int... padding)

Padding

param padding padding value

Upsampling1D

[source]

sequenceLength]} Example:

If input (for a single example, with channels down page, and sequence from left to right) is:
[ A1, A2, A3]
[ B1, B2, B3]
Then output with size = 2 is:
[ A1, A1, A2, A2, A3, A3]
[ B1, B1, B2, B2, B3, B2]

size

public Builder size(int size)

Upsampling size

param size upsampling size in single spatial dimension of this 1D layer

size

public Builder size(int[] size)

Upsampling size int array with a single element. Array must be length 1

param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

[source]

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

Input (slice for one example and channel)
[ A, B ]
[ C, D ]
Size = [2, 2]
Output (slice for one example and channel)
[ A, A, B, B ]
[ A, A, B, B ]
[ C, C, D, D ]
[ C, C, D, D ]

size

public Builder size(int size)

Upsampling size int, used for both height and width

param size upsampling size in height and width dimensions

size

public Builder size(int[] size)

Upsampling size array

param size upsampling size in height and width dimensions

Upsampling3D

[source]

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

public Builder size(int size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

size

public Builder size(int[] size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

[source]

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

public void setPadding(int... padding)

Padding value for left and right. Must be length 2 array

build

public ZeroPadding1DLayer build()

param padding Padding for both the left and right

ZeroPadding3DLayer

[source]

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

public void setPadding(int... padding)

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

public ZeroPadding3DLayer build()

param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

[source]

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

public void setPadding(int... padding)

Padding value for top, bottom, left, and right. Must be length 4 array

build

public ZeroPaddingLayer build()

param padHeight Padding for both the top and bottom
param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

[source]

is a learnable weight vector of length nOut

“.” is element-wise multiplication
b is a bias vector

Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

public LayerMemoryReport getMemoryReport(InputType inputType)

This is a report of the estimated memory consumption for the given layer

param inputType Input type to the layer. Memory consumption is often a function of the input type
return Memory report for the layer

RepeatVector

[source]

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

public int getRepetitionFactor()

Set repetition factor for RepeatVector layer

setRepetitionFactor

public void setRepetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

repetitionFactor

public Builder repetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

Yolo2OutputLayer

[source]

Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - https://arxiv.org/abs/1612.08242 and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

public Builder lambdaCoord(double lambdaCoord)

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

public Builder lambbaNoObj(double lambdaNoObj)

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

public Builder lossPositionScale(ILossFunction lossPositionScale)

Loss function for position/scale component of the loss function

param lossPositionScale Loss function for position/scale

lossClassPredictions

public Builder lossClassPredictions(ILossFunction lossClassPredictions)

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

public Builder boundingBoxPriors(INDArray boundingBoxes)

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

[source]

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

[source]

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

Performance Issues

How to Debug Performance Issues

This page is a how-to guide for debugging performance issues encountered when training neural networks with Deeplearning4j. Much of the information also applies to debugging performance issues encountered when using ND4J.

Deeplearning4j and ND4J provide excellent performance in most cases (utilizing optimized c++ code for all numerical operations as well as high performance libraries such as NVIDIA cuDNN and Intel MKL). However, sometimes bottlenecks or misconfiguration issues may limit performance to well below the maximum. This page is intended to be a guide to help users identify the cause of poor performance, and provide steps to fix these issues.

Performance issues may include:

Poor CPU/GPU utilization
Slower than expected training or operation execution

To start, here’s a summary of some possible causes of performance issues:

Wrong ND4J backend is used (for example, CPU backend when GPU backend is expected)
Not using cuDNN when using CUDA GPUs
ETL (data loading) bottlenecks
Garbage collection overheads
Small batch sizes
Multi-threaded use of MultiLayerNetwork/ComputationGraph for inference (not thread safe)
Double precision floating point data type used when single precision should be used
Not using workspaces for memory management (enabled by default)
Poorly configured network
Layer or operation is CPU-only
CPU: Lack of hardware support for modern AVX etc extensions
Other processes using CPU or GPU resources
CPU: Lack of configuration of OMP_NUM_THREADS when using many models/threads simultaneously

Finally, this page has a short section on Debugging Performance Issues with JVM Profiling

Step 1: Check if correct backend is used

ND4J (and by extension, Deeplearning4j) can perform computation on either the CPU or GPU. The device used for computation is determined by your project dependencies - you include nd4j-native-platform to use CPUs for computation or nd4j-cuda-x.x-platform to use GPUs for computation (where x.x is your CUDA version - such as 9.2, 10.0 etc).

It is straightforward to check which backend is used. ND4J will log the backend upon initialization.

For CPU execution, you will expect output that looks something like:

o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for NativeOps: 8
o.n.n.Nd4jBlas - Number of threads used for BLAS: 8
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]

For CUDA execution, you would expect the output to look something like:

13:08:09,042 INFO  ~ Loaded [JCublasBackend] backend
13:08:13,061 INFO  ~ Number of threads used for NativeOps: 32
13:08:14,265 INFO  ~ Number of threads used for BLAS: 0
13:08:14,274 INFO  ~ Backend used: [CUDA]; OS: [Windows 10]
13:08:14,274 INFO  ~ Cores: [16]; Memory: [7.1GB];
13:08:14,274 INFO  ~ Blas vendor: [CUBLAS]
13:08:14,274 INFO  ~ Device Name: [TITAN X (Pascal)]; CC: [6.1]; Total/free memory: [12884901888]

Pay attention to the Loaded [X] backend and Backend used: [X] messages to confirm that the correct backend is used. If the incorrect backend is being used, check your program dependencies to ensure tho correct backend has been included.

Step 2: Check for cuDNN

If you are using CPUs only (nd4j-native backend) then you can skip to step 3 as cuDNN only applies when using NVIDIA GPUs (nd4j-cuda-x.x-platform dependency).

cuDNN is NVIDIA’s library for accelerating neural network training on NVIDIA GPUs. Deeplearning4j can make use of cuDNN to accelerate a number of layers - including ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM. When training on GPUs, cuDNN should always be used if possible as it is usually much faster than the built-in layer implementations.

Instructions for configuring CuDNN can be found here. In summary, include the deeplearning4j-cuda-x.x dependency (where x.x is your CUDA version - such as 9.2 or 10.0). The network configuration does not need to change to utilize cuDNN - cuDNN simply needs to be available along with the deeplearning4j-cuda module.

How to determine if CuDNN is used or

Not all DL4J layer types are supported in cuDNN. DL4J layers with cuDNN support include ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM.

To check if cuDNN is being used, the simplest approach is to look at the log output when running inference or training: If cuDNN is NOT available when you are using a layer that supports it, you will see a message such as:

o.d.n.l.c.ConvolutionLayer - cuDNN not found: use cuDNN for better GPU performance by including the deeplearning4j-cuda module. For more information, please refer to: https://deeplearning4j.org/cudnn
java.lang.ClassNotFoundException: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)

If cuDNN is available and was loaded successfully, no message will be logged.

Alternatively, you can confirm that cuDNN is used by using the following code:

MultiLayerNetwork net = ...
LayerHelper h = net.getLayer(0).getHelper();    //Index 0: assume layer 0 is a ConvolutionLayer in this example
System.out.println("Layer helper: " + (h == null ? null : h.getClass().getName()));

Note that you will need to do at least one forward pass or fit call to initialize the cuDNN layer helper.

If cuDNN is available and was loaded successfully, you will see the following printed:

Layer helper: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper

whereas if cuDNN is not available or could not be loaded successfully (you will get a warning or error logged also):

Layer helper: null

Step 3: Check for ETL (Data Loading) Bottlenecks

Neural network training requires data to be in memory before training can proceed. If the data is not loaded fast enough, the network will have to wait until data is available. DL4J uses asynchronous prefetch of data to improve performance by default. Under normal circumstances, this asynchronous prefetching means the network should never be waiting around for data (except on the very first iteration) - the next minibatch is loaded in another thread while training is proceeding in the main thread.

However, when data loading takes longer than the iteration time, data can be a bottleneck. For example, if a network takes 100ms to perform fitting on a single minibatch, but data loading takes 200ms, then we have a bottleneck: the network will have to wait 100ms per iteration (200ms loading - 100ms loading in parallel with training) before continuing the next iteration. Conversely, if network fit operation was 100ms and data loading was 50ms, then no data loading bottleck will occur, as the 50ms loading time can be completed asynchronously within one iteration.

How to check for ETL / data loading bottlenecks

The way to identify ETL bottlenecks is simple: add PerformanceListener to your network, and train as normal. For example:

MultiLayerNetwork net = ...
net.setListeners(new PerformanceListener(1));       //Logs ETL and iteration speed on each iteration

When training, you will see output such as:

.d.o.l.PerformanceListener - ETL: 0 ms; iteration 16; iteration time: 65 ms; samples/sec: 492.308; batches/sec: 15.384;

The above output shows that there is no ETL bottleneck (i.e., ETL: 0 ms). However, if ETL time is greater than 0 consistently (after the first iteration), an ETL bottleneck is present.

How to identify the cause of an ETL bottleneck

There are a number of possible causes of ETL bottlenecks. These include (but are not limited to):

Slow hard drives
Network latency or throughput issues (when reading from remote or network storage)
Computationally intensive or inefficient ETL (especially for custom ETL pipelines)

One useful way to get more information is to perform profiling, as described in the profiling section later in this page. For custom ETL pipelines, adding logging for the various stages can help. Finally, another approach to use a process of elimination - for example, measuring the latency and throughput of reading raw files from disk or from remote storage vs. measuring the time to actually process the data from its raw format.

Step 4: Check for Garbage Collection Overhead

Java uses garbage collection for management of on-heap memory (see this link for example for an explanation). Note that DL4J and ND4J use off-heap memory for storage of all INDArrays (see the memory page for details).

Even though DL4J/ND4J array memory is off-heap, garbage collection can still cause performance issues.

In summary:

Garbage collection will sometimes (temporarily and briefly) pause/stop application execution (“stop the world”)
These GC pauses slow down program execution
The overall performance impact of GC pauses depends on both the frequency of GC pauses, and the duration of GC pauses
The frequency is controllable (in part) by ND4J, using Nd4j.getMemoryManager().setAutoGcWindow(10000); and Nd4j.getMemoryManager().togglePeriodicGc(false);
Not every GC event is caused by or controlled by the above ND4J configuration.

In our experience, garbage collection time depends strongly on the number of objects in the JVM heap memory. As a rough guide:

Less than 100,000 objects in heap memory: short GC events (usually not a performance problem)
100,000-500,000 objects: GC overhead becomes noticeable, often in the 50-250ms range per full GC event
500,000 or more objects: GC can be a bottleneck if performed frequently. Performance may still be good if GC events are infrequent (for example, every 10 seconds or less).
10 million or more objects: GC is a major bottleneck even if infrequently called, with each full GC takes multiple seconds

How to configure ND4J garbage collection settings

In simple terms, there are two settings of note:

Nd4j.getMemoryManager().setAutoGcWindow(10000);             //Set to 10 seconds (10000ms) between System.gc() calls
Nd4j.getMemoryManager().togglePeriodicGc(false);            //Disable periodic GC calls

If you suspect garbage collection overhead is having an impact on performance, try changing these settings. The main downside to reducing the frequency or disabling periodic GC entirely is when you are not using workspaces, though workspaces are enabled by default for all neural networks in Deeplearning4j.

Side note: if you are using DL4J for training on Spark, setting these values on the master/driver will not impact the settings on the worker. Instead, see this guide.

How to determine GC impact using PerformanceListener

NOTE: this feature was added after 1.0.0-beta3 and will be available in future releases To determine the impact of garbage collection using PerformanceListener, you can use the following:

int listenerFrequency = 1;
boolean reportScore = true;
boolean reportGC = true;
net.setListeners(new PerformanceListener(listenerFrequency, reportScore, reportGC));

This will report GC activity:

o.d.o.l.PerformanceListener - ETL: 0 ms; iteration 30; iteration time: 17 ms; samples/sec: 588.235; batches/sec: 58.824; score: 0.7229335801186025; GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)];

The garbage collection activity is reported for all available garbage collectors - the GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)] means that garbage collection was performed 2 times since the last PerformanceListener reporting, and took 1ms and 24ms total respectively for the two GC algorithms, respectively.

Keep in mind: PerformanceListener reports GC events every N iterations (as configured by the user). Thus, if PerformanceListener is configured to report statistics every 10 iterations, the garbage collection stats would be for the period of time corresponding to the last 10 iterations.

How to determine GC impact using -verbose:gc

Another useful tool is the -verbose:gc, -XX:+PrintGCDetails -XX:+PrintGCTimeStamps command line options. For more details, see Oracle Command Line Options and Oracle GC Portal Documentation

These options can be passed to the JVM on launch (when using java -jar or java -cp) or can be added to IDE launch options (for example, in IntelliJ: these should be placed in the “VM Options” field in Run/Debug Configurations - see Setting Configuration Options)

When these options are enabled, you will have information reported on each GC event, such as:

5.938: [GC (System.gc()) [PSYoungGen: 5578K->96K(153088K)] 9499K->4016K(502784K), 0.0006252 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
5.939: [Full GC (System.gc()) [PSYoungGen: 96K->0K(153088K)] [ParOldGen: 3920K->3911K(349696K)] 4016K->3911K(502784K), [Metaspace: 22598K->22598K(1069056K)], 0.0117132 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]

This information can be used to determine the frequency, cause (System.gc() calls, allocation failure, etc) and duration of GC events.

How to determine GC impact using a profiler

An alternative approach is to use a profiler to collect garbage collection information.

For example, YourKit Java Profiler can be used to determine both the frequency and duration of garbage collection - see Garbage collection telemetry for more details.

Other tools, such as VisualVM can also be used to monitor GC activity.

How to determine number (and type) of JVM heap objects using memory dumps

If you determine that garbage collection is a problem, and suspect that this is due to the number of objects in memory, you can perform a heap dump.

To perform a heap dump:

Step 1: Run your program
Step 2: While running, determine the process ID
- One approach is to use jps:
  - For basic details, run jps on the command line. If jps is not on the system PATH, it can be found (on Windows) at C:\Program Files\Java\jdk<VERSION>\bin\jps.exe
  - For more details on each process, run jps -lv instead
- Alternatively, you can use the top command on Linux or Task Manager (Windows) to find the PID (on Windows, the PID column may not be enabled by default)
Step 3: Create a heap dump using jmap -dump:format=b,file=file_name.hprof 123 where 123 is the process id (PID) to create the heap dump for

A number of alternatives for generating heap dumps can be found here.

After a memory dump has been collected, it can be opened in tools such as YourKit profiler and VisualVM to determine the number, type and size of objects. With this information, you should be able to pinpoint the cause of the large number of objects and make changes to your code to reduce or eliminate the objects that are causing the garbage collection overhead.

Step 5: Check Minibatch Size

Another common cause of performance issues is a poorly chosen minibatch size. A minibatch is a number of examples used together for one step of inference and training. Minibatch sizes of 32 to 128 are commonly used, though smaller or larger are sometimes used.

In summary:

If minibatch size is too small (for example, training or inference with 1 example at a time), poor hardware utilization and lower overall throughput is expected
If minibatch size is too large
- Hardware utilization will usually be good
- Iteration times will slow down
- Memory utilization may be too high (leading to out-of-memory errors)

For inference, avoid using minibatch size of 1, as throughput will suffer. Unless there are strict latency requirements, you should use larger minibatch sizes as this will give you the best hardware utilization and hence throughput, and is especially important for GPUs.

For training, you should never use a minibatch size of 1 as overall performance and hardware utilization will be reduced. Network convergence may also suffer. Start with a minibatch size of 32-128, if memory will allow this to be used.

For serving predictions in multi-threaded applications (such as a web server), ParallelInference should be used.

Step 6: Ensure you are not using a single MultiLayerNetwork/ComputationGraph for inference from multiple threads

MultiLayerNetwork and ComputationGraph are not considered thread-safe, and should not be used from multiple threads. That said, most operations such as fit, output, etc use synchronized blocks. These synchronized methods should avoid hard to understand exceptions (race conditions due to concurrent use), they will limit throughput to a single thread (though, note that native operation parallelism will still be parallelized as normal). In summary, using the one network from multiple threads should be avoided as it is not thread safe and can be a performance bottleneck.

For inference from multiple threads, you should use one model per thread (as this avoids locks) or for serving predictions in multi-threaded applications (such as a web server), use ParallelInference.

Step 7: Check Data Types

As of 1.0.0-beta3 and earlier, ND4J has a global datatype setting that determines the datatype of all arrays. The default value is 32-bit floating point. The data type can be set using Nd4j.setDataType(DataBuffer.Type.FLOAT); for example.

For best performance, this value should be left as its default. If 64-bit floating point precision (double precision) is used instead, performance can be significantly reduced, especially on GPUs - most consumer NVIDIA GPUs have very poor double precision performance (and half precision/FP16). On Tesla series cards, double precision performance is usually much better than for consumer (GeForce) cards, though is still usually half or less of the single precision performance. Wikipedia has a summary of the single and double precision performance of NVIDIA GPUs here.

Performance on CPUs can also be reduced for double precision due to the additional memory batchwidth requirements vs. float precision.

You can check the data type setting using:

System.out.println("ND4J Data Type Setting: " + Nd4j.dataType());

Step 8: Check workspace configuration for memory management (enabled by default)

For details on workspaces, see the workspaces page.

In summary, workspaces are enabled by default for all Deeplearning4j networks, and enabling them improves performance and reduces memory requirements. There are very few reasons to disable workspaces.

You can check that workspaces are enabled for your MultiLayerNetwork using:

System.out.println("Training workspace config: " + net.getLayerWiseConfigurations().getTrainingWorkspaceMode());
System.out.println("Inference workspace config: " + net.getLayerWiseConfigurations().getInferenceWorkspaceMode());

or for a ComputationGraph using:

System.out.println("Training workspace config: " + cg.getConfiguration().getTrainingWorkspaceMode());
System.out.println("Inference workspace config: " + cg.getConfiguration().getInferenceWorkspaceMode());

You want to see the output as ENABLED output for both training and inference. To change the workspace configuration, use the setter methods, for example: net.getLayerWiseConfigurations().setTrainingWorkspaceMode(WorkspaceMode.ENABLED);

Step 9: Check for a badly configured network or network with layer bottlenecks

Another possible cause (especially for newer users) is a poorly designed network. A network may be poorly designed if:

It has too many layers. A rough guideline:
- More than about 100 layers for a CNN may be too many
- More than about 10 layers for a RNN/LSTM network may be too many
- More than about 20 feed-forward layers may be too many for a MLP
The input/activations are too large
- For CNNs, inputs in the range of 224x224 (for image classification) to 600x600 (for object detection and segmentation) are used. Large image sizes (such as 500x500) are computationally demanding, and much larger than this should be considered too large in most cases.
- For RNNs, the sequence length matters. If you are using sequences longer than a few hundred steps, you should use truncated backpropgation through time if possible.
The output number of classes is too large
- Classification with more than about 10,000 classes can become a performance bottleneck with standard softmax output layers
The layers are too large
- For CNNs, most layers have kernel sizes in the range 2x2 to 7x7, with channels equal to 32 to 1024 (with larger number of channels appearing later in the network). Much larger than this may cause a performance bottleneck.
- For MLPs, most layers have at most 2048 units/neurons (often much smaller). Much larger than this may be too large.
- For RNNs such as LSTMs, layers are typically in the range of 128 to 512, though the largest RNNs may use around 1024 units per layer.
The network has too many parameters
- This is usually a consequence of the other issues already mentioned - too many layers, too large input, too many output classes
- For comparison, less than 1 million parameters would be considered small, and more than about 100 million parameters would be considered very large.
- You can check the number of parameters using MultiLayerNetwork/ComputationGraph.numParams() or MultiLayerNetwork/ComputationGraph.summary()

Note that these are guidelines only, and some reasonable network may exceed the numbers specified here. Some networks can become very large, such as those commonly used for imagenet classification or object detection. However, in these cases, the network is usually carefully designed to provide a good tradeoff between accuracy and computation time.

If your network architecture is significantly outside of the guidelines specified here, you may want to reconsider the design to improve performance.

Step 10: Check for CPU-only ops (when using GPUs)

If you are using CPUs only (nd4j-native backend), you can skip this step, as it only applies when using the GPU (nd4j-cuda) backend.

As of 1.0.0-beta3, a handful of recently added operations do not yet have GPU implementations. Thus, when these layer are used in a network, they will execute on CPU only, irrespective of the nd4j-backend used. GPU support for these layers will be added in an upcoming release.

The layers without GPU support as of 1.0.0-beta3 include:

Convolution3D
Upsampling1D/2D/3D
Deconvolution2D
LocallyConnected1D/2D
SpaceToBatch
SpaceToDepth

Unfortunately, there is no workaround or fix for now, until these operations have GPU implementations completed.

Step 11: Check CPU support for hardware extensions (AVX etc)

If you are running on a GPU, this section does not apply.

When running on older CPUs or those that lack modern AVX extensions such as AVX2 and AVX512, performance will be reduced compared to running on CPUs with these features. Though there is not much you can do about the lack of such features, it is worth knowing about if you are comparing performance between different CPU models.

In summary, CPU models with AVX2 support will perform better than those without it; similarly, AVX512 is an improvement over AVX2.

For more details on AVX, see the Wikipedia AVX article

Step 12: Check other processes using CPU or GPU resources

Another obvious cause of performance issues is other processes using CPU or GPU resources.

For CPU, it is straightforward to see if other processes are using resources using tools such as top (for Linux) or task managed (for Windows).

For NVIDIA CUDA GPUs, nvidia-smi can be used. nvidia-smi is usually installed with the NVIDIA display drivers, and (when run) shows the overall GPU and memory utilization, as well as the GPU utilization of programs running on the system.

On Linux, this is usually on the system path by default. On Windows, it may be found at C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi

Step 13: Check OMP_NUM_THREADS performing concurrent inference using CPU in multiple threads simultaneously

If you are using GPUs (nd4j-cuda backend), you can skip this section.

One issue to be aware of when running multiple DL4J networks (or ND4J operations generally) concurrently in multiple threads is the OpenMP number of threads setting. In summary, in ND4J we use OpenMP pallelism at the c++ level to increase operation performance. By default, ND4J will use a value equal to the number of physical CPU cores (not logical cores) as this will give optimal performance

This also applies if the CPU resources are shared with other computationally demanding processes.

In either case, you may see better overall throughput by reducing the number of OpenMP threads by setting the OMP_NUM_THREADS environment variable - see ND4JEnvironmentVars for details.

One reason for reducing OMP_NUM_THREADS improving overall performance is due to reduced cache thrashing.

Debugging Performance Issues with JVM Profiling

Profiling is a process whereby you can trace how long each method in your code takes to execute, to identify and debug performance bottlenecks.

A full guide to profiling is beyond the scope of this page, but the summary is that you can trace how long each method takes to execute (and where it is being called from) using a profiling tool. This information can then be used to identify bottlenecks (and their causes) in your program.

How to Perform Profiling

Multiple options are available for performing profiling locally. We suggest using either YourKit Java Profiler or VisualVM for profiling.

The YourKit profiling documentation is quite good. To perform profiling with YourKit:

Install and start YourKit Profiler
Start your application with the profiler enabled. For details, see Running applications with the profiler and Local profiling
- Note that IDE integrations are available - see IDE integration
Collect a snapshot and analyze

Note that YourKit provides multiple different types of profiling: Sampling, tracing, and call counting. Each type of profiling has different pros and cons, such as accuracy vs. overhead. For more details, see Sampling, tracing, call counting

VisualVM also supports profiling - see the Profiling Applications section of the VisualVM documentation for more details.

Profiling on Spark

When debugging performance issues for Spark training or inference jobs, it can often be useful to perform profiling here also.

One approach that we have used internally is to combine manual profiling settings (-agentpath JVM argument) with spark-submit arguments for YourKit profiler.

To perform profiling in this manner, 5 steps are required:

Download YourKit profiler to a location on each worker (must be the same location on each worker) and (optionally) the driver
[Optional] Copy the profiling configuration onto each worker (must be the same location on each worker)
Create a local output directory for storing the profiling result files on each worker
Launch the Spark job with the appropriate configuration (see example below)
The snapshots will be saved when the Spark job completes (or is cancelled) to the specified directories.

For example, to perform tracing on both the driver and the workers,

spark-submit
    --conf 'spark.executor.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/executor/,tracing_settings_path=/home/user/yourkitconf.txt'
    --conf 'spark.driver.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/driver/,tracing_settings_path=/home/user/yourkitconf.txt'
    <other spark submit arguments>

The configuration (tracing_settings_path) is optional. A sample tracing settings file is provided below:

walltime=*
adaptive=true
adaptive_min_method_invocation_count=1000
adaptive_max_average_method_time_ns=100000

1.0.0-alpha

Highlights - 1.0.0-alpha Release

ND4J: Added SameDiff - Java automatic differentiation library (alpha release) with Tensorflow import (technology preview) and hundreds of new operations
ND4J: Added CUDA 9.0 and 9.1 support (with cuDNN), dropped support for CUDA 7.5, continued support for CUDA 8.0
ND4J: Native binaries (nd4j-native on Maven Central) now ship with AVX/AVX2/AVX-512 support (Windows/Linux)
DL4J: Large number of new layers and API improvements
DL4J: Keras 2.0 import support

Deeplearning4J

Deeplearning4J: New Features

Layers (new and enhanced)
- Added Yolo2OutputLayer CNN layer for object detection (Link). See also DataVec's ObjectDetectionRecordReader
- Adds support for 'no bias' layers via hasBias(boolean) config (DenseLayer, EmbeddingLayer, OutputLayer, RnnOutputLayer, CenterLossOutputLayer, ConvolutionLayer, Convolution1DLayer). EmbeddingLayer now defaults to no bias (Link)
- Adds support for dilated convolutions (aka 'atrous' convolutions) - ConvolutionLayer, SubsamplingLayer, and 1D versions there-of. (Link)
- Added Upsampling2D layer, Upsampling1D layer (Link, Link)
- ElementWiseVertex now (additionally) supports Average and Max modes in addition to Add/Subtract/Product (Link)
- Added SeparableConvolution2D layer (Link)
- Added Deconvolution2D layer (aka transpose convolution, fractionally strided convolution layer) (Link)
- Added ReverseTimeSeriesVertex (Link)
- Added RnnLossLayer - no-parameter version of RnnOutputLayer, or RNN equivalent of LossLayer (Link)
- Added CnnLossLayer - no-parameter CNN output layer for use cases such as segmentation, denoising, etc. (Link)
- Added Bidirectional layer wrapper (converts any uni-directional RNN to a bidirectional RNN) (Link)
- Added SimpleRnn layer (aka "vanilla" RNN layer) (Link)
- Added LastTimeStep wrapper layer (wraps a RNN layer to get last time step, accounting for masking if present) (Link)
- Added MaskLayer utility layer that simply zeros out activations on forward pass when a mask array is present (Link)
- Added alpha-version (not yet stable) SameDiff layer support to DL4J (Note: forward pass, CPU only for now)(Link)
- Added SpaceToDepth and SpaceToBatch layers (Link, Link)
- Added Cropping2D layer (Link)
Added parameter constraints API (LayerConstraint interface), and MaxNormConstraint, MinMaxNormConstraint, NonNegativeConstraint, UnitNormConstraint implementations (Link)
Significant refactoring of learning rate schedules (Link)
- Added ISchedule interface; added Exponential, Inverse, Map, Poly, Sigmoid and Step schedule implementations (Link)
- Added support for both iteration-based and epoch-based schedules via ISchedule. Also added support for custom (user defined) schedules
- Learning rate schedules are configured on the updaters, via the .updater(IUpdater) method
Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types (Link)
Added support for dropout schedules via ISchedule interface (Link)
Added weight/parameter noise API (IWeightNoise interface); added DropConnect and WeightNoise (additive/multiplicative Gaussian noise) implementations (Link); dropconnect and dropout can now be used simultaneously
Adds layer configuration alias .units(int) equivalent to .nOut(int) (Link)
Adds ComputationGraphConfiguration GraphBuilder .layer(String, Layer, String...) alias for .addLayer(String, Layer, String...)
Layer index no longer required for MultiLayerConfiguration ListBuilder (i.e., .list().layer(<layer>) can now be used for configs) (Link)
Added MultiLayerNetwork.summary(InputType) and ComputationGraph.summary(InputType...) methods (shows layer and activation size information) (Link)
MultiLayerNetwork, ComputationGraph and layerwise trainable layers now track the number of epochs (Link)
Added deeplearning4j-ui-standalone module: uber-jar for easy launching of UI server (usage: java -jar deeplearning4j-ui-standalone-1.0.0-alpha.jar -p 9124 -r true -f c:/UIStorage.bin)
Weight initializations:
- Added .weightInit(Distribution) convenience/overload (previously: required .weightInit(WeightInit.DISTRIBUTION).dist(Distribution)) (Link)
- WeightInit.NORMAL (for self-normalizing neural networks) (Link)
- Ones, Identity weight initialization (Link)
- Added new distributions (LogNormalDistribution, TruncatedNormalDistribution, OrthogonalDistribution, ConstantDistribution) which can be used for weight initialization (Link)
- RNNs: Added ability to specify weight initialization for recurrent weights separately to "input" weights (Link)
Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) (Link)
Added Spark IteratorUtils - wraps a RecordReaderMultiDataSetIterator for use in Spark network training (Link)
CuDNN-supporting layers (ConvolutionLayer, etc) now warn the user if using CUDA without CuDNN (Link)
Binary cross entropy (LossBinaryXENT) now implements clipping (1e-5 to (1 - 1e-5) by default) to avoid numerical underflow/NaNs (Link)
SequenceRecordReaderDataSetIterator now supports multi-label regression (Link)
TransferLearning FineTuneConfiguration now has methods for setting training/inference workspace modes (Link)
IterationListener iterationDone method now reports both current iteration and epoch count; removed unnecessary invoke/invoked methods (Link)
Added MultiLayerNetwork.layerSize(int), ComputationGraph.layerSize(int)/layerSize(String) to easily determine size of layers (Link)
Added MultiLayerNetwork.toComputationGraph() method (Link)
Added NetworkUtils convenience methods to easily change the learning rate of an already initialized network (Link)
Added MultiLayerNetwork.save(File)/.load(File) and ComputationGraph.save(File)/.load(File) convenience methods (Link)
Added CheckpointListener to periodically save a copy of the model during training (every N iter/epochs, every T time units) (Link)
Added ComputationGraph output method overloads with mask arrays (Link)
New LossMultiLabel loss function for multi-label classification (Link)
Added new model zoo models:
- Darknet19 (Link)
- TinyYOLO (Link)
New iterators, and iterator improvements:
- Added FileDataSetIterator, FileMultiDataSetIterator for flexibly iterating over directories of saved (Multi)DataSet objects (Link)
- UCISequenceDataSetIterator (Link)
- RecordReaderDataSetIterator now has builder pattern for convenience, improved javadoc (Link)
- Added DataSetIteratorSplitter, MultiDataSetIteratorSplitter (Link, Link)
Added additional score functions for early stopping (ROC metrics, full set of Evaluation/Regression metrics, etc) (Link)
Added additional ROC and ROCMultiClass evaluation overloads for MultiLayerNetwork and ComputationGraph (Link)
Clarified Evaluation.stats() output to refer to "Predictions" instead of "Examples" (former is more correct for RNNs) (Link)
EarlyStoppingConfiguration now supports Supplier<ScoreCalculator> for use with non-serializable score calculators (Link)
Improved ModelSerializer exceptions when trying to load a model via wrong method (i.e., try to load ComputationGraph via restoreMultiLayerNetwork) (Link)
Added SparkDataValidation utility methods to validate saved DataSet and MultiDataSet on HDFS or local (Link)
ModelSerializer: added restoreMultiLayerNetworkAndNormalizer and restoreComputationGraphAndNormalizer methods (Link)
ParallelInference now has output overloads with support for input mask arrays (Link)

Deeplearning4J: Bug Fixes and Optimizations

Lombok is no longer included as a transitive dependency (Link)
ComputationGraph can now have a vertex as the output (not just layers) (Link, Link)
Performance improvement for J7FileStatsStorage with large amount of history (Link)
Fixed UI layer sizes for variational autoencoder layers (Link)
Fixes to avoid HDF5 library crashes (Link, Link)
UI Play servers switch to production (PROD) mode (Link)
Related to the above: users can now set play.crypto.secret system property to manually set the Play application secret; is randomly generated by default (Link).
SequenceRecordReaderDataSetIterator would apply preprocessor twice (Link)
Evaluation no-arg constructor could cause NaN evaluation metrics when used on Spark
CollectScoresIterationListener could recurse endlessly (Link)
Async(Multi)DataSetIterator calling reset() on underlying iterator could cause issues in some situations (Link)
In some cases, L2 regularization could be (incorrectly) applied to frozen layers (Link)
Logging fixes for NearestNeighboursServer (Link)
Memory optimization for BaseStatsListener (Link)
ModelGuesser fix for loading Keras models from streams (previously would fail) (Link)
Various fixes for workspaces in MultiLayerNetwork and ComputationGraph (Link, Link, Link, Link, Link, Link)
Fix for incorrect condition in DuplicateToTimeSeriesVertex (Link)
Fix for getMemoryReport exception on some valid ComputationGraph networks (Link)
RecordReaderDataSetIterator when used with preprocessors could cause an exception under some circumstances (Link)
CnnToFeedForwardPreProcessor could silently reshape invalid input, as long as the input array length matches the expected length (Link)
ModelSerializer temporary files would not be deleted if JVM crashes; now are deleted immediately when no longer required (Link)
RecordReaderMultiDataSetIterator may not add mask arrays under some circumstances, when set to ALIGN_END mode (Link)
ConvolutionIterationListener previously produced an IndexOutOfBoundsException when all convolution layers are frozen (Link)
PrecisionRecallCurve.getPointAtRecall could return a point with a correct but sub-optimal precision when multiple points had identical recall (Link)
Setting dropout(0) on transfer learning FineTuneConfiguration did not remove dropout if present on existing layer (Link)
Under some rare circumstances, Spark evaluation could lead to a NullPointerException (Link)
ComputationGraph: disconnected vertices were not always detected in configuration validation (Link)
Activation layers would not always inherit the global activation function configuration (Link)
RNN evaluation memory optimization: when TBPTT is configured for training, also use TBPTT-style splitting for evaluation (identical result, less memory) (Link, Link)
PerformanceListener is now serializable (Link)
ScoreIterationListener and PerformanceListener now report model iteration, not "iterations since listener creation" (Link)
Precision/recall curves cached values in ROC class may not be updated after merging ROC instances (Link)
ROC merging after evaluating a large number of examples may produce IllegalStateException (Link)
Added checks for invalid input indices to EmbeddingLayer (Link)
Fixed possible NPE when loading legacy (pre-0.9.0) model configurations from JSON (Link)
Fixed issues with EvaluationCalibration HTML export chart rendering (Link)
Fixed possible incorrect redering of UI/StatsStorage charts with J7FileStatsStorage when used with Spark training (Link)
MnistDataSetIterator would not always reliably detect and automatically fix/redownload on corrupted download data (Link)
MnistDataSetIterator / EmnistDataSetIterator: updated download location after hosting URL change (Link, Link)
Fixes to propagation of thread interruptions (Link)
MultiLayerNetwork/ComputationGraph will no longer throw an ND4JIllegalStateException during initialization if a network contains no parameters (Link, Link)
Fixes for TSNE posting of data to UI for visualization (Link)
PerformanceListener now throws a useful exception (in constructor) on invalid frequency argument, instead of runtime ArithmeticException (Link)
RecordReader(Multi)DataSetIterator now throws more useful exceptions when Writable values are non-numerical (Link)
UI: Fixed possible character encoding issues for non-English languages when internationalization data .txt files are read from uber JARs (Link)
UI: Fixed UI incorrectly trying to parse non-DL4J UI resources when loading I18N data (Link)
Various threading fixes (Link)
Evaluation: no-arg methods (f1(), precion(), etc) now return single class value for binary case instead of macro-averaged value; clarify values in stats() method and javadoc (Link)
Early stopping training: TrainingListener opEpochStart/End (etc) methods were not being called correctly (Link)
Fixes issue where dropout was not always applied to input of RNN layers (Link)
ModelSerializer: improved validation/exceptions when reading from invalid/empty/closed streams (Link)
ParallelInference fixes:
- fixes for variable size inputs (variable length time series, variable size CNN inputs) when using batch mode (Link)
- fixes undelying model exceptions during output method are now properly propagated back to the user (Link)
- fixes support for 'pre-batched' inputs (i.e., inputs where minibatch size is > 1) (Link)
Memory optimization for network weight initialization via in-place random ops (Link)
Fixes for CuDNN with SAME mode padding (Link, Link)
Fix for VariationalAutoencoder builder decoder layer size validation (Link)
Improved Kmeans throughputlink
Add RPForest to nearest neighbors link

Deeplearning4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

Default training workspace mode has been switched to SEPARATE from NONE for MultiLayerNetwork and ComputationGraph (Link)
Behaviour change: fit(DataSetIterator) and similar methods no longer perform layerwise pretraining followed by backprop - only backprop is performed in these methods. For pretraining, use pretrain(DataSetIterator) and pretrain(MultiDataSetIterator) methods (Link)
Previously deprecated updater configuration methods (.learningRate(double), .momentum(double) etc) all removed
- To configure learning rate: use .updater(new Adam(lr)) instead of .updater(Updater.ADAM).learningRate(lr)
- To configure bias learning rate: use .biasUpdater(IUpdater) method
- To configure learning rate schedules: use .updater(new Adam(ISchedule)) and similar
Updater configuration via enumeration (i.e., .updater(Updater)) has been deprecated; use .updater(IUpdater)
.regularization(boolean) config removed; functionality is now always equivalent to .regularization(true)
.useDropConnect(boolean) removed; use .weightNoise(new DropConnect(double)) instead
.iterations(int) method has been removed (was rarely used and confusing to users)
Multiple utility classes (in org.deeplearning4j.util) have been deprecated and/or moved to nd4j-common. Use same class names in nd4j-common org.nd4j.util instead.
DataSetIterators in DL4J have been moved from deeplearning4j-nn module to new deeplearning4j-datasets, deeplearning4j-datavec-iterators and deeplearning4j-utility-iterators modules. Packages/imports are unchanged; deeplearning4j-core pulls these in as transitive dependencies hence no user changes should be required in most cases (Link)
Previously deprecated .activation(String) has been removed; use .activation(Activation) or .activation(IActivation) instead
Layer API change: Custom layers may need to implement applyConstraints(int iteration, int epoch) method
Parameter initializer API change: Custom parameter initializers may need to implement isWeightParam(String) and isBiasParam(String) methods
RBM (Restricted Boltzmann Machine) layers have been removed entirely. Consider using VariationalAutoencoder layers as a replacement (Link)
GravesBidirectionalLSTM has been deprecated; use new Bidirectional(Bidirectional.Mode.ADD, new GravesLSTM.Builder()....build())) instead
Previously deprecated WordVectorSerializer methods have now been removed (Link)
Removed deeplearning4j-ui-remote-iterationlisteners module and obsolete RemoteConvolutionalIterationListener (Link)

Deeplearning4J: 1.0.0-alpha Known Issues

Performance on some networks types may be reduced on CUDA compared to 0.9.1 (with workspaces configured). This will be addressed in the next release
Some issues have been noted with FP16 support on CUDA (Link)

Deeplearing4J: Keras Import

Keras 2 support, keeping backward compatibility for keras 1
Keras 2 and 1 import use exact same API and are inferred by DL4J
Keras unit test coverage increased by 10x, many more real-world integration tests
Unit tests for importing and checking layer weights
Leaky ReLU, ELU, SELU support for model import
All Keras layers can be imported with optional bias terms
Old deeplearning4j-keras module removed, old "Model" API removed
All Keras initializations (Lecun normal, Lecun uniform, ones, zeros, Orthogonal, VarianceScaling, Constant) supported
1D convolution and pooling supported in DL4J and Keras model import
Atrous Convolution 1D and 2D layers supported in Keras model import
1D Zero padding layers supported
Keras constraints module fully supported in DL4J and model import
Upsampling 1D and 2D layers in DL4J and Keras model import (including GAN examples in tests)
Most merge modes supported in Keras model import, Keras 2 Merge layer API supported
Separable Convolution 2D layer supported in DL4J and Keras model import
Deconvolution 2D layer supported in DL4J and Keras model import
Full support of Keras noise layers on import (Alpha dropout, Gaussian dropout and noise)
Support for SimpleRNN layer in Keras model import
Support for Bidirectional layer wrapper Keras model import
Addition of LastTimestepVertex in DL4J to support return_sequences=False for Keras RNN layers.
DL4J support for recurrent weight initializations and Keras import integration.
SpaceToBatch and BatchToSpace layers in DL4J for better YOLO support, plus end-to-end YOLO Keras import test.
Cropping2D support in DL4J and Keras model import

Deeplearning4J: Keras Import - API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

In 0.9.1 deprecated Model and ModelConfiguration have been permanently removed. Use KerasModelImport instead, which is now the only entry point for Keras model import.

Deeplearning4J: Keras Import - Known Issues

Embedding layer: In DL4J the output of an embedding layer is 2D by default, unless preprocessors are specified. In Keras the output is always 3D, but depending on specified parameters can be interpreted as 2D. This often leads to difficulties when importing Embedding layers. Many cases have been covered and issues fixed, but inconsistencies remain.
Batchnormalization layer: DL4J's batch normalization layer is much more restrictive (in a good way) than Keras' version of it. For instance, DL4J only allows to normalize spatial dimensions for 4D convolutional inputs, while in Keras any axis can be used for normalization. Depending on the dimension ordering (NCHW vs. NHWC) and the specific configuration used by a Keras user, this can lead to expected (!) and unexpected import errors.
Support for importing a Keras model for training purposes in DL4J (enforceTrainingConfig == true) is still very limited and will be tackled properly for the next release.
Keras Merge layers: seem to work fine with the Keras functional API, but have issues when used in a Sequential model.
Reshape layers: can be somewhat unreliable on import. DL4J rarely has a need to explicitly reshape input beyond (inferred) standard input preprocessors. In Keras, Reshape layers are used quite often. Mapping the two paradigms can be difficult in edge cases.

ND4J

ND4J: New Features

Hundreds of new operations added
New DifferentialFunction api with automatic differentiation (see samediff section) Link
Technology preview of tensorflow import added (supports 1.4.0 and up)
Apache Arrow serialization added supporting new tensor API Link
Add support for AVX/AVX2 and AVX-512 instruction sets for Windows/Linux for nd4j-native backend Link
nVidia CUDA 8/9.0/9.1 now supported
Worskpaces improvements were introduced to ensure safety: SCOPE_PANIC profiling mode is enabled by default
FlatBuffers support for INDArray serde
Support for auto-broadcastable operations was added
libnd4j, underlying c++ library, got functionality boost and now offers: NDArray class, Graph class, and can be used as standalone library or executable.
Convolution-related ops now support NHWC in addition to NCHW data format.
Accumulation ops now have option to keep reduced dimensions.

ND4J: Known Issues

Not all op gradients implemented for automatic differentiation
Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

ND4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

ND4J - SameDiff

Initial tech preview Link
Control flow is supported with IF and WHILE primitives.

Alpha release of SameDiff auto-differentiation engine for ND4J.

Features

Two execution modes available: Java-driven execution, and Native execution for serialized graphs.
SameDiff graphs can be serialized using FlatBuffers
Building and running computation graphs build from SameDiff operations.
Graphs can run forward pass on input data and compute gradients for the backward pass.
Already supports many high-level layers, like dense layers, convolutions (1D-3D) deconvolutions, separable convolutions, pooling and upsampling, batch normalization, local response normalization, LSTMs and GRUs.
In total there are about 350 SameDiff operations available, including many basic operations used in building complex graphs.
Supports rudimentary import of TensorFlow and ONNX graphs for inference.
TFOpTests is a dedicated project for creating test resources for TensorFlow import.

Known Issues and Limitations

Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.
While many of the widely used base operations and high-level layers used in practice are supported, op coverage is still limited. Goal is to achieve feature parity with TensorFlow and fully support import for TF graphs.
Some of the existing ops do not have a backward pass implemented (called doDiff in SameDiff).

DataVec

DataVec: New Features

Added ObjectDetectionRecordReader - for use with DL4J's Yolo2OutputLayer (Link) (also supports image transforms: Link)
Added ImageObjectLabelProvider, VocLabelProvider and SvhnLabelProvider (Streetview house numbers) for use with ObjectDetectionRecordReader (Link, Link)
Added LocalTransformExecutor for single machine execution (without Spark dependency) (Link)
Added ArrowRecordReader (for reading Apache Arrow format data) (Link)
Added RecordMapper class for conversion between RecordReader and RecordWriter (Link)
RecordWriter and InputSplit APIs have been improved; more flexible and support for partitioning across all writers (Link, Link, Link)
Added ArrowWritableRecordBatch and NDArrayRecordBatch for efficient batch storage (List<List<Writable>>) (Link, Link)
Added BoxImageTransform - an ImageTransform that either crops or pads without changing aspect ratio (Link)
TransformProcess now has executeToSequence(List<Writable)), executeSequenceToSingle(List<List<Writable>>) and executeToSequenceBatch(List<List<Writable>>) methods (Link, Link)
Added CSVVariableSlidingWindowRecordReader (Link)
ImageRecordReader: supports regression use cases for labels (previously: only classification) (Link)
ImageRecordReader: supports multi-class and multi-label image classification (via PathMultiLabelGenerator interface) (Link, Link)
DataAnalysis/AnalyzeSpark now includes quantiles (via t-digest) (Link)
Added AndroidNativeImageLoader.asBitmap(), Java2DNativeImageLoader.asBufferedImage() (Link)
Add new RecordReader / SequenceRecordReader implementations:
- datavec-excel module and ExcelRecordReader (Link)
- JacksonLineRecordReader (Link)
- ConcatenatingRecordReader (Link)
Add new transforms:
- TextToTermIndexSequenceTransform (Link)
- ConditionalReplaceValueTransformWithDefault (Link)
- GeographicMidpointReduction (Link)
StringToTimeTransform will con try to guess time format if format isn't provided (Link)
Improved performance for NativeImageLoader on Android (Link)
Added BytesWritable (Writable for byte[] data) (Link)
Added TranformProcess.inferCategories methods to auto-infer categories from a RecordReader (Link)

DataVec: Fixes

Lombok is no longer included as a transitive dependency (Link)
MapFileRecordReader and MapFileSequenceRecordReader can handle empty partitions/splits for multi-part map files (Link)
CSVRecordReader is now properly serializable using Java serialization (Link) and Kryo serialization (Link)
Writables: equality semantics have been changed: for example, now DoubleWritable(1.0) is equal to IntWritable(1) (Link)
NumberedFileInputSplit now supports leading zeros (Link)
CSVSparkTransformServer and ImageSparkTransformServer Play severs changed to production mode (Link)
Fix for JSON subtype info for FloatMetaData (Link)
Serialization fixes for JacksonRecordReader, RegexSequenceRecordReader (Link)
Added RecordReader.resetSupported() method (Link)
SVMLightRecordReader now implements nextRecord() method (Link)
Fix for custom reductions when using conditions (Link)
SequenceLengthAnalysis is now serializable (Link) and supports to/from JSON (Link)
Fixes for FFT functionality (Link, Link)
Remove use of backported java.util.functions; use ND4J functions API instead (Link)
Fix for transforms data quality analysis for time columns (Link)

DataVec: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

Many of the util classes (in org.datavec.api.util mainly) have been deprecated or removed; use equivalently named util clases in nd4j-common module (Link)
RecordReader.next(int) method now returns List<List<Writable>> for batches, not List<Writable>. See also NDArrayRecordBatch
RecordWriter and SequenceRecordWriter APIs have been updated with multiple new methods

Arbiter

Arbiter: New Features

Workspace support added (Link, Link)
Added new layer spaces: LSTM, CenterLoss, Deconvolution2D, LossLayer, Bidirectional layer wrapper (Link, Link)
As per DL4J API changes: Updater configuration options (learning rate, momentum, epsilon, rho etc) have been moved to ParameterSpace instead. Updater spaces (AdamSpace, AdaGradSpace etc) introduced (Link)
As per DL4J API changes: Dropout configuration is now via ParameterSpace<IDropout>, DropoutSpace introduced (Link)
RBM layer spaces removed (Link)
ComputationGraphSpace: added layer/vertex methods with overloads for preprocessors (Link)
Added support to specify 'fixed' layers using DL4J layers directly (instead of using LayerSpaces, even for layers without hyperparameters) (Link)
Added LogUniformDistribution (Link)
Improvements to score functions; added ROC score function (Link)
Learning rate schedule support added (Link)
Add math ops for ParameterSpace<Double> and ParameterSpace<Integer> (Link)

Arbiter: Fixes

Fix parallel job execution (when using multiple execution threads) (Link, Link)
Improved logging for failed task execution (Link)
Fix for UI JSON serialization (Link)
Fix threading issues when running on CUDA and multiple execution threads (Link, Link, Link)
Rename saved model file to model.bin (Link)
Fix threading issues with non thread-safe candidates / parameter spaces (Link)
Lombok is no longer included as a transitive dependency (Link)

Arbiter: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

As per DL4J updater API changes: old updater configuration (learningRate, momentum, etc) methods have been removed. Use .updater(IUpdater) or .updater(ParameterSpace<IUpdater>) methods instead

RL4J

Add support for LSTM layer to A3C
Fix A3C to make it actually work using new ActorCriticLoss and correct use of randomness
Fix cases when QLearning would fail (non-flat input, incomplete serialization, incorrect normalization)
Fix logic of HistoryProcessor with async algorithms and failures when preprocessing images
Tidy up and correct the output of statistics, also allowing the use of IterationListener
Fix issues preventing efficient execution with CUDA
Provide access to more of the internal structures with NeuralNet.getNeuralNetworks(), Policy.getNeuralNet(), and convenience constructors for Policy
Add MDPs for ALE (Arcade Learning Environment) and MALMO to support Atari games and Minecraft
Update MDP for Doom to allow using the latest version of VizDoom

ScalNet

First release of ScalNet Scala API, which closely resembles Keras' API.
Can be built with sbt and maven.
Supports both Keras inspired Sequential models, corresponding to DL4J's MultiLayerNetwork, and Model, corresponding to ComputationGraph.
Project structure is closely aligned to both DL4J model-import module and Keras.
Supports the following layers: Convolution2D, Dense, EmbeddingLayer, AvgPooling2D, MaxPooling2D, GravesLSTM, LSTM, Bidirectional layer wrapper, Flatten, Reshape. Additionally, DL4J OutputLayers are supported.

ND4S

Scala 2.12 support

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

Details: https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Data: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data Image: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

see https://archive.ics.uci.edu/ml/datasets/Iris
param batch Batch size
param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

param batchSize the batch size of the examples
param numExamples the overall number of examples
param imgDim an array of height, width and channels
param numLabels the overall number of examples
param useSubset use a subset of the LFWDataSet
param labelGenerator path label generator to use
param train true if use train value
param splitTrainTest the percentage to split data for train and remainder goes to test
param imageTransform how to transform the image
param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes
MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z
BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)
LETTERS: 145,600 examples total. 26 balanced classes
DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

param dataSet Dataset (subset) to return
param batchSize Batch size
param train If true: use training set. If false: use test set
param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

param dataSet Subset to get
return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

param dataSet Subset to get
return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

param recordReader Record reader to use as the source of data
param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

param recordReader RecordReader: provides the source of the data
param batchSize Batch size (number of examples) for the output DataSet objects
param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()
param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

param labelIndexFrom Column index of the first regression target (indexes start at 0)
param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1
param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

param featuresReader SequenceRecordReader for the features
param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1
param miniBatchSize Minibatch size for each call of next()
param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

param iterator Underlying iterator to wrap and fetch asynchronously from
param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)

param baseIterator
param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches
param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public MultiDataSet next(int num)

This method returns test iterator instance

return

AsyncShieldDataSetIterator

[source]

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)

param numMiniBatches Number of minibatches per epoch
param features Each triple in the list specifies the shape, array order and type of values for the features arrays
param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)

param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

param shape Shape of the array
param values Values to fill the array with
return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

param shape Shape of the array
param order Order of array (‘c’ or ‘f’)
param values Values to fill the array with
return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap
param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

param baseIterator - iterator to be wrapped and split
param totalBatches - total batches in baseIterator
param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public DataSet next(int i)

This method returns test iterator instance

return

JointMultiDataSetIterator

[source]

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)

param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)

param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet
param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)

param files List of files to iterate over
param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)

param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

param numMiniBatches Number of minibatches per epoch
param featuresShape Features shape
param labelsShape Labels shape
param featureValues Type of values for the features
param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator