1 of 100

EN 1.0.0-M2

Deeplearning4j Suite Overview

Introduction to core Deeplearning4j concepts.

Eclipse DeepLearning4J

Eclipse Deeplearning4j is a suite of tools for running deep learning on the JVM. It's the only framework that allows you to train models from java while interoperating with the python ecosystem through a mix of python execution via our cpython bindings, model import support, and interop of other runtimes such as tensorflow-java and onnxruntime.

The use cases include importing and retraining models (Pytorch, Tensorflow, Keras) models and deploying in JVM Micro service environments, mobile devices, IoT, and Apache Spark. It is a great compliment to your python environment for running models built in python, deployed to or packaged for other environments.

Deeplearning4j has several submodules including:

Samediff: a tensorflow/pytorch like framework for execution of complex graphs. This framework is lower level, but very flexible. It's also the base api for running onnx and tensorflow graphs.
Nd4j: numpy ++ for java. Contains a mix of numpy operations and tensorflow/pytorch operations.
Libnd4j: A lightweight, standalone c++ library enable math code to run on different devices. Optimizable for running on a wide variety of devices.
Python4j: A python script execution framework easing deployment of python scripts in to production.
Apache Spark Integration: An integration with the Apache Spark framework enabling execution of deep learning pipelines on spark
Datavec: A data transformation library converting raw input data to tensors suitable for running neural networks on.

How to use this website

This website follows the divio framework layout. This website has several sections of documentation following this layout. Below is an overview of the sections of the site:

Multi project contains all cross project documentation such as end to end training and other whole project related documentation. This should be the default entry point for those getting started.
Deeplearning4j contains all of the documentation related to the core deeplearning4j apis such as the multi layer network and the computation graph. Consider this the high level framework for building neural networks. If you would like something lower level like tensorflow or pytorch, consider using samediff
Samediff contains all the documentation related to the samediff submodule of ND4j. Samediff is a lower level api for building neural networks similar to pytorch or tensorflow with built in automatic differentiation.
Datavec contains all the documentation related to our data transformation library datavec.
Python4j contains all the documentation related to our cpython execution framework python4j.
Libnd4j contains all the documentation related to our underlying C++ framework libnd4j.
Apache Spark contains all of the documentation related to our Apache Spark integration.
Concepts/Theory contains all of the documentation related to general mathematical or computer science theory needed to understand various aspects of the framework.

Open Source

The libraries are completely open-source, Apache 2.0 under open governance at the Eclipse foundation. The Eclipse Deeplearning4j project welcomes all contributions. See our community and our Contribution guide to get involved.

JVM/Python/C++

Deeplearning4j can either be a compliment to your existing workflows in python and c++ or a standalone library for you to build and deploy models. Use what components you find useful.

Release Notes

1.0.0-M2

Highlights

Adds proper support for java 9 modules: https://github.com/eclipse/deeplearning4j/pull/9631 https://github.com/eclipse/deeplearning4j/pull/9626

As part of the same work flatbuffers has been upgraded to 1.12.1. This affects the samediff file format and the user interfaces. Flatbuffers as a file format is forwards and backwards compatible but if you have any issues please do let us know. The relevant files have been updated using the flatc compiler.

Removed rl4j: in continuing to cut unmaintained modules, the 1.0 will focus the framework on a few key use cases. This invites other folks to build external modules for a tightly maintained core that focuses on deployment, framework interop and training models in java.

Added new model zoo module called omnihub for dl4j and new samediff models. These can be found here: https://github.com/KonduitAI/omnihub-zoo See more in the new omnihub section.

Migrated the snapshots to sonatype's new repository https://s01.oss.sonatype.org. More context can be found here: https://twitter.com/Brian_Fox/status/1357414532512104448 https://github.com/eclipse/deeplearning4j/pull/9618

Consolidated tests to platform-tests to allow for easy testing of behavior against different backends.

Adds proper support for jetson nano with curated binaries and an updated cuda 10.2

Adds Spark 3 support: https://github.com/eclipse/deeplearning4j/pull/9444

Reduce binary size using selective compilation: https://github.com/eclipse/deeplearning4j/pull/9443

https://github.com/eclipse/deeplearning4j/pull/9451 Remove scala 11 support. Only supporting scala 2.12: https://github.com/eclipse/deeplearning4j/pull/9440

Extensive enhancements for samediff model training: https://github.com/eclipse/deeplearning4j/pull/9501

Nd4j/Samdiff/Libnd4j

Features and Enhancements

Add beginnings of graph optimization framework: https://github.com/eclipse/deeplearning4j/pull/9402
Many onnx model import improvements (add new ops): https://github.com/eclipse/deeplearning4j/pull/9411 https://github.com/eclipse/deeplearning4j/pull/9489 https://github.com/eclipse/deeplearning4j/pull/9475 https://github.com/eclipse/deeplearning4j/pull/9526 https://github.com/eclipse/deeplearning4j/pull/9502 https://github.com/eclipse/deeplearning4j/pull/9587 https://github.com/eclipse/deeplearning4j/pull/9599
Add new op subset frameworks: allows selective inclusion of operations to enable users to reduce binary size: https://github.com/eclipse/deeplearning4j/pull/9443 https://github.com/eclipse/deeplearning4j/pull/9451 https://github.com/eclipse/deeplearning4j/pull/9569
Update onednn to 2.2: https://github.com/eclipse/deeplearning4j/pull/9423 https://github.com/eclipse/deeplearning4j/pull/9425
Add updated jetson nano support: https://github.com/eclipse/deeplearning4j/pull/9432
Enhance codegen exposing more functions for samediff: https://github.com/eclipse/deeplearning4j/pull/9478 https://github.com/eclipse/deeplearning4j/pull/9503 https://github.com/eclipse/deeplearning4j/pull/9500
Add new samediff eager mode (mainly used for model import use cases): https://github.com/eclipse/deeplearning4j/pull/9538 https://github.com/eclipse/deeplearning4j/pull/9535 https://github.com/eclipse/deeplearning4j/pull/9533
Add dimensions as input variables: https://github.com/eclipse/deeplearning4j/pull/9584

Bug Fixes

Update samediff api to allow dimensions as variables
Fix cuda shuffle: https://github.com/eclipse/deeplearning4j/pull/9472 https://github.com/eclipse/deeplearning4j/pull/9459
Fix up conditions/matching: https://github.com/eclipse/deeplearning4j/pull/9551
ImageResize updates to improve compatibility with onnx: https://github.com/eclipse/deeplearning4j/pull/9495
Rewrite compat sparse to dense op: https://github.com/eclipse/deeplearning4j/pull/9566
Fix creation of string scalar ndarrays: https://github.com/eclipse/deeplearning4j/pull/9556
Fix serialization with conv/pooling3d: https://github.com/eclipse/deeplearning4j/pull/9648

Deeplearning4j

Features and Enhancements

Add Spark 3 support: https://github.com/eclipse/deeplearning4j/pull/9553
Added Deconvolution3D for keras import https://github.com/eclipse/deeplearning4j/pull/9399
Add full channels last support for 3d convolutions: https://github.com/eclipse/deeplearning4j/pull/9578

Bug Fixes

Fix confusion matrix count increments: https://github.com/eclipse/deeplearning4j/pull/9553
Fix Conv3D data format serialization: https://github.com/eclipse/deeplearning4j/pull/9648

Datavec

Features and Enhancements

Add LabelsSource to BagOfWordsVectorizer (thanks to XAI!): https://github.com/eclipse/deeplearning4j/pull/9624
Performance enhancement for mnist related datasetiterators: https://github.com/eclipse/deeplearning4j/pull/9612

Bug Fixes

Fix memory leak in datavec-arrow: https://github.com/eclipse/deeplearning4j/pull/9441

Omnihub

Launches new Omnihub module. Allows access to models from: https://github.com/KonduitAI/omnihub-zoo

A pretrained omnihub module will provide access to pretrained samediff and dl4j modules. This will also supplant the old dl4j zoo.

Modules will be made available from a Pretrained class:https://github.com/eclipse/deeplearning4j/blob/feb8eee5eb07239c49a4d14786114dc0394aad4e/omnihub/src/main/java/org/eclipse/deeplearning4j/omnihub/models/Pretrained.java#L30

Python4j

Clean up tests/consolidate tests to platform-tests

1.0.0-M1.1

Highlights

A number of bug fixes following the M1 release, thanks to the feedback from the community, allowed us to quickly sort out a few issues. This is a minor bug fix release to address short comings found with M1. Most fixes were related to keras import, the cnn/rnn helpers, and python4j.

Snapshots will also be published every 2 days automatically now https://github.com/eclipse/deeplearning4j/pull/9355 to get around sonatype ossrh deletion of snapshots every 3 days. This should increase robustness of the snapshots.

Worked around an issue with github actions pre emptively upgrading visual studio breaking the cuda builds: https://github.com/eclipse/deeplearning4j/pull/9364

Added backwards compatibility for centos 6 via a new linux-x86_64-compat classifier enabling use of older glibcs on centos 7:

https://github.com/eclipse/deeplearning4j/pull/9368 https://github.com/eclipse/deeplearning4j/pull/9368 https://github.com/eclipse/deeplearning4j/pull/9373

A number of bugs were fixed with LSTM and CUDNN: https://github.com/eclipse/deeplearning4j/pull/9372

Known issues

https://github.com/eclipse/deeplearning4j/issues/9142 - avoid shuffle operations on gpu. Pre save data on cpu in mini batches. For more help, please post on the forums at https://community.konduit.ai/

Deeplearning4j

Features and Enhancements

Add batch normalization support for RNNs: https://github.com/eclipse/deeplearning4j/pull/9338
Disable old helpers by default https://github.com/eclipse/deeplearning4j/pull/9343
Minor unit test fixes: https://github.com/eclipse/deeplearning4j/pull/9346
Add keras support for cnn 1d NWHC: https://github.com/eclipse/deeplearning4j/pull/9353
Move the warning about version check to tracing so it stops logging this during normal usage confusing users: https://github.com/eclipse/deeplearning4j/pull/9356
Allow 1d convolutions to accept feed forward as input type: https://github.com/eclipse/deeplearning4j/pull/9365
Remove the old benchmark suite and migrate it to contrib: https://github.com/eclipse/deeplearning4j/pull/9374
Remove old MKLDNNLSTM helper (it never fully functioned anyways): https://github.com/eclipse/deeplearning4j/pull/9381

Bug fixes

Fixed an issue with helper reflection ensuring the classes would be loaded properly https://github.com/eclipse/deeplearning4j/pull/9333 https://github.com/eclipse/deeplearning4j/pull/9350
Fix minor workspace activation bug: https://github.com/eclipse/deeplearning4j/pull/9341
Fixed compilation error when running anything more than jdk 8 and NIO buffers: https://github.com/eclipse/deeplearning4j/pull/9351
Move logback to be a test dependency for some modules: https://github.com/eclipse/deeplearning4j/pull/9362
Keras model import fixes for GlobalPooling: https://github.com/eclipse/deeplearning4j/pull/9378 https://github.com/eclipse/deeplearning4j/pull/9384

Nd4j

Features and Enhancements

Add Eigen op as public ensuring easier use when running eigenvalue decomposition https://github.com/eclipse/deeplearning4j/pull/9328

Bug fixes

Fixes minor issue with choice(..) op https://github.com/eclipse/deeplearning4j/pull/9360 thanks to https://github.com/Romira915
Minor applyScalar typo fix: https://github.com/eclipse/deeplearning4j/pull/9385

Datavec

Features and Enhancements

Bug fixes

Fixed serialization bug with StringToTimeTransform: https://github.com/eclipse/deeplearning4j/pull/9377 thanks to community member https://github.com/yumg

Python4j

Features and Enhancements

Made python4j's python path setting more robust by migrating from set path calls to add path calls: https://github.com/eclipse/deeplearning4j/pull/9386

Bug fixes

Fixes bug with numpy import array jvm crashes: https://github.com/eclipse/deeplearning4j/pull/9348

Samediff

Features and Enhancements

Bug fixes

Fixed inconsistent conventions between SameDiffVariable getArr and getArrForName().. https://github.com/eclipse/deeplearning4j/pull/9357

1.0.0-M1

Highlights

In light of the coming 1.0, the project has decided to cut a number of modules before the final release. These modules have not had many users in the past and have created confusion for many users just trying to use a few simple apis. Many of these modules have not been maintained.

There will likely be 1 or 2 more milestone releases before the final 1.0. These should be considered checkpoints.

These modules include:

Arbiter
Jumpy
Datavec modules for video, audio, audio, sound. The computer vision datavec module
will continue to be available.
Tokenizers: The tokenizers for chinese, japanese, korean were imported from other frameworks
and not really updated.
Scalnet, Nd4s: We removed the scala modules due to the small user base. We welcome 3rd party enhancements
to the framework for syntatic sugar such as kotlin and scala. The framework's focus will be on providing
the underlying technology rather than the defacto interfaces. If there is interest in something higher level, please discuss it on

ARM support: We have included armcompute modules for core convolution routines. These routines can be found

TVM: We now support running TVM modules. Docs coming soon.

We've updated our shaded modules to newer versions to mitigate security risks. These modules include: 1. jackson 2. guava

Cuda 11: We've upgraded dl4j and associated modules to support cuda 11 and 11.2.

A more modular model import framework supporting tensorflow and onnx: 1. Model mapping procedures loadable as protobuf 2. Defining custom rules for import to work around unsupported or custom layers/operations 3. Op descriptor for all operations in nd4j

This will enable users to override model import behavior to run their own custom models. This means, in most circumstances, there will be no need to modify model import core code anymore. Instead, users will be able to provide definitions and custom rules for their graphs.

Users will be expected to convert their models in an external process. This means running standalone conversions for their models. This extends to keras import as well. Sometimes users convert their models in production directly from keras.

The workflow going forward is to ensure that your model is converted ahead of time to avoid performance issues with converting large models.

Removed ppc from nd4j-native-platform and nd4j-cuda-platform. If you need this architecture, please contact us or build from source.

Added more support for avx/mkldnn/cudnn linked acceleration in our c++ library. We now have the ability to distribute more combinations of pre compiled math kernels via different combinations of classifiers. See the for more details.

. This is useful for OSGI and application server environments.

We've upgraded arrow to 4.0.0 enabling the associated nd4j-arrow and datavec-arrow modules to be used without netty clashes.

Deeplearning4j

Bug fixes

Improved keras model import support for NWHC as well as NCHW input formats for both rnn and cnn

Nd4j

Features and Enhancements

: We now have basic support for CTC loss in nd4j. This will enable the import of CTC loss based models for speech recognition as well as OCR.

Bug fixes

Python4j

Features and Enhancements

Rewritten and more stable python execution. This allows better support for multi threaded environments.

Bug fixes

Contributors:

0.9.1

Deeplearning4J

Fixed issue with incorrect version dependencies in 0.9.0
Added EmnistDataSetIterator
Numerical stability improvements to LossMCXENT / LossNegativeLogLikelihood with softmax (should reduce NaNs with very large activations)

ND4J

Added runtime version checking for ND4J, DL4J, RL4J, Arbiter, DataVec

Known Issues

Deeplearning4j: Use of Evaluation class no-arg constructor (i.e., new Evaluation()) can result in accuracy/stats being reported as 0.0. Other Evaluation class constructors, and ComputationGraph/MultiLayerNetwork.evaluate(DataSetIterator) methods work as expected.
- This also impacts Spark (distributed) evaluation: workaround is to replace sparkNet.evaluate(testData); with sparkNet.doEvaluation(testData, 64, new Evaluation(10))[0];, where 10 is the number of classes and 64 in the evaluation minibatch size to use.
SequenceRecordReaderDataSetIterator applies preprocessors (such as normalization) twice to each DataSet (possible workaround: use RecordReaderMultiDataSetIterator + MultiDataSetWrapperIterator)
TransferLearning: ComputationGraph may incorrectly apply l1/l2 regularization (defined in FinetuneConfiguration) to frozen layers. Workaround: set 0.0 l1/l2 on FineTuneConfiguration, and required l1/l2 on new/non-frozen layers directly. Note that MultiLayerNetwork with TransferLearning appears to be unaffected.

0.9.0

Deeplearning4J

Workspaces feature added (faster training performance + less memory) Link
SharedTrainingMaster added for Spark network training (improved performance) Link 1, Link 2
ParallelInference added - wrapper that server inference requests using internal batching and queues Link
ParallelWrapper now able to work with gradients sharing, in addition to existing parameters averaging mode Link
VPTree performance significantly improved
CacheMode network configuration option added - improved CNN and LSTM performance at the expense of additional memory use Link
LSTM layer added, with CuDNN support Link (Note that the existing GravesLSTM implementation does not support CuDNN)
New native model zoo with pretrained ImageNet, MNIST, and VGG-Face weights Link
Convolution performance improvements, including activation caching
Custom/user defined updaters are now supported Link
Evaluation improvements
- EvaluationBinary, ROCBinary classes added: for evaluation of binary multi-class networks (sigmoid + xent output layers) Link
- Evaluation and others now have G-Measure and Matthews Correlation Coefficient support; also macro + micro-averaging support for Evaluation class metrics Link
- ComputationGraph and SparkComputationGraph evaluation convenience methods added (evaluateROC, etc)
- ROC and ROCMultiClass support exact calculation (previous: thresholded calculation was used) Link
- ROC classes now support area under precision-recall curve calculation; getting precision/recall/confusion matrix at specified thresholds (via PrecisionRecallCurve class) Link
- RegressionEvaluation, ROCBinary etc now support per-output masking (in addition to per-example/per-time-step masking)
- EvaluationCalibration added (residual plots, reliability diagrams, histogram of probabilities) Link 1 Link 2
- Evaluation and EvaluationBinary: now supports custom classification threshold or cost array Link
Optimizations: updaters, bias calculation
Network memory estimation functionality added. Memory requirements can be estimated from configuration without instantiating networks Link 1 Link 2
New loss functions:
- Mixture density loss function Link
- F-Measure loss function Link

ND4J

Workspaces feature added Link
Native parallel sort was added
New ops added: SELU/SELUDerivative, TAD-based comparisons, percentile/median, Reverse, Tan/TanDerivative, SinH, CosH, Entropy, ShannonEntropy, LogEntropy, AbsoluteMin/AbsoluteMax/AbsoluteSum, Atan2
New distance functions added: CosineDistance, HammingDistance, JaccardDistance

DataVec

MapFileRecordReader and MapFileSequenceRecordReader added Link 1 Link 2
Spark: Utilities to save and load JavaRDD<List<Writable>> and JavaRDD<List<List<Writable>> data to Hadoop MapFile and SequenceFile formats Link
TransformProcess and Transforms now support NDArrayWritables and NDArrayWritable columns
Multiple new Transform classes

Arbiter

Arbiter UI: Link
- UI now uses Play framework, integrates with DL4J UI (replaces Dropwizard backend). Dependency issues/clashing versions fixed.
- Supports DL4J StatsStorage and StatsStorageRouter mechanisms (FileStatsStorage, Remote UI via RemoveUIStatsStorageRouter)
- General UI improvements (additional information, formatting fixes)

0.8.0

0.8.0 -> 0.9.0 Transition Notes

Deeplearning4j

Updater configuration methods such as .momentum(double) and .epsilon(double) have been deprecated. Instead: use .updater(new Nesterovs(0.9)) and .updater(Adam.builder().beta1(0.9).beta2(0.999).build()) etc to configure

DataVec

CsvRecordReader constructors: now uses characters for delimiters, instead of Strings (i.e., ',' instead of ",")

Arbiter

Arbiter UI is now a separate module, with Scala version suffixes: arbiter-ui_2.10 and arbiter-ui_2.11

Version 0.8.0

Added transfer learning API Link
Spark 2.0 support (DL4J and DataVec; see transition notes below)
New layers
- Global pooling (aka "pooling over time"; usable with both RNNs and CNNs) Link
- Center loss output layer Link
- 1D Convolution and subsampling layers Link Link2
- ZeroPaddingLayer Link
New ComputationGraph vertices
- L2 distance vertex
- L2 normalization vertex
Per-output masking is now supported for most loss functions (for per output masking, use a mask array equal in size/shape to the labels array; previous masking functionality was per-example for RNNs)
L1 and L2 regularization can now be configured for biases (via l1Bias and l2Bias configuration options)
Evaluation improvements:
- DL4J now has an IEvaluation class (that Evaluation, RegressionEvaluation, etc all implement. Also allows custom evaluation on Spark) Link
- Added multi-class (one vs. all) ROC: ROCMultiClass Link
- For both MultiLayerNetwork and SparkDl4jMultiLayer: added evaluateRegression, evaluateROC, evaluateROCMultiClass convenience methods
- HTML export functionality added for ROC charts Link
- TSNE re-added to new UI
- Training UI: now usable without an internet connection (no longer relies on externally hosted fonts)
- UI: improvements to error handling for ‘no data’ condition
Epsilon configuration now used for Adam and RMSProp updaters
Fix for bidirectional LSTMs + variable-length time series (using masking)
Added CnnSentenceDataSetIterator (for use with ‘CNN for Sentence Classification’ architecture) Link Link2
Spark + Kryo: now test serialization + throw exception if misconfigured (instead of logging an error that can be missed)
MultiLayerNetwork now adds default layer names if no name is specified
DataVec:
- JSON/YAML support for DataAnalysis, custom Transforms etc
- ImageRecordReader refactored to reduce garbage collection load (hence improve performance with large training sets)
- Faster quality analysis.
Arbiter: added new layer types to match DL4J
- Performance improvement for Word2Vec/ParagraphVectors tokenization & training.
Batched inference introduced for ParagraphVectors
Nd4j improvements
- New native operations available for ND4j: firstIndex, lastIndex, remainder, fmod, or, and, xor.
- OpProfiler NAN_PANIC & INF_PANIC now also checks result of BLAS calls.
- Nd4.getMemoryManager() now provides methods to tweak GC behavior.
Alpha version of parameter server for Word2Vec/ParagraphVectors were introduced for Spark. Please note: It’s not recommended for production use yet.
Performance improvements for CNN inference

0.7.2 -> 0.8.0 Transition Notes

Spark versioning schemes: with the addition of Spark 2 support, the versions for Deeplearning4j and DataVec Spark modules has changed
- For Spark 1: use <version>0.8.0_spark_1</version>
- For Spark 2: use <version>0.8.0_spark_2</version>
- Also note: Modules with Spark 2 support are released with Scala 2.11 support only. Spark 1 modules are released with both Scala 2.10 and 2.11 support

0.8.0 Known Issues (At Launch)

UI/CUDA/Linux issue: Link
Dirty shutdown on JVM exit is possible for CUDA backend sometimes: Link
Issues with RBM implementation Link
Keras 1D convolutional and pooling layers cannot be imported yet. Will be supported in forthcoming release.
Keras v2 model configurations cannot be imported yet. Will be supported in forthcoming release.

0.7.2

Added variational autoencoder Link
Activation function refactor
- Activation functions are now an interface Link
- Configuration now via enumeration, not via String (see examples - Link)
- Custom activation functions now supported Link
- New activation functions added: hard sigmoid, randomized leaky rectified linear units (RReLU)
Multiple fixes/improvements for Keras model import
Added P-norm pooling for CNNs (option as part of SubsamplingLayer configuration)
Iteration count persistence: stored/persisted properly in model configuration + fixes to learning rate schedules for Spark network training
LSTM: gate activation function can now be configured (previously: hard-coded to sigmoid)
UI:
- Added Chinese translation
- Fixes for UI + pretrain layers
- Added Java 7 compatible stats collection compatibility Link
- Improvements in front-end for handling NaNs
- Added UIServer.stop() method
- Fixed score vs. iteration moving average line (with subsampling)
Solved Jaxb/Jackson issue with Spring Boot based applications
RecordReaderDataSetIterator now supports NDArrayWritable for the labels (set regression == true; used for multi-label classification + images, etc)

0.7.1 -> 0.7.2 Transition Notes

Activation functions (built-in): now specified using Activation enumeration, not String (String-based configuration has been deprecated)

0.7.1

RBM and AutoEncoder key fixes:
- Ensured visual bias updated and applied during pretraining.
- RBM HiddenUnit is the activation function for this layer; thus, established derivative calculations for backprop according to respective HiddenUnit.
RNG performance issues fixed for CUDA backend
OpenBLAS issues fixed for macOS, powerpc, linux.
DataVec is back to Java 7 now.
Multiple minor bugs fixed for ND4J/DL4J

0.7.0

UI overhaul: new training UI has considerably more information, supports persistence (saving info and loading later), Japanese/Korean/Russian support. Replaced Dropwizard with Play framework.
Import of models configured and trained using
- Imports both Keras model and
- Supported models: models
- Supported : Dense, Dropout, Activation, Convolution2D, MaxPooling2D, LSTM
Added ‘Same’ padding more for CNNs (ConvolutionMode network configuration option)
Weighted loss functions: Loss functions now support a per-output weight array (row vector)
ROC and AUC added for binary classifiers
Improved error messages on invalid configuration or data; improved validation on both
Added metadata functionality: track source of data (file, line number, etc) from data import to evaluation. Loading a subset of examples/data from this metadata is now supported.
Removed Jackson as core dependency (shaded); users can now use any version of Jackson without issue
Added LossLayer: version of OutputLayer that only applies loss function (unlike OutputLayer: it has no weights/biases)
Functionality required to build triplet embedding model (L2 vertex, LossLayer, Stack/Unstack vertices etc)
Reduced DL4J and ND4J ‘cold start’ initialization/start-up time
Pretrain default changed to false and backprop default changed to true. No longer needed to set these when setting up a network configuration unless defaults need to be changed.
Added TrainingListener interface (extends IterationListener). Provides access to more information/state as network training occurs
Numerous bug fixes across DL4J and ND4J
Performance improvements for nd4j-native & nd4j-cuda backends
Standalone Word2Vec/ParagraphVectors overhaul:
- Performance improvements
- ParaVec inference available for both PV-DM & PV-DBOW
- Parallel tokenization support was added, to address computation-heavy tokenizers.
Native RNG introduced for better reproducibility within multi-threaded execution environment.
Additional RNG calls added: Nd4j.choice(), and BernoulliDistribution op.
Off-gpu storage introduced, to keep large things, like Word2Vec model in host memory. Available via WordVectorSerializer.loadStaticModel()
Two new options for performance tuning on nd4j-native backend: setTADThreshold(int) & setElementThreshold(int)

0.6.0 -> 0.7.0 Transition Notes

Notable changes for upgrading codebases based on 0.6.0 to 0.7.0:

UI: new UI package name is deeplearning4j-ui_2.10 or deeplearning4j-ui_2.11 (previously: deeplearning4j-ui). Scala version suffix is necessary due to Play framework (written in Scala) being used now.
Histogram and Flow iteration listeners deprecated. They are still functional, but using new UI is recommended
DataVec ImageRecordReader: labels are now sorted alphabetically by default before assigning an integer class index to each - previously (0.6.0 and earlier) they were according to file iteration order. Use .setLabels(List) to manually specify the order if required.
CNNs: configuration validation is now less strict. With new ConvolutionMode option, 0.6.0 was equivalent to ‘Strict’ mode, but new default is ‘Truncate’
- See ConvolutionMode javadoc for more details:
Xavier weight initialization change for CNNs and LSTMs: Xavier now aligns better with original Glorot paper and other libraries. Xavier weight init. equivalent to 0.6.0 is available as XAVIER_LEGACY
DataVec: Custom RecordReader and SequenceRecordReader classes require additional methods, for the new metadata functionality. Refer to existing record reader implementations for how to implement these methods.
Word2Vec/ParagraphVectors:
- Few new builder methods:
  - allowParallelTokenization(boolean)
  - useHierarchicSoftmax(boolean)
- Behaviour change: batchSize: now batch size is ALSO used as threshold to execute number of computational batches for sg/cbow

0.6.0

Custom layer support
Support for custom loss functions
Support for compressed INDArrays, for memory saving on huge data
Native support for BooleanIndexing where applicable
Initial support for combined operations on CUDA
Significant performance improvements on CPU & CUDA backends
Better support for Spark environments using CUDA & cuDNN with multi-gpu clusters
New UI tools: FlowIterationListener and ConvolutionIterationListener, for better insights of processes within NN.
Special IterationListener implementation for performance tracking: PerformanceListener
Inference implementation added for ParagraphVectors, together with option to use existing Word2Vec model
Severely decreased file size on the deeplearnning4j api
nd4j-cuda-8.0 backend is available now for cuda 8 RC
Added multiple new built-in loss functions
Custom preprocessor support
Performance improvements to Spark training implementation
Improved network configuration validation using InputType functionality

0.5.0

FP16 support for CUDA
Better performance for multi-gpu
Including optional P2P memory access support
Normalization support for time series and images
Normalization support for labels
Removal of Canova and shift to DataVec: Javadoc, Github Repo
Numerous bug fixes
Spark improvements

0.4.0

Initial multi-GPU support viable for standalone and Spark.
Refactored the Spark API significantly
Added CuDNN wrapper
Performance improvements for ND4J
Introducing DataVec: Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)
New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator
New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively
New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs
Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph
Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint

1.00-M2.2

Multi-Project

Tutorials

How To Guides

Import in to your favorite IDE

Pre requisites

Ensure that you clone the deeplearning4j project locally.

git clone https://github.com/eclipse/deeplearning4j

Before importing the project, a few things of note no matter what IDE you use:

One submodule (libnd4j) is a c++ project that uses maven to invoke a cmake build. You may wish to edit libnd4j separately in a cmake oriented IDE like VS Code, Clion, or Eclipse c/c++. In order to build a particular nd4j backend, libnd4j should already be compiled. By default, relevant nd4j backends all look for a pre compiled libnd4j in the libnd4j directory included within the same project.
Maven profiles for deeplearning4j matter a lot. Especially if you want to run tests. Read more on the test profiles here. For most code nd4j-tests-cpu should probably be the main profile you use.
Deeplearning4j uses lombok for its dependencies. Ensure you install lombok for your favorite IDE in order to use the project. Please follow the baeldung guide for setting this up in your IDE.

Intellij

Once cloned locally, open intellij. Please follow the guide here to import from external maven sources.

Once imported, please give the project time to download associated dependencies. You can verify the status of the project in the bottom right corner.

In order to enable the project to work, the following modifications need to be made.

Shaded modules

Eclipse Deeplearning4j has a set of shaded modules. Shaded modules are artifacts that re namespace a dependency to a different location in order to use it as a set of private dependencies that do not clash with other libraries that may also share the dependency.

Intellij does not handle this very well. In order to work around this, you need to exclude all projects under the nd4j/nd4j-shade folder individually. Right click on each folder. Go to Maven -> Ignore Projects.

Assuming you follow the other steps above (lombok,libdn4j,..) then you should be able to run any module you want.

Eclipse

Note: for now the latest version of eclipse appears to fail upon first import. Any suggestions maybe reported on the community forums.

Once cloned locally, open eclipse. Please follow the guide here to import from external maven sources. Importing your project in to eclipse may take a while. Of note is due to the profile sensitive nature of the deeplearning4j suite, there maybe issues when opening and building the project.

When first finishing import of the project, a number of maven connector errors should be highlighted. Afterwards, just click resolve all later and finish. Let eclipse finish downloading sources and javadoc.

As of the latest version of eclipse, build errors may occur.

Contribute

How to contribute to the Eclipse Deeplearning4j source code.

Prerequisites

Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j . These include:

DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.
ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.
DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.
RL4J: Reinforcement Learning for Java. This set of libraries contains the ability to do reinforcement learning built on the deeplearning4j library.
Samediff: Built within the nd4j library, this library contains a tensorflow/pytorch like library for building data flow graphs.

We also have an extensive examples repository at .

Ways to contribute

There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:

Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)
Add a new training feature
Bug fixes
DL4J examples: Is there an application or network architecture that we don’t have examples for?
Testing performance and identifying bottlenecks or areas to improve
Improve website documentation (or write tutorials, etc)
Improve the JavaDocs

There are a number of different ways to find things to work on. These include:

Looking at the issue trackers:
Reviewing our Roadmap
Talking to the developers on the
Reviewing recent papers and blog posts on training features, network architectures and applications
Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?

General guidelines

Before you dive in, there’s a few things you need to know. In particular, the tools we use:

Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.
Git: the version control system we use
Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE
VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.
IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.

Things to keep in mind:

Code should be Java 7 compliant
If you are adding a new method or class: add JavaDocs
You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)
Provide informative comments throughout your code. This helps to keep all code maintainable.
Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.
If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct
If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.
If you are unsure about something - ask us on the !

Eclipse Contributors

IP/Copyright requirements for Eclipse Foundation Projects

This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository: https://github.com/eclipse/deeplearning4j

Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:

Sign the Eclipse Contributor Agreement (once)
Sign commits (each time)

Why Is This Required?

These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here: https://projects.eclipse.org/

By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.

By signing your commits, you are asserting that the code in that particular commit is your own.

Signing the Eclipse Contributor Agreement

You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:

Step 1: Sign up for an Eclipse account

This can be done at https://accounts.eclipse.org/user/register

Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).

Step 2: Sign the ECA

Go to https://accounts.eclipse.org/user/eca and follow the instructions.

Signing Your Commits

Signing a New Commit

There are a few ways to sign commits. Note that you can use any of these aoptions.

Option 1: Use -s When Committing on Command Line

Signing commits here is simple:

git commit -s -m "My signed commit"

Note the use of -s (lower case s) - upper-case S (i.e., -S) is for GPG signing (see below).

Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing

For example, you could set up the following alias in Bash:

alias gcm='git commit -s -m'

Then committing would be done with the following:

gcm "My Commit"

For Windows command line, similar options are available through a few mechanisms (see here)

One simple way is to create a gcm.bat file with the following contents, and add it to your system path:

@echo off
echo.
git commit -s -m %*

You can then commit using the same process as above (i.e., gcm "My Commit")

Option 3: Use GPG Signing

For details on GPG signing, see this link

Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m' - note the upper case -S for GPG signing.

Option 4: Commit using IntelliJ with Auto Signing

IntelliJ can be used to perform git commits, including through signed commits. See this page for details.

Checking If A Commit Is Signed

After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1 to show the signature for the last commit (use -5 to show the last 5 commits, for example)

The output will look like:

$ git log --show-signature -2
commit 81681455918371e29da1490d3f0ca3deecaf0490 (HEAD -> commit_test_branch)
Author: YourName <[email protected]>
Date:   Fri Jun 21 22:27:50 2019 +1000

    This commit is unsigned

commit 2349c6aa3497bd65866d7d0a18fe82bb691bb868
Author: YourName <[email protected]>
Date:   Fri Jun 21 21:42:38 2019 +1000

    My signed commit

    Signed-off-by: YourName <[email protected]>

The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by).

If You Forget to Sign a Commit - Amending the Last Commit

If you forgot to sign the last commit, you can use the following command:

git commit --amend --signoff

If You Forget to Sign Multiple Commits

Suppose your branch has 3 new commits, all of which are unsigned:

$ git log -4 --oneline
4b164026 (HEAD -> commit_test_branch) Your new commit 3
d7799615 Your new commit 2
6bb6113a Your new commit 1
ef09606c This commit already exists

One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)

git reset --soft HEAD~3
git commit -s -m "Squashed and signed"

The result:

$ git log -2 --oneline
31658e11 (HEAD -> commit_test_branch) Squashed and signed
ef09606c This commit already exists

You can confirm that the commit is signed using git log -1 --show-signature as shown earlier.

Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.

If you are updating an existing PR, you may need to force push using -f (as in git push X -f).

Developer Docs

Javacpp

DL4J and Javacpp

DL4J and Javacpp overview

DL4J heavily depends on javacpp for its interop between java and platform optimized c++ libraries. However, due to our usage of JNI this comes with certain complexities in the build anyone should be aware of.

The following modules rely on javacpp as part of their build process: 1. nd4j-native 2. nd4j-native-presets 3. nd4j-cuda 4. nd4j-cuda-presets

Each of these libraries are what comprise our nd4j backends. Leveraging [libnd4j], javacpp handles linking each nd4j-backend against the libnd4j c++ codebase. This linking is done using a libnd4j home. This will contain all of the include files and necessary binary files for specific platforms. By default, nd4j backends and the libnd4j code base are compiled within the same build step. This is the recommended default, but for specific circumstances. A libnd4j release is also uploaded to maven central as a zip file and can be used in place of libnd4j compilation. See our Github actions overview libnd4jUrl parameter for more information on this.

Each backend consists of 2 modules

The codebase: This represents the actual nd4j backend logic for specific platforms. Conceptually, this logic will be anything that a developer should need to control such as memory management, environment variables, or other execution logic.
The presets: This is a similar concept in spirit to the official javacpp presets In order to avoid a race condition between the backend and the presets compilation, this is a separate dependency that just exists to handle interop between the libnd4j code base and the java frontend. The above backend then contains the rest of the logic needed for execution of the math operations on specific platforms.

Compilation flow

After a libnd4j build is executed for a specific platform, we need to leverage javacpp to actually link against libnd4j to create a complete libnd4j backend. When invoking a maven build, the javacpp maven plugin is used to actually invoke a build. The presets will be compiled first. Generally the presets are just 1 or 2 classes containing a description of how to map the actual nd4j code base to the libnd4j codebase.

Next, the actual backend is compiled with a dependency on the above presets code base. The javacpp plugin will leverage the description from the presets we specify as a dependency and facilitate linking against a LIBND4J_HOME (a folder which contains the platform specific libnd4j binaries and include sources) specified by the user. In the actual plugin declaration on the backend pom.xml we include the target presets class to use for our particular backend.

Note: This still requires the native platform specific tools to be installed since binaries are generated for each platform. Please see our github actions for instructions on specific platforms.

-platform dependencies

Nd4j reuses javacpp's notion of a -platform library. This is a curated set of dependencies most users will use as part of a build. Each backend will have an associated -platform artifact so users don't have to deal with maven classifiers. See docs from javacpp for how to leverage this artifact.

Caution to users: By default, this means that a large number of dependencies for all platforms will be included. If you do not need dependencies for all platforms, then please read the above documentation to figure out how to build a jar for your specific platform.

Generally, the main thing to know is when you build your application, use:

mvn -Djavacpp.platform=your-target-platform

A comprehensive list of classifiers can be found here Note that each library we link against such as openblas will also have a similar set of classifiers.

Javacpp platform specific profiles

Throughout the dl4j pom.xml files, platform specific profiles that setup dependencies exist. An example can be found here. This helps us dynamically figure out which platform someone is building for.

Running javacpp on termux + android/lineagos

A testing setup the team uses for testing android involves lineageos, termux, and some arm32 based open jdk debian files that can be found here

In order to bootstrap this environment, a from scratch install of the latest lineageos flashed on an sd card using the raspberry pi is suggested.

Afterwards, install

In order to properly setup the test environment,

you need to execute your test from the command line as follows:

mvn -DargLine="org.bytedeco.javacpp.pathsfirst=true -Djavacpp.platform=android-arm" -Dorg.bytedeco.javacpp.pathsfirst=true -Djavcpp.platform=android-arm clean test

A proper execution environment after the above jdk is installed involves manually setting the environment as follows:

export JAVA_HOME=/data/data/com.termux/files/usr/lib/jvm/openjdk-9
export PATH=$PATH:$HOME/apache-maven-3.8.1/bin
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$JAVA_HOME/lib/:$JAVA_HOME/lib/jli"
export MAVEN_OPTS="-Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=ture -Dmaven.wagon.http.ssl.ignore.validity.dates=true"

This will setup the jdk + maven to ignore ssl errors due to issues with cacerts + termux. This is largely irrelevant for our small testing use case, but not recommended for production environments.

Redist artifacts

Redist artifacts are easy ways of distributing dependencies without installation.

Note that for the presets that are part of nd4j (nd4j-cuda-presets and nd4j-native-presets) only the latest versions support redist artifacts. The presets preload versions only support pre loading (eg: linking against libraries from the javacpp cache) against the latest version. This is because during pre loading, certain version numbers are checked for.

Release

How to conduct a release to Maven Central

Deeplearning4j has several steps to a release. Below is a brief outline with follow on descriptions.

Compile libnd4j for different cpu architectures
Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date
Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu
Create a staging repository for testing using github actions running manually on each platform
Update the examples to be compatible with the latest release
Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)
to sanity check platform specific numerical bugs using the staging repository
Double check any user related bugs to see if they should block a release
Hit release button
Perform follow up release of -platform projects under same version
Tag release

Compile libnd4j on different cpu architectures

Compiling libnd4j on different cpu architectures ensures there is platform optimized math in c++ for each platform. The is a self contained cmake project that can be run on different platforms. In each there are steps for deploying for each platform.

At the core of compiling from source for libnd4j is a maven pom.xml that is run as part of the overall build process that invokes our with various parameters that then get passed to our overall cmake structure for compilation. This script exists to formalize some of the required parameters for invokving cmake. Any developer is welcome to invoke cmake directly.

Platform compatibility
We currently compile libnd4j on ubuntu 16.04. This means glibc 2.23.
For our cuda builds, we use gcc7.
Users of older glibc versions may need to compile from source. For our standard release, we try to keep it reasonably old, but do not support end of lifed
end of linux distributions for public builds.
Platform specific helpers

Each build of libnd4j links against an accelerated backend for and convolution operations such as , , or The implementations for each platform can be found

Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date

This is a step that just ensures that the dl4j release matches the current state of the dependencies provided by javacpp on maven central. This affects every module including python4j, nd4j-native/cuda, datavec-image, among others. The versions of everything can be found in the top level The general convention is library version followed by a - and the version of javacpp that that version uses.

Of note here is that certain older versions of libraries can use older javacpp versions. It is recommended that that the desired version be up to date if possible. Otherwise, if an older version of javacpp is the only version available, this is generally ok.

Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu

We run all of the major integration tests on the core major platforms where higher end compute is accessible. This is generally a bigger machine. It is expected that some builds can take up to 2 hours depending on the specs of the desired machine.

This step may also involve invoking tests with specific tags if only running a subset of tests is desired. This can be achived using the -Dgroups flag.

Update the examples to be compatible with the latest release

To ensure the examples stay compatible with the current release, we also tag the release version to be the latest version found on maven central. This step may also involve adding or removing examples for new or deprecated features respectivley.

Ensure different classifiers work

Different supported cuda versions with and without cudnn
Onednn and associated classifiers per platform

Android

Ensure testing happens on the android emulator.

Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)

The examples contain a set of tests which just allow us to run maven clean test on a small number of examples. Instead of us picking examples manually, we can just run mvn clean test on any platform we need by just specifying a version of dl4j to depend on and usually a

Generally, sometimes users will raise issues right before a release that can be critical. It is the sole discretion of the maintainers to ask the user to use snapshots or to wait for a follow on version. For certain fixes, we will publish quick bugfix releases. If your team has specific requirements on a release, please contact us on the

Hit release button

This means after , hitting the release button initiating a sync of the staging repository with the desired version to maven central. Sync usually takes 2 hours or less.

Ensure a tag exists

After a release happens, a version update to the stable version + a github tag needs to happen. This is achived in the desktop app by going to: 1. History 2. Right click on target commit you want to tag 3. Click tag 4. Push the revision 5. Update the version back to snapshot after tag.

Testing

How to conduct a release to Maven Central

Parameters for testing

test.heap.size: The heap size used for maven surefire plugin sub processes
test.offheap.size: The off heap size used for maven surefire sub processes. This is very important for
configuration (especially on gpu systems)

Test resources

In order to run the deeplearning4j tests, many pretrained models and other resources are required. Ensure as a dependency on your classpath. It is a big repository that needs to be mvn clean installed in order to run the tests properly. You can do this by adding -Ptestresources to your test execution when running the tests from maven.

Test profiles for enabling nd4j backends

When running deeplearning4j's tests, there are 2 main profiles to be aware of: nd4j-tests-cpu and nd4j-tests-cuda. These each enable running cpu or gpu tests respectively across the whole code base. Please ensure one of these is selected when running tests.

testresources: Used to add the test resources used for nd4j.

Test categories

Deeplearning4j uses' junit 5's tags to categorize tests in to different types. All of the tag names used throughout the code base can be found Nd4j-common-tests is included as a dependency for all tests and has a few reusable utilities used throughout the code base for tests. This makes it a great location to put common utilities we want to use throughout the code base. The tag names are mainly there to categorize tests that can take longer or use more resources so we can avoid running those dynamically depending on the size of the machine we are running tests on.

GPUs and multi threaded boxes

Note when running gpu tests on a box with more than 1 gpu, it can/will run out of memory if test.heap.size is at not at least 4g. Also of note, is when running tests

Build From Source

Instructions to build all DL4J libraries from source.

A reference for building dl4j from source can be found for every platform in our workflows. For maintenance reasons, we would prefer to have a canonical source of up to date build information for users rather than out of date install instructions in this guide. This guide will contain specific long lived tips for how to interpret the workflows and what to consider when building.

For an overview of the GitHub actions workflows see the overview doc

This document will cover the specific components of the build by platform rather than step through what's already in the workflows. If you have suggestions for improving this document, please comment over at the community forums

Core steps:

Building libnd4j for your specific platform
Linking the nd4j backend you want to compile for against libnd4j via JavaCPP
Compiling the rest of the code in to jar files

Key concepts

Libnd4j is a CMake based c++ project that supports running optimized math code on different architectures. Its sole focus is being a tiny self contained library for running math kernels. It can link against optimized BLAS routines, platform specific CNN libraries such as OneDNN and CuDNN, and contains hundreds of math kernels for implementing neural networks and other math routines.
Maven: Maven is the core build tool for deeplearning4j. Understanding maven is key to building deeplearning4j from source
Maven and CMake: For compiling libnd4j, we invoke a buildnativeoperations.sh wrapper script via maven. buildnativeoperations.sh in turn automatically sets up CMake to then build the c++ project
pi_build.sh: This is our build script for embedded and ARM based platforms. It focuses on cross compilation running on a Linux x86 based platform.
buildnativeoperations.sh: The main build script for libnd4j. It initializes CMake and invokes CMake compilation for the user on whatever platform the user is currently on unless the user specifies an alternative platform. Specifying a different platform is possible for android for example.

Building for x86_64

The main considerations for building on x86_64 are:

Whether to compile for avx2 or avx512
Whether to use OpenBLAS or MKL
Whether to link against OneDNN

From there, the normal platform specific libraries should be installed before hand. Up to date install instructions can be found in our CPU builds for Windows, Mac and Linux

Building for ARM

ARM based builds all link against the armcompute library by default and, as mentioned above, use the pi_build.sh script for building libnd4j on specific platforms. Note that pi_build.sh can also be used to compile all of dl4j for a specific project.

pi_build.sh mainly focuses on cross compilation.

In order to properly use the pi_build.sh script, a number of environment variables should be set. Per platform, you can find these environment variables in the final build step under the environment section.

If you would like to compile deeplearning4j on an actual ARM device, please use the normal buildnativeoperations.sh workflow.

Building for CUDA

In order to compile deeplearning4j for a particular version, you must first invoke change-cuda-versions.sh in the root directory:

./change-cuda-versions.sh $YOUR_CUDA_VERSION

This will ensure that all library versions are set to the appropriate version. Ensure that the CUDA toolkit you need is installed. If you intend on using CuDNN, ensure that is also installed correctly. For installing CUDA, consider using our install scripts as a reference if you intend on doing automated installs.

Jetson nano users: please see this thread for successfully compiling deeplearning4j on Jetson nano.

In short: It relies on CUDA 10.0. The JavaCPP presets for CUDA are also only compiled for arm64 for CUDA 10.0. You can find the supported CUDA versions for CUDA 10.0 here If you would like something more up to date, please feel free to contact us over at our forums As of 1.0.0-M1.1 you can also use updated dependencies:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

Note for windows users

We use msys2 for compiling libnd4j. CUDA requires MSVC in order to be installed in order to properly compile CUDA kernels. If you want to compile libnd4j for CUDA from source, please ensure you first invoke the vcvars.bat script in a cmd terminal, then launch msys2 manually. For more specifics, please see our Windows CUDA 11 and 11.2 build files.

Beginners

Road map for beginners new to deep learning.

How Do I Start Using Deep Learning?

Where you start depends on what you already know.

The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model.

In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven.

Below you'll find a list of resources. The sections are roughly organized in the order they will be useful.

Free Machine- and Deep-learning Courses Online

(For those interested in a survey of artificial intelligence.)
(For those interested in image recognition.)

Math

The math involved with deep learning is basically linear algebra, calculus and probility, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).

; Patrick van der Smagt

Programming

If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start.

(Vim is an editor accessible from the command line.)

If you want to jump into deep-learning from here without Java, we recommend and the various Python frameworks built atop it, including and .

Python

Java

Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.

Deeplearning4j

With that under your belt, we recommend you approach Deeplearning4j through its .

Other Resources

Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups .

While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like and .

Reference

Explanation

The core workflow

An overview of the core deeplearning4j workflow

Introduction

An end to end workflow involves the following:

Preparing your data
Normalization
Building a model
Tuning a model
Preparing for deployment

This page will try to cover considerations for each workflow and link to additional resources for how to handle each step that maybe specific to particular people.

Preparing your data

Data always needs to be preprocessed. This means converting data from a raw source of different data types to ndarrays to be processed by a neural network. In the deeplearning4j suite there can be a few ways to do this:

The datavec module: Using a record reader abstraction, data can be read in batches via a data set iterator to train models
Pre process using embedded python code in python4j: using the python ecosystem such as pandas and python opencv, you can embed python scripts and output numpy arrays for training
Custom java code: using 3rd party libraries such as tablesaw and javacv

We recommend the following for the various data types:

CSV: The CSV record reader in datavec is fairly good for this if you have a lot of data. The reason is the record readers assume that the data you are using is too large to fit in memory. If you have a smaller dataset that can fit in memory you can look at our tablesaw example. If you have a large amount of CSV data then our example here should work well.
Images: The native image loader and image record reader based on javacv handles loading images of any format and are easily converted to labeled image datasets. We have a comprehensive image example here.
NLP: The DL4J suite has a core tokenizer api where a user can supply a tokenizer and build an iterator from that. A combination of that interface and something like our BERT iterator allow usage of the latest transformer models. If you are looking for word2vec, then we also have examples for that as well here.
Audio: We do have a midi example here. Audio should be treated as time series. For your workflow, javacpp (which our ndarray library nd4j supports internally) has ffmpeg bindings. Due to licensing restrictions for the project (basically no gpl code) we can not directly include ffmpeg in the project, but you are welcome to ask questions on the community forums.
Video: Dl4j does not directly support video, but does have 3d convolutional layers for processing video frames. It is suggested to use javacv or ffmpeg mentioned above to process video and convert them in to frames. Please use our forums for additional support.

Once you have figured out how you will convert your data, you will need to figure out how to split it up in to training and validation sets. Dl4j allows you to do this in a few ways.

If all of your data is in memory, you can use our dataset api's split test and train api.

An example of that workflow maybe found here. If your data may not fit in memory, it maybe worth looking in to our minibatch pipelines and ways of creating your test train splits over minibatches. Our image examples cover this . For larger input data like images, it is highly suggested to do minibatch partitioning of your data.

Normalization

Once your input data has been created and converted to ndarrays, you still need to decide how to normalize your data. DL4J has a set of normalizers that cover the standard preprocessing, this includes:

Zero mean unit variance
Scale zero to 1 - note that this can also be used to scale to a min and max like for images in this case being between 1 and 255.

Normalizers, like models upcoming can be saved and loaded as part of your pipeline. Models must have their accompanying normalizers even during deployment. An example of serializing normalizers can be found here.

Building a model

Once you have figured out how you will serialize your data as ndarrays you need to figure out how you will want to build your model.

When building a model, you can choose one of the following:

Train a model using the higher level dl4j interface. One quick example can be found here.
Train a model using samediff: lower level but more flexible. An example can be found here.
Import a model from another framework such as tensorflow,keras or pytorch.

If you are going to import a model, there are a few things to be aware of.

Tensorflow import: This uses samediff. Samediff has 2 forms of tensorflow import. The new version is the recommended path forward which uses a more extensible model import framework.
Pytorch: Right now, it is required to import pytorch models via onnx. Please use pytorch's onnx model export to import a pytorch model in to deeplearning4j
Keras: The keras h5 format integration is a bit older and uses the higher level dl4j interface. Keras model import for non sequential models use the computation graph. An example can be found here. Sequential models can be found here.

For more advanced models, it is suggested that the user pick the samediff framework. Going forward, that will be the preferred way to train and run models.

When saving a model, make sure you save it. Note that the higher level dl4j interface and samediff also have different file formats. When saving models, note that normalizers above are saved separately. It is advised to save both separately.

Tuning a model

Tuning a model can be difficult. Our tuning guide can help navigate this. It uses the deeplearning4j ui to monitor the gradients and ensure that they converge quickly. It is recommended to run the dl4j ui in a separate process to avoid dependency clashes. An example of how to run the UI server in a separate process can be found here.

When evaluating models, it is suggested to pair the workflow here with the data set splitting considerations above. Our evaluation API takes in ndarrays and tracks evaluations in bits. An example of the higher level dl4j interface's evaluate call can be shown here.

A samediff model also has a similar evaluate call. In samediff, you pass in an evaluation object in to a training configuration. Results for the validation set will be streamed in to this object. An example can be found here.

Deploying a model

When deploying a machine learning model, the first consideration is to figure out what you are deploying. Generally a model deployment contains:

A normalizer file which is loaded and used during inference
A model file (either a dl4j zip file or a samediff flatbuffers file)
Data pipeline code that converts raw data from production to an appropriate format (usually ndarrays) for consumption by the neural network.

These 3 aspects of a deployment should all be treated as software assets just like code and be versioned. Optionally, a user may want to consider how to implement versioned deployments. There are a number of tools that can handle this.

After a model has been built and deployed, usually the next thing users will want to do are setup the environment in which the model will run. One immediate suggestion is to optimize your dependencies. Since the whole deeplearning4j suite heavily relies on javacpp for its underlying dependencies, this guide is recommended reading as next steps for optimizing your binaries.

Another consideration is performance. Depending on the nd4j backend you pick and the cpus you are deploying on, you may be able to add specialized performance increases such as:

Helpers: Accelerated libraries for faster platform specific math routines including onednn, armcompute, and cudnn.
Avx: We pre compile our binaries for specific intel cpus including avx2 and avx512. Various classifiers are available for developers which can be found here.
Compatibility: if you need to run on a very old linux, we also provide a centos 6 compatible compat classifier.

For building deployment pipelines, it is recommended to use konduit-serving which is built on the same technology and is usually co released alongside deeplearning4j.

If you are going to just be deploying a model embedded in your application, then please remember the above artifacts for a model deployment when including resources for your micro service.

Configuration

Backends

Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.

ND4J works atop so-called backends, or linear-algebra libraries, such as Native nd4j-native and nd4j-cuda-10.2 (GPUs), which you can select by pasting the right dependency into your project’s POM.xml file.

ND4J backends for GPUs and CPUs

You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.

If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:

As of now, the artifactId for the CUDA versions can be one of nd4j-cuda-11.0,nd4j-cuda-11.2. Generally, the last 2 cuda versions are supported for a given release.

You can also find the available CUDA versions via or in the .

Otherwise you will need to use the native implementation of ND4J as a CPU backend:

Building for Multiple Operating Systems

If you are developing your project on multiple operating systems/system architectures, you can add -platform to the end of your artifactId which will download binaries for most major systems.

Bundling multiple Backends

For enabling different backends at runtime, you set the priority with your environment via the environment variable

Relative to the priority, it will allow you to dynamically set the backend type.

CuDNN

See our page on .

CUDA Installation

Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA .

Troubleshooting

Nd4jBackend$NoAvailableBackendException

There are multiple reasons why you might run into this error message.

You haven't configured an ND4J backend at all.
You have a jar file that doesn't contain a backend for your platform.
You have a jar file that doesn't contain service loader files.

You haven't configured any ND4J Backend

Read this page and add a ND4J Backend to your dependencies:

You have a jar file that doesn't contain a backend for your platform.

This happens when you use a non -platform type backend dependency definition. In this case, only the Backend for the system that the jar file was built on will be included.

To solve this issue, use nd4j-native-platform instead of nd4j-native, if you are running on CPU and nd4j-cuda-11.2-platform instead of nd4j-cuda-11.2 when using the GPU backend.

If the jar file only contains the GPU backend, but your system has no CUDA capable (CC >= 3.5) GPU or CUDA isn't installed on the system, the CPU Backend should be used instead.

You have a jar file that doesn't contain service loader files.

ND4J uses the Java in order to detect which backends are available on the class path. Depending on your uberjar packaging configuration, those files might be stripped away or broken.

To double check that the required files are included, open your uberjar and make sure it contains /META-INF/services/org.nd4j.linalg.factory.Nd4jBackend. Then open the file, and make sure there are entries for all of your configured backends.

If your uberjar does not contain that file, or if not all of the configured backends are listed there, you will have to reconfigure your shade plugin. See documentation for how to do that.

CPU

CPU and AVX support in ND4J/Deeplearning4j

What is AVX, and why does it matter?

AVX (Advanced Vector Extensions) is a set of CPU instructions for accelerating numerical computations. See Wikipedia for more details.

Note that AVX only applies to nd4j-native (CPU) backend for x86 devices, not GPUs and not ARM/PPC devices.

Why AVX matters: performance. You want to use the version of ND4J compiled with the highest level of AVX supported by your system.

AVX support for different CPUs - summary:

Most modern x86 CPUs: AVX2 is supported
Some high-end server CPUs: AVX512 may be supported
Old CPUs (pre 2012) and low power x86 (Atom, Celeron): No AVX support (usually)

Note that CPUs supporting later versions of AVX include all earlier versions also. This means it's possible run a generic x86 or AVX2 binary on a system supporting AVX512. However it is not possible to run binaries built for later versions (such as avx512) on a CPU that doesn't have support for those instructions.

In version 1.0.0-beta6 and later you may get a warning as follows, if AVX is not configured optimally:

*********************************** CPU Feature Check Warning ***********************************
Warning: Initializing ND4J with Generic x86 binary on a CPU with AVX/AVX2 support
Using ND4J with AVX/AVX2 will improve performance. See deeplearning4j.org/cpu for more details
Or set environment variable ND4J_IGNORE_AVX=true to suppress this warning
************************************************************************************************

This warning has been removed in more recent versions as it's more confusing to users and out of date.

Configure mkl usage

When using the nd4j-native backend on intel platforms, our openblas bindings give the ability to also use mkl instead. In order to use mkl, set the system property as follows eitehr on launch or before Nd4j is initialized with Nd4j.create():

 System.setProperty("org.bytedeco.openblas.load", "mkl");

Configuring AVX in ND4J/DL4J

As noted earlier, for best performance you should use the version of ND4J that matches your CPU's supported AVX level.

ND4J defaults configuration (when just including the nd4j-native or nd4j-native-platform dependencies without maven classifier configuration) is "generic x86" (no AVX) for nd4j/nd4j-platform dependencies.

To configure AVX2 and AVX512, you need to specify a classifier for the appropriate architecture.

The following binaries (nd4j-native classifiers) are provided for x86 architectures:

Generic x86 (no AVX): linux-x86_64, windows-x86_64, macosx-x86_64
AVX2: linux-x86_64-avx2, windows-x86_64-avx2, macosx-x86_64-avx2
AVX512: linux-x86_64-avx512

As of 1.0.0-M1, the following combinations are also possible with onednn:

Generic x86 (no AVX): linux-x86_64-onednn, windows-x86_64-onednn, macosx-x86_64-onednn
AVX2: linux-x86_64-onednn-avx2, windows-x86_64-onednn-avx2, macosx-x86_64-onednn-avx2
AVX512: linux-x86_64-onednn-avx512

Example: Configuring AVX2 on Windows (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>windows-x86_64-avx2</classifier>
</dependency>

Example: Configuring AVX512 on Linux (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>linux-x86_64-avx512</classifier>
</dependency>

Example: Configuring AVX512 on Linux with onednn(Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>linux-x86_64-onednn-avx512</classifier>
</dependency>

Note that you need both nd4j-native dependencies - with and without the classifier.

In the examples above, it is assumed that a Maven property nd4j.version is set to an appropriate ND4J version such as 1.0.0-M1.1

Cudnn

Using the NVIDIA cuDNN library with DL4J.

Using Deeplearning4j with cuDNN

There are 2 ways of using cudnn with deeplearning4j. One is an older way described below that is built in to the various deeplearning4j layers at the java level.

The other is to use the new nd4j cuda bindings that link to cudnn at the c++ level. Both will be described below. The newer way first, followed by the old way.

Cudnn setup

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

NVIDIA cuDNN

Note there are multiple combinations of cuDNN and CUDA supported. Deeplearning4j's cuda support is based on javacpp's cuda bindings. The way to read the versioning is: cuda version - cudnn version - javacpp version. For example, if the cuda version is set to 11.2, you can expect us to support cudnn 8.1.

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

Alternatively, in the case of the most recent supported cuda version, cuDNN comes bundled with the "redist" package of the JavaCPP Presets for CUDA. After agreeing to the license, we can add the following dependencies instead of installing CUDA and cuDNN:

 <dependency>
     <groupId>org.bytedeco</groupId>
     <artifactId>cuda-platform-redist</artifactId>
     <version>$CUDA_VERSION-$CUDNN_VERSIUON-$JAVACPP_VERSION</version>
 </dependency>

The same versioning scheme for redist applies to the cuda bindings that leverage an installed cuda.

Using cuDNN via nd4j

Similar to our avx bindings, nd4j leverages our c++ library libnd4j for running mathematical operations. In order to use cudnn, all you need to do is change the cuda backend dependency from:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.0</artifactId>
    <version>1.0.0-M1</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

For jetson nano cuda 10.2:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
  <version>linux-arm64</version>
</dependency>

Note that we are only adding an additional dependency. The reason we use an additional classifier is to pull in an optional dependency on cudnn based routines. The default does not use cudnn, but instead built in standalone routines for various operations implemented in cudnn such as conv2d and lstm.

For users of the -platform dependencies such as nd4j-cuda-11.2-platform, this classifier is still required. The -platform dependencies try to set sane defaults for each platform, but give users the option to include whatever they want. If you need optimizations, please become familiar with this.

Using cudnn via deeplearning4j

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-11.0, or deeplearning4j-cuda-11.2, for example:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.0</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the NO_WORKSPACE mode settable via the network configuration, instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

    // for the whole network
    new NeuralNetConfiguration.Builder()
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...
    // or separately for each layer
    new ConvolutionLayer.Builder(h, w)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...

Memory

Setting available Memory/RAM for a DL4J application

Memory Management for ND4J/DL4J: How does it work?

ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.

"Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.

To manage memory allocations, we use two approaches:

JVM Garbage Collector (GC) and WeakReference tracking
MemoryWorkspaces - see Workspaces guide for details

Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and MemoryWorkspaces approaches is in when and how the memory is released.

For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.
For MemoryWorkspaces: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.

Configuring Memory Limits

With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:

-Xms - this defines how much memory JVM heap will use at application start.
-Xmx - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.
-Dorg.bytedeco.javacpp.maxbytes - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.
-Dorg.bytedeco.javacpp.maxphysicalbytes - this specifies the maximum bytes for the entire process - usually set to maxbytes plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. Unlike setting maxbytes setting maxphysicalbytes is optional. This can also be a percentage (>100%), in which case it would apply to maxMemory.

Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:

-Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=8G -Dorg.bytedeco.javacpp.maxphysicalbytes=10G

Gotchas: A few things to watch out for

With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.
For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.
If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.
If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.
If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. -Xmx8G will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.
In limited memory environments, it's usually a bad idea to use high -Xmx value together with -Xms option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set -Xms14G: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.

Memory-mapped files

ND4J supports the use of a memory-mapped file instead of RAM when using the nd4j-native backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.

Here's sample code:

WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
                .initialSize(1000000000)
                .policyLocation(LocationPolicy.MMAP)
                .build();

try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) {
    INDArray x = Nd4j.create(10000);
}

In this case, a 1GB temporary file will be created and mmap'ed, and NDArray x will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.

GPUs

When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.

We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.

We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.

If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.

Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the Xmx option.

Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM at minimum. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.

It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.

Example:

WorkspaceConfiguration basicConfig = WorkspaceConfiguration.builder()
    .policyAllocation(AllocationPolicy.STRICT)
    .policyLearning(LearningPolicy.FIRST_LOOP)
    .policyMirroring(MirroringPolicy.HOST_ONLY) // <--- this option does this trick
    .policySpill(SpillPolicy.EXTERNAL)
    .build();

It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the INDArray.unsafeDuplication() method.

Workspaces

Workspaces are an efficient model for memory paging in DL4J.

What are workspaces?

ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all INDArrays' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.

The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call INDArray.detach() and you'll get an independent INDArray copy.

Neural Networks

For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards. Thus for most users, no explicit worspaces configuration is required.

To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:

.trainingWorkspaceMode(WorkspaceMode.SEPARATE) and/or .inferenceWorkspaceMode(WorkspaceMode.SINGLE) in your neural network configuration.

The difference between SEPARATE and SINGLE workspaces is a tradeoff between the performance & memory footprint:

SEPARATE is slightly slower, but uses less memory.
SINGLE is slightly faster, but uses more memory.

That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).

With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference. The only exclusion is the output() method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting INDArray from the workspaces, thus providing you with independent INDArray which will be handled by the JVM GC.

Please note: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.

Garbage Collector

If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:

Put that somewhere before your model.fit(...) call.

ParallelWrapper & ParallelInference

For ParallelWrapper, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.

Iterators

We provide asynchronous prefetch iterators, AsyncDataSetIterator and AsyncMultiDataSetIterator, which are usually used internally.

These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first DataSet coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).

Caution: If you’re using a custom iterator or the RecordReader, please make sure you’re not initializing something huge within the first next() call. Do that in your constructor to avoid undesired workspace growth.

Caution: With AsyncDataSetIterator being used, DataSets are supposed to be used before calling the next() DataSet. You are not supposed to store them, in any way, without the detach() call. Otherwise, the memory used for INDArrays within DataSet will be overwritten within AsyncDataSetIterator eventually.

If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: AsyncShieldDataSetIterator and AsyncShieldMultiDataSetIterator. Basically, those are just thin wrappers that prevent prefetch.

Evaluation

Usually, evaluation assumes use of the model.output() method, which essentially returns an INDArray detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:

This piece of code will run a single cycle over iteratorTest, and it will update both (or less/more if required by your needs) IEvaluation implementations without any additional INDArray allocation.

Workspace Destruction

There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.

That could be done like so: Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.

Workspace Exceptions

If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:

DL4J's LayerWorkspaceMgr

DL4J's Layer API includes the concept of a "layer workspace manager".

The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces. For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons. However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.

What does this mean in practice? Usually it's quite simple...

When returning activations (activate(boolean training, LayerWorkspaceMgr workspaceMgr) method), make sure the returned array is defined in ArrayType.ACTIVATIONS (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)
When returning activation gradients (backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)), similarly return an array defined in ArrayType.ACTIVATION_GRAD

You can also leverage an array defined in any workspace to the appropriate workspace using, for example, LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)

Note that if you are not implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use LayerWorkspaceMgr.noWorkspaces().

Build Tools

Configure the build tools for Deeplearning4j.

Configuring your build tool

While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects.

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Gradle

You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:

implementation "org.deeplearning4j:deeplearning4j-core:1.0.0-M1"

Add a backend by adding the following:

implementation "org.nd4j:nd4j-native-platform:1.0.0-M1"

You can also swap the standard CPU implementation for GPUs.

SBT

You can use Deeplearning4j with SBT by adding the following to your build.sbt:

libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-M1"

Add a backend by adding the following:

libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "1.0.0-M1"

You can also swap the standard CPU implementation for GPUs.

Ivy

You can use Deeplearning4j with ivy by adding the following to your ivy.xml:

<dependency org="org.deeplearning4j" name="deeplearning4j-core" rev="1.0.0-M1" conf="build" />

Add a backend by adding the following:

<dependency org="org.nd4j" name="nd4j-native-platform" rev="1.0.0-M1" conf="build" />

You can also swap the standard CPU implementation for GPUs.

Leinengen

Clojure programmers may want to use Leiningen or Boot to work with Maven. A Leiningen tutorial is here.

NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.

Maven

Configure the Maven build tool for Deeplearning4j.

Configuring the Maven build tool

You can use Deeplearning4j with Maven by adding the following to your pom.xml:

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Add a backend

DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your pom.xml:

You can also swap the standard CPU implementation for .

Deeplearning4j

Tutorials

Language Processing

Overview of language processing in DL4J

Although not designed to be comparable to tools such as Stanford CoreNLP or NLTK, deepLearning4J does include some core text processing tools that are described here.

Deeplearning4j's NLP support contains interfaces for different NLP libraries. A user wraps third party libraries via our interfaces. Deeplearning4j as of M1, does not support any 3rd party libraries directly. This is due to the lack of maintenance and custom work needed to make this work well for users. Instead, we expose interfaces to allow users to implement their own tokenizers.

SentenceIterator

There are several steps involved in processing natural language. The first is to iterate over your corpus to create a list of documents, which can be as short as a tweet, or as long as a newspaper article. This is performed by a SentenceIterator, which will appear like this:

The SentenceIterator encapsulates a corpus or text, organizing it, say, as one Tweet per line. It is responsible for feeding text piece by piece into your natural language processor. The SentenceIterator is not analogous to a similarly named class, the DatasetIterator, which creates a dataset for training a neural net. Instead it creates a collection of strings by segmenting a corpus.

Tokenizer

A Tokenizer further segments the text at the level of single words, also alternatively as n-grams. ClearTK contains the underlying tokenizers, such as parts of speech (PoS) and parse trees, which allow for both dependency and constituency parsing, like that employed by a recursive neural tensor network (RNTN).

A Tokenizer is created and wrapped by a . The default tokens are words separated by spaces. The tokenization process also involves some machine learning to differentiate between ambibuous symbols like . which end sentences and also abbreviate words such as Mr. and vs.

Both Tokenizers and SentenceIterators work with Preprocessors to deal with anomalies in messy text like Unicode, and to render such text, say, as lowercase characters uniformly.

Vocab

Each document has to be tokenized to create a vocab, the set of words that matter for that document or corpus. Those words are stored in the vocab cache, which contains statistics about a subset of words counted in the document, the words that "matter". The line separating significant and insignifant words is mobile, but the basic idea of distinguishing between the two groups is that words occurring only once (or less than, say, five times) are hard to learn and their presence represents unhelpful noise.

The vocab cache stores metadata for methods such as Word2vec and Bag of Words, which treat words in radically different ways. Word2vec creates representations of words, or neural word embeddings, in the form of vectors that are hundreds of coefficients long. Those coefficients help neural nets predict the likelihood of a word appearing in any given context; for example, after another word. Here's Word2vec, configured:

Once you obtain word vectors, you can feed them into a deep net for classification, prediction, sentiment analysis and the like.

Doc2Vec

Doc2Vec and arbitrary documents for language processing in DL4J.

The main purpose of Doc2Vec is associating arbitrary documents with labels, so labels are required. Doc2vec is an extension of word2vec that learns to correlate labels and words, rather than words with other words. Deeplearning4j's implentation is intended to serve the Java, Scala and Clojure communities.

The first step is coming up with a vector that represents the "meaning" of a document, which can then be used as input to a supervised machine learning algorithm to associate documents with labels.

In the ParagraphVectors builder pattern, the labels() method points to the labels to train on. In the example below, you can see labels related to sentiment analysis:

    .labels(Arrays.asList("negative", "neutral","positive"))

Here's a full working example of classification with paragraph vectors:

public void testDifferentLabels() throws Exception {
    ClassPathResource resource = new ClassPathResource("/labeled");
    File file = resource.getFile();
    LabelAwareSentenceIterator iter = LabelAwareUimaSentenceIterator.createWithPath(file.getAbsolutePath());

    TokenizerFactory t = new UimaTokenizerFactory();

    ParagraphVectors vec = new ParagraphVectors.Builder()
            .minWordFrequency(1).labels(Arrays.asList("negative", "neutral","positive"))
            .layerSize(100)
            .stopWords(new ArrayList<String>())
            .windowSize(5).iterate(iter).tokenizerFactory(t).build();

    vec.fit();

    assertNotEquals(vec.lookupTable().vector("UNK"), vec.lookupTable().vector("negative"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("positive"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("neutral"));}

Sentence Iterator

Iteration of words, documents, and sentences for language processing in DL4J.

A sentence iterator is used in both Word2vec and Bag of Words.

It feeds bits of text into a neural network in the form of vectors, and also covers the concept of documents in text processing.

In natural-language processing, a document or sentence is typically used to encapsulate a context which an algorithm should learn.

A few examples include analyzing Tweets and full-blown news articles. The purpose of the sentence iterator is to divide text into processable bits. Note the sentence iterator is input agnostic. So bits of text (a document) can come from a file system, the Twitter API or Hadoop.

Depending on how input is processed, the output of a sentence iterator will then be passed to a tokenizer for the processing of individual tokens, which are usually words, but could also be ngrams, skipgrams or other units. The tokenizer is created on a per-sentence basis by a tokenizer factory. The tokenizer factory is what is passed into a text-processing vectorizer.

Some typical examples are below:

SentenceIterator iter = new LineSentenceIterator(new File("your file"));

This assumes that each line in a file is a sentence.

You can also do list of strings as sentence as follows:

Collection<String> sentences = ...;
SentenceIterator iter = new CollectionSentenceIterator(sentences);

This will assume that each string is a sentence (document). Remember this could be a list of Tweets or articles -- both are applicable.

You can iterate over files as follows:

SentenceIterator iter = new FileSentenceIterator(new File("your dir or file"));

This will parse the files line by line and return individual sentences on each one.

For anything complex, we recommend any pipeline that can implement more in depth support than space separated tokens.

Tokenization

Breaking text into individual words for language processing in DL4J.

Notes to write on: 1. Tokenizer factory interface 2. Tokenizer interface 2. How to write your own factory and tokenizer

Tokenization

What is Tokenization?

Tokenization is the process of breaking text down into individual words. Word windows are also composed of tokens. can output text windows that comprise training examples for input into neural nets, as seen here.

Example

Here's an example of tokenization done with DL4J tools:

The above snippet creates a tokenizer capable of stemming.

In Word2Vec, that's the recommended a way of creating a vocabulary, because it averts various vocabulary quirks, such as the singular and plural of the same noun being counted as two different words.

Vocabulary Cache

Mechanism for handling general NLP tasks in DL4J.

The vocabulary cache, or vocab cache, is a mechanism for handling general-purpose natural-language tasks in Deeplearning4j, including normal TF-IDF, word vectors and certain information-retrieval techniques. The goal of the vocab cache is to be a one-stop shop for text vectorization, encapsulating techniques common to bag of words and word vectors, among others.

Vocab cache handles storage of tokens, word-count frequencies, inverse-document frequencies and document occurrences via an inverted index. The InMemoryLookupCache is the reference implementation.

In order to use a vocab cache as you iterate over text and index tokens, you need to figure out if the tokens should be included in the vocab. The criterion is usually if tokens occur with more than a certain pre-configured frequency in the corpus. Below that frequency, an individual token isn't a vocab word, and it remains just a token.

We track tokens as well. In order to track tokens, do the following:

addToken(new VocabWord(1.0,"myword"));

When you want to add a vocab word, do the following:

addWordToIndex(0, Word2Vec.UNK);
putVocabWord(Word2Vec.UNK);

Adding the word to the index sets the index. Then you declare it as a vocab word. (Declaring it as a vocab word will pull the word from the index.)

How To Guides

Custom Layers

Extend DL4J functionality for custom layers.

There are two components to adding a custom layer:

Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer
Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer

The configuration layer ((1) above) class handles the settings. It's the one you would use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom settings here, and use them in your layer.

The implementation layer ((2) above) class has parameters, and handles network forward pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...) method. In other words: the instantiate method is how we go from the configuration to the implementation; MultiLayerNetwork or ComputationGraph will call this method when initializing the

An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the implementation class). Both of these classes have extensive comments regarding their methods.

You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes, etc: the reason is because one is for the configuration, one is for the implementation. We have not followed this "same name" pattern here to hopefully avoid confusion.

Testing Your Custom Layer

Once you have added a custom layer, it is necessary to run some tests to ensure it is correct.

These tests should at a minimum include the following:

Tests to ensure that the JSON configuration (to/from JSON) works correctly
This is necessary for networks with your custom layer to function with both
model serialization (saving) and Spark training.
Gradient checks to ensure that the implementation is correct.

Example

A full custom layer example is available in our examples repository.

Keras Import

Overview of model import.

Deeplearning4j: Keras model import

Keras model import provides routines for importing neural network models originally configured and trained using Keras, a popular Python deep learning library.

Once you have imported your model into DL4J, our full production stack is at your disposal. We support import of all Keras model types, most layers and practically all utility functionality. Please check here for a complete list of supported Keras features.

Note to users: tf.keras models are also supported. Please check here for an overview of what to expect for tf.keras as well as other features. Our documentation needs to be updated to reflect the changes between keras and tf.keras. For now, users should aware of this as you read the below docs. Migrating from keras to tf.keras mainly involves changing the imports in your python script. The equivalent kind of changes needed to happen for the model import in deeplearning4j. Those changes happened in beta7.

Getting started: Import a Keras model in 60 seconds

To import a Keras model, you need to create and serialize such a model first. Here's a simple example that you can use. The model is a simple MLP that takes mini-batches of vectors of length 100, has two Dense layers and predicts a total of 10 categories. After defining the model, we serialize it in HDF5 format.

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('simple_mlp.h5')

If you put this model file (simple_mlp.h5) into the base of your resource folder of your project, you can load the Keras model as DL4J MultiLayerNetwork as follows

This shows only how to import a Keras Sequential model. For more details take a look at both Functional Model import and Sequential Model import.

String simpleMlp = new ClassPathResource("simple_mlp.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp);

That's it! The KerasModelImport is your main entry point to model import and class takes care of mapping Keras to DL4J concepts internally. As user you just have to provide your model file, see our Getting started guide for more details and options to load Keras models into DL4J.

You can now use your imported model for inference (here with dummy data for simplicity)

INDArray input = Nd4j.create(DataType.FLOAT, 256, 100);
INDArray output = model.output(input);

Here's how you do training in DL4J for your imported model:

model.fit(input, output);

The full example just shown can be found in our DL4J examples.

Project setup

To use Keras model import in your existing project, all you need to do is add the following dependency to your pom.xml.

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-modelimport</artifactId>
    <version>1.0.0-beta6</version> // This version should match that of your other DL4J project dependencies.
</dependency>

If you need a project to get started in the first place, consider cloning DL4J examples and follow the instructions in the repository to build the project.

Backend

DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models can be imported into DL4J.

Popular models and applications

We support import for a growing number of applications, check here for a full list of currently covered models. These applications include

Deep convolutional and Wasserstein GANs
UNET
ResNet50
SqueezeNet
MobileNet
Inception
Xception

Troubleshooting and support

An IncompatibleKerasConfigurationException message indicates that you are attempting to import a Keras model configuration that is not currently supported in Deeplearning4j (either because model import does not cover it, or DL4J does not implement the layer, or feature).

Once you have imported your model, we recommend our own ModelSerializer class for further saving and reloading of your model.

You can inquire further by visiting the community forums. You might consider filing a feature request via Github so that this missing functionality can be placed on the DL4J development roadmap or even sending us a pull request with the necessary changes!

Why Keras model import?

Keras is a popular and user-friendly deep learning library written in Python. The intuitive API of Keras makes defining and running your deep learning models in Python easy. Keras allows you to choose which lower-level library it runs on, but provides a unified API for each such backend. Currently, Keras supports Tensorflow, CNTK and Theano backends.

There is often a gap between the production system of a company and the experimental setup of its data scientists. Keras model import allows data scientists to write their models in Python, but still seamlessly integrates with the production stack.

Keras model import is targeted at users mainly familiar with writing their models in Python with Keras. With model import you can bring your Python models to production by allowing users to import their models into the DL4J ecosystem for either further training or evaluation purposes.

You should use this module when the experimentation phase of your project is completed and you need to ship your models to production. Konduit commercial support for Keras implementations in enterprise.

Functional Models

Importing the functional model.

Getting started with importing Keras functional Models

Let's say you start with defining a simple MLP using Keras' functional API:

In Keras there are several ways to save a model. You can store the whole model (model definition, weights and training configuration) as HDF5 file, just the model configuration (as JSON or YAML file) or just the weights (as HDF5 file). Here's how you do each:

If you decide to save the full model, you will have access to the training configuration of the model, otherwise you don't. So if you want to further train your model in DL4J after import, keep that in mind and use model.save(...) to persist your model.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

In case you didn't compile your Keras model, it will not come with a training configuration. In that case you need to explicitly tell model import to ignore training configuration by setting the enforceTrainingConfig flag to false like this:

To load just the model configuration from JSON, you use KerasModelImport as follows:

If additionally you also want to load the model weights with the configuration, here's what you do:

In the latter two cases no training configuration will be read.

Sequential Models

Importing the functional model.

Getting started with importing Keras Sequential models

Let's say you start with defining a simple MLP using Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('full_model.h5')  # save everything in HDF5 format

model_json = model.to_json()  # save just the config. replace with "to_yaml" for YAML serialization
with open("model_config.json", "w") as f:
    f.write(model_json)

model.save_weights('model_weights.h5') # save just the weights.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

String fullModel = new ClassPathResource("full_model.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel);

MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel, false);

To load just the model configuration from JSON, you use KerasModelImport as follows:

String modelJson = new ClassPathResource("model_config.json").getFile().getPath();
MultiLayerNetworkConfiguration modelConfig = KerasModelImport.importKerasSequentialConfiguration(modelJson)

If additionally you also want to load the model weights with the configuration, here's what you do:

String modelWeights = new ClassPathResource("model_weights.h5").getFile().getPath();
MultiLayerNetwork network = KerasModelImport.importKerasSequentialModelAndWeights(modelJson, modelWeights)

In the latter two cases no training configuration will be read.

Custom Layers

How to implement custom Keras layers for import in Deeplearning4J.

Many more advanced models will contain custom layers, i.e. layers that aren't included in Keras.

You can import those models too, but you will have to provide an implementation of that layer yourself, as the exported model file only provides us with a name for it.

Usually, you will have found out about needing to implement a custom layer, when you saw an exception like the following:

Implementing a custom layer for Keras import

There are two ways of implementing a custom layer for Keras import. Which one is the right approach for you, depends on the type of layer you need to implement.

SameDiffLambdaLayer Use this approach if your layer doesn't have any weights and defines just a computation. It is most useful when you have to define a custom layer because you are using a lambda in your model definition. This is the approach you should be using when you've gotten the exception about no lambda layer being found.
KerasLayer Use this approach if your layer needs its own weights. It is most useful when you have to define some complex layer that is more than just a simple computation. This is the approach you should be using when you've gotten the exception about an unsupported layer type.

SameDiffLambdaLayer

Using a SameDiffLambdaLayer is pretty easy. You create a new class that extends it, and override the defineLayer and getOutputType methods.

This simple lambda layer just multiplies its input by 3.

defineLayer will only be called once to create the SameDiff graph that is used as the definition of this layer. Do not use information about the size of the inputs or other non-static sizes, like batch size, when defining the layer, or it may fail later on.

After defining your layer, you have to register it to make it available on import.

The correct name for your lambda layer will depend on the model you are importing. As you, most likely, were made aware of needing to implement the lambda layer by an exception, this exception should have given you the proper name already.

KerasLayer

Implementing a full layer with weights is more complex than defining a lambda layer. You will have to create a new class that extends KerasLayer and that reads the configuration of that layer and defines it appropriately.

For examples on how this was done, take a look at and which are custom layers that were needed to be able to import GoogLeNet.

After you've defined your layer, you will have to register it to make it available on import:

Again, the appropriate name will we apparent from the exception that has notified you about needing to implement the custom layer in the first place.

Advanced Activations

KerasPReLU

[source]

Imports PReLU layer from Keras

KerasPReLU

public KerasPReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getPReLULayer

public PReLULayer getPReLULayer()

Get DL4J ActivationLayer.

return ActivationLayer

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

param weights Dense layer weights

KerasThresholdedReLU

[source]

Imports ThresholdedReLU layer from Keras

KerasThresholdedReLU

public KerasThresholdedReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

public ActivationLayer getActivationLayer()

Get DL4J ActivationLayer.

return ActivationLayer

KerasLeakyReLU

[source]

Imports LeakyReLU layer from Keras

KerasLeakyReLU

public KerasLeakyReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

public ActivationLayer getActivationLayer()

Get DL4J ActivationLayer.

return ActivationLayer

Embedding Layers

KerasEmbedding

[source]

Imports an Embedding layer from Keras.

KerasEmbedding

public KerasEmbedding() throws UnsupportedKerasConfigurationException

Pass through constructor for unit tests

throws UnsupportedKerasConfigurationException Unsupported Keras config

getEmbeddingLayer

public EmbeddingSequenceLayer getEmbeddingLayer()

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

return number of trainable parameters (1)

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

param weights Embedding layer weights

Local Layers

KerasLocallyConnected1D

Imports a 1D locally connected layer from Keras.

KerasLocallyConnected1D

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected1DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

setWeights

Set weights for 1D locally connected layer.

param weights Map from parameter name to INDArray.

KerasLocallyConnected2D

Imports a 2D locally connected layer from Keras.

KerasLocallyConnected2D

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected2DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

setWeights

Set weights for 2D locally connected layer.

param weights Map from parameter name to INDArray.

Noise Layers

KerasGaussianNoise

[source]

Keras wrapper for DL4J dropout layer with GaussianNoise.

KerasGaussianNoise

public KerasGaussianNoise(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianNoiseLayer

public DropoutLayer getGaussianNoiseLayer()

Get DL4J DropoutLayer with Gaussian dropout.

return DropoutLayer

KerasAlphaDropout

[source]

Keras wrapper for DL4J dropout layer with AlphaDropout.

KerasAlphaDropout

public KerasAlphaDropout(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getAlphaDropoutLayer

public DropoutLayer getAlphaDropoutLayer()

Get DL4J DropoutLayer with Alpha dropout.

return DropoutLayer

KerasGaussianDropout

[source]

Keras wrapper for DL4J dropout layer with GaussianDropout.

KerasGaussianDropout

public KerasGaussianDropout(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianDropoutLayer

public DropoutLayer getGaussianDropoutLayer()

Get DL4J DropoutLayer with Gaussian dropout.

return DropoutLayer

Normalization Layers

KerasBatchNormalization

Imports a BatchNormalization layer from Keras.

KerasBatchNormalization

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getBatchNormalizationLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (4)

setWeights

Set weights for layer.

param weights Map from parameter name to INDArray.

Pooling Layers

KerasPooling1D

Imports a Keras 1D Pooling layer as a DL4J Subsampling layer.

KerasPooling1D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling1DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasPoolingUtils

Utility functionality for Keras pooling layers.

mapPoolingType

Map Keras pooling layers to DL4J pooling types.

param className name of the Keras pooling class
return DL4J pooling type
throws UnsupportedKerasConfigurationException Unsupported Keras config

KerasPooling3D

Imports a Keras 3D Pooling layer as a DL4J Subsampling3D layer.

KerasPooling3D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling3DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasGlobalPooling

Imports a Keras Pooling layer as a DL4J Subsampling layer.

KerasGlobalPooling

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getGlobalPoolingLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras config
see org.deeplearning4j.nn.conf.InputPreProcessor

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

KerasPooling2D

Imports a Keras 2D Pooling layer as a DL4J Subsampling layer.

KerasPooling2D

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling2DLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
param enforceTrainingConfig whether to enforce training-related configuration options
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

Recurrent Layers

KerasSimpleRnn

Imports a Keras SimpleRNN layer as a DL4J SimpleRnn layer.

KerasSimpleRnn

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getSimpleRnnLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (12)

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

getUnroll

Get whether SimpleRnn layer should be unrolled (for truncated BPTT).

return whether RNN should be unrolled (boolean)

setWeights

Set weights for layer.

param weights Simple RNN weights
throws InvalidKerasConfigurationException Invalid Keras configuration exception

KerasRnnUtils

Utility functions for Keras RNN layers

getUnrollRecurrentLayer

Get unroll parameter to decide whether to unroll RNN with BPTT or not.

param conf KerasLayerConfiguration
param layerConfig dictionary containing Keras layer properties
return boolean unroll parameter
throws InvalidKerasConfigurationException Invalid Keras configuration

getRecurrentDropout

Get recurrent weight dropout from Keras layer configuration. Non-zero dropout rates are currently not supported.

param conf KerasLayerConfiguration
param layerConfig dictionary containing Keras layer properties
return recurrent dropout rate
throws InvalidKerasConfigurationException Invalid Keras configuration

KerasLSTM

Imports a Keras LSTM layer as a DL4J LSTM layer.

KerasLSTM

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getLSTMLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration.
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters (12)

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

Set weights for layer.

param weights LSTM layer weights

getUnroll

Get whether LSTM layer should be unrolled (for truncated BPTT).

return whether to unroll the LSTM

getGateActivationFromConfig

Get LSTM gate activation function from Keras layer configuration.

param layerConfig dictionary containing Keras layer configuration
return LSTM inner activation function
throws InvalidKerasConfigurationException Invalid Keras config

getForgetBiasInitFromConfig

Get LSTM forget gate bias initialization from Keras layer configuration.

param layerConfig dictionary containing Keras layer configuration
return LSTM forget gate bias init
throws InvalidKerasConfigurationException Unsupported Keras config

Wrapper Layers

KerasBidirectional

Builds a DL4J Bidirectional layer from a Keras Bidirectional layer wrapper

KerasBidirectional

Pass-through constructor from KerasLayer

param kerasVersion major keras version
throws UnsupportedKerasConfigurationException Unsupported Keras config

getUnderlyingRecurrentLayer

Constructor from parsed Keras layer configuration dictionary.

param layerConfig dictionary containing Keras layer configuration
throws InvalidKerasConfigurationException Invalid Keras config
throws UnsupportedKerasConfigurationException Unsupported Keras config

getBidirectionalLayer

Get DL4J Bidirectional layer.

return Bidirectional Layer

getOutputType

Get layer output type.

param inputType Array of InputTypes
return output type as InputType
throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

return number of trainable parameters

getInputPreprocessor

Gets appropriate DL4J InputPreProcessor for given InputTypes.

param inputType Array of InputTypes
return DL4J InputPreProcessor
throws InvalidKerasConfigurationException Invalid Keras configuration exception
see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

Set weights for Bidirectional layer.

param weights Map of weights

Activations

Supported Keras activations.

We support all Keras activation functions, namely:

softmax
elu
selu
softplus
softsign
relu
tanh
sigmoid
hard_sigmoid
linear

The mapping of Keras to DL4J activation functions is defined in KerasActivationUtils

Constraints

Supported Keras constraints.

All are supported:

max_norm
non_neg
unit_norm
min_max_norm

Mapping Keras to DL4J constraints happens in .

Initializers

Supported Keras weight initializers.

DL4J supports all available Keras initializers, namely:

Zeros
Ones
Constant
RandomNormal
RandomUniform
TruncatedNormal
VarianceScaling
Orthogonal
Identity
lecun_uniform
lecun_normal
glorot_normal
glorot_uniform
he_normal
he_uniform

The mapping of Keras to DL4J initializers can be found in KerasInitilizationUtils.

Losses

Supported Keras loss functions.

DL4J supports all available Keras losses (except for logcosh), namely:

mean_squared_error
mean_absolute_error
mean_absolute_percentage_error
mean_squared_logarithmic_error
squared_hinge
hinge
categorical_hinge
logcosh
categorical_crossentropy
sparse_categorical_crossentropy
binary_crossentropy
kullback_leibler_divergence
poisson
cosine_proximity

The mapping of Keras loss functions can be found in KerasLossUtils.

Optimizers

Supported Keras optimizers

All standard Keras optimizers are supported, but importing custom TensorFlow optimizers won't work:

SGD
RMSprop
Adagrad
Adadelta
Adam
Adamax
Nadam
TFOptimizer

Regularizers

Supported Keras regularizers.

All [Keras regularizers] are supported by DL4J model import:

l1
l2
l1_l2

Mapping of regularizers can be found in .

Tuning and Training

Early Stopping

Terminate a training session given certain conditions.

What is early stopping?

When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).

Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.

The idea behind early stopping is relatively simple:

Split data into training and test sets
At the end of each epoch (or, every N epochs):
- evaluate the network performance on the test set
- if the network outperforms the previous best model: save a copy of the network at the current epoch
Take as our final model the model that has the best test set performance

This is shown graphically below:

The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.

Using DL4J's early stopping functionality requires you to provide a number of configuration options:

A score calculator, such as the DataSetLossCalculator(JavaDoc, Source Code) for a Multi Layer Network, or DataSetLossCalculatorCG (JavaDoc, Source Code) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)
How frequently we want to calculate the score function (default: every epoch)
One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:
- Epoch termination conditions: evaluated every N epochs
- Iteration termination conditions: evaluated once per minibatch
A model saver, that defines how models are saved

An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:

MultiLayerConfiguration myNetworkConfiguration = ...;
DataSetIterator myTrainData = ...;
DataSetIterator myTestData = ...;

EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
        .epochTerminationConditions(new MaxEpochsTerminationCondition(30))
        .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
        .scoreCalculator(new DataSetLossCalculator(myTestData, true))
        .evaluateEveryNEpochs(1)
        .modelSaver(new LocalFileModelSaver(directory))
        .build();

EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);

//Conduct early stopping training:
EarlyStoppingResult result = trainer.fit();

//Print out the results:
System.out.println("Termination reason: " + result.getTerminationReason());
System.out.println("Termination details: " + result.getTerminationDetails());
System.out.println("Total epochs: " + result.getTotalEpochs());
System.out.println("Best epoch number: " + result.getBestModelEpoch());
System.out.println("Score at best epoch: " + result.getBestModelScore());

//Get the best model:
MultiLayerNetwork bestModel = result.getBestModel();

You can also implement your own iteration and epoch termination conditions.

Early Stopping w/ Parallel Wrapper

The early stopping implementation described above will only work with a single device. However, EarlyStoppingParallelTrainer provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. EarlyStoppingParallelTrainer wraps your model in a ParallelWrapper class and performs localized distributed training.

Note that EarlyStoppingParallelTrainer doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.

Evaluation

Tools and classes for evaluating neural network performance

Why evaluate?

When training or deploying a Neural Network it is useful to know the accuracy of your model. In DL4J the Evaluation Class and variants of the Evaluation Class are available to evaluate your model's performance.

The Evaluation class is used to evaluate the performance for binary and multi-class classifiers (including time series classifiers). This section covers basic usage of the Evaluation Class.

Given a dataset in the form of a DataSetIterator, the easiest way to perform evaluation is to use the built-in evaluate methods on MultiLayerNetwork and ComputationGraph:

However, evaluation can be performed on individual minibatches also. Here is an example taken from our dataexamples/CSVExample in the project.

The CSV example has CSV data for 3 classes of flowers and builds a simple feed forward neural network to classify the flowers based on 4 measurements.

The first line creates an Evaluation object with 3 classes. The second line gets the labels from the model for our test dataset. The third line uses the eval method to compare the labels array from the testdata with the labels generated from the model. The fourth line logs the evaluation data to the console.

The output.

By default the .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:

Confusion Matrix
False Positive/Negative Rate
True Positive/Negative
Class Counts
F-beta, G-measure, Matthews Correlation Coefficient and more, see

Display the Confusion Matrix.

Displays

Additionaly the confusion matrix can be accessed directly, converted to csv or html using.

To Evaluate a network performing regression use the RegressionEvaluation Class.

As with the Evaluation class, RegressionEvaluation on a DataSetIterator can be performed as follows:

Here is a code snippet with single column, in this case the neural network was predicting the age of shelfish based on measurements.

Print the statistics for the Evaluation.

Returns

Columns are Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, Relative Squared Error, and R^2 Coefficient of Determination

See

When performing multiple types of evaluations (for example, Evaluation and ROC on the same network and dataset) it is more efficient to do this in one pass of the dataset, as follows:

Time series evaluation is very similar to the above evaluation approaches. Evaluation in DL4J is performed on all (non-masked) time steps separately - for example, a time series of length 10 will contribute 10 predictions/labels to an Evaluation object. One difference with time seires is the (optional) presence of mask arrays, which are used to mark some time steps as missing or not present. See for more details on masking.

For most users, it is simply sufficient to use the MultiLayerNetwork.evaluate(DataSetIterator) or MultiLayerNetwork.evaluateRegression(DataSetIterator) and similar methods. These methods will properly handle masking, if mask arrays are present.

The EvaluationBinary is used for evaluating networks with binary classification outputs - these networks usually have Sigmoid activation functions and XENT loss functions. The typical classification metrics, such as accuracy, precision, recall, F1 score, etc. are calculated for each output.

See

ROC (Receiver Operating Characteristic) is another commonly used evaluation metric for the evaluation of classifiers. Three ROC variants exist in DL4J:

ROC - for single binary label (as a single column probability, or 2 column 'softmax' probability distribution).
ROCBinary - for multiple binary labels
ROCMultiClass - for evaluation of non-binary classifiers, using a "one vs. all" approach

These classes have the ability to calculate the area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC), via the calculateAUC() and calculateAUPRC() methods. Furthermore, the ROC and Precision-Recall curves can be obtained using getRocCurve() and getPrecisionRecallCurve().

The ROC and Precision-Recall curves can be exported to HTML for viewing using: EvaluationTools.exportRocChartsToHtmlFile(ROC, File), which will export a HTML file with both ROC and P-R curves, that can be viewed in a browser.

Note that all three support two modes of operation/calculation

Thresholded (approximate AUROC/AUPRC calculation, no memory issues)
Exact (exact AUROC/AUPRC calculation, but can require large amount of memory with very large datasets - i.e., datasets with many millions of examples)

The number of bins can be set using the constructors. Exact can be set using the default constructor new ROC() or explicitly using new ROC(0)

See is used to evaluate Binary Classifiers.

Deeplearning4j also has the EvaluationCalibration class, which is designed to analyze the calibration of a classifier. It provides a number of tools for this purpose:

Counts of the number of labels and predictions for each class
Reliability diagram (or reliability curve)
Residual plot (histogram)
Histograms of probabilities, including probabilities for each class separately
Evaluation of a classifier using EvaluationCalibration is performed in a similar manner to the other evaluation classes. The various plots/histograms can be exported to HTML for viewing using EvaluationTools.exportevaluationCalibrationToHtmlFile(EvaluationCalibration, File).

SparkDl4jMultiLayer and SparkComputationGraph both have similar methods for evaluation:

A multi-task network is a network that is trained to produce multiple outputs. For example a network given audio samples can be trained to both predict the language spoken and the gender of the speaker. Multi-task configuration is briefly described .

Evaluation Classes useful for Multi-Task Network

See

Available evaluations

Transfer Learning

DL4J’s Transfer Learning API

The DL4J transfer learning API enables users to:

Modify the architecture of an existing model
Fine tune learning configurations of an existing model.
Hold parameters of a specified layer constant during training, also referred to as “frozen"

Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

The transfer learning helper

The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

Featurize an input dataset to save for future use
Fit the model with frozen layers with a featurized dataset
Output from the model with frozen layers given a featurized input.

When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

Show me the code

This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from http://download.tensorflow.org/example_images/flower_photos.tgz

I. Import a zoo model

Deeplearning4j has a new native model zoo. Read about the deeplearning4j-zoo module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:

ZooModel zooModel = VGG16.builder().build();
ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);

II. Set up a fine-tune configuration

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Nesterovs(5e-5))
            .seed(seed)
            .build();

III. Build new models based on VGG16

A.Modifying only the last layer, keeping other frozen

The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
    .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("fc2")
              .removeVertexKeepConnections("predictions") 
              .addLayer("predictions", 
        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nIn(4096).nOut(numClasses)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SOFTMAX).build(), "fc2")
              .build();

After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

B. Attach new layers to the bottleneck (block5_pool)

Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("block5_pool")
              .nOutReplace("fc2",1024, WeightInit.XAVIER)
              .removeVertexAndConnections("predictions") 
              .addLayer("fc3",new DenseLayer.Builder()
              .activation(Activation.RELU)
              .nIn(1024).nOut(256).build(),"fc2") 
              .addLayer("newpredictions",new OutputLayer
              .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                                .activation(Activation.SOFTMAX)
                                .nIn(256).nOut(numClasses).build(),"fc3") 
              .setOutputs("newpredictions") 
              .build();

C. Fine tune layers from a previously saved model

Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor(“block4_pool”)
              .build();

IV. Saving “featurized” datasets and training with them.

We use the transfer learning helper API. Note this freezes the layers of the model passed in.

Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(pretrainedNet, "fc2");
while(trainIter.hasNext()) {
        DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
        saveToDisk(currentFeaturized,trainDataSaved,true);
  trainDataSaved++;
}

Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(vgg16Transfer);
while (trainIter.hasNext()) {
       transferLearningHelper.fitFeaturized(trainIter.next());
}

Notes

The TransferLearning builder returns a new instance of a dl4j model.

Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

The trained model helper imports models from Keras without enforcing a training configuration.

Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

Frozen layer configurations are not saved when writing the model to disk.

In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
In place: With the transfer learning helper API which will apply the frozen layers to the given model.
FineTune configurations will selectively update learning parameters.

For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.

Utilities

Reference

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

And initialize another VGG16 model with weights trained on VGGFace:

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

You can find a complete list of models using this .

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

Transfer Learning

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J .

Workspaces

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see .

Activations

Special algorithms for gradient descent.

What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

Available activations

ActivationRectifiedTanh

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

ActivationELU

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

ActivationReLU

f(x) = max(0, x)

ActivationRationalTanh

Rational tanh approximation From

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

ActivationThresholdedReLU

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

ActivationReLU6

f(x) = min(max(input, cutoff), 6)

ActivationHardTanH

ActivationSigmoid

f(x) = 1 / (1 + exp(-x))

ActivationGELU

GELU activation function - Gaussian Error Linear Units

ActivationPReLU

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

ActivationIdentity

f(x) = x

ActivationSoftSign

ActivationHardSigmoid

f(x) = min(1, max(0, 0.2x + 0.5))

ActivationSoftmax

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

ActivationCube

f(x) = x^3

ActivationRReLU

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

ActivationTanH

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

ActivationSELU

ActivationLReLU

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

ActivationSwish

f(x) = x sigmoid(x)

ActivationSoftPlus

f(x) = log(1+e^x)

Auto Encoders

What are autoencoders?

Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

Where’s Restricted Boltzmann Machine?

RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

Supported layers

AutoEncoder

[source]

Autoencoder layer. Adds noise to input and learn a reconstruction function.

corruptionLevel

public Builder corruptionLevel(double corruptionLevel)

Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

sparsity

public Builder sparsity(double sparsity)

Autoencoder sparity parameter

param sparsity Sparsity

VariationalAutoencoder

[source]

Variational Autoencoder layer

See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes - https://arxiv.org/abs/1312.6114

This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

encoderLayerSizes

public Builder encoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

setEncoderLayerSizes

public void setEncoderLayerSizes(int... encoderLayerSizes)

param encoderLayerSizes Size of each encoder layer in the variational autoencoder

decoderLayerSizes

public Builder decoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

setDecoderLayerSizes

public void setDecoderLayerSizes(int... decoderLayerSizes)

param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

reconstructionDistribution

public Builder reconstructionDistribution(ReconstructionDistribution distribution)

The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:

{- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data
{- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data
param distribution Reconstruction distribution

lossFunction

public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

lossFunction

public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)

param outputActivationFn Activation function for the output/reconstruction
param lossFunction Loss function to use

pzxActivationFn

public Builder pzxActivationFn(IActivation activationFunction)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activationFunction Activation function for p(z| x)

pzxActivationFunction

public Builder pzxActivationFunction(Activation activation)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

param activation Activation function for p(z | x)

nOut

public Builder nOut(int nOut)

Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

param nOut Size of P(Z | data) and output size

numSamples

public Builder numSamples(int numSamples)

Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

param numSamples Number of samples per data point for pretraining

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

[source]

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

public ActivationLayer clone()

param activation Activation function for the layer

activation

public Builder activation(String activationFunction)

Activation function for the layer

activation

public Builder activation(IActivation activationFunction)

param activationFunction Activation function for the layer

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

DenseLayer

[source]

Dense layer: a standard fully connected feed forward layer

hasBias

public Builder hasBias(boolean hasBias)

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

public Builder hasLayerNorm(boolean hasLayerNorm)

If true (default = false): enable layer normalization on this layer

DropoutLayer

[source]

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

public DropoutLayer build()

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

param dropout Activation retain probability.

EmbeddingLayer

[source]

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

[source]

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

inputLength

public Builder inputLength(int inputLength)

Set input sequence length for this embedding layer.

param inputLength input sequence length
return Builder

inferInputLength

public Builder inferInputLength(boolean inferInputLength)

Set input sequence inference mode for embedding layer.

param inferInputLength whether to infer input length
return Builder

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

[source]

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only
CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

[miniBatchSize, vectorSize, 1] for RNN output,
[miniBatchSize, channels, 1, 1] for CNN output, and
[miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

public Builder poolingDimensions(int... poolingDimensions)

Pooling type for global pooling

poolingType

public Builder poolingType(PoolingType poolingType)

param poolingType Pooling type for global pooling

collapseDimensions

public Builder collapseDimensions(boolean collapseDimensions)

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false:

3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]
4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]
5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]
param collapseDimensions Whether to collapse the dimensions or not

pnorm

public Builder pnorm(int pnorm)

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

param pnorm P-norm constant

LocalResponseNormalization

[source]

Local response normalization layer See section 3.3 of http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

public Builder k(double k)

LRN scaling constant k. Default: 2

public Builder n(double n)

Number of adjacent kernel maps to use when doing LRN. default: 5

param n Number of adjacent kernel maps

alpha

public Builder alpha(double alpha)

LRN scaling constant alpha. Default: 1e-4

param alpha Scaling constant

beta

public Builder beta(double beta)

Scaling constant beta. Default: 0.75

param beta Scaling constant

cudnnAllowFallback

public Builder cudnnAllowFallback(boolean allowFallback)

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

[source]

SameDiff version of a 1D locally connected layer.

nIn

public Builder nIn(int nIn)

Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int k)

param k Kernel size for the layer

stride

public Builder stride(int s)

param s Stride for the layer

padding

public Builder padding(int p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int d)

param d Dilation for the layer

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int inputSize)

Set input filter size for this locally connected 1D layer

param inputSize height of the input filters
return Builder

LocallyConnected2D

[source]

SameDiff version of a 2D locally connected layer.

setKernel

public void setKernel(int... kernel)

Number of inputs to the layer (input size)

setStride

public void setStride(int... stride)

param stride Stride for the layer. Must be 2 values (height/width)

setPadding

public void setPadding(int... padding)

param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

public void setDilation(int... dilation)

param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

public Builder nIn(int nIn)

param nIn Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)

param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)

param activation Activation function for the layer

kernelSize

public Builder kernelSize(int... k)

param k Kernel size for the layer. Must be 2 values (height/width)

stride

public Builder stride(int... s)

param s Stride for the layer. Must be 2 values (height/width)

padding

public Builder padding(int... p)

param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)

param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int... d)

param d Dilation for the layer. Must be 2 values (height/width)

hasBias

public Builder hasBias(boolean hasBias)

param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int... inputSize)

Set input filter size (h,w) for this locally connected 2D layer

param inputSize pair of height and width of the input filters to this layer
return Builder

LossLayer

[source]

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

public Builder nIn(int nIn)

param lossFunction Loss function for the loss layer

OutputLayer

[source]

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

public OutputLayer build()

param lossFunction Loss function for the output layer

Pooling1D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

[source]

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

[source]

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

public void setKernelSize(int... kernelSize)

Kernel size

param kernelSize kernel size

setStride

public void setStride(int... stride)

Stride

param stride stride value

setPadding

public void setPadding(int... padding)

Padding

param padding padding value

Upsampling1D

[source]

sequenceLength]} Example:

If input (for a single example, with channels down page, and sequence from left to right) is:
[ A1, A2, A3]
[ B1, B2, B3]
Then output with size = 2 is:
[ A1, A1, A2, A2, A3, A3]
[ B1, B1, B2, B2, B3, B2]

size

public Builder size(int size)

Upsampling size

param size upsampling size in single spatial dimension of this 1D layer

size

public Builder size(int[] size)

Upsampling size int array with a single element. Array must be length 1

param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

[source]

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

Input (slice for one example and channel)
[ A, B ]
[ C, D ]
Size = [2, 2]
Output (slice for one example and channel)
[ A, A, B, B ]
[ A, A, B, B ]
[ C, C, D, D ]
[ C, C, D, D ]

size

public Builder size(int size)

Upsampling size int, used for both height and width

param size upsampling size in height and width dimensions

size

public Builder size(int[] size)

Upsampling size array

param size upsampling size in height and width dimensions

Upsampling3D

[source]

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

public Builder size(int size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

size

public Builder size(int[] size)

Upsampling size as int, so same upsampling size is used for depth, width and height

param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

[source]

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

public void setPadding(int... padding)

Padding value for left and right. Must be length 2 array

build

public ZeroPadding1DLayer build()

param padding Padding for both the left and right

ZeroPadding3DLayer

[source]

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

public void setPadding(int... padding)

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

public ZeroPadding3DLayer build()

param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

[source]

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

public void setPadding(int... padding)

Padding value for top, bottom, left, and right. Must be length 4 array

build

public ZeroPaddingLayer build()

param padHeight Padding for both the top and bottom
param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

[source]

is a learnable weight vector of length nOut

“.” is element-wise multiplication
b is a bias vector

Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

public LayerMemoryReport getMemoryReport(InputType inputType)

This is a report of the estimated memory consumption for the given layer

param inputType Input type to the layer. Memory consumption is often a function of the input type
return Memory report for the layer

RepeatVector

[source]

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

public int getRepetitionFactor()

Set repetition factor for RepeatVector layer

setRepetitionFactor

public void setRepetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

repetitionFactor

public Builder repetitionFactor(int n)

Set repetition factor for RepeatVector layer

param n upsampling size in height and width dimensions

Yolo2OutputLayer

[source]

Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - https://arxiv.org/abs/1612.08242 and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

public Builder lambdaCoord(double lambdaCoord)

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

public Builder lambbaNoObj(double lambdaNoObj)

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

public Builder lossPositionScale(ILossFunction lossPositionScale)

Loss function for position/scale component of the loss function

param lossPositionScale Loss function for position/scale

lossClassPredictions

public Builder lossClassPredictions(ILossFunction lossClassPredictions)

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

public Builder boundingBoxPriors(INDArray boundingBoxes)

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

[source]

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

[source]

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

[source]

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see http://yann.lecun.com/exdb/mnist/

UciSequenceDataSetIterator

[source]

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

Details: https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series Data: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data Image: https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

param batchSize Minibatch size

Cifar10DataSetIterator

[source]

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

IrisDataSetIterator

[source]

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes https://archive.ics.uci.edu/ml/datasets/Iris

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

see https://archive.ics.uci.edu/ml/datasets/Iris
param batch Batch size
param numExamples Total number of examples

LFWDataSetIterator

[source]

LFW iterator - Labeled Faces from the Wild dataset See http://vis-www.cs.umass.edu/lfw/ 13233 images total, with 5749 classes.

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

param batchSize the batch size of the examples
param numExamples the overall number of examples
param imgDim an array of height, width and channels
param numLabels the overall number of examples
param useSubset use a subset of the LFWDataSet
param labelGenerator path label generator to use
param train true if use train value
param splitTrainTest the percentage to split data for train and remainder goes to test
param imageTransform how to transform the image
param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

[source]

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

See: http://cs231n.stanford.edu/ and https://tiny-imagenet.herokuapp.com/

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

param batchSize Minibatch size for the iterator

EmnistDataSetIterator

[source]

EMNIST DataSetIterator

COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes
MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z
BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)
LETTERS: 145,600 examples total. 26 balanced classes
DIGITS: 280,000 examples total. 10 balanced classes

See: https://www.nist.gov/itl/iad/image-group/emnist-dataset and https://arxiv.org/abs/1702.05373

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

param dataSet Dataset (subset) to return
param batchSize Batch size
param train If true: use training set. If false: use test set
param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

param dataSet Subset to get
return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

param dataSet Subset to get
return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

return Labels

RecordReaderDataSetIterator

[source]

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

param recordReader Record reader to use as the source of data
param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

param recordReader RecordReader: provides the source of the data
param batchSize Batch size (number of examples) for the output DataSet objects
param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()
param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

param labelIndexFrom Column index of the first regression target (indexes start at 0)
param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1
param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

[source]

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

[source]

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

param featuresReader SequenceRecordReader for the features
param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1
param miniBatchSize Minibatch size for each call of next()
param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader
return DataSet with the specified example
throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor
return DataSet with the specified examples
throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

[source]

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

[source]

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

[source]

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

param iterator Underlying iterator to wrap and fetch asynchronously from
param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

[source]

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

[source]

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

[source]

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

[source]

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)

param baseIterator
param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches
param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public MultiDataSet next(int num)

This method returns test iterator instance

return

AsyncShieldDataSetIterator

[source]

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)

param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

[source]

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

[source]

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

[source]

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

[source]

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)

param numMiniBatches Number of minibatches per epoch
param features Each triple in the list specifies the shape, array order and type of values for the features arrays
param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)

param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

param shape Shape of the features
param order Order (‘c’ or ‘f’) for the array
param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

param shape Shape of the array
param values Values to fill the array with
return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

param shape Shape of the array
param order Order of array (‘c’ or ‘f’)
param values Values to fill the array with
return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

[source]

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

[source]

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

param iterator Iterator to wrap
param labels String labels. May be null.

DummyBlockMultiDataSetIterator

[source]

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

[source]

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

param underlyingIterator, iterator to wrap
param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

[source]

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

[source]

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

param baseIterator - iterator to be wrapped and split
param totalBatches - total batches in baseIterator
param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

return

next

public DataSet next(int i)

This method returns test iterator instance

return

JointMultiDataSetIterator

[source]

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)

param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)

param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet
param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method
implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

[source]

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)

param iterable Iterable to source data from
param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

[source]

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)

param files List of files to iterate over
param callback Callback for loading the files

MultipleEpochsIterator

[source]

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

param num the number of examples
return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

return {- code true} if the iteration has more elements

remove

public void remove()

throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator
throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

[source]

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)

param iterator Undelying iterator to wrap

RandomDataSetIterator

[source]

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)

param numMiniBatches Number of minibatches per epoch
param featuresShape Features shape
param labelsShape Labels shape
param featureValues Type of values for the features
param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

[source]

Iterator that adapts a DataSetIterator to a MultiDataSetIterator