Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 200 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

EN 1.0.0-M2

Loading...

Release Notes

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

1.00-M2.2

Multi-Project

Tutorials

Loading...

Loading...

How To Guides

Loading...

Loading...

Loading...

Developer Docs

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Explanation

Loading...

Configuration

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Deeplearning4j

Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

How To Guides

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Tuning and Training

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

1.0.0-beta6

Highlights - 1.0.0-beta6 Release

  • Added support for CUDA 10.2. 1.0.0-beta6 released with CUDA 9.2, 10.0, 10.1 and 10.2 support

  • SameDiff optimizations - memory use for inference and training significantly reduced, with some performance improvements also

    • Note: No API changes, only artifact ID change: replace deeplearning4j-ui_2.1x with deeplearning4j-ui

    • Note that additional ND4J namespaces API will have additions (new namespaces and methods), and may have some API changes, in the next release

  • OpenMP replaced with thread pool c++ parallelism framework; enabled c++ parallelism for platforms without C++-level threading for operations

Deeplearning4J

Deeplearning4J: Features and Enhancements

  • DNNL (MKL-DNN) upgraded to version 1.1

Deeplearning4J: Bug Fixes and Optimizations

Deeplearning4j: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

  • Deeplearning4j UI artifact ID has changed: deeplearning4j-ui_2.1x (beta5 and earlier) with deeplearning4j-ui

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

ND4J/SameDiff: Bug Fixes and Optimizations

ND4J: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

DataVec

DataVec: Bug Fixes and Optimizations

RL4J

RL4J: Features and Enhancements

RL4J: Bug Fixes and Optimizations

PyDataVec

PyDataVec Features and Enhancements

PyDataVec Bug Fixes and Optimizations

Deeplearning4j UI - Play framework replaced with Vertx; deeplearning4j-ui dependency now no longer has Scala dependency or Scala version suffix

ND4j namespace operation methods: operations are available through the Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network), for example Nd4j.math.abs(INDArray), Nd4j.random.logNormal etc .

Added causal convolution mode for Convolution1D layer (ConvolutionMode.Causal) and added causal conv1d support for Keras import

Keras import now supports scaled identity weight initialization

Added Mish activation function ,

BertIterator now has a BertIterator.featurizeSentences(List<String>) method for inference ,

BertIterator now supports sentence pairs for supervised training

Added Sparse multi-class cross entropy for both Deeplearning4j and Keras import ,

Deeplearning4j UI: migrated from Play to Vertx for web serving backend, also removing dependency on Scala libraries; no API changes, only artifact ID change - replace deeplearning4j-ui_2.1x with deeplearning4j-ui ,

Added TimeDistributed wrapper layer

KDTree implementation optimized

Deeplearning4j zoo models and datasets hosting location updated

Fixed nIn validation for Deconv2D layer

Fixed an issue with incorrect Deconvolution2d results for Keras import models

Added DNNL/MKLDNN support for batch normalization layer ,

Fixed various integer casts to avoid overflows for very large arrays (with dimensions or length > Integer.MAX_VALUE)

Fixed an issue with UNet non-pretrained model architecture (last layer kernel size)

Deeplearning4j SameDiff layers now use DL4J workspaces for better performance and reduced memory consumption

Updated broken links in afew error messages

Cleaned up a few unused dependencies in various modules

Cleaned up duplicate SamplingDataSetIterator class

Fixed an issue where ComputationGraph instances with a single input going into multiple embedding layers could throw a NPE

Fixed an issue where loss function weights were not automatically cast to network datatype, resulting in an exception if not already correct type

Shaded Jackson version upgraded from 2.9.9/2.9.9.3 to 2.10.1

Fixed an issue with KNN where getMostPopulatedClusters actually returned the least populated clusters

Added suport for CUDA 10.2

DNNL (MKL-DNN) upgraded to version 1.1

Added ND4j namespaces to match SameDiff: Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network)

Added SameDiff.calculateGradientsAndOutputs method

Additional SameDiff single batch .output method overloads for DataSet/MultiDataSet added

TensorFlow import ops coverage enhanced (significant number of additional ops supported) , , , ,

PRelu op added

adjust_contrast, igamma and igammac ops added

ND4J/SameDiff: BitCast, CompareAndBitpack, DivideNoNan, DrawBoundingBoxes, FakeQuantWithMinMaxVarsPerChannel ops added

non_max_suppression_overlaps op added

ImagePreProcessingScaler now supports segmentation use cases

concat operation now supports the concatenation axis being specified via the last input array

Added Gamma and Poisson RNG distributions

SameDiff’s use of DeviceLocal for variables/constants etc is now configurable

Uniform distribution op now supports random integer generation, not just random floating point generation

SameDiff: Added simple OpBenchmarkListener for benchmarking purposes

Added the ability to disable platform helpers (DNNL/MKLDNN etc) via Nd4jCPU.Environment.getInstance().allowHelpers(false); and Nd4jCuda.Environment.getInstance().allowHelpers(false);

Added draw_bounding_boxes operation

Added resize_bicubic operation

Added causal padding mode to conv1d operation

DNNL (MKLDNN) is included and enabled by default for non-AVX builds

Added SameDiff ArraySavingListener for debugging purposes

OpenMP replaced with ThreadPool abstraction, enables parallelism for platforms without OpenMP support

SameDiff memory management overheauled for (in some cases significantlny) reduced memory consumption and improved performance ,

Switched to Clang instead of gcc for OSX compilation to avoid compiler-related issues

Removed SameDiff.outputs() “best guess” output inference due to being unreliable, in favor of explicit SameDiff.setOutputs(String...) call

Fixed an issue with Nd4j.hstack on 1D arrays

SameDiff no longer allows empty arrays for variables

Fixed an issue with Nadam updater LR schedules not being cloned

Cleaned up IActivation interface

Added new LSTM op implementation with DNNL/MKLDNN support (forward pass only so far)

SameDiff API cleaned up; deprecated methods removed

Switched SameDiff variable initialization to non-lazy, to avoid unexpected behaviour when mixing execution and ND4J RNG seed setting

SameDiff.zero and .one methods now create constants, not vairables

Moved CUDA build version and device logging to Java logging, from c++ stdout to enable disabling logging (via ND4J config or slf4j config)

Added DNNL/MKLDNN support for batch normalization

SameDiff: Fixed an issue where listeners weren’t being called for gradient calculation

Added DNNL/MKLDNN support for deconv2d/3d operations

Fixed an issue with biasadd_bp operation and NHWC data format

Fixed an issue with certain strided slice backprop configurations ,

Fixed an issue with LogSumExp reduction operation backprop for along dimension case ,

INDArray.toString() now has correct brackets for rank 1+ scalars to avoid ambiguity

Fixed an issue where some ND4J methods could fail when the library is compiled on Java 9+ but run on Java 8

Fixed empty array input case for is_strictly_increasing, non_decreasing and non_max_suppression ops ,

Fixed empty input arrays for legacy ops (transform, scalar, pairwise, broadcast)

CUDA compute capability 3.0 is supported again

Improved performance for Scatter operations (1D case) + index validation

Fixed an issue where SameDiff TrainingConfig serialization would fail if evaluation instances are set ,

SameDiff execution will now throw an exception when assertion operations in the graph fail

PolyGamma function now returns NaNs when passed double for args requiring integer values

Fixed some issues for pad and mirror_pad ops to ensure they conform with Tensorflow for imported networks

Updated and fixed some issues for TensorFlow graph runner

Improved performance for Reverse operation

Removed/cleanup up unused ND4J list functionality

Fixed reduce bool operation results (such as any, all, IsInf, etc) for empty array inputs

SameDiff.outputs() now requires user to call SameDiff.setOutputs(String...) first; previous “best guess” output inference was unreliable

SameDiff.zero and .one methods now create constants, not vairables

NativeImageLoader now checks for empty input streams and throws an exception instead of crashing

NDArrayScalarOpTransform now supports modulus operator

Added AsyncTrainingListener

Replaced multiple uses of java.util.Random with ND4J Random

Added Observable and LegacyMDPWrapper

Refactored RL4J video recording to separate VideoRecorder class

Fixed an issue with target for DQN ,

Refactoring for DQN and double DQN for improved maintainability

Internal refactoring and various bug fixes

PyDataVec TransformProcess now supports non-inplace operations

Fixed various issues with PyDataVec

Fixed an issue with data locality that could cause incorrect results under some circumstances when running on CUDA

Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link

1.0.0-M2

Highlights

As part of the same work flatbuffers has been upgraded to 1.12.1. This affects the samediff file format and the user interfaces. Flatbuffers as a file format is forwards and backwards compatible but if you have any issues please do let us know. The relevant files have been updated using the flatc compiler.

Removed rl4j: in continuing to cut unmaintained modules, the 1.0 will focus the framework on a few key use cases. This invites other folks to build external modules for a tightly maintained core that focuses on deployment, framework interop and training models in java.

Consolidated tests to platform-tests to allow for easy testing of behavior against different backends.

Adds proper support for jetson nano with curated binaries and an updated cuda 10.2

Nd4j/Samdiff/Libnd4j

Features and Enhancements

Bug Fixes

  1. Update samediff api to allow dimensions as variables

Deeplearning4j

Features and Enhancements

Bug Fixes

Datavec

Features and Enhancements

Bug Fixes

Omnihub

Launches new Omnihub module. Allows access to models from: https://github.com/KonduitAI/omnihub-zoo

A pretrained omnihub module will provide access to pretrained samediff and dl4j modules. This will also supplant the old dl4j zoo.

Python4j

Clean up tests/consolidate tests to platform-tests

Deeplearning4j Suite Overview

Introduction to core Deeplearning4j concepts.

Eclipse DeepLearning4J

Eclipse Deeplearning4j is a suite of tools for running deep learning on the JVM. It's the only framework that allows you to train models from java while interoperating with the python ecosystem through a mix of python execution via our cpython bindings, model import support, and interop of other runtimes such as tensorflow-java and onnxruntime.

The use cases include importing and retraining models (Pytorch, Tensorflow, Keras) models and deploying in JVM Micro service environments, mobile devices, IoT, and Apache Spark. It is a great compliment to your python environment for running models built in python, deployed to or packaged for other environments.

Deeplearning4j has several submodules including:

  1. Samediff: a tensorflow/pytorch like framework for execution of complex graphs. This framework is lower level, but very flexible. It's also the base api for running onnx and tensorflow graphs.

  2. Nd4j: numpy ++ for java. Contains a mix of numpy operations and tensorflow/pytorch operations.

  3. Libnd4j: A lightweight, standalone c++ library enable math code to run on different devices. Optimizable for running on a wide variety of devices.

  4. Python4j: A python script execution framework easing deployment of python scripts in to production.

  5. Apache Spark Integration: An integration with the Apache Spark framework enabling execution of deep learning pipelines on spark

  6. Datavec: A data transformation library converting raw input data to tensors suitable for running neural networks on.

How to use this website

  1. Multi project contains all cross project documentation such as end to end training and other whole project related documentation. This should be the default entry point for those getting started.

  2. Deeplearning4j contains all of the documentation related to the core deeplearning4j apis such as the multi layer network and the computation graph. Consider this the high level framework for building neural networks. If you would like something lower level like tensorflow or pytorch, consider using samediff

  3. Samediff contains all the documentation related to the samediff submodule of ND4j. Samediff is a lower level api for building neural networks similar to pytorch or tensorflow with built in automatic differentiation.

  4. Datavec contains all the documentation related to our data transformation library datavec.

  5. Python4j contains all the documentation related to our cpython execution framework python4j.

  6. Libnd4j contains all the documentation related to our underlying C++ framework libnd4j.

  7. Apache Spark contains all of the documentation related to our Apache Spark integration.

  8. Concepts/Theory contains all of the documentation related to general mathematical or computer science theory needed to understand various aspects of the framework.

Open Source

JVM/Python/C++

Deeplearning4j can either be a compliment to your existing workflows in python and c++ or a standalone library for you to build and deploy models. Use what components you find useful.

1.0.0-M1.1

Highlights

A number of bug fixes following the M1 release, thanks to the feedback from the community, allowed us to quickly sort out a few issues. This is a minor bug fix release to address short comings found with M1. Most fixes were related to keras import, the cnn/rnn helpers, and python4j.

Added backwards compatibility for centos 6 via a new linux-x86_64-compat classifier enabling use of older glibcs on centos 7:

Known issues

Deeplearning4j

Features and Enhancements

Bug fixes

Nd4j

Features and Enhancements

Bug fixes

Datavec

Features and Enhancements

Bug fixes

Python4j

Features and Enhancements

Bug fixes

Samediff

Features and Enhancements

Bug fixes

1.0.0-beta7

Version 1.0.0-beta7

Deeplearning4j

Features and Enhancements

    • Full inference and training support is available for ops/layers in the tf.keras namespace; inference only for general Tensorflow operations outside of the tf.keras namespace

    • Note also improvements to Keras import for reshape, permute, etc operations due to NHWC and NWC support in DL4J

Bug Fixes and Optimizations

ND4J/SameDiff:

Features and Enhancements

  • Added new Image operations namespace operations:

  • Added new Random operations namespace operations:

  • Added new Math namespace operations:

  • Added new NN namespace operations:

  • Added new CNN namespace operations:

  • Added new linalg operations namespace

  • Added new RNN operation namespace operations:

  • Mapped operations for Tensorflow import:

Bug Fixes and Optimizations

  • Improved performance for bias_add operation

DataVec

Features and Enhancements

Bug Fixes and Optimizations

RL4J

Features and Enhancements

Arbiter

Bug Fixes and Optimizations

1.0.0-M1

Highlights

In light of the coming 1.0, the project has decided to cut a number of modules before the final release. These modules have not had many users in the past and have created confusion for many users just trying to use a few simple apis. Many of these modules have not been maintained.

There will likely be 1 or 2 more milestone releases before the final 1.0. These should be considered checkpoints.

These modules include:

  1. Arbiter

  2. Jumpy

  3. Datavec modules for video, audio, audio, sound. The computer vision datavec module

    will continue to be available.

  4. Tokenizers: The tokenizers for chinese, japanese, korean were imported from other frameworks

    and not really updated.

  5. Scalnet, Nd4s: We removed the scala modules due to the small user base. We welcome 3rd party enhancements

    to the framework for syntatic sugar such as kotlin and scala. The framework's focus will be on providing

TVM: We now support running TVM modules. Docs coming soon.

We've updated our shaded modules to newer versions to mitigate security risks. These modules include: 1. jackson 2. guava

Cuda 11: We've upgraded dl4j and associated modules to support cuda 11 and 11.2.

A more modular model import framework supporting tensorflow and onnx: 1. Model mapping procedures loadable as protobuf 2. Defining custom rules for import to work around unsupported or custom layers/operations 3. Op descriptor for all operations in nd4j

This will enable users to override model import behavior to run their own custom models. This means, in most circumstances, there will be no need to modify model import core code anymore. Instead, users will be able to provide definitions and custom rules for their graphs.

Users will be expected to convert their models in an external process. This means running standalone conversions for their models. This extends to keras import as well. Sometimes users convert their models in production directly from keras.

The workflow going forward is to ensure that your model is converted ahead of time to avoid performance issues with converting large models.

Removed ppc from nd4j-native-platform and nd4j-cuda-platform. If you need this architecture, please contact us or build from source.

We've upgraded arrow to 4.0.0 enabling the associated nd4j-arrow and datavec-arrow modules to be used without netty clashes.

Deeplearning4j

Bug fixes

  • Improved keras model import support for NWHC as well as NCHW input formats for both rnn and cnn

Nd4j

Features and Enhancements

Bug fixes

Python4j

Features and Enhancements

Rewritten and more stable python execution. This allows better support for multi threaded environments.

Bug fixes

1.0.0-beta2

Highlights - 1.0.0-beta2 Release

  • ND4J/Deeplearning4j: Added support for CUDA 9.2. Dropped support for CUDA 9.1. (1.0.0-beta2 release has CUDA 8.0, 9.0 and 9.2 support)

  • Deeplearning4j resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property

  • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

  • ND4J: nd4j-native-platform will now use Intel MKL-DNN as the default/bundled BLAS implementation (replacing OpenBLAS as the previous default)

  • Deeplearning4j: Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs (to assist with debugging and tuning memory configuration).

Deeplearning4J

Deeplearning4J: New Features

Deeplearning4J: Bug Fixes and Optimizations

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

  • GravesLSTM has been deprecated in favor of LSTM due to lack of CuDNN support but otherwise similar accuracy to in practice. Use LSTM class instead.

Deelpearning4J: 1.0.0-beta2 Known Issues

Deeplearing4J: Keras Import

  • Keras model import now imports every Keras application

  • Supports GlobalPooling3D layer import

  • Supports RepeatVector layer import

  • Supports LocallyConnected1D and LocallyConnected2D layers

  • Keras Lambda layers can now be imported by registering custom SameDiff layers

  • All Keras optimizers are now supported

  • All advanced activation functions can now be imported.

  • Many minor bugs have been fixed, including proper weight setting for all configurations of BatchNormalization, improvements to Reshape SeparableConvolution2D, and full support of Bidirectional layers.

ND4J

ND4J: New Features

  • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

  • SameDiff: A significant number of new ops, and backprop implementations for existing ops

ND4J: Bug Fixes and Optimizations

  • SameDiff: a significant number of bug fixes for execution and individual ops

ND4J: Known Issues

ND4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

  • CUDA 9.1 support has been removed. CUDA 8.0, 9.0 and 9.2 support is available

  • Due to long indexing changes, long/long[] should be used in place of int/int[] in some places (such as INDArray.size(int), INDArray.shape())

  • Unused and not properly tested/maintained utility class BigDecimalMath has been removed. Users should find an aternative library for this functionality, if required.

DataVec

DataVec: New Features

DataVec: Optimizations and Bug Fixes

DataVec: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

Arbiter

Arbiter: New Features

Arbiter: Fixes

  • DataProvider has been deprecated. Use DataSource instead.

RL4J

1.0.0-beta3

Highlights - 1.0.0-beta3 Release

  • ND4J/Deeplearning4j: Added support for CUDA 10.0. Dropped support for CUDA 8.0. (1.0.0-beta3 release has CUDA 9.0, 9.2 and 10.0 support)

  • SameDiff now supports training and evaluation from DataSetIterator and MultiDataSetIterator. Evaluation classes have been moved to ND4J.

  • DL4J Spark training (gradient sharing) is now fully fault tolerant, and has improvements for threshold adaption (potentially more robust convergence). Ports can now be easily configured independently on master/workers.

Deeplearning4J

Deeplearning4J: New Features

Deeplearning4J: Bug Fixes and Optimizations

    • Note that learning rates may need to be decreased for some updaters (such as Adam) to account for this change vs. earlier versions. Some other updaters (such as SGD, NoOp, etc) should be unaffected.

    • Note that deserialized (loaded) configurations/networks saved in 1.0.0-beta2 or earlier will default to old behaviour for backward compatibility. All new networks (created in 1.0.0-beta3) will default to the new behaviour.

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

  • IEvaluation classes in DL4J have been deprecated and moved to ND4J so they are available for SameDiff training. Functionality and APIs are unchanged

Deeplearning4J: Known issues: 1.0.0-beta3

Deeplearning4J: Keras Import

ND4J

ND4J: New Features

  • Libnd4j new ops:

ND4J: Bug Fixes and Optimizations

  • Libnd4j native op fixes:

ND4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

  • CUDA 8.0 support has been removed. CUDA 9.0, 9.2 and 10.0 support is available in 1.0.0-beta3

ND4J: Known issues: 1.0.0-beta3

  • Android users may need to manually exclude the (now deprecated) module nd4j-base64. This is due to org.nd4j.serde.base64.Nd4jBase64 class being present in both nd4j-api and nd4j-base64 modules. Both versions have identical content. Use exclude group: 'org.nd4j', module: 'nd4j-base64' to exclude.

DataVec

DataVec: New Features

DataVec: Optimizations and Bug Fixes

Arbiter

Arbiter: Fixes

ND4S

Adds proper support for java 9 modules:

Added new model zoo module called omnihub for dl4j and new samediff models. These can be found here: See more in the new omnihub section.

Migrated the snapshots to sonatype's new repository https://s01.oss.sonatype.org. More context can be found here:

Adds Spark 3 support:

Reduce binary size using selective compilation:

Remove scala 11 support. Only supporting scala 2.12:

Extensive enhancements for samediff model training:

Add beginnings of graph optimization framework:

Many onnx model import improvements (add new ops):

Add new op subset frameworks: allows selective inclusion of operations to enable users to reduce binary size:

Update onednn to 2.2:

Add updated jetson nano support:

Enhance codegen exposing more functions for samediff:

Add new samediff eager mode (mainly used for model import use cases):

Add dimensions as input variables:

Fix cuda shuffle:

Fix up conditions/matching:

ImageResize updates to improve compatibility with onnx:

Rewrite compat sparse to dense op:

Fix creation of string scalar ndarrays:

Fix serialization with conv/pooling3d:

Add Spark 3 support:

Added Deconvolution3D for keras import

Add full channels last support for 3d convolutions:

Fix confusion matrix count increments:

Fix Conv3D data format serialization:

Add LabelsSource to BagOfWordsVectorizer (thanks to XAI!):

Performance enhancement for mnist related datasetiterators:

Fix memory leak in datavec-arrow:

Modules will be made available from a Pretrained class:

This website follows the layout. This website has several sections of documentation following this layout. Below is an overview of the sections of the site:

The libraries are completely open-source, Apache 2.0 under open governance at the . The Eclipse Deeplearning4j project welcomes all contributions. See our and our to get involved.

Snapshots will also be published every 2 days automatically now to get around sonatype ossrh deletion of snapshots every 3 days. This should increase robustness of the snapshots.

Worked around an issue with github actions pre emptively upgrading visual studio breaking the cuda builds:

A number of bugs were fixed with LSTM and CUDNN:

- avoid shuffle operations on gpu. Pre save data on cpu in mini batches. For more help, please post on the forums at https://community.konduit.ai/

Add batch normalization support for RNNs:

Disable old helpers by default

Minor unit test fixes:

Add keras support for cnn 1d NWHC:

Move the warning about version check to tracing so it stops logging this during normal usage confusing users:

Allow 1d convolutions to accept feed forward as input type:

Remove the old benchmark suite and migrate it to contrib:

Remove old MKLDNNLSTM helper (it never fully functioned anyways):

Fixed an issue with helper reflection ensuring the classes would be loaded properly

Fix minor workspace activation bug:

Fixed compilation error when running anything more than jdk 8 and NIO buffers:

Move logback to be a test dependency for some modules:

Keras model import fixes for GlobalPooling:

Add Eigen op as public ensuring easier use when running eigenvalue decomposition

Fixes minor issue with choice(..) op thanks to

Minor applyScalar typo fix:

Fixed serialization bug with StringToTimeTransform: thanks to community member

Made python4j's python path setting more robust by migrating from set path calls to add path calls:

Fixes bug with numpy import array jvm crashes:

Fixed inconsistent conventions between SameDiffVariable getArr and getArrForName()..

Read the announcement at for the highlights of this release.

Added Keras model import support for tf.keras models ,

DL4J now supports NHWC (channels last) data format for all CNN 2D layers, in addition to NCHW

DL4J now supports NWC (channels last - [minibatch, sequence_length, size]) for all RNN and CNN 1D layers, in addition to NCW

Added Deconvolution3D layer

Keras import: added ReLU, ELU and Softmax advanced activation layers and Swish activation function

Added DL4J SameDiffLoss class (for easily-defined DL4J ILossFunction's via SameDiff)

Useful exceptions are now thrown when attempting to perform unsupported operations on FastText

Added MultiLayerNetwork.evaluate(MultiDataSetIterator) and .evaluateRegression(MultiDataSetIterator) methods ,

Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff ,

Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules)

Note: this is a breaking change for some class packages/imports. See for details on exact package changes

Deeplearning4j UI: Webjars versions locked down using dependency management to avoid check on each build

Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff

Refactored/merged modules dl4j-perf and dl4j-util into deeplearning4j-core

Fixed an issue with BertWordPieceTokenizer - potential StackOverflowError with certain inputs

Fixed an issue with GlobalPooling layer with masks of different datatype to the activations datatype

Fixed an issue with DL4JModelValidator for ComputationGraph

Fixed an issue where SameDiff layers in DL4J could throw an exception when used with transfer learning

Weight initialization for EmbeddingLayer and EmbeddingSequenceLayer now no longer depend on the vocabulary size (only the vector size)

Fixed an issue with Keras import with bidirectional layers + preprocessors

DL4J UI: added redirect from /train to /train/overview

Fixed an issue where RecordReaderDataSetIterator builder collectMetaData configuration was not being applied

Fixed an issue where MultiLayerNetwork evaluation was not passing metadata to the IEvaluation instances during evaluation ,

Fixed an issue with Spark training SharedTrainingMaster when training with a ComputationGraph and MultiDataSets

Assorted fixes for edge cases for DL4J Keras import

deelpearning4j-nlp-korean will no longer be released for Scala 2.12 due to required dependency only having Scala 2.11 version avairable

Fix for ConvolutionalIterationListener for ComputationGraph

Fixed an issue where dataset and model zoo downloads could get stuck if the server fails to send any data (now: timeout + retry)

DL4J ModelSerializer no longer writes temporary files when restoring models from InputStream

Fixes issues with UIServer multi session mode, and potential shutdown race condition

Fixed an issue where TfidfVectorizer.vectorize() could throw a NPE when fit from LabelAwareIterator

SameDiff multi-threaded inference enhanced (and fixed) - a single SameDiff instance can now be used for inference safely and efficiently from multiple threads

cuDNN support added to SameDiff (automatically enabled for nd4j-cuda-10.x backend)

Added ND4J namespaces: Nd4j.cnn, Nd4j.rnn, Nd4j.image

rgbToHsv, hsvToRgb

rgbToYiq, yiqToRgb, rgbToYuv, yuvToRgb

imageResize

gamma, poisson, shuffle

clipByAvgNorm, embeddingLookup

mergeMaxIndex

cReLU

upsampling3d

triangular_solve

tri operation

triu operation

lstmLayer (note old lstmLayer method renamed to lstmBlock)

gru

Added new Loss operations namespace - Nd4j.loss

HSVToRGB, RGBToHSV, Igamma, Igammac, RandomGamma, RandomPoisson, RandomPoissonV2, RandomShuffle

Added SameDiff ProfilingListener - writes op performance profiles in Chrome profiler format (load in chrome://tracing/)

Added SameDiff ProfileAnalyzer tool to compare profiles output from ProfilingListener (or Tensorflow)

SameDiff listener API: added frame and iteration information for listener methods

Added (non-backend-specific) method of accessing Nd4j environment: Nd4j.getEnvironment() method (environment info and low-level configuration options)

Improved memory limits/configuration support for libnd4j (c++)

Added pairwise (broadcastable) power backprop operation

Updated JavaCPP presets MKL version to 2020.0 from 2019.5

Added DynamicCustomOp dargs - datatype arguments

Output datatype configuration for Range op , SequenceOp , ConfusionMatrix

Added tensormmul_bp op

OpenBLAS version upgraded to 0.3.8

libnd4j (c++ codebase underlying DL4J, ND4J and SameDiff) refactored to be more easily embeddable in other C++ projects

ImagePreProcessingScaler now supports preprocessing of labels (for segmentation)

Additional datatypes now supported for nd4j-tensorflow TensorflowConversion

SameDiff operation namespaces (sd.math, sd.image, etc) are now code generated to ensure SameDiff and ND4J namespaces are identical (all operations included, same API)

Added ND4J ArchiveUtils.unzipFileTo(String, String, boolean logFiles) overload to enable/disable extracted file path logging

Added weight format configuration for following operations: conv1D, conv2D, conv3D, deconv2d, deconv3d, depthwiseConv2d, pointwiseConv2d, sconv2d

Added backprop operation implementations for mergemax, mergeadd, mergeavg operations

MKL version upgraded to 2020.0 2020.1; OpenCV upgraded from 4.2.0 to 4.3.0

SameDiff: DifferentialFunctionFactory class removed in favor of namespace methods (sd.math, sd.linalg, etc)

Added lstmLayer_bp operation

Added gru_bp operation

linspace operation can now use both targs and arrays for start/end/size arguments

Assorted dependency updates - OpenBLAS (0.3.9), OpenCV (4.3.0), Leptonica (1.79.0)

Upgraded assorted dependency versions: javax.activation:activation (1.1 -> 1.1.1), stream analytics (2.7.0->2.9.8), Apache Spark (2.4.3->2.4.5), Jackson databind (2.10.1 -> 2.10.3), Vertx (3.8.3 -> 3.9.0)

Added nd4j-common-tests ResourceUtils.listClassPathfiles method

Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff ,

SameDiff - added CuDNN support

Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules)

Note: this is a breaking change for some class packages/imports. See for details on exact package changes

Fixed some issues with Tensorflow import of FusedBatchNorm operation

Fixed an issue where the Roll operation did not match Tensorflow operation

Fixed an issue where ArchiveUtils could fail to create the top level destination directory when it does not exist

Fixed an issue where resize_bicubic operation did not match Tensorflow for some configuration values

Pad operation now supports long/int64 values for padding array

Fixed an issue where hashcode operation shape function wasn't always returning int64/long dtype

Fixed an issue with reshape operation on empty arrays with -1s

Improved performance on CUDA for concat operation and CPU/GPU

On CPU for NHWC case

Generally

On CUDA for 2D case

Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff

Fixed a small SameDiff execution issue for switch operation where the predicate is a constant

Fixed an issue with batchnorm operation when input arrays have unusual strides

Merged nd4j-buffer, nd4j-content modules into nd4j-api

Deleted deprecated nd4j-jackson module (remaining functionality available in nd4j-api)

Deleted unused/unmaintained nd4j-camel and nd4j-gson modules

Optimization for legacy random ops

Optimization for broadcast operations , , , ,

Performance optimization for multiple operations: softmax, squeeze, expand_dims, tanh

Optimization for transpose/permute operations

Performance enhancement: MKLDNN matmul used for some mmul operation cases

Optimization for gather operation on CPU

Optimization for stack/unstack operations on CPU

Optimization for split operation (CPU and CUDA)

ND4J initialization no longer logs number of OpenMP BLAS threads for CUDA

Optimization: Fixed issues with auto-vectorization on multple CPU operations

Optimization for reshape operation ,

Fixed an issue where INDArray.hashCode() could cause an exception on some datatypes

Optimization for CPU: MKLDNN is now used for softmax, tanh, softmax_bp and tanh_bp operations , , ,

Fixed random_exponential operation

Improved performance on C++ SameDiff graph execution via reduced array zeroing where safe to do so

Improved C++ indexing implementation impacting CPU performance on some operations

Fixed an issue where Split operation could have incorrect output shapes for empty arrays

Fixed some issues with SameDiff.equals method

Fixed an issue with reshape operation output shape on empty arrays ,

Nd4j.gemm now uses Mmul operation internally to avoid potential threading issues with direct BLAS calls on CUDA

Fixed an edge case issue with percentile operation

Fixed an edge case issue for cusolved (CUDA) in libnd4j

Fixed an issue with error formatting for segment operations for incorrect lengths

Fixed an issue where ND4J workspaces were not guaranteed to be unique

Fixed some operation implementations when operating on views (Batch/Space to Space/Batch/Depth; batchnorm_bp)

Fixed an issue where exponential distribution random number generation operation could produce infinities extremely rarely (~1 in 10^9 values)

Fixed an issue with long file paths for memory mapped workspaces on Windows

Memory for memory mapped workspaces are now deallocated immediately when workspace is destroyed, instead of waiting for GC to free memory

Fall-back to other BLAS implementation for cases where MKLDNN GEMM implementation is slow

Set nd4j-native source/target to Java 7 ,

datavec-python: added zero-copy support for bytes/byte buffers

datavec-python: Python exceptions are now thrown as Java exceptions

datavec-python: Added support for additional NumPy datatypes

datavec-python: Python version upgraded from 3.7.6 to 3.7.7

Deleted not properly maintained modules: datavec-camel, datavec-perf

Fixed missing BOOL datatype support for arrow conversion functionality

Assorted fixes for datavec-python ,

Fixed an issue with LineRecordReader where initialization was performed unnecessarily (adding performance overhead)

Refactoring to decouple configuration and learning methods from their implementations

Added builder patterns for all configuration classes

Fixes an issue with GridSearchCandidateGenerator not working correctly for some cases ,

the underlying technology rather than the defacto interfaces. If there is interest in something higher level, please discuss it on

ARM support: We have included armcompute modules for core convolution routines. These routines can be found

Added more support for avx/mkldnn/cudnn linked acceleration in our c++ library. We now have the ability to distribute more combinations of pre compiled math kernels via different combinations of classifiers. See the for more details.

. This is useful for OSGI and application server environments.

: We now have basic support for CTC loss in nd4j. This will enable the import of CTC loss based models for speech recognition as well as OCR.

Contributors:

Deeplearning4j: New SameDiff layers with training support -

Deeplearning4j - new layers: Locally connected 1d , Locally connected 2d

Added new SameDiff layers (automatic differentiation - only single class, forward pass definition required) to DL4J with full training support - SameDiffLayer, SameDiffVertex, SameDiffOutputLayer, SameDiffLambdaLayer, SameDiffLambdaVertex - note that these are CPU-only execution for now

Resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property. Note that it is also possible to set a different base location for downloads (for local mirrors of DL4J resources)

Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs. Same information is available (without a crash) for MultiLayerNetwork/ComputationGraph.memoryInfo methods. Can be disabled (or output directory set) using -

Added Composite[Multi]DataSetPreProcessor to enable multiple [Multi]DataSetPreProcessors to be applied in a single iterator

Added ComputationGraph evaluate methods for multi-output networks: evaluate(DataSetIterator, Map<Integer,IEvaluation[]>) and evaluate(MultiDataSetIterator, Map<Integer,IEvaluation[]>)

Added JointMultiDataSetIterator - utility iterator used to create MultiDataSetIterator from multiple DataSetIterators

GraphVertices may now have trainable parameters directly (not just enclose layers with trainable parameters)

Added MultiLayerNetwork/ComputationGraph getLearningRate methods

Added RandomDataSetIterator and RandomMultiDataSetIterator (mainly for testing/debugging)

Added cyclical "1cycle" schedule for learning rate schedules etc -

RDD repartitioning for Spark training is more configurable (adds Repartitioner interface)

Added ComputationGraph.getIterationCount() and .getEpochCount() for consistency with MultiLayerNetwork

Added locally connected 1d layer

Spark "data loader" API (mainly for Spark)

Spark evaluation: added evaluation method overloads that allow specifying the number of evaluation workers (less than number of Spark threads)

CnnSentenceDataSetIterator now has a Format argument, and supports outputting data for RNNs and 1D CNNs

Added ComputationGraph/MultiLayerNetwork.pretrain((Multi)DataSetIterator, int epochs) method overloads

MultiLayerNetwork and ComputationGraph now have output method overloads where the network output can be placed in the user-specified workspace, instead of being detached . This can be used to avoid creating INDArrays that need to be garbage collected before native memory can be freed.

EmbeddingSequenceLayer now supports [minibatch,1,seqLength] format sequence data in addition to [minibatch,seqLength] format data

CuDNN batch norm implementation will now be used for rank 2 input, not just rank 4 input

Environment variables and system properties for DL4J have been centralized into DL4JResources and DL4JEnvironmentVars classes, with proper descriptions

MultiLayerNetwork and ComputationGraph output/feedForward/fit methods are now thread-safe via synchronization. Note that concurrent use is not recommended due to performance (instead: use ParallelInference); however the now-synchronized methods should avoid obscure errors due to concurrent modifications

BarnesHutTSNE now throws a useful exception in the case where the distance metric is undefined (for example, all zeros plus cosine similarity)

ComputationGraph.addListeners was not working correctly if listeners were already present ,

TinyImageNetDataSetIterator did not validate/correctly use input shape configuration ,

BatchNormalization layer now correctly asserts that nOut is set if required (instead of unfriendly shape errors later)

Fixed issue where OutputLayer may not initialize parameter constraints correctly

Fixed performance issue with Nesterov updater using CPU-only op for CUDA execution

Removed TerminationCondition for DL4J optimizers - was not used in practice, and had minor overhead

Fixed issue where EvaluativeListener could hit a workspace validation exception when workspaces are enabled

Fixed issue where TrainingListener.onEpochStart/onEpochEnd were not being called correctly for ComputationGraph

Fixed workspace issue with TensorFlowCnnToFeedForwardPreProcessor

Performance optimization for BatchNormalization when using CuDNN

Performance optimization: Dropout will be applied in-place when safe to do so, avoiding a copy

Added CuDNN implementation of Dropout

Reduced memory use for CuDNN: CuDNN working memory is now shared and reused between layers within a network

CuDNN batch normalization implementation would fail with FP16 datatype

Fixed issue Bidirectional LSTM may incorrectly use workspaces causing an exception

Fixed issue with early stopping where scores to be maximized (accuracy, f1, etc) were not properly triggering termination conditions

Fixed issue where label mask counter could be incorrectly incremented in ComputationGraph.computeGradientAndScore()

ComputationGraph was not setting lastEtlTime field during training

Fixed issue with AutoEncoder layer when workspaces are enabled

Fixed issue with EmbeddingSequenceLayer use of mask arrays

Lombok is now provided scope everywhere, isn't on user classpath when using DL4J

Fixed issue where WordVectorSerializer.readParagraphVectors(File) initialization of label source

Spark training (gradient sharing) now properly handles empty partition edge case when encountered during training

Errors are propagated better/more consistently for Spark gradient sharing training

Fixed issue with 1D CNN layers with mask arrays and stride > 1 (masks not being correctly downsized)

DL4J Batch norm implementation was not correctly adding epsilon value during inference, only during training (CuDNN unaffected)

CuDNN subsampling layers with max pooling and ConvolutionMode.SAME may have taken padding value (0) as the maximum for border values when all non-padding values are less than 0

Spark training with gradient sharing now passes listeners to workers correctly

Fixed rare (and non-terminal) concurrent modification issue with UI and FileStatsStorage

CuDNN convolution layer now supports dilation > 2 (previously: used DL4J conv layer implementation as a fallback)

Yolo2OutputLayer now implements computeScoreForExamples()

SequenceRecordReeaderDataSetIterator now handles the "no labels" case correctly

Fixed issue where BarnesHutTSNE could hit a workspace validation exception

EMNIST iterator could produce incorrect data in some cases after a reset

deeplearning4j-modelexport-solr: now uses Lucene/Solr version 7.4.0 (was 7.3.0)

Mask arrays for CNN2d layers must be in broadcastable 4d format: [minibatch,depth or 1, height or 1, width or 1] - previously they were 2d with shape [minibatch,height] or [minibatch,width]. This provents ambiguity in later cases (pooling layers), and allows for more complex masking scenarios (such as masking for different image sizes in same minibatch).

Some older/deprecated Model and Layer methods have been removed. (validateInput(), initParams()). Some custom layers may need to be updated as a result

Windows users are unable to load the HDF5 files used in SvhnLabelProvider (used in HouseNumberDetection example). Linux/Mac users are unaffected. A workaround for windows users is to add the sonatype snapshot dependency org.bytedeco.javacpp-presets:hdf5-platform:jar:1.10.2-1.4.3-SNAPSHOT

Added the ability to write Numpy .npy format using Nd4j.writeAsNumpy(INDArray,File) and convert an INDArray to a numpy strict in-memory using Nd4j.convertToNumpy(INDArray)

ND4j-common ClassPathResource: added ClassPathResource.copyDirectory(File)

Added Nd4j.randomBernoulli/Binomial/Exponential convenience methods

Added way to disable/suppress ND4J initialization logging via org.nd4j.log.initialization system property

SameDiff class - most op/constructor methods now have complete/useful javadoc

Workspaces can now be disabled globally, ignoring workspace configuration. This is mainly used for debugging; use Nd4j.getWorkspaceManager().setDebugMode(DebugMode.DISABLED) or Nd4j.getWorkspaceManager().setDebugMode(DebugMode.SPILL_EVERYTHING); to enable this. [Link]

Added EnvironmentalAction API for environment variable processing

ND4J environment variables and system properties have been centralized in ND4jEnvironmentVars and ND4jSystemProperties classes and

Fixed issue where INDArray.toDoubleArray() with true scalars (rank 0 arrays)

Fixed issue with DataSet.sample() not working for rank 3+ features

IActivation implementations now validate/enforce same shape for activations and gradients

Fixed issue with muliColumnVector where vector is 1d

ImagePreProcessingScaler now supports serialization via NormalizerSerializerStrategy and ModelSerializer

Performance optimization for threshold encoding used in DL4J's Spark gradient sharing distributed training implementation

SameDiff: Fixed issue where memory wasn't always released after execution

DataSet.save() and MultiDataSet.save() methods now save example metadata when present

Fixed issue with KFoldIterator when dataset does not divide equally into folds with no remainder

Fixed issue where version check functionality could fail to load resources if resources are on a path with spaces

Simplified DataSetIterator API: totalExamples(), cursor() and numExamples() - these were unsupported on most DataSetIterator implementations, and not used in practice for training. Custom iterators should remove these methods also

Long-deprecated DataSet.getFeatureMatrix() has been removed. Use DataSet.getFeatures() instead.

Not properly maintained complex number support classes (IComplexNumber, IComplexNDArray) have been removed entirely

Added AnalyzeLocal class to mirror functionality of AnalyzeSpark (but without Spark dependency)

Added JacksonLineSequenceRecordReader: RecordReader used for multi-example JSON/XML where each line in a file is an independent example

Added RecordConvert.toRecord(Schema, List<Object>)

Added missing FloatColumnCondition

Added CSVLineSequenceRecordReader for "each line in CSV is a sequence, and sequence is single-valued/univariate"

Added CSVMultiSequenceRecordReader for "multiple multi-valued sequences in a single CSV" data

Fixed issue with NativeImageLoader on Android

Fixed issue with ExcelRecordReader

Fixed issue where bad args for CSVRecordReader.next(int) could cause an unnecessarily large list to be generated

Added DataSource interface. Unlike old DataProvider, this does not require JSON serializability (only a no-arg constructor)

Added numerous enhancements and missing configuration options (constraints, dilation, etc)

stepCounter, epochCounter and historyProcessor can now be set

Random seed is now loaded for ACPolicy is loaded

Added OutputAdapter interface and MultiLayerNetwork/ComputationGraph.output method overloads using OutputAdapter (avoids allocating off-heap memory that needs to be cleaned up by GC) , ,

Added ComputationGraph/MultiLayerNetwork rnnTimeStep overload with user-specified workspace.

Added Cnn3DLossLayer

ParallelInference: Instances can now update the model in real-time (without re-init)

ParallelInferenc: Added ParallelInference INPLACE mode

Added validation for incompatible loss/activation function combinations (such as softmax+nOut=1, or sigmoid+mcxent). New validation can be disabled using outputValidation(false)

Spark training: Added full fault tolerance (robust failure recovery) for gradient sharing implementation

Spark training now supports configuring ports more flexibly (and differently for different workers) using PortSupplier

Spark training: overhauled gradient sharing threshold adaption algorithms; made it possible to customize threshold settings, plus made defaults more robust to initial threshold configuration improving convergence speed in some cases.

Spark training: implemented chunked messaging to reduce memory requirements (and insufficient buffer length issues) for large messages

Spark training: Added MeshBuildMode configuration for improved scalability for large clusters

Spark network data pipelines: added FileBatch, FileBatchRecordReader etc for "small files" (images etc) distributed training use cases

Added FailureTestingListener for fault tolerance/debugging purposes

Upgraded Apache Lucene/Solr to version 7.5.0 (from 7.4.0)

Added system properties (org.deeplearning4j.tempdir and org.nd4j.tempdir) to allow overriding of the temporary directories ND4J and DL4J use

Mode MultiLayerNetwork/ComputationGraph.clearLayerStates methods public (was protected)

AbstactLayer.layerConf() method is now public

ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper

Improved validation and error mesages for invalid inputs/labels in Yolo2OutputLayer

Spark training: added SharedTrainingMaster.Builder.workerTogglePeriodicGC and .workerPeriodicGCFrequency to easily configure the ND4J garbage collection configuration on workers. Set default GC to 5 seconds on workers

Spark training: added threshold encoding debug mode (logs current threshold and encoding statistics on each worker during training). Enable using SharedTrainingConfiguration.builder.encodingDebugMode(true). Note this operation has computational overhead.

Fixed an issue where L1/L2 and updaters (Adam, Nesterov, etc) were applied before dividing gradients by minibatch to obtain average gradient. To maintain old behaviour, use NeuralNetConfiguration.Builder.legacyBatchScaledL2(true) .

Fixed an issue where EarlyStoppingScoreCalculator would not correctly handle "maximize score" cases instead of minimizing

Fixed order (BGR vs. RGB) for VGG16ImagePreProcessor channel offset values

Fixed bug with variational autoencoders using weight noise

Fixed issue with BaseDataSetIterator not respecting the 'maximum examples' configuration

Optimization: A workspace is now used for ComputationGraph/MultiLayerNetwork evaluation methods (avoids allocating off-heap memory during evaluation that must be cleaned up by garbage collector)

Fixed an issue where shuffling combined with a subset for MnistDataSetIterator would not maintain the same subset between resets

Fixed issue with StackVertex.getOutputType

Fix issue with CNN to/from RNN preprocessors handling of mask arrays

Fixed issue with VGG16 non-pretrained configuration in model zoo

Fixed issue with TransferLearning nOutReplace where multiple layers in a row are modified

Fixed issue with CuDNN workspaces where backpropagation is performed outside of a standard fit call

Fixed an issue with dropout masks being cleared prematurely on output layers in ComputationGraph

RecordReaderMultiDataSetIterator now supports 5D arrays (for 3D CNNs)

Fixed bug in multi input/output ComputationGraphs with TBPTT combined with both masking and different number of input/output arrays

Improved input validation/exceptions for batch normalization layer

Fixed bug with TransferLearning GraphBuilder nOutReplace when combined with subsampling layers

SimpleRnnParamInitializer now properly respects bias initialization configuration

Fixed SqueezeNet zoo model non-pretrained configuration

Fixed Xception zoo model non-pretrained configuration

Fixed an issue with some evaluation signatures for multi-output ComputationGraphs

Improved MultiLayerNetwork/ComputationGraph summary method formatting for large nets

Fixed an issue where gradient normalization could result in NaNs if gradient is exactly 0.0 for all parameters in a layer

Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate could throw an exception for SGD and NoOp updaters

Fixed an issue with StackVertex plus masking in some rare cases

Fixed an issue with JSON deserialization of frozen layers in pre-1.0.0-alpha format

Fixed an issue where GraphBuilder.removeVertex can fail under some limited circumstances

Fixed a bug in CacheableExtractableDataSetFetcher

DL4J Spark training: Fixed issues with thread/device affinity for multi-GPU training + evaluation

DL4J Spark training: Made all Aeron threads daemon threads to prevent Aeron from stopping JVM shutdown when all other threads have completed

Added cudnnAllowFallback configuration for BatchNormalization layer (fallback to built-in implementation if CuDNN fails unexpectedly)

Fixed some rare concurrency issues with multi-worker (multi-GPU) nodes for Spark training

Fixed an issue with BatchNormalization layers that prevented the mean/variance estimates from being synced properly on each worker for GradientSharing training, causing convergence issues

Added a check to detect ZipSlip CVE attempts in ArchiveUtils

DL4J Spark training and evaluation: methods now use Hadoop Configuration from Spark context to ensure runtime-set configuration is available in Spark functions reading directly from remote storage (HDFS etc)

MultiLayerNetwork and ComputationGraph now properly support more than Integer.MAX_VALUE parameters

Added data validation for Nd4j.readTxt - now throws exception on invalid input instead of returning incorrect values

Fixed an issue with KNN implementation where a deadlock could occur if an invalid distance function (one returning "distances" less than 0) was utilized

Added synchronization to loading of Keras import models to avoid thread safety issues in the underlying HDFS library used for loading

Fixed rare issue for Async(Multi)DataSetIterator with large prefetch values

MultiLayerConfiguration/ComputationGraphConfiguration pretrain(boolean) and backprop(boolean) have been deprecated and are no longer used. Use fit and pretrain/pretrainLayer methods instead.

ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper which should be used instead

deeplearning4j-nlp-korean module now has Scala version suffix due to scala dependencies; new artifact ID is deeplearning4j-nlp-korean_2.10 and deeplearning4j-nlp-korean_2.11

Running multiple Spark training jobs simultaneously on the one physical node (i.e., multiple JVMs from one or more Spark jobs) may cause problems with network communication. A workaround for this is to manually set a unique stream ID manually in the VoidConfiguration. Use a unique (or random) integer value for different jobs

Fixed import issue due to Keras JSON format changes for Keras 2.2.3+

Added Keras import for timeseries preprocessing

Elephas

Fixed issue with importing models with reshaping after an embedding layer

Added support for Keras masking layers

Fixed JSON deserialization issue with some layers/preprocessors, such as Permute

Fixed issue with Keras import of Nadam configuration

Added SameDiff training and evaluation: SameDiff instances can now be trained directly using DataSetIterator and MultiDataSetIterator, and evaluated using IEvaluation instances (that have been moved from ND4J to DL4J)

Added GraphServer implementation: c++ inference server for SameDiff (and Tensorflow, via TF import) with Java API

SameDiff instances can now be loaded from serialized FlatBuffers format (SameDiff.asFlatFile plus fromFlatFile)

Added MKL-DNN support for some operations (Conv2d, etc)

Upgraded ND4J (and DataVec) to Arrow 0.11.0 , which also fixes

Added Nd4j.where op method (same semantics as numpy.where)

Added Nd4j.stack op method (combine arrays + increase array rank by 1)

Matrix band part

Scatter ND, ND-add, ND-sub and ND-update ops

Sparse softmax cross entropy loss with logits

Histogram fixed width op

broadcast_to op

deconv3d op added

Unsorted segment ops added

Segment_X backprop ops added

batchnorm_new op added that supports multiple axes for mean/variance

GRU cell backprop added

Nd4j Preconditions class now has methods for formatting INDArray arguments ,

SameDiff loss functions: cleanup plus forward pass implementation

CudaGridExecutioner now warns that exception stack traces may be delayed to avoid confusion in debugging exceptions occuring during asynchronous execution of ops

JavaCPP and JavaCPP-presets have been upgraded to version 1.4.3

Improved Javadoc on SDVariable class

Fixes for android: Remove use of RawIndexer

Libnd4j custom ops: conv op weight layouts are now not dependent on the input format (NCHW/NHWC) - now always [kH, kW, inChannels, outChannels] for 2d CNNs, [kH, kW, kD, inChannels, outChannels] for 3d CNNs. ,

Dot operation backprop , determinant

Backprop op fix for the broadcast case for some pairwise transform custom op implementations

Fix for reverse custom op with rank 1 inputs

ATan2 op is now broadcastable

Boolean custom op broadcast fixes/additions

Scatter op edge case fixes

ArgMin shape function fix , negative axis fix

Unique op fix

Pad op fix

Fixed where op shape function

SVD rank 1 edge case fix

Range op

Split and space_to_batch fixes

Broadcast dynamic shape

embedding_lookup op now supports multiple input arrays

Matrix determinant op edge case (rank 0 result) shape fix

SameDiff TensorFlow import: fixes for multiple operations , , ,

SameDiff: Improved error handling for multiple outputs case

Fixed issue where INDArray.permute would not correctly throw an exception for invalid length case

Fixed issues with INDArray.get/put with SpecifiedIndex ,

Minor change to DataSet.merge - signature now accepts any DataSet subtypes

INDArray.transposei operation was not in-place

Fixed issues with INDArray.mmul with MMulTranspose

Added additional order validation for ND4J creation methods (create, rand, etc)

Fix for ND4J binary deserialization (BinarySerde) when deserializing from heap byte buffers

Fixed issue with Nd4j-common ClassPathResource path resolution in some IDEs

Fixed issue where INDArray.get(interval) on rank 1 array would return rank 2 array

Fixed a validation issue with Nd4j.gemm/mmuli on views

INDArray.assign(INDArray) no longer allows assigning different shape arrays (other than scalar/vector cases)

NDarrayStrings (and INDArray.toString()) now always uses US locale when formatting numbers

Fixed an issue with GaussianDistribution specific to V100 GPUs

Fixed an issue with bitmap compression/encoding specific to V100 GPUs

Transforms.softmax now throws an error on unsupported shapes instead of simply not applying operation

VersionCheck functionality: handle case where SimpleFileVisitor is not available on earlier versions of Android

SameDiff convolution layer configuration (Conv2dConfig/Conv3dConfig/Pooling3dConfig etc) have had parameter names aligned

nd4j-base64 module contents have been deprecated; use the equivalent classes in nd4j-api from now on

Some classes in nd4j-jackson module has been deprecated; use the equivalent classes in nd4j-api from now on

Added NativeImageLoader method overloads for org.opencv.core.Mat and String as filename

Fix for JDBCRecordReader handling of null values

Improved errors/validation for ObjectDetectionRecordReader for invalid input (where image object centers are outside of image bounds)

Fixed issue where FileSplit using methods that are unavailable on earlier versions of Android

Added SerializableHadoopConfiguration and BroadcastHadoopConfigHolder for cases where a Hadoop configuration is required in Spark functions

Fixed issue with JDBCRecordReader's handling of real-valued column result types

Added validation and useful exception for CSVRecordReader/LineRecordReader being used without initialization

Fixed some issues with dropout layers

Added conversion between org.nd4j.linalg.primitives.Pair/Triple and Scala Tuple

https://github.com/eclipse/deeplearning4j/pull/9631
https://github.com/eclipse/deeplearning4j/pull/9626
https://github.com/KonduitAI/omnihub-zoo
https://twitter.com/Brian_Fox/status/1357414532512104448
https://github.com/eclipse/deeplearning4j/pull/9618
https://github.com/eclipse/deeplearning4j/pull/9444
https://github.com/eclipse/deeplearning4j/pull/9443
https://github.com/eclipse/deeplearning4j/pull/9451
https://github.com/eclipse/deeplearning4j/pull/9440
https://github.com/eclipse/deeplearning4j/pull/9501
https://github.com/eclipse/deeplearning4j/pull/9402
https://github.com/eclipse/deeplearning4j/pull/9411
https://github.com/eclipse/deeplearning4j/pull/9489
https://github.com/eclipse/deeplearning4j/pull/9475
https://github.com/eclipse/deeplearning4j/pull/9526
https://github.com/eclipse/deeplearning4j/pull/9502
https://github.com/eclipse/deeplearning4j/pull/9587
https://github.com/eclipse/deeplearning4j/pull/9599
https://github.com/eclipse/deeplearning4j/pull/9443
https://github.com/eclipse/deeplearning4j/pull/9451
https://github.com/eclipse/deeplearning4j/pull/9569
https://github.com/eclipse/deeplearning4j/pull/9423
https://github.com/eclipse/deeplearning4j/pull/9425
https://github.com/eclipse/deeplearning4j/pull/9432
https://github.com/eclipse/deeplearning4j/pull/9478
https://github.com/eclipse/deeplearning4j/pull/9503
https://github.com/eclipse/deeplearning4j/pull/9500
https://github.com/eclipse/deeplearning4j/pull/9538
https://github.com/eclipse/deeplearning4j/pull/9535
https://github.com/eclipse/deeplearning4j/pull/9533
https://github.com/eclipse/deeplearning4j/pull/9584
https://github.com/eclipse/deeplearning4j/pull/9472
https://github.com/eclipse/deeplearning4j/pull/9459
https://github.com/eclipse/deeplearning4j/pull/9551
https://github.com/eclipse/deeplearning4j/pull/9495
https://github.com/eclipse/deeplearning4j/pull/9566
https://github.com/eclipse/deeplearning4j/pull/9556
https://github.com/eclipse/deeplearning4j/pull/9648
https://github.com/eclipse/deeplearning4j/pull/9553
https://github.com/eclipse/deeplearning4j/pull/9399
https://github.com/eclipse/deeplearning4j/pull/9578
https://github.com/eclipse/deeplearning4j/pull/9553
https://github.com/eclipse/deeplearning4j/pull/9648
https://github.com/eclipse/deeplearning4j/pull/9624
https://github.com/eclipse/deeplearning4j/pull/9612
https://github.com/eclipse/deeplearning4j/pull/9441
https://github.com/eclipse/deeplearning4j/blob/feb8eee5eb07239c49a4d14786114dc0394aad4e/omnihub/src/main/java/org/eclipse/deeplearning4j/omnihub/models/Pretrained.java#L30
divio framework
Eclipse foundation
community
Contribution guide
https://github.com/eclipse/deeplearning4j/pull/9355
https://github.com/eclipse/deeplearning4j/pull/9364
https://github.com/eclipse/deeplearning4j/pull/9368
https://github.com/eclipse/deeplearning4j/pull/9368
https://github.com/eclipse/deeplearning4j/pull/9373
https://github.com/eclipse/deeplearning4j/pull/9372
https://github.com/eclipse/deeplearning4j/issues/9142
https://github.com/eclipse/deeplearning4j/pull/9338
https://github.com/eclipse/deeplearning4j/pull/9343
https://github.com/eclipse/deeplearning4j/pull/9346
https://github.com/eclipse/deeplearning4j/pull/9353
https://github.com/eclipse/deeplearning4j/pull/9356
https://github.com/eclipse/deeplearning4j/pull/9365
https://github.com/eclipse/deeplearning4j/pull/9374
https://github.com/eclipse/deeplearning4j/pull/9381
https://github.com/eclipse/deeplearning4j/pull/9333
https://github.com/eclipse/deeplearning4j/pull/9350
https://github.com/eclipse/deeplearning4j/pull/9341
https://github.com/eclipse/deeplearning4j/pull/9351
https://github.com/eclipse/deeplearning4j/pull/9362
https://github.com/eclipse/deeplearning4j/pull/9378
https://github.com/eclipse/deeplearning4j/pull/9384
https://github.com/eclipse/deeplearning4j/pull/9328
https://github.com/eclipse/deeplearning4j/pull/9360
https://github.com/Romira915
https://github.com/eclipse/deeplearning4j/pull/9385
https://github.com/eclipse/deeplearning4j/pull/9377
https://github.com/yumg
https://github.com/eclipse/deeplearning4j/pull/9386
https://github.com/eclipse/deeplearning4j/pull/9348
https://github.com/eclipse/deeplearning4j/pull/9357
https://blog.konduit.ai/2020/05/14/deeplearning4j-1-0-0-beta7-released/
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
this link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
this link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
community forums
here
ADR here
The class loader is now overridable
Added Adabelief updater
Added maximum merge for Keras import
Keras cropping 2d validation fixes
Lenet input shape fix
Fix for obtaining the UI port from a property
CTC Loss
tensormmul_bp now run from c++
Arm compute added for conv2d and pooling operations
Add IndexUtils containing ravelMultiIndex and unravelIndex methods
Updates sortCooolIndicesGeneric to take any datatype
Add TVM runner
compare_and_bitpack now functions properly
Fix null pointer in cuda op executioner
Fix for samediff array cache removal during training
Fix for SD_FORBID_HELPERS environment variable
Fixed cuda bug in summary stats (mean, variance,)
https://github.com/eclipse/deeplearning4j/issues?q=is%3Apr+author%3Amjlorenzo305
Link
Link
Link
Link
Link
Link
Link
Link
system properties
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link

0.6.0

  • Custom layer support

  • Support for custom loss functions

  • Support for compressed INDArrays, for memory saving on huge data

  • Native support for BooleanIndexing where applicable

  • Initial support for combined operations on CUDA

  • Significant performance improvements on CPU & CUDA backends

  • Better support for Spark environments using CUDA & cuDNN with multi-gpu clusters

  • New UI tools: FlowIterationListener and ConvolutionIterationListener, for better insights of processes within NN.

  • Special IterationListener implementation for performance tracking: PerformanceListener

  • Inference implementation added for ParagraphVectors, together with option to use existing Word2Vec model

  • Severely decreased file size on the deeplearnning4j api

  • nd4j-cuda-8.0 backend is available now for cuda 8 RC

  • Added multiple new built-in loss functions

  • Custom preprocessor support

  • Performance improvements to Spark training implementation

  • Improved network configuration validation using InputType functionality

0.7.1

  • RBM and AutoEncoder key fixes:

    • Ensured visual bias updated and applied during pretraining.

    • RBM HiddenUnit is the activation function for this layer; thus, established derivative calculations for backprop according to respective HiddenUnit.

  • RNG performance issues fixed for CUDA backend

  • OpenBLAS issues fixed for macOS, powerpc, linux.

  • DataVec is back to Java 7 now.

  • Multiple minor bugs fixed for ND4J/DL4J

0.9.1

Deeplearning4J

  • Fixed issue with incorrect version dependencies in 0.9.0

  • Numerical stability improvements to LossMCXENT / LossNegativeLogLikelihood with softmax (should reduce NaNs with very large activations)

ND4J

Known Issues

  • Deeplearning4j: Use of Evaluation class no-arg constructor (i.e., new Evaluation()) can result in accuracy/stats being reported as 0.0. Other Evaluation class constructors, and ComputationGraph/MultiLayerNetwork.evaluate(DataSetIterator) methods work as expected.

    • This also impacts Spark (distributed) evaluation: workaround is to replace sparkNet.evaluate(testData); with sparkNet.doEvaluation(testData, 64, new Evaluation(10))[0];, where 10 is the number of classes and 64 in the evaluation minibatch size to use.

  • SequenceRecordReaderDataSetIterator applies preprocessors (such as normalization) twice to each DataSet (possible workaround: use RecordReaderMultiDataSetIterator + MultiDataSetWrapperIterator)

  • TransferLearning: ComputationGraph may incorrectly apply l1/l2 regularization (defined in FinetuneConfiguration) to frozen layers. Workaround: set 0.0 l1/l2 on FineTuneConfiguration, and required l1/l2 on new/non-frozen layers directly. Note that MultiLayerNetwork with TransferLearning appears to be unaffected.

1.0.0-beta

Highlights - 1.0.0-beta Release

  • Performance and memory optimizations for DL4J

Deeplearning4J

Deeplearning4J: New Features

  • New or enhanced layers:

Deeplearning4J: Bug Fixes and Optimizations

    • Fixes issues with custom and some Keras import layers on Android

  • Added new model zoo models:

    • (to do)

Deeplearning4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

  • WorkspaceMode.SINGLE and SEPARATE have been deprecated; use WorkspaceMode.ENABLED instead

  • Internal layer API changes: custom layers will need to be updated to the new Layer API - see built-in layers or custom layer example

  • Custom layers etc in pre-1.0.0-beta JSON (ModelSerializer) format need to be registered before they can be deserialized due to JSON format change. Built-in layers and models saved in 1.0.0-beta or later do not require this. Use NeuralNetConfiguration.registerLegacyCustomClassesForJSON(Class) for this purpose

  • ExistingDataSetIterator has been deprecated; use fit(DataSetIterator, int numEpochs) method instead

Deelpearning4J: 1.0.0-beta Known Issues

  • ComputationGraph TrainingListener onEpochStart and onEpochEnd methods are not being called correctly

  • DL4J Zoo Model FaceNetNN4Small2 model configuration is incorrect, causing issues during forward pass

  • Early stopping score calculators with values thar should be maximized (accuracy, f1 etc) are not working properly (values are minimized not maximized). Workaround: override ScoreCalculator.calculateScore(...) and return 1.0 - super.calculateScore(...).

Deeplearing4J: Keras Import

Deeplearning4J: Keras Import - API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

ND4J

ND4J: New Features

ND4J: Known Issues

  • Not all op gradients implemented for automatic differentiation

  • Vast majority of new operations added in 1.0.0-beta do NOT use GPU yet.

ND4J: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

DataVec

DataVec: New Features

DataVec: Optimizations and Bug Fixes

DataVec: API Changes (Transition Guide): 1.0.0-alpha to 1.0.0-beta

Arbiter

Arbiter: New Features

  • Added LayerSpace for OCNN (one-class neural network)

Arbiter: Fixes

0.8.0

0.8.0 -> 0.9.0 Transition Notes

Deeplearning4j

  • Updater configuration methods such as .momentum(double) and .epsilon(double) have been deprecated. Instead: use .updater(new Nesterovs(0.9)) and .updater(Adam.builder().beta1(0.9).beta2(0.999).build()) etc to configure

DataVec

  • CsvRecordReader constructors: now uses characters for delimiters, instead of Strings (i.e., ',' instead of ",")

Arbiter

  • Arbiter UI is now a separate module, with Scala version suffixes: arbiter-ui_2.10 and arbiter-ui_2.11

Version 0.8.0

  • Spark 2.0 support (DL4J and DataVec; see transition notes below)

  • New layers

  • New ComputationGraph vertices

    • L2 distance vertex

    • L2 normalization vertex

  • Per-output masking is now supported for most loss functions (for per output masking, use a mask array equal in size/shape to the labels array; previous masking functionality was per-example for RNNs)

  • L1 and L2 regularization can now be configured for biases (via l1Bias and l2Bias configuration options)

  • Evaluation improvements:

    • For both MultiLayerNetwork and SparkDl4jMultiLayer: added evaluateRegression, evaluateROC, evaluateROCMultiClass convenience methods

    • TSNE re-added to new UI

    • Training UI: now usable without an internet connection (no longer relies on externally hosted fonts)

    • UI: improvements to error handling for ‘no data’ condition

  • Epsilon configuration now used for Adam and RMSProp updaters

  • Fix for bidirectional LSTMs + variable-length time series (using masking)

  • Spark + Kryo: now test serialization + throw exception if misconfigured (instead of logging an error that can be missed)

  • MultiLayerNetwork now adds default layer names if no name is specified

  • DataVec:

    • JSON/YAML support for DataAnalysis, custom Transforms etc

    • ImageRecordReader refactored to reduce garbage collection load (hence improve performance with large training sets)

    • Faster quality analysis.

  • Arbiter: added new layer types to match DL4J

    • Performance improvement for Word2Vec/ParagraphVectors tokenization & training.

  • Batched inference introduced for ParagraphVectors

  • Nd4j improvements

    • New native operations available for ND4j: firstIndex, lastIndex, remainder, fmod, or, and, xor.

    • OpProfiler NAN_PANIC & INF_PANIC now also checks result of BLAS calls.

    • Nd4.getMemoryManager() now provides methods to tweak GC behavior.

  • Alpha version of parameter server for Word2Vec/ParagraphVectors were introduced for Spark. Please note: It’s not recommended for production use yet.

  • Performance improvements for CNN inference

0.7.2 -> 0.8.0 Transition Notes

  • Spark versioning schemes: with the addition of Spark 2 support, the versions for Deeplearning4j and DataVec Spark modules has changed

    • For Spark 1: use <version>0.8.0_spark_1</version>

    • For Spark 2: use <version>0.8.0_spark_2</version>

    • Also note: Modules with Spark 2 support are released with Scala 2.11 support only. Spark 1 modules are released with both Scala 2.10 and 2.11 support

0.8.0 Known Issues (At Launch)

  • Keras 1D convolutional and pooling layers cannot be imported yet. Will be supported in forthcoming release.

  • Keras v2 model configurations cannot be imported yet. Will be supported in forthcoming release.

1.0.0-beta5

Highlights - 1.0.0-beta5 Release

  • Added model server - remote inference of SameDiff and DL4J models using JSON or (optionally) binary serialization

  • Added Scala 2.12 support, dropped Scala 2.10 support. Modules with Scala dependencies are now released with Scala 2.11 and 2.12 versions

  • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading: 1.0.0-beta4_spark2 -> 1.0.0-beta5

  • Added FastText support to deeplearning4j-nlp

  • CUDA support for all ND4J/SameDiff Operations

    • In 1.0.0-beta4, some operations were CPU only. Now, all operations have full CUDA support

  • Added support for new data types in ND4J (and DL4J/SameDiff): BFLOAT16, UINT16, UINT32, UINT64

  • ND4J: Implicit broadcasting support added to INDArray (already present in SameDiff - for example shape [3,1]+[3,2]=[3,2])

  • CUDA 9.2, 10.0 and 10.1-Update2 still supported

    • NOTE: For CUDA 10.1, CUDA 10.1 update 2 is recommended. CUDA 10.1 and 10.1 Update 1 will still run, but rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems

  • Dependency upgrades: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, and shaded to avoid dependency clashes)

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

Deeplearning4J

Deeplearning4J: Features and Enhancements

Deeplearning4J: Bug Fixes and Optimizations

Deeplearning4j: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

  • DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J, use org.nd4j.linalg.dataset.Async(Multi)DataSetIterator instead

  • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading, change versions as follows: 1.0.0-beta4_spark2 -> 1.0.0-beta5

  • Scala 2.10 dropped, Scala 2.12 added (for modules with Scala dependencies)

Deeplearning4j: 1.0.0-beta5 Known Issues

  • Some layers (such as LSTM) may run slower on 1.0.0-beta5 than 1.0.0-beta4 on CUDA when not using cuDNN, due to added synchronization. This synchronization will be removed in the next release after 1.0.0-beta5

  • CUDA 10.1: Rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems, when running CUDA 10.1 Update 1 (and maybe 10.1). CUDA 10.1 update 2 is recommended.

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

ND4J/SameDiff: Bug Fixes and Optimizations

ND4J: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

  • OldAddOp, OldSubOp, etc removed: Replace with AddOp, SubOp, etc

  • Nd4j.trueScalar and trueVector removed; use Nd4j.scalar and Nd4j.createFromArray methods

  • INDArray.javaTensorAlongDimension removed; use INDArray.tensorAlongDimension instead

  • INDArray.lengthLong() removed; use INDArray.length() instead

ND4J: 1.0.0-beta5 Known Issues

DataVec

DataVec: Features and Enhancements

DataVec: Bug Fixes and Optimizations

RL4J

RL4J: Features and Enhancements

RL4J: Bug Fixes and Optimizations

Arbiter

Bug Fixes and Optimizations

Arbiter: Known Issues

ND4S

ND4S Features and Enhancements

0.7.2

  • Activation function refactor

    • New activation functions added: hard sigmoid, randomized leaky rectified linear units (RReLU)

  • Multiple fixes/improvements for Keras model import

  • Added P-norm pooling for CNNs (option as part of SubsamplingLayer configuration)

  • Iteration count persistence: stored/persisted properly in model configuration + fixes to learning rate schedules for Spark network training

  • LSTM: gate activation function can now be configured (previously: hard-coded to sigmoid)

  • UI:

    • Added Chinese translation

    • Fixes for UI + pretrain layers

    • Improvements in front-end for handling NaNs

    • Added UIServer.stop() method

    • Fixed score vs. iteration moving average line (with subsampling)

  • Solved Jaxb/Jackson issue with Spring Boot based applications

  • RecordReaderDataSetIterator now supports NDArrayWritable for the labels (set regression == true; used for multi-label classification + images, etc)

0.7.1 -> 0.7.2 Transition Notes

  • Activation functions (built-in): now specified using Activation enumeration, not String (String-based configuration has been deprecated)

0.4.0

  • Initial multi-GPU support viable for standalone and Spark.

  • Refactored the Spark API significantly

  • Added CuDNN wrapper

  • Performance improvements for ND4J

  • New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator

  • New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively

  • New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs

  • Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph

  • Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint

Beginners

Road map for beginners new to deep learning.

How Do I Start Using Deep Learning?

Where you start depends on what you already know.

The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model.

In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven.

Below you'll find a list of resources. The sections are roughly organized in the order they will be useful.

Free Machine- and Deep-learning Courses Online

Math

The math involved with deep learning is basically linear algebra, calculus and probability, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).

Programming

If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start.

Python

Java

Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.

Deeplearning4j

Other Resources

0.7.0

  • Weighted loss functions: Loss functions now support a per-output weight array (row vector)

  • Improved error messages on invalid configuration or data; improved validation on both

  • Removed Jackson as core dependency (shaded); users can now use any version of Jackson without issue

  • Added LossLayer: version of OutputLayer that only applies loss function (unlike OutputLayer: it has no weights/biases)

  • Functionality required to build triplet embedding model (L2 vertex, LossLayer, Stack/Unstack vertices etc)

  • Reduced DL4J and ND4J ‘cold start’ initialization/start-up time

  • Pretrain default changed to false and backprop default changed to true. No longer needed to set these when setting up a network configuration unless defaults need to be changed.

  • Numerous bug fixes across DL4J and ND4J

  • Performance improvements for nd4j-native & nd4j-cuda backends

  • Standalone Word2Vec/ParagraphVectors overhaul:

    • Performance improvements

    • ParaVec inference available for both PV-DM & PV-DBOW

    • Parallel tokenization support was added, to address computation-heavy tokenizers.

  • Native RNG introduced for better reproducibility within multi-threaded execution environment.

  • Additional RNG calls added: Nd4j.choice(), and BernoulliDistribution op.

  • Off-gpu storage introduced, to keep large things, like Word2Vec model in host memory. Available via WordVectorSerializer.loadStaticModel()

  • Two new options for performance tuning on nd4j-native backend: setTADThreshold(int) & setElementThreshold(int)

0.6.0 -> 0.7.0 Transition Notes

Notable changes for upgrading codebases based on 0.6.0 to 0.7.0:

  • UI: new UI package name is deeplearning4j-ui_2.10 or deeplearning4j-ui_2.11 (previously: deeplearning4j-ui). Scala version suffix is necessary due to Play framework (written in Scala) being used now.

  • DataVec ImageRecordReader: labels are now sorted alphabetically by default before assigning an integer class index to each - previously (0.6.0 and earlier) they were according to file iteration order. Use .setLabels(List) to manually specify the order if required.

  • CNNs: configuration validation is now less strict. With new ConvolutionMode option, 0.6.0 was equivalent to ‘Strict’ mode, but new default is ‘Truncate’

  • Xavier weight initialization change for CNNs and LSTMs: Xavier now aligns better with original Glorot paper and other libraries. Xavier weight init. equivalent to 0.6.0 is available as XAVIER_LEGACY

  • DataVec: Custom RecordReader and SequenceRecordReader classes require additional methods, for the new metadata functionality. Refer to existing record reader implementations for how to implement these methods.

  • Word2Vec/ParagraphVectors:

    • Few new builder methods:

      • allowParallelTokenization(boolean)

      • useHierarchicSoftmax(boolean)

    • Behaviour change: batchSize: now batch size is ALSO used as threshold to execute number of computational batches for sg/cbow

Added EmnistDataSetIterator

Added runtime version checking for ND4J, DL4J, RL4J, Arbiter, DataVec

Added Cropping1D layer

Added Convolution3D, Cropping3D, UpSampling3D, ZeroPadding3D, Subsampling3D layers (all with Keras import support):

Added EmbeddingSequenceLayer (EmbeddingLayer for time series)

Added OCNNOutputLayer (one-class neural network) - implementation of -

Added FrozenLayerWithBackprop layer

Added DepthwiseConvolution2D layer

Added ComputationGraph.output(DataSetIterator) method

Added MultiLayerNetwork/ComputationGraph.layerInputSize methods

Added SparkComputationGraph.feedForwardWithKey overload with feature mask support

Added MultiLayerNetwork.calculateGradients method (for easily getting parameter and input gradients, for example for some model interpretabilithy approaches)

Added support to get input/activation types for each layer from configuration: ComputationGraphConfiguration.getLayerActivationTypes(InputType...), ComputationGraphConfiguration.GraphBuilder.getLayerActivationTypes(), NeuralNetConfiguration.ListBuilder.getLayerActivationTypes(), MultiLayerConfiguration.getLayerActivationTypes(InputType) methods

Evaluation.stats() now prints confusion matrix in easier to read matrix format, rather than list format

Added ModelSerializer.addObjectToFile, .getObjectFromFile and .listObjectsInFile for storing arbitrary Java objects in same file as saved network

Added SpatialDropout support (with Keras import support)

Added MultiLayerNetwork/ComputationGraph.fit((Multi)DataSetIterator, int numEpochs) overloads

Added performance (hardware) listeners: SystemInfoPrintListener and SystemInfoFilePrintListener

Performance and memory optimizations via optimizations of internal use of workspaces

Reflections library has entirely been removed from DL4J and is no longer required for custom layer serialization/deserialization ,

RecordReaderMultiDataSetIterator will no longer try to convert unused columns to numerical values

Fixes for Android compilation (removed duplicate classes, aligned versions, removed some dependencies)

Fix for RecordReaderMulitDataSetIterator where output could be incorrect for some constructors

Non-frozen layers before a frozen layer will no longer be skipped during backprop (useful for GANs and similar architectures)

Fixed issue where ComputationGraph topological sort may not be consistent on all platforms; could sometimes break ComputationGraphs (with multiple valid topological orderings) trained on PC and deployed on Android

Fixed issue with CuDNN batch norm using 1-decay instead of decay

deeplearning4j-cuda no longer throws exceptions if present on classpath with nd4j-native backend set to higher priority

Added RNG control for CifarDataSetIterator

WordVectorSerializer now deletes temp files immediately once done

IterationListener has been deprecated in favor of TrainingListener. For existing custom listeners, switch from implements TrainingListener to extends BaseTrainingListener

ImageRecordReader now logs number of inferred label classes (to reduce risk of users missing a problem if something is misconfigured)

Added AnalyzeSpark.getUnique overload for multiple columns

Added performance/timing module

Reduced ImageRecordReader garbage generation via buffer reuse

Fixes for Android compilation (aligned versions, removed some dependencies)

Removed Reflections library use in DataVec

Fix for TransformProcessRecordReader batch support

Fix for TransformProcessRecordReader with filter operations

Fixed issue with ImageRecordReader/ParentPathLabelGenerator incorrectly filtering directories containing . character(s)

ShowImageTransform now initializes frame lazily to avoid blank windows

DataVec ClassPathResource has been deprecated; use nd4j-common version instead

Fixed timestamp issue that could cause incorrect rendering of first model's results in UI

Execution now waits for last model(s) to complete before returning when a termination condition is hit

As per DL4J etc: use of Reflections library has been removed entirely from Arbiter

Remove use of Eclipse Collections library due to issues with Android compilation

Improved cleanup of completed models to reduce maximum memory requirements for training

Added transfer learning API

Global pooling (aka "pooling over time"; usable with both RNNs and CNNs)

Center loss output layer

1D Convolution and subsampling layers

ZeroPaddingLayer

DL4J now has an IEvaluation class (that Evaluation, RegressionEvaluation, etc all implement. Also allows custom evaluation on Spark)

Added multi-class (one vs. all) ROC: ROCMultiClass

HTML export functionality added for ROC charts

Added CnnSentenceDataSetIterator (for use with ‘CNN for Sentence Classification’ architecture)

UI/CUDA/Linux issue:

Dirty shutdown on JVM exit is possible for CUDA backend sometimes:

Issues with RBM implementation

Server: See

Client: See

Tests/examples: See and

Added FastText - inference and training, including OOV (out of vocabulary) support ()

Scala 2.12 support added, Scala 2.10 support dropped ()

Added model server (DL4J and SameDiff models, JSON and binary communication) - , , ,

Added saved model format validation utilities - DL4JModelValidator, DL4JKerasModelValidator ()

Added LabelLastTimeStepPreProcessor ()

BertIterator: added option to prepend token to the output (such as [cls] expected by some models) ()

Added trace level logging to MultiLayerNetwork and ComputationGraph assist with debugging certain issues ()

Upsampling3D: Added NDHWC support ()

MergeVertex now supports broadcasting ()

LSTM and Dropout will now fall back on built-in implementations if an exception is encountered from cuDNN (same as Subsampling/ConvolutionLayer) ()

Improved JavaDoc and cleanup up API for WordVectorSerializer (, )

Updated deeplearning4j-ui theme ()

Fixed an issue with MergeVertex and CNN3D activations ()

Fixed typo in Yolo2OutputLayer builder/configuration method name ()

Improved ComputationGraph builder InputType validation ()

Removed dl4j-spark-ml module until it can be properly maintained ()

Fixed an issue with BertWordPieceTokenizerFactory and bad character encoding ()

Fixed an issue with LearnedSelfAttentionLayer and variable minibatch size (, )

Fixed issue with SharedTrainingMaster controller address when set from environment variable ()

Fixed issue with SameDiffOutputLayer initialization under some circumstances ()

https is now used by default for data and zoo model downloads (, )

Fixed an issue where UI WebJars dependencies would check for updates on every single build (, )

Fixed issue where Upsampling layer memory report could produce an OOM exception ()

Improved UX/validation for RecordReaderDataSetIterator ()

Fixed an issue where EmbeddingSequenceLayer would not check mask array datatype ()

Improved validation when initializing networks with a non rank-2 (shape [1, numParams]) array ()

Fixed a DataType issue for BertIterator ()

Fixed Word2Vec model backward compatibilty (beta3 and earlier models now loadable again)

Fixed issue where some Keras import models could fail with Could not read abnormally long HDF5 attribute ()

Added validation for RnnOutputLayer - feature/label array lengths ()

Fixed an issue where SameDiffOutputLayer would not support variable minibatch size ()

Fixed DL4J SameDiff layer mask support ()

DL4J UI: Fixed an issue where tab switching did not work when visualizing saved/stored data (, )

DL4J UI: Fixed a rare UI threading issue ()

Fixed a Keras import issue with JSON format change ()

Fixed a Keras import issue where updater learning rate schedule could be imported incorrectly ()

Fixed an issue with CnnSentenceDataSetIterator when using UnknownWordHandling.UseUnknownVector (, )

Fixes and optimizations to DL4J SameDiff layers ()

MultiLayerNetwork/ComputationGraph will now log the original exception if a second exception occurs during workspace closing, instead of swallowing it (inference/fit operation try/finally blocks) ()

Upgraded dependencies: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, shaded to avoid dependency clashes) ()

Logging framework can now be configured for DL4J UI (due to Play framework dependency upgrade) ()

Reduced amount of garbage produced by MnistDataFetcher (impacts MNIST and EMNIST DataSetIterators) ()

Activation function backpropagation has been optimized for many activation functions (, )

Saved models with custom layers from 1.0.0-alpha and before can no longer be loaded. Workaround: load in 1.0.0-beta4, and re-save the model (). Models without custom layers can still be loaded back to 0.5.0

dl4j-spark_2.11 and _2.12 dependencies incorrectly pull in datavec-spark_2.11/2.12 version 1.0.0-SNAPSHOT. Workaround: control version using dependency management as per or

Added new data types: BFLOAT16, UINT16, UINT32, UINT64 ()

CUDA support for all operations without CUDA implementations (, , , , )

Added model server (DL4J and SameDiff models, JSON and binary communication) - , , ,

Added support for empty arrays with zeros in shape, for compatibility with TensorFlow import ()

Improved SameDiff training API - added "in line" test set evaluation, returning History object with loss curve, etc ()

Added saved model format validation utilities - Nd4jValidator, Nd4jCommonValidator ()

Added SameDiff ScoreListener (equivalent to DL4J ScoreIterationListener/PerformanceListener) (, )

Added SameDiff.convertDataTypes method, for variable dtype conversion ()

Added crop and resize op ()

DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J

Added basic/MVP SameDiff UI listener ()

Added SameDiff CheckpointListener (, )

Added SameDiff name scopes ()

SameDiff: Updater state and training configuration is now written to FlatBuffers format ()

Added c++ benchmark suite callable from Java - call using Nd4j.getExecutioner().runLightBenchmarkSuit() and Nd4j.getExecutioner().runFullBenchmarkSuit() ()

Added SameDiff.save/load methods with InputStream/OutputStream arguments (, )

Added axis configuraiton for evaluation instances (Evaluation, RegressionEvaluation, ROC, etc - getAxis and setAxis methods) to allow different data formats (NCHW vs. NHWC for CNNs, for example) ()

SameDiff: Added support to convert constants to placeholders, via SDVariable.convertToConstant() method ()

SameDiff: Added GradCheckUtil.checkActivationGradients method to check activation gradients for SameDiff instance (not just parameter gradients as in existing gradient check methods) ()

Added CheckNumerics op ()

Added FakeQuantWithMinMaxArgs and FakeQuantWithMinMaxVars ops ()

Added INDArray reduction methods with "keep dimensions" option - for example, INDArray.mean(boloean, int... dimension) ()

Added Nd4j SystemInfo class - SystemInfo.getSystemInfo, .writeSystemInfo(File) to aid with debugging issues (, )

Added INDArray.toString(NDArrayStrings options), toStringFull() and toString overloads for easier control of array printing ()

Added HashCode op, INDArray.hashCode() ()

SameDiff: added whileLoop, ifCond methods for loops/conditional ops ()

Cleaned up some infrequently used Nd4j methods (, , , )

Added bitwise integer operations: left/right bit shift, left/right cyclical bit shift, bitwise Hamming distance (, , , , )

deeplearning4j-nlp: renamed AggregatingSentencePreProcessor to sentencePreProcessor method ()

Upgraded (and shaded) Protobuf version - 3.5.1 to 3.8.0 ()

Switched to c=style error handling for libnd4j native operations ()

Renamed FlatBuffers enum org.nd4j.graph.DataType to org.nd4j.graph.DType to avoid users importing incorrect type when using Nd4j methods (, )

Added SameDiff.bitwise namespace for bitwise ops (, )

Updated to JavaCPP/JavaCV 1.5.1-1 ()

SameDiff: Placeholders must now only be provided if required to calculate the requested variables ()

SameDiff: Fixed an issue with duplicate variable name validation ()

SameDiff: Fixed an issue with SDVariable.getArr for scalars ()

Added delayed mode to DeviceLocalNDArray (don't replicate to device until needed) ()

ND4J: Fixed an issue with writing 0d (scalar) NDArrays in numpy .npy format ()

Fixed an issue with Pad operation for some constant cases ()

Fixed some issues with strided_slice operation (, , )

SameDiff: Fixed issue with DataType inference for some ops using ND4J default datatype ()

INDArray.castTo(DataType) is now a no-op when array is already the correct type ()

SameDiff: Fixed an issue with training mixed precision networks ()

Fixed an issue where Evaluation class was incorrectly reporting macro-averaged precision for binary case ()

Removed trainableParams config/field from SameDiff TrainingConfig (no longer required) ()

Improvements and cleanup to ND4J Javadoc (, , , )

Fixed an issue with Cholesky Lapack op on CUDA (, )

Fixed an issue where [1,N] and [N,1] arrays were not considered a matrix (rank 2 array) according to INDArray.isMatrix() ()

Fixed RegressionEvaluation for 4D arrays (CNNs / segmentation) (, )

Fixed issue with INDArray.median(int... dimension) ()

Fixed NPE that could occur when executing gather operation backprop ()

Fixed issue with LogSumExp operation Java/C++ mapping ()

Added header validation when reading Numpy .npy files, to ensure file is valid ()

Fixed a possible issue with reading Numpy .npy files on CUDA ()

Fixed an issue when reading Numpy .npy boolean files ()

Various fixes for TensorFlow import ()

Fixed an issue with a small number of Nd4j.create methods not creating arrays corresponding to the java primitive ()

Improved shape validation for some Nd4j.create methods ()

Cleaned up unmaintained Nd4j.createSparse methods ()

Fixed a CUDA issue for CUDA GPUs with CC 3.0 ()

Fixed some possible integer overflows in c++ code ()

Removed deprecated methods: Nd4j.trueScalar and Nd4j.trueVector (, )

Fixed an issue where some JVMs could warn about "Illegal reflective access" due to a (now removed) SameDiff dependency ()

SDVariable now no longer extends DifferentialFunction ()

Moved numerous operation calculateOutputShape instances from Java to C++ ()

Fixed an issue where maxpool2d_bp could throw an exception when NaN values are present ()

Fixed an issue with concatenation of empty shapes (with zeros) ()

Removed INDArray.javaTensorAlongDimension ()

LayerNorm operation now properly supports axis arg, NCHW format data ()

libnd4j: cuBLAS hgemm (FP16 gemm) wil only be called for devices with compute capability >= 5.3 due to cuBLAS limitations ()

Nd4j.readNumpy optimized ()

Added configurable alpha parameter to ELU and lrelu_bp operations in c++ ()

Cleaned up SameDiff SDCNN/SDRNN (SameDiff.cnn, .rnn) API/methods (, )

nd4j-native on some OSX systems can fail with Symbol not found: ___emutls_get_address - See

SBT 1.3.0 can fail with an Illegal character in path error; SBT 1.2.8 is OK. This is an SBT issue, not an ND4J issue. See for details

ImageRecordReader: Support for 16-bit TIFF added ()

Added SequenceTrimToLengthTransform ()

Fixed an issue with AnalyzeSpark and String columns ()

Fixed an issue with URL scheme detection in NumberedFileInputScheme ()

Fixed an issue with RandomPathFilter sampling being biased (, )

API cleanup and refactoring (, , , )

Fixed issue with compression for HistoryProcessor ()

Updated EvaluationScoreFunction to use ND4J Evaluation class metrics ()

Fixed incorrect search size in GridSearchCandidateGenerator ()

The Jackson version upgrade necessitated a change to how generic object serialization was performed; Arbiter JSON data stored in 1.0.0-beta4 or earlier format may not be readable in 1.0.0-beta5 ()

Added full data type support to ND4S as per ND4J ()

Added syntactic sugar for SameDiff (implicits, operator overloads) ()

Added variational autoencoder

Activation functions are now an interface

Configuration now via enumeration, not via String (see examples - )

Custom activation functions now supported

Added Java 7 compatible stats collection compatibility

Introducing : Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)

(For those interested in a survey of artificial intelligence.)

(For those interested in image recognition.)

; Patrick van der Smagt

(Vim is an editor accessible from the command line.)

If you want to jump into deep-learning from here without Java, we recommend and the various Python frameworks built atop it, including and .

With that under your belt, we recommend you approach Deeplearning4j through its .

Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups .

While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like and .

UI overhaul: new training UI has considerably more information, supports persistence (saving info and loading later), Japanese/Korean/Russian support. Replaced Dropwizard with Play framework.

Import of models configured and trained using

Imports both Keras model and

Supported models: models

Supported : Dense, Dropout, Activation, Convolution2D, MaxPooling2D, LSTM

Added ‘Same’ padding more for CNNs (ConvolutionMode network configuration option)

ROC and AUC added for binary classifiers

Added metadata functionality: track source of data (file, line number, etc) from data import to evaluation. Loading a subset of examples/data from this metadata is now supported.

Added TrainingListener interface (extends IterationListener). Provides access to more information/state as network training occurs

Histogram and Flow iteration listeners deprecated. They are still functional, but using new UI is recommended

See ConvolutionMode javadoc for more details:

Link
Link
Link
Link
Link
Link
this paper
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link2
Link
Link
Link
Link
Link
Link2
Link
Link
Link
JsonModelServer
JsonRemoteInference
Link
Link
Link
Link
JsonModelServer
JsonRemoteInference
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
here
here
Link
Link
Link
Link
Link
Link
JsonModelServer
JsonRemoteInference
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
this link
this link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
DataVec
Andrew Ng's Machine-Learning Class on YouTube
Geoff Hinton's Neural Networks Class on YouTube
Patrick Winston's Introduction to Artificial Intelligence @MIT
Andrej Karpathy's Convolutional Neural Networks Class at Stanford
ML@B: Machine Learning Crash Course: Part 1
ML@B: Machine Learning Crash Course: Part 2
Gradient descent, how neural networks learn, Deep learning, part 2
Calculus Made Easy, by Silvanus P. Thompson
Seeing Theory: A Visual Introduction to Probability and Statistics
Andrew Ng's 6-Part Review of Linear Algebra
Khan Academy's Linear Algebra Course
Linear Algebra for Machine Learning
CMU's Linear Algebra Review
Math for Machine Learning
Immersive Linear Algebra
Probability Cheatsheet
The best linear algebra books
Markov Chains, Visually Explained
An Introduction to MCMC for Machine Learning
Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
Markov Chain Monte Carlo (MCMC) & Machine Learning
Relearning Matrices as Linear Functions
Scratch: A Visual Programming Environment From MIT
Learn to Program (Ruby)
Grasshopper: A Mobile App to Learn Basic Coding (Javascript)
Intro to the Command Line
Additional command-line tutorial
A Vim Tutorial and Primer
Intro to Computer Science (CS50 @Harvard edX)
A Gentle Introduction to Machine Fundamentals
Teaching C
Theano
Keras
Lasagne
Learn Python the Hard Way
Google's Python Class
Udemy: Complete Python 3 Masterclass Journey
MIT: Introduction to Computer Science and Python Programming
David Beazley: Python Tutorials
CS231n: Python Numpy Tutorial
Pyret: A Python Learning Environment
Think Java: Interactive Web-based Dev Environment
Learn Java The Hard Way
Introduction to JShell
JShell in 5 Minutes
Java Resources
Java Ranch: A Community for Java Beginners
Intro to Programming in Java @Princeton
Head First Java
Java in a Nutshell
Java Programming for Complete Beginners in 250 Steps
examples
Quickstart
here
Stackoverflow
Math Stackexchange
Link
Keras
configurations
stored weights
Sequential
layers
Link
Link
Link
Link
Link
Link

Javacpp

DL4J and Javacpp

DL4J and Javacpp overview

The following modules rely on javacpp as part of their build process: 1. nd4j-native 2. nd4j-native-presets 3. nd4j-cuda 4. nd4j-cuda-presets

Each backend consists of 2 modules

  1. The codebase: This represents the actual nd4j backend logic for specific platforms. Conceptually, this logic will be anything that a developer should need to control such as memory management, environment variables, or other execution logic.

Compilation flow

Next, the actual backend is compiled with a dependency on the above presets code base. The javacpp plugin will leverage the description from the presets we specify as a dependency and facilitate linking against a LIBND4J_HOME (a folder which contains the platform specific libnd4j binaries and include sources) specified by the user. In the actual plugin declaration on the backend pom.xml we include the target presets class to use for our particular backend.

Note: This still requires the native platform specific tools to be installed since binaries are generated for each platform. Please see our github actions for instructions on specific platforms.

-platform dependencies

Caution to users: By default, this means that a large number of dependencies for all platforms will be included. If you do not need dependencies for all platforms, then please read the above documentation to figure out how to build a jar for your specific platform.

Generally, the main thing to know is when you build your application, use:

mvn -Djavacpp.platform=your-target-platform

Javacpp platform specific profiles

Running javacpp on termux + android/lineagos

In order to bootstrap this environment, a from scratch install of the latest lineageos flashed on an sd card using the raspberry pi is suggested.

Afterwards, install

In order to properly setup the test environment,

you need to execute your test from the command line as follows:

mvn -DargLine="org.bytedeco.javacpp.pathsfirst=true -Djavacpp.platform=android-arm" -Dorg.bytedeco.javacpp.pathsfirst=true -Djavcpp.platform=android-arm clean test

A proper execution environment after the above jdk is installed involves manually setting the environment as follows:

export JAVA_HOME=/data/data/com.termux/files/usr/lib/jvm/openjdk-9
export PATH=$PATH:$HOME/apache-maven-3.8.1/bin
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$JAVA_HOME/lib/:$JAVA_HOME/lib/jli"
export MAVEN_OPTS="-Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=ture -Dmaven.wagon.http.ssl.ignore.validity.dates=true"

This will setup the jdk + maven to ignore ssl errors due to issues with cacerts + termux. This is largely irrelevant for our small testing use case, but not recommended for production environments.

Redist artifacts

Redist artifacts are easy ways of distributing dependencies without installation.

Note that for the presets that are part of nd4j (nd4j-cuda-presets and nd4j-native-presets) only the latest versions support redist artifacts. The presets preload versions only support pre loading (eg: linking against libraries from the javacpp cache) against the latest version. This is because during pre loading, certain version numbers are checked for.

Quickstart

Quickstart for Java using Maven

Get started

This is everything you need to run DL4J examples and begin your own projects.

We are currently reworking the Getting Started Guide.

A quick overview

Deeplearning4j started as a domain-specific language to configure deep neural networks, and evolved in to a suite of tools developers use to do everything from train models in java to deploy models to production.

Prerequisites

You should have these installed to use this QuickStart guide. DL4J targets professional Java developers who are familiar with production deployments, IDEs and automated build tools. Working with DL4J will be easiest if you already have experience with these.

java -version

Please make sure you have a 64-Bit version of java installed, as you will see an error telling you no jnind4j in java.library.path if you decide to try to use a 32-Bit version instead. Make sure the JAVA_HOME environment variable is set.

mvn --version

If you are working on a Mac, you can simply enter the following into the command line:

brew install maven
$ git clone git://git.kernel.org/pub/scm/git/git.git

The latest version of Mac's Mojave OS breaks git, producing the following error message:

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

This can be fixed by running:

xcode-select --install
  1. Use the command line to enter the following:

    git clone https://github.com/eclipse/deeplearning4j-examples.git
  2. Open IntelliJ and choose Import Project. Then select the dl4j-examples directory.

  3. Choose 'Import project from external model' and ensure that Maven is selected.

  4. Continue through the wizard's options. Select the SDK that begins with jdk. (You may need to click on a plus sign to see your options...) Then click Finish. Wait a moment for IntelliJ to download all the dependencies. You'll see the horizontal bar working on the lower right.

  5. Pick an example from the file tree on the left. Right-click the file to run.

The example repository contains multiple example projects that are grouped by different levels of functionality. The dl4j-examples project you just opened has the simplest examples, but feel free to explore the other projects too!

Using DL4J In Your Own Projects: Configuring the POM.xml File

To run DL4J in your own projects, we highly recommend using Maven for Java users, or a tool such as SBT for Scala. The basic set of dependencies and their versions are shown below. This includes:

  • deeplearning4j-core, which contains the neural network implementations

  • nd4j-native-platform, the CPU version of the ND4J library that powers DL4J

  • datavec-api - Datavec is our library vectorizing and loading data

To run the example, right click on it and select the green button in the drop-down menu. You will see, in IntelliJ's bottom window, a series of scores. The rightmost number is the error score for the network's classifications. If your network is learning, then that number will decrease over time with each batch it processes. At the end, this window will tell you how accurate your neural-network model has become:

In another window, a graph will appear, showing you how the multilayer perceptron (MLP) has classified the data in the example. It will look like this:

Congratulations! You just trained your first neural network with Deeplearning4j.

Next Steps

Additional links

Troubleshooting

Q: I'm using a 64-Bit Java on Windows and still get the no jnind4j in java.library.path error

A: You may have incompatible DLLs on your PATH. To tell DL4J to ignore those, you have to add the following as a VM parameter (Run -> Edit Configurations -> VM Options in IntelliJ):

-Djava.library.path=""

Q: SPARK ISSUES I am running the examples and having issues with the Spark based examples such as distributed training or datavec transform options.

Troubleshooting: Debugging UnsatisfiedLinkError on Windows

Windows users might be seeing something like:

Exception in thread "main" java.lang.ExceptionInInitializerError
at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:624)
at org.deeplearning4j.examples.feedforward.anomalydetection.MNISTAnomalyExample.main(MNISTAnomalyExample.java:46)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5556)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:189)
... 2 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:259)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5553)
... 3 more

Quickstart template

Now that you've learned how to run the different examples, we've made a template available for you that has a basic MNIST trainer with simple evaluation code.

To use the template:

  1. Copy the standalone-sample-project from the examples and give it the name of your project.

  2. Import the folder into IntelliJ.

  3. Start coding!

More about Eclipse Deeplearning4j

Deeplearning4j is a framework that lets you pick and choose with everything available from the beginning. We're not Tensorflow (a low-level numerical computing library with automatic differentiation) or Pytorch. Deeplearning4j has several subprojects that make it easy-ish to build end-to-end applications.

Deeplearning4j has two other notable components:

Contribute

How to contribute to the Eclipse Deeplearning4j source code.

Prerequisites

  • DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.

  • ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.

  • DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.

  • RL4J: Reinforcement Learning for Java. This set of libraries contains the ability to do reinforcement learning built on the deeplearning4j library.

  • Samediff: Built within the nd4j library, this library contains a tensorflow/pytorch like library for building data flow graphs.

Ways to contribute

There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:

  • Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)

  • Add a new training feature

  • Bug fixes

  • DL4J examples: Is there an application or network architecture that we don’t have examples for?

  • Testing performance and identifying bottlenecks or areas to improve

  • Improve website documentation (or write tutorials, etc)

  • Improve the JavaDocs

There are a number of different ways to find things to work on. These include:

  • Looking at the issue trackers:

  • Reviewing our Roadmap

  • Reviewing recent papers and blog posts on training features, network architectures and applications

  • Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?

General guidelines

Before you dive in, there’s a few things you need to know. In particular, the tools we use:

  • Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.

  • Git: the version control system we use

  • Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE

  • VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.

  • IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.

Things to keep in mind:

  • Code should be Java 7 compliant

  • If you are adding a new method or class: add JavaDocs

  • You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)

  • Provide informative comments throughout your code. This helps to keep all code maintainable.

  • Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.

  • If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct

  • If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.

Eclipse Contributors

IP/Copyright requirements for Eclipse Foundation Projects

Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:

  1. Sign the Eclipse Contributor Agreement (once)

  2. Sign commits (each time)

Why Is This Required?

By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.

By signing your commits, you are asserting that the code in that particular commit is your own.

Signing the Eclipse Contributor Agreement

You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:

Step 1: Sign up for an Eclipse account

Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).

Step 2: Sign the ECA

Signing Your Commits

Signing a New Commit

There are a few ways to sign commits. Note that you can use any of these aoptions.

Option 1: Use -s When Committing on Command Line

Signing commits here is simple:

git commit -s -m "My signed commit"

Note the use of -s (lower case s) - upper-case S (i.e., -S) is for GPG signing (see below).

Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing

For example, you could set up the following alias in Bash:

alias gcm='git commit -s -m'

Then committing would be done with the following:

gcm "My Commit"

One simple way is to create a gcm.bat file with the following contents, and add it to your system path:

@echo off
echo.
git commit -s -m %*

You can then commit using the same process as above (i.e., gcm "My Commit")

Option 3: Use GPG Signing

Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m' - note the upper case -S for GPG signing.

Option 4: Commit using IntelliJ with Auto Signing

Checking If A Commit Is Signed

After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1 to show the signature for the last commit (use -5 to show the last 5 commits, for example)

The output will look like:

$ git log --show-signature -2
commit 81681455918371e29da1490d3f0ca3deecaf0490 (HEAD -> commit_test_branch)
Author: YourName <you@email.com>
Date:   Fri Jun 21 22:27:50 2019 +1000

    This commit is unsigned

commit 2349c6aa3497bd65866d7d0a18fe82bb691bb868
Author: YourName <you@email.com>
Date:   Fri Jun 21 21:42:38 2019 +1000

    My signed commit

    Signed-off-by: YourName <you@email.com>

The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by).

If You Forget to Sign a Commit - Amending the Last Commit

If you forgot to sign the last commit, you can use the following command:

git commit --amend --signoff

If You Forget to Sign Multiple Commits

Suppose your branch has 3 new commits, all of which are unsigned:

$ git log -4 --oneline
4b164026 (HEAD -> commit_test_branch) Your new commit 3
d7799615 Your new commit 2
6bb6113a Your new commit 1
ef09606c This commit already exists

One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)

git reset --soft HEAD~3
git commit -s -m "Squashed and signed"

The result:

$ git log -2 --oneline
31658e11 (HEAD -> commit_test_branch) Squashed and signed
ef09606c This commit already exists

You can confirm that the commit is signed using git log -1 --show-signature as shown earlier.

Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.

If you are updating an existing PR, you may need to force push using -f (as in git push X -f).

Github Actions/Build Infra

Github actions Configuration Overview

Overview of a Github Actions Configuration

Most workflows implement a matrix structure for handling different combinations of builds related to the following: 1. Platform specific optimizations: On windows/linux/mac we allow cpu + optional linking against mkldnn. Each combination is enumerated and ran as part of a matrix build on github actions.

  1. Cuda, optional cudnn: We also allow optional linking against cudnn for gpu routines.

Input parameters:

  1. buildThreads: This is the number of builds threads used for compilation in linbnd4j. This is the equivalent of make -j. For specific platforms that use more memory, 1 is the recommended value. On self hosted setups, you may use more threads to make builds run faster.

  2. deployToReleaseStaging: 0 or 1. If 1, this will create a staging repository on oss sonatype. Otherwise, it will deploy to ossrh snapshots. Snapshots is the default.

  3. snapshotVersion: The current in development snapshot version

  4. releaseRepoId: If blank, then a new staging repository for a version is created. Otherwise, a staging repository id should be obtained from the ossrh nexus sonatype. This releaseRepoId should be passed to subsequent builds so all of the artifacts associated with a version get propagated to 1 place.

  5. serverId: This should be ossrh 90% of the time. A github profile is also available for use with github actions.

  6. --pl !libnd4j

    to skip a libnd4j compile. This can speed builds up significantly.

  7. runsOn: This is the operating system upon which to run the build. For linux, this defaults to ubuntu-16.04. For windows, windows-2019. self-hosted can also be specified for faster builds.

Matrix builds

Many configurations on cpu and cuda require a matrix based build structure to capture the various combinations of optimization and software versions people may want to use. In order to accomodate these workflows, we need to attach variables proxying the values of the manual inputs to the individual matrix workers themselves. These parameters are analogous to the above described parameters. Note we will not repeat the descriptions here, but the values can be seen from their values in the form of $ where SOME_VALUE is one of the values above.

The configuration to look for is as follows:

          - mvn_ext: ${{ github.event.inputs.mvnFlags }}
            experimental: true
            name: Extra maven flags

          - debug_enabled: ${{ github.event.inputs.debug_enabled }}
            experimental: true
            name: Debug enabled

          - runs_on: ${{ github.event.inputs.runsOn }}
            experimental: true
            name: OS to run on

          - libnd4j_file_download: ${{ github.event.inputs.libnd4jDownload }}
            experimental: true
            name: OS to run on

          - deploy_to_release_staging: ${{ github.event.inputs.deployToReleaseStaging }}
            experimental: true
            name: Whether to deploy to release staging or not

          - release_version: ${{ github.event.inputs.releaseVersion }}
            experimental: true
            name: Release version

          - snapshot_version: ${{ github.event.inputs.snapshotVersion }}
            experimental: true
            name: Snapshot version

          - server_id: ${{ github.event.inputs.serverId }}
            experimental: true
            name: Server id

          - release_repo_id: ${{ github.event.inputs.releaseRepoId }}
            experimental: true
            name: The release repository to run on

          - mvn_flags: ${{ github.event.inputs.mvnFlags }}
            experimental: true
            name: Extra maven flags to use as part of the build

          - build_threads: ${{ github.event.inputs.buildThreads }}
            experimental: true
            name: The number of threads to build libnd4j with

Expected timings

  1. CUDA: Most cuda builds take 4-5 hours. Both windows and linux on GH actions just download the cuda distribution and compile things on their respective platforms.

  2. CPU builds: From scratch libnd4j + cpu builds typically take 1-2 hours max. Anything more than that, your build may have something wrong.

Build error causes

  1. Out of disk: It is very common for a github actions VM to run out of disk. If a build fails with no logs after and all steps terminated, this maybe one of the reasons.

  2. Out of memory: Sometimes builds run out of memory. A few common causes include:

    • Clang out of memory on android, depending on the number of builds threads assigned, it is easy for clang to run out of memory

    • Maven javadoc: The maven javadoc plugin for bigger projects can use a ton of ram and crash a job

  3. Network failures: Maven can sometimes (rarely) fail to download certain dependencies in the middle of a job

Environment variables:

  1. MAVEN_GPG_KEY: The maven gpg key secret for a release

  2. CROSS_COMPILER_DIR: For the pi_build.sh script in libnd4j. This contains the root directory

    for cross compiler invocation. We need this because all cross compilation for various libnd4j builds happens

    on x86. We cross compile for speed reasons also easily allowing us to run on github actions.

  3. Debian frontend: This is to ensure that all debian commands by default don't prompt for yes/no

  4. GITHUB_TOKEN: This is for authentication with github actions

  5. BUILD_USING_MAVEN: This is for pi_build.sh. This toggles (0 or 1) whether to use maven or buildnativeoperation.sh

    in the libnd4j root directory directly.

  6. NDK_VERSION: Default is r21d. Libnd4j's android is compiled with the android r21 currently.

  7. CURRENT_TARGET: This variable is for pi_build.sh. It tells pi_build.sh which architecture to build for.

  8. PUBLISH_TO: The repo to publish to for releases or snapshots. Valid values are github or ossrh.

    These are repositories defined in the deeplearning4j root pom.

  9. OPENBLAS_PATH: We compile libnd4j against openblas for several different cpus. Openblas is manually downloaded and linked against.

    This specifies the path to the download for the libnd4j cmake invocation.

  10. MAVEN_USERNAME: The user name to login to for the ossrh maven repository

  11. MAVEN_PASSWORD: The password to login to for the ossrh maven repository

  12. MAVEN_GPG_PASSPHRSE: The gpg password for signing artifacts for uploading to maven central

  13. DEPLOY_TO> Valid values are either ossrh or github.

  14. LIBND4J_BUILD_THREADS: This is the equivalent of make -j. It specifies the number of threads

    that should be used to compile libnd4j

  15. PERFORM_RELEASE: Whether to perform a release or not (0 or 1)

  16. RELEASE_VERSION: The version to be released to maven central. change-versions.sh will be run

    to change versions throughout the code base from the snapshot verison to the intended release version.

  17. SNAPSHOT_VERSION: The current snapshot version to be changed when performing a release.

    After a release is conducted, this should generally be the next development version.

  18. RELEASE_REPO_ID: Leave this empty when first creating a release repository in combination with

    DEPLOY set to 1. Afterwards, note which staging repository id gets created in the ossrh interface when publishing

    to maven central. Use that id for further buidls to ensure that all uploads for 1 version are synchronized to 1 staging repository.

  19. MODULES: Extra maven flags for pi_build.sh if more flags are needed (such as for debugging or only building specific modules)

  20. LIBND4J_URL: Used when building nd4j-native. If a user does not want to recompile libnd4j for their particular build, you can instead

    skip this step and specify a libnd4j zip file download (generally built with the maven assembly plugin)

Import in to your favorite IDE

Pre requisites

Ensure that you clone the deeplearning4j project locally.

git clone https://github.com/eclipse/deeplearning4j

Before importing the project, a few things of note no matter what IDE you use:

  1. One submodule (libnd4j) is a c++ project that uses maven to invoke a cmake build. You may wish to edit libnd4j separately in a cmake oriented IDE like VS Code, Clion, or Eclipse c/c++. In order to build a particular nd4j backend, libnd4j should already be compiled. By default, relevant nd4j backends all look for a pre compiled libnd4j in the libnd4j directory included within the same project.

Intellij

Once imported, please give the project time to download associated dependencies. You can verify the status of the project in the bottom right corner.

In order to enable the project to work, the following modifications need to be made.

Shaded modules

Eclipse Deeplearning4j has a set of shaded modules. Shaded modules are artifacts that re namespace a dependency to a different location in order to use it as a set of private dependencies that do not clash with other libraries that may also share the dependency.

Intellij does not handle this very well. In order to work around this, you need to exclude all projects under the nd4j/nd4j-shade folder individually. Right click on each folder. Go to Maven -> Ignore Projects.

Assuming you follow the other steps above (lombok,libdn4j,..) then you should be able to run any module you want.

Eclipse

When first finishing import of the project, a number of maven connector errors should be highlighted. Afterwards, just click resolve all later and finish. Let eclipse finish downloading sources and javadoc.

As of the latest version of eclipse, build errors may occur.

Testing

How to conduct a release to Maven Central

Parameters for testing

  1. test.heap.size: The heap size used for maven surefire plugin sub processes

  2. test.offheap.size: The off heap size used for maven surefire sub processes. This is very important for

    configuration (especially on gpu systems)

Test resources

Test profiles for enabling nd4j backends

When running deeplearning4j's tests, there are 2 main profiles to be aware of: nd4j-tests-cpu and nd4j-tests-cuda. These each enable running cpu or gpu tests respectively across the whole code base. Please ensure one of these is selected when running tests.

testresources: Used to add the test resources used for nd4j.

Test categories

GPUs and multi threaded boxes

Note when running gpu tests on a box with more than 1 gpu, it can/will run out of memory if test.heap.size is at not at least 4g. Also of note, is when running tests

Beginners

Road map for beginners new to deep learning.

How Do I Start Using Deep Learning?

Where you start depends on what you already know.

The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model.

In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven.

Below you'll find a list of resources. The sections are roughly organized in the order they will be useful.

Free Machine- and Deep-learning Courses Online

Math

The math involved with deep learning is basically linear algebra, calculus and probility, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).

Programming

If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start.

Python

Java

Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.

Deeplearning4j

Other Resources

Build From Source

Instructions to build all DL4J libraries from source.

Core steps:

  1. Building libnd4j for your specific platform

  2. Linking the nd4j backend you want to compile for against libnd4j via JavaCPP

  3. Compiling the rest of the code in to jar files

Key concepts

  1. Libnd4j is a CMake based c++ project that supports running optimized math code on different architectures. Its sole focus is being a tiny self contained library for running math kernels. It can link against optimized BLAS routines, platform specific CNN libraries such as OneDNN and CuDNN, and contains hundreds of math kernels for implementing neural networks and other math routines.

  2. Maven: Maven is the core build tool for deeplearning4j. Understanding maven is key to building deeplearning4j from source

  3. Maven and CMake: For compiling libnd4j, we invoke a buildnativeoperations.sh wrapper script via maven. buildnativeoperations.sh in turn automatically sets up CMake to then build the c++ project

  4. pi_build.sh: This is our build script for embedded and ARM based platforms. It focuses on cross compilation running on a Linux x86 based platform.

  5. buildnativeoperations.sh: The main build script for libnd4j. It initializes CMake and invokes CMake compilation for the user on whatever platform the user is currently on unless the user specifies an alternative platform. Specifying a different platform is possible for android for example.

Building for x86_64

The main considerations for building on x86_64 are:

  1. Whether to compile for avx2 or avx512

  2. Whether to use OpenBLAS or MKL

  3. Whether to link against OneDNN

Building for ARM

pi_build.sh mainly focuses on cross compilation.

In order to properly use the pi_build.sh script, a number of environment variables should be set. Per platform, you can find these environment variables in the final build step under the environment section.

If you would like to compile deeplearning4j on an actual ARM device, please use the normal buildnativeoperations.sh workflow.

Building for CUDA

In order to compile deeplearning4j for a particular version, you must first invoke change-cuda-versions.sh in the root directory:

./change-cuda-versions.sh $YOUR_CUDA_VERSION
<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

Note for windows users

Benchmark

General guidelines for benchmarking in DL4J and ND4J.

General Benchmarking Guidelines

Guideline 1: Run Warm-Up Iterations Before Benchmarking

A warm-up period is where you run a number of iterations (for example, a few hundred) of your benchmark without timing, before commencing timing for further iterations.

Why is a warm-up required? The first few iterations of any ND4J/DL4J execution may be slower than those that come later, for a number of reasons:

  1. In the initial benchmark iterations, the JVM has not yet had time to perform just-in-time compilation of code. Once JIT has completed, code is likely to execute faster for all subsequent operations

  2. ND4J and DL4J (and, some other libraries) have some degree of lazy initialization: the first operation may trigger some one-off execution code.

  3. DL4J or ND4J (when using workspaces) can take some iterations to learn memory requirements for execution. During this learning phase, performance will be lower than after its completion.

Guideline 2: Run Multiple Iterations of All Benchmarks

Your benchmark isn't the only thing running on your computer (not to mention if you are using cloud hardware, that might have shared resources). And operation runtime is not perfectly deterministic.

For benchmark results to be reliable, it is important to run multiple iterations - and ideally report both mean and standard deviation for the runtime. Without this, it's impossible to compare the performance of operations, as performance differences may simply be due to random variation.

Guideline 3: Pay Careful Attention to What You Are Benchmarking

This is especially important when comparing frameworks. Before you declare that "performance on operation X is Y" or "A is faster than B", make sure that:

You are bench-marking only the operations of interest.

If your goal is to check the performance of an operation, make sure that only this operation is being timed.

You should carefully check whether you unintentionally including other things - for example, does it include: JVM initialization time? Library initialization time? Result array allocation time? Garbage collection time? Data loading time?

Ideally, these should be excluded from any timing/performance results you report. If they cannot be excluded, make sure you note this whenever making performance claims.

  1. What native libraries are you using?

    For example: what BLAS implementation (MKL, OpenBLAS, etc)? If you are using CUDA, are you using CuDNN? ND4J and DL4J can use these libraries (MKL, CuDNN) when they are available - but are not always available by default. If they are not made available, performance can be lower - sometimes considerably.

    This is especially important when comparing results between libraries: for example, if you compared two libraries (one using OpenBLAS, another using MKL) your results may simply reflect the performance differences it the BLAS library being used - and not the performance of the libraries being tested. Similarly, one library with CuDNN and another without CuDNN may simply reflect the performance benefit of using CuDNN.

  2. How are things configured?

    For better or worse, DL4J and ND4J allow a lot of configuration. The default values for a lot of this configuration is adequate for most users - but sometimes manual configuration is required for optimal performance. This can be especially true in some benchmarks! Some of these configuration options allow users to trade off higher memory use for better performance, for example. Some configuration options of note: (a) Memory configuration (b) Workspaces and garbage collection (c) CuDNN (d) DL4J Cache Mode (enable using .cacheMode(CacheMode.DEVICE))

If you aren't sure if you are only measuring what you intend to measure when running DL4J or ND4J code, you can use a profiler such as VisualVM or YourKit Profilers.

  1. What versions are you using? When benchmarking, you should use the latest version of whatever libraries you are benchmarking. There's no point identifying and reporting a bottleneck that was fixed 6 months ago. An exception to this would be when you are comparing performance over time between versions. Note also that snapshot versions of DL4J and ND4J are also available - these may contain performance improvements (feel free to ask)

Guideline 4: Focus on Real-World Use Cases - And Run a Range of Sizes

Consider for example a benchmark a benchmark that adds two numbers:

double x = 0;
//<start timing>
x += 1.0;
//<end timing>

And something equivalent in ND4J:

INDArray x = Nd4j.create(1);
//<start timing>
x.addi(1.0);
//<end timing>

Of course, the ND4J benchmark above is going to be much slower - method calls are required, input validation is performed, native code has to be called (with context switching overhead), and so on. One must ask the question, however: is this what users will actually be doing with ND4J or an equivalent linear algebra library? It's an extreme example - but the general point is a valid one.

Note also that performance on mathematical operations can be size - and shape - specific. For example, if you are benchmarking the performance on matrix multiplication - the matrix dimensions can matter a lot. In some internal benchmarks, we found that different BLAS implementations (MKL vs OpenBLAS) - and different backends (CPU vs GPU) - can perform very differently with different matrix dimensions. None of the BLAS implementations (OpenBLAS, MKL, CUDA) we have tested internally were uniformly faster than others for all input shapes and sizes.

Therefore - whenever you are running benchmarks, it's important to run those benchmarks with multiple different input shapes/sizes, to get the full performance picture.

Guideline 5: Understand Your Hardware

When comparing different hardware, it's important to be aware of what it excels at. For example, you might find that neural network training performs faster on a CPU with minibatch size 1 than on a GPU - yet larger minibatch sizes show exactly the opposite. Similarly, small layer sizes may not be able to adequately utilize the power of a GPU.

Furthermore, some deep learning distributions may need to be specifically compiled to provide support for hardware features such as AVX2 (note that recent version of ND4J are packaged with binaries for CPUs that support these features). When running benchmarks, the utilization (or lack there-of) of these features can make a considerable difference to performance.

Guideline 6: Make It Reproducible

When running benchmarks, it's important to make your benchmarks reproducible. Why? Good or bad performance may only occur under certain limited circumstances.

And finally - remember that (a) ND4J and DL4J are in constant development, and (b) benchmarks do sometimes identify performance bottlenecks (after all we - ND4J includes literally hundreds of distinct operations). If you identify a performance bottleneck, great - we want to know about it - so we can fix it. Any time a potential bottleneck is identified, we first need to reproduce it - so that we can study it, understand it and ultimately fix it.

Guideline 7: Understand the Limitations of Your Benchmarks

Linear algebra libraries contain hundreds of distinct operations. Neural network libraries contain dozens of layer types. When benchmarking, it's important to understand the limitations of those benchmarks. Benchmarking one type of operation or layer cannot tell you anything about the performance on other types of layers or operations - unless they share code that has been identified to be a performance bottleneck.

Guideline 8: If You Aren't Sure - Ask

And if you do happen to find a performance issue - let us know!

ND4J Specific Benchmarking

A Note on BLAS and Array Orders

BLAS - or Basic Linear Algebra Subprograms - refers to an interface and set of methods used for linear algebra operations. Some examples include 'gemm' - General Matrix Multiplication - and 'axpy', which implements Y = a*X+b.

Note that ND4J will log the BLAS backend used when it initializes. For example:

14:17:34,169 INFO  ~ Loaded [CpuBackend] backend
14:17:34,672 INFO  ~ Number of threads used for NativeOps: 8
14:17:34,823 INFO  ~ Number of threads used for BLAS: 8
14:17:34,831 INFO  ~ Backend used: [CPU]; OS: [Windows 10]
14:17:34,831 INFO  ~ Cores: [16]; Memory: [7.1GB];
14:17:34,831 INFO  ~ Blas vendor: [OPENBLAS]

Performance can depend on the available BLAS library - in internal tests, we have found that OpenBLAS has been between 30% faster and 8x slower than MKL - depending on the array sizes and array orders.

For matrix multiplication, this means there are 8 possible combinations of array orders (c/f for each of input 1, input 2 and result arrays). Performance won't be the same for all cases.

Similarly, an operation such as element-wise addition (i.e., z=x+y) will be much faster for some combinations of input orders than others - notably, when x, y and z are all the same order. In short, this is due to memory striding: it's cheaper to read a sequence of memory addresses when those memory addresses are adjacent to each other in memory, as compared to being spread far apart.

Note that, by default, ND4J expects result arrays (for matrix multiplication) to be defined in column major ('f') order, to be consistent across backends, given that CuBLAS (i.e., NVIDIA's BLAS library for CUDA) requires results to be in f order. As a consequence, some ways of performing matrix multiplication with the result array being in c order will have lower performance than if the same operation was executed with an 'f' order array.

Finally, when it comes to CUDA: array orders/striding can matter even more than when running on CPU. For example, certain combinations of orders can be much faster than others - and input/output dimensions that are even multiples of 32 or 64 typically perform faster (sometimes considerably) than when input/output dimensions are not multiples of 32.

DL4J Specific Benchmarking

Most of what has been said for ND4J also applies to DL4J.

In addition:

  1. If you are using the nd4j-native (CPU) backend, ensure you are using Intel MKL. This is faster than the default of OpenBLAS in most cases.

  2. Watch out for ETL bottlenecks. You can add PerformanceListener to your network training to see if ETL is a bottleneck.

  3. Don't forget that performance is dependent on minibatch sizes. Don't benchmark with minibatch size 1 - use something more realistic.

  4. If you need multi-GPU training or inference support, use ParallelWrapper or ParallelInference.

  5. Don't forget that CuDNN is configurable: you can specify DL4J/CuDNN to prefer performance - at the expense of memory - using .cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST) configuration on convolution layers

  6. When using GPUs, multiples of 8 (or 32) for input sizes and layer sizes may perform better.

  7. When using RNNs (and manually creating INDArrays), use 'f' ordered arrays for both features and (RnnOutputLayer) labels. Otherwise, use 'c' ordered arrays. This is for faster memory access.

Common Benchmark Mistakes

Finally, here's a summary list of common benchmark mistakes:

  1. Not using the latest version of ND4J/DL4J (there's no point identifying a bottleneck that was fixed many releases back). Consider trying snapshots to get the latest performance improvements.

  2. Not paying attention to what native libraries (MKL, OpenBLAS, CuDNN etc) are being used

  3. Providing no warm-up period before benchmarking begins

  4. Running only a single (or too few) iterations, or not reporting mean, standard deviation and number of iterations

  5. Not configuring workspaces, garbage collection, etc

  6. Running only one possible case - for example, benchmarking a single set of array dimensions/orders when benchmarking BLAS operations

  7. Running unusually small inputs - for example, minibatch size 1 on a GPU (which might be slower - but isn't realistic!)

  8. Not measuring exactly - and only - what you claim to be measuring (for example, not accounting for array allocation, initialization or garbage collection time)

  9. Not making your benchmarks reproducible (does the benchmark conclusion generalize? are there problems with the benchmark? what can we do to fix it?)

  10. Comparing results across different hardware, not accounting for differences (for example, testing on one machine with AVX2 support, and on another without)

How to Run Deeplearning4j Benchmarks - A Guide

Total training time is always ETL plus computation. That is, both the data pipeline and the matrix manipulations determine how long a neural network takes to train on a dataset.

When programmers familiar with Python try to run benchmarks comparing Deeplearning4j to well-known Python frameworks, they usually end up comparing ETL + computation on DL4J to just computation on the Python framework. That is, they're comparing apples to oranges. We'll explain how to optimize several parameters below.

The JVM has knobs to tune, and if you know how to tune them, you can make it a very fast environment for deep learning. There are several things to keep in mind on the JVM. You need to:

  • Get garbage collection right

  • Make ETL asynchronous

  • Presave datasets (aka pickling)

Setting Heap Space

Users have to reconfigure their JVMs themselves, including setting the heap space. We can't give it to you preconfigured, but we can show you how to do it. Here are the two most important knobs for heap space.

  • Xms sets the minimum heap space

  • Xmx sets the maximum heap space

You can set these in IDEs like IntelliJ and Eclipse, as well as via the CLI like so:

    java -Xms256m -Xmx1024m YourClassNameHere

What’s the ideal amount to set Xmx to? That depends on how much RAM is on your computer. In general, allocate as much heap space as you think the JVM will need to get work done. Let’s say you’re on a 16G RAM laptop — allocate 8G of RAM to the JVM. A sound minimum on laptops with less RAM would be 3g, so

    java -Xmx3g

It may seem counterintuitive, but you want the min and max to be the same; i.e. Xms should equal Xmx. If they are unequal, the JVM will progressively allocate more memory as needed until it reaches the max, and that process of gradual allocation slows things down. You want to pre-allocate it at the beginning. So

    java -Xms3g -Xmx3g YourClassNameHere

Another way to do this is by setting your environmental variables. Here, you would alter your hidden .bash_profile file, which adds environmental variables to bash. To see those variables, enter env in the command line. To add more heap space, enter this command in your console:

    echo "export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=512m"" > ~/.bash_profile

We need to increase heap space because Deeplearning4j loads data in the background, which means we're taking more RAM in memory. By allowing more heap space for the JVM, we can cache more data in memory.

Garbage Collection

A garbage collector is a program which runs on the JVM and gets rid of objects no longer used by a Java application. It is automatic memory management. Creating a new object in Java takes on-heap memory: A new Java object takes up 8 bytes of memory by default. So every new DatasetIterator you create takes another 8 bytes.

You may need to alter the garbage collection algorithm that Java is using. This can be done via the command line like so:

    java -XX:+UseG1GC

JavaCPP, created by a Skymind engineer, relies on the garbage collector to tell it what has been done. We rely on the Java GC to tell us what to collect; the Java GC points at things, and we know how to de-allocate them with JavaCPP. This applies equally to how we work with GPUs.

The larger the batch size you use, the more RAM you’re taking in memory.

ETL & Asynchronous ETL

In our dl4j-examples repo, we don't make the ETL asynchronous, because the point of examples is to keep them simple. But for real-world problems, you need asynchronous ETL, and we'll show you how to do it with examples.

Data is stored on disk and disk is slow. That’s the default. So you run into bottlenecks when loading data onto your hard drive. When optimizing throughput, the slowest component is always the bottleneck. For example, a distributed Spark job using three GPU workers and one CPU worker will have a bottleneck with the CPU. The GPUs have to wait for that CPU to finish.

The Deeplearning4j class DatasetIterator hides the complexity of loading data on disk. The code for using any Datasetiterator will always be the same, invoking looks the same, but they work differently.

  • one loads from disk

  • one loads asynchronously

  • one loads pre-saved from RAM

Here's how the DatasetIterator is uniformly invoked for MNIST:

        while(mnistTest.hasNext()){
                DataSet ds = mnistTest.next();
                INDArray output = model.output(ds.getFeatures(), false);
                eval.eval(ds.getLabels(), output);
        }

You can optimize by using an asynchronous loader in the background. Java can do real multi-threading. It can load data in the background while other threads take care of compute. So you load data into the GPU at the same time that compute is being run. The neural net trains even as you grab new data from memory.

    MultiDataSetIterator iterator;
    if (prefetchSize > 0 && source.asyncSupported()) {
        iterator = new AsyncMultiDataSetIterator(source, prefetchSize);
    } else iterator = source;

Notice in the code above that prefetchSize is another parameter to set. Normal batch size might be 1000 examples, but if you set prefetchSize to 3, it would pre-fetch 3,000 instances.

ETL: Comparing Python frameworks With Deeplearning4j

But Java has robust tools for moving big data, and if compared correctly, is much faster than Python. The Deeplearning4j community has reported up to 3700% increases in speed over Python frameworks, when ETL and computation are optimized.

We try to be more flexible. That means you can point DL4J at raw photos, and it will load the image, run the transforms and put it into an NDArray to generate a dataset on the fly.

But if your training pipeline is doing that every time, Deeplearning4j will seem about 10x slower than other frameworks, because you’re spending your time creating datasets. Every time you call fit, you're recreating a dataset, over and over again. We allow it to happen for ease of use, but we can show you how to speed things up. There are ways to make it just as fast.

One way is to pre-save the datasets, in a manner similar to the Python frameworks. (Pickles are pre-formatted data.) When you pre-save the dataset, you create a separate class.

A Recordreaderdatasetiterator talks to Datavec and outputs datasets for DL4J.

Line 90 is where you see the asynchronous ETL. In this case, it's wrapping the pre-saved iterator, so you're taking advantage of both methods, with the asynch loading the pre-saved data in the background as the net trains.

MKL and Inference on CPUs

If you are running inference benchmarks on CPUs, make sure you are using Deeplearning4j with Intel's MKL library, which is available via a clickwrap; i.e. Deeplearning4j does not bundle MKL like Anaconda, which is used by libraries like PyTorch.

Memory

Setting available Memory/RAM for a DL4J application

Memory Management for ND4J/DL4J: How does it work?

ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.

"Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.

To manage memory allocations, we use two approaches:

  • JVM Garbage Collector (GC) and WeakReference tracking

Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and MemoryWorkspaces approaches is in when and how the memory is released.

  • For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.

  • For MemoryWorkspaces: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.

Configuring Memory Limits

With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:

  • -Xms - this defines how much memory JVM heap will use at application start.

  • -Xmx - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.

  • -Dorg.bytedeco.javacpp.maxbytes - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.

  • -Dorg.bytedeco.javacpp.maxphysicalbytes - this specifies the maximum bytes for the entire process - usually set to maxbytes plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. Unlike setting maxbytes setting maxphysicalbytes is optional. This can also be a percentage (>100%), in which case it would apply to maxMemory.

Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:

Gotchas: A few things to watch out for

  • With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.

  • For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.

  • If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.

  • If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.

  • If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. -Xmx8G will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.

  • In limited memory environments, it's usually a bad idea to use high -Xmx value together with -Xms option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set -Xms14G: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.

Memory-mapped files

ND4J supports the use of a memory-mapped file instead of RAM when using the nd4j-native backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.

Here's sample code:

In this case, a 1GB temporary file will be created and mmap'ed, and NDArray x will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.

GPUs

When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.

We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.

We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.

If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.

Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the Xmx option.

Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM at minimum. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.

It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.

Example:

It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the INDArray.unsafeDuplication() method.

The core workflow

An overview of the core deeplearning4j workflow

Introduction

An end to end workflow involves the following:

  1. Preparing your data

  2. Normalization

  3. Building a model

  4. Tuning a model

  5. Preparing for deployment

This page will try to cover considerations for each workflow and link to additional resources for how to handle each step that maybe specific to particular people.

Preparing your data

Data always needs to be preprocessed. This means converting data from a raw source of different data types to ndarrays to be processed by a neural network. In the deeplearning4j suite there can be a few ways to do this:

  1. The datavec module: Using a record reader abstraction, data can be read in batches via a data set iterator to train models

  2. Pre process using embedded python code in python4j: using the python ecosystem such as pandas and python opencv, you can embed python scripts and output numpy arrays for training

We recommend the following for the various data types:

Once you have figured out how you will convert your data, you will need to figure out how to split it up in to training and validation sets. Dl4j allows you to do this in a few ways.

If all of your data is in memory, you can use our dataset api's split test and train api.

Normalization

Once your input data has been created and converted to ndarrays, you still need to decide how to normalize your data. DL4J has a set of normalizers that cover the standard preprocessing, this includes:

Building a model

Once you have figured out how you will serialize your data as ndarrays you need to figure out how you will want to build your model.

When building a model, you can choose one of the following:

  1. Import a model from another framework such as tensorflow,keras or pytorch.

If you are going to import a model, there are a few things to be aware of.

  1. Tensorflow import: This uses samediff. Samediff has 2 forms of tensorflow import. The new version is the recommended path forward which uses a more extensible model import framework.

  2. Pytorch: Right now, it is required to import pytorch models via onnx. Please use pytorch's onnx model export to import a pytorch model in to deeplearning4j

For more advanced models, it is suggested that the user pick the samediff framework. Going forward, that will be the preferred way to train and run models.

When saving a model, make sure you save it. Note that the higher level dl4j interface and samediff also have different file formats. When saving models, note that normalizers above are saved separately. It is advised to save both separately.

Tuning a model

Deploying a model

When deploying a machine learning model, the first consideration is to figure out what you are deploying. Generally a model deployment contains:

  1. A normalizer file which is loaded and used during inference

  2. A model file (either a dl4j zip file or a samediff flatbuffers file)

  3. Data pipeline code that converts raw data from production to an appropriate format (usually ndarrays) for consumption by the neural network.

These 3 aspects of a deployment should all be treated as software assets just like code and be versioned. Optionally, a user may want to consider how to implement versioned deployments. There are a number of tools that can handle this.

Another consideration is performance. Depending on the nd4j backend you pick and the cpus you are deploying on, you may be able to add specialized performance increases such as:

  1. Compatibility: if you need to run on a very old linux, we also provide a centos 6 compatible compat classifier.

If you are going to just be deploying a model embedded in your application, then please remember the above artifacts for a model deployment when including resources for your micro service.

Performance Issues

How to Debug Performance Issues

This page is a how-to guide for debugging performance issues encountered when training neural networks with Deeplearning4j. Much of the information also applies to debugging performance issues encountered when using ND4J.

Deeplearning4j and ND4J provide excellent performance in most cases (utilizing optimized c++ code for all numerical operations as well as high performance libraries such as NVIDIA cuDNN and Intel MKL). However, sometimes bottlenecks or misconfiguration issues may limit performance to well below the maximum. This page is intended to be a guide to help users identify the cause of poor performance, and provide steps to fix these issues.

Performance issues may include:

  1. Poor CPU/GPU utilization

  2. Slower than expected training or operation execution

To start, here’s a summary of some possible causes of performance issues:

  1. Wrong ND4J backend is used (for example, CPU backend when GPU backend is expected)

  2. Not using cuDNN when using CUDA GPUs

  3. ETL (data loading) bottlenecks

  4. Garbage collection overheads

  5. Small batch sizes

  6. Multi-threaded use of MultiLayerNetwork/ComputationGraph for inference (not thread safe)

  7. Double precision floating point data type used when single precision should be used

  8. Not using workspaces for memory management (enabled by default)

  9. Poorly configured network

  10. Layer or operation is CPU-only

  11. CPU: Lack of hardware support for modern AVX etc extensions

  12. Other processes using CPU or GPU resources

  13. CPU: Lack of configuration of OMP_NUM_THREADS when using many models/threads simultaneously

Step 1: Check if correct backend is used

ND4J (and by extension, Deeplearning4j) can perform computation on either the CPU or GPU. The device used for computation is determined by your project dependencies - you include nd4j-native-platform to use CPUs for computation or nd4j-cuda-x.x-platform to use GPUs for computation (where x.x is your CUDA version - such as 9.2, 10.0 etc).

It is straightforward to check which backend is used. ND4J will log the backend upon initialization.

For CPU execution, you will expect output that looks something like:

For CUDA execution, you would expect the output to look something like:

Pay attention to the Loaded [X] backend and Backend used: [X] messages to confirm that the correct backend is used. If the incorrect backend is being used, check your program dependencies to ensure tho correct backend has been included.

Step 2: Check for cuDNN

If you are using CPUs only (nd4j-native backend) then you can skip to step 3 as cuDNN only applies when using NVIDIA GPUs (nd4j-cuda-x.x-platform dependency).

cuDNN is NVIDIA’s library for accelerating neural network training on NVIDIA GPUs. Deeplearning4j can make use of cuDNN to accelerate a number of layers - including ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM. When training on GPUs, cuDNN should always be used if possible as it is usually much faster than the built-in layer implementations.

How to determine if CuDNN is used or

Not all DL4J layer types are supported in cuDNN. DL4J layers with cuDNN support include ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM.

To check if cuDNN is being used, the simplest approach is to look at the log output when running inference or training: If cuDNN is NOT available when you are using a layer that supports it, you will see a message such as:

If cuDNN is available and was loaded successfully, no message will be logged.

Alternatively, you can confirm that cuDNN is used by using the following code:

Note that you will need to do at least one forward pass or fit call to initialize the cuDNN layer helper.

If cuDNN is available and was loaded successfully, you will see the following printed:

whereas if cuDNN is not available or could not be loaded successfully (you will get a warning or error logged also):

Step 3: Check for ETL (Data Loading) Bottlenecks

Neural network training requires data to be in memory before training can proceed. If the data is not loaded fast enough, the network will have to wait until data is available. DL4J uses asynchronous prefetch of data to improve performance by default. Under normal circumstances, this asynchronous prefetching means the network should never be waiting around for data (except on the very first iteration) - the next minibatch is loaded in another thread while training is proceeding in the main thread.

However, when data loading takes longer than the iteration time, data can be a bottleneck. For example, if a network takes 100ms to perform fitting on a single minibatch, but data loading takes 200ms, then we have a bottleneck: the network will have to wait 100ms per iteration (200ms loading - 100ms loading in parallel with training) before continuing the next iteration. Conversely, if network fit operation was 100ms and data loading was 50ms, then no data loading bottleck will occur, as the 50ms loading time can be completed asynchronously within one iteration.

How to check for ETL / data loading bottlenecks

The way to identify ETL bottlenecks is simple: add PerformanceListener to your network, and train as normal. For example:

When training, you will see output such as:

The above output shows that there is no ETL bottleneck (i.e., ETL: 0 ms). However, if ETL time is greater than 0 consistently (after the first iteration), an ETL bottleneck is present.

How to identify the cause of an ETL bottleneck

There are a number of possible causes of ETL bottlenecks. These include (but are not limited to):

  • Slow hard drives

  • Network latency or throughput issues (when reading from remote or network storage)

  • Computationally intensive or inefficient ETL (especially for custom ETL pipelines)

Step 4: Check for Garbage Collection Overhead

Even though DL4J/ND4J array memory is off-heap, garbage collection can still cause performance issues.

In summary:

  • Garbage collection will sometimes (temporarily and briefly) pause/stop application execution (“stop the world”)

  • These GC pauses slow down program execution

  • The overall performance impact of GC pauses depends on both the frequency of GC pauses, and the duration of GC pauses

  • The frequency is controllable (in part) by ND4J, using Nd4j.getMemoryManager().setAutoGcWindow(10000); and Nd4j.getMemoryManager().togglePeriodicGc(false);

  • Not every GC event is caused by or controlled by the above ND4J configuration.

In our experience, garbage collection time depends strongly on the number of objects in the JVM heap memory. As a rough guide:

  • Less than 100,000 objects in heap memory: short GC events (usually not a performance problem)

  • 100,000-500,000 objects: GC overhead becomes noticeable, often in the 50-250ms range per full GC event

  • 500,000 or more objects: GC can be a bottleneck if performed frequently. Performance may still be good if GC events are infrequent (for example, every 10 seconds or less).

  • 10 million or more objects: GC is a major bottleneck even if infrequently called, with each full GC takes multiple seconds

How to configure ND4J garbage collection settings

In simple terms, there are two settings of note:

How to determine GC impact using PerformanceListener

NOTE: this feature was added after 1.0.0-beta3 and will be available in future releases To determine the impact of garbage collection using PerformanceListener, you can use the following:

This will report GC activity:

The garbage collection activity is reported for all available garbage collectors - the GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)] means that garbage collection was performed 2 times since the last PerformanceListener reporting, and took 1ms and 24ms total respectively for the two GC algorithms, respectively.

Keep in mind: PerformanceListener reports GC events every N iterations (as configured by the user). Thus, if PerformanceListener is configured to report statistics every 10 iterations, the garbage collection stats would be for the period of time corresponding to the last 10 iterations.

How to determine GC impact using -verbose:gc

When these options are enabled, you will have information reported on each GC event, such as:

This information can be used to determine the frequency, cause (System.gc() calls, allocation failure, etc) and duration of GC events.

How to determine GC impact using a profiler

An alternative approach is to use a profiler to collect garbage collection information.

How to determine number (and type) of JVM heap objects using memory dumps

If you determine that garbage collection is a problem, and suspect that this is due to the number of objects in memory, you can perform a heap dump.

To perform a heap dump:

  • Step 1: Run your program

  • Step 2: While running, determine the process ID

    • One approach is to use jps:

      • For basic details, run jps on the command line. If jps is not on the system PATH, it can be found (on Windows) at C:\Program Files\Java\jdk<VERSION>\bin\jps.exe

      • For more details on each process, run jps -lv instead

    • Alternatively, you can use the top command on Linux or Task Manager (Windows) to find the PID (on Windows, the PID column may not be enabled by default)

  • Step 3: Create a heap dump using jmap -dump:format=b,file=file_name.hprof 123 where 123 is the process id (PID) to create the heap dump for

After a memory dump has been collected, it can be opened in tools such as YourKit profiler and VisualVM to determine the number, type and size of objects. With this information, you should be able to pinpoint the cause of the large number of objects and make changes to your code to reduce or eliminate the objects that are causing the garbage collection overhead.

Step 5: Check Minibatch Size

Another common cause of performance issues is a poorly chosen minibatch size. A minibatch is a number of examples used together for one step of inference and training. Minibatch sizes of 32 to 128 are commonly used, though smaller or larger are sometimes used.

In summary:

  • If minibatch size is too small (for example, training or inference with 1 example at a time), poor hardware utilization and lower overall throughput is expected

  • If minibatch size is too large

    • Hardware utilization will usually be good

    • Iteration times will slow down

    • Memory utilization may be too high (leading to out-of-memory errors)

For inference, avoid using minibatch size of 1, as throughput will suffer. Unless there are strict latency requirements, you should use larger minibatch sizes as this will give you the best hardware utilization and hence throughput, and is especially important for GPUs.

For training, you should never use a minibatch size of 1 as overall performance and hardware utilization will be reduced. Network convergence may also suffer. Start with a minibatch size of 32-128, if memory will allow this to be used.

Step 6: Ensure you are not using a single MultiLayerNetwork/ComputationGraph for inference from multiple threads

MultiLayerNetwork and ComputationGraph are not considered thread-safe, and should not be used from multiple threads. That said, most operations such as fit, output, etc use synchronized blocks. These synchronized methods should avoid hard to understand exceptions (race conditions due to concurrent use), they will limit throughput to a single thread (though, note that native operation parallelism will still be parallelized as normal). In summary, using the one network from multiple threads should be avoided as it is not thread safe and can be a performance bottleneck.

Step 7: Check Data Types

As of 1.0.0-beta3 and earlier, ND4J has a global datatype setting that determines the datatype of all arrays. The default value is 32-bit floating point. The data type can be set using Nd4j.setDataType(DataBuffer.Type.FLOAT); for example.

Performance on CPUs can also be reduced for double precision due to the additional memory batchwidth requirements vs. float precision.

You can check the data type setting using:

Step 8: Check workspace configuration for memory management (enabled by default)

In summary, workspaces are enabled by default for all Deeplearning4j networks, and enabling them improves performance and reduces memory requirements. There are very few reasons to disable workspaces.

You can check that workspaces are enabled for your MultiLayerNetwork using:

or for a ComputationGraph using:

You want to see the output as ENABLED output for both training and inference. To change the workspace configuration, use the setter methods, for example: net.getLayerWiseConfigurations().setTrainingWorkspaceMode(WorkspaceMode.ENABLED);

Step 9: Check for a badly configured network or network with layer bottlenecks

Another possible cause (especially for newer users) is a poorly designed network. A network may be poorly designed if:

  • It has too many layers. A rough guideline:

    • More than about 100 layers for a CNN may be too many

    • More than about 10 layers for a RNN/LSTM network may be too many

    • More than about 20 feed-forward layers may be too many for a MLP

  • The input/activations are too large

    • For CNNs, inputs in the range of 224x224 (for image classification) to 600x600 (for object detection and segmentation) are used. Large image sizes (such as 500x500) are computationally demanding, and much larger than this should be considered too large in most cases.

  • The output number of classes is too large

    • Classification with more than about 10,000 classes can become a performance bottleneck with standard softmax output layers

  • The layers are too large

    • For CNNs, most layers have kernel sizes in the range 2x2 to 7x7, with channels equal to 32 to 1024 (with larger number of channels appearing later in the network). Much larger than this may cause a performance bottleneck.

    • For MLPs, most layers have at most 2048 units/neurons (often much smaller). Much larger than this may be too large.

    • For RNNs such as LSTMs, layers are typically in the range of 128 to 512, though the largest RNNs may use around 1024 units per layer.

  • The network has too many parameters

    • This is usually a consequence of the other issues already mentioned - too many layers, too large input, too many output classes

    • For comparison, less than 1 million parameters would be considered small, and more than about 100 million parameters would be considered very large.

    • You can check the number of parameters using MultiLayerNetwork/ComputationGraph.numParams() or MultiLayerNetwork/ComputationGraph.summary()

Note that these are guidelines only, and some reasonable network may exceed the numbers specified here. Some networks can become very large, such as those commonly used for imagenet classification or object detection. However, in these cases, the network is usually carefully designed to provide a good tradeoff between accuracy and computation time.

If your network architecture is significantly outside of the guidelines specified here, you may want to reconsider the design to improve performance.

Step 10: Check for CPU-only ops (when using GPUs)

If you are using CPUs only (nd4j-native backend), you can skip this step, as it only applies when using the GPU (nd4j-cuda) backend.

As of 1.0.0-beta3, a handful of recently added operations do not yet have GPU implementations. Thus, when these layer are used in a network, they will execute on CPU only, irrespective of the nd4j-backend used. GPU support for these layers will be added in an upcoming release.

The layers without GPU support as of 1.0.0-beta3 include:

  • Convolution3D

  • Upsampling1D/2D/3D

  • Deconvolution2D

  • LocallyConnected1D/2D

  • SpaceToBatch

  • SpaceToDepth

Unfortunately, there is no workaround or fix for now, until these operations have GPU implementations completed.

Step 11: Check CPU support for hardware extensions (AVX etc)

If you are running on a GPU, this section does not apply.

When running on older CPUs or those that lack modern AVX extensions such as AVX2 and AVX512, performance will be reduced compared to running on CPUs with these features. Though there is not much you can do about the lack of such features, it is worth knowing about if you are comparing performance between different CPU models.

In summary, CPU models with AVX2 support will perform better than those without it; similarly, AVX512 is an improvement over AVX2.

Step 12: Check other processes using CPU or GPU resources

Another obvious cause of performance issues is other processes using CPU or GPU resources.

For CPU, it is straightforward to see if other processes are using resources using tools such as top (for Linux) or task managed (for Windows).

For NVIDIA CUDA GPUs, nvidia-smi can be used. nvidia-smi is usually installed with the NVIDIA display drivers, and (when run) shows the overall GPU and memory utilization, as well as the GPU utilization of programs running on the system.

On Linux, this is usually on the system path by default. On Windows, it may be found at C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi

Step 13: Check OMP_NUM_THREADS performing concurrent inference using CPU in multiple threads simultaneously

If you are using GPUs (nd4j-cuda backend), you can skip this section.

One issue to be aware of when running multiple DL4J networks (or ND4J operations generally) concurrently in multiple threads is the OpenMP number of threads setting. In summary, in ND4J we use OpenMP pallelism at the c++ level to increase operation performance. By default, ND4J will use a value equal to the number of physical CPU cores (not logical cores) as this will give optimal performance

This also applies if the CPU resources are shared with other computationally demanding processes.

Debugging Performance Issues with JVM Profiling

Profiling is a process whereby you can trace how long each method in your code takes to execute, to identify and debug performance bottlenecks.

A full guide to profiling is beyond the scope of this page, but the summary is that you can trace how long each method takes to execute (and where it is being called from) using a profiling tool. This information can then be used to identify bottlenecks (and their causes) in your program.

How to Perform Profiling

The YourKit profiling documentation is quite good. To perform profiling with YourKit:

  • Install and start YourKit Profiler

  • Collect a snapshot and analyze

Profiling on Spark

When debugging performance issues for Spark training or inference jobs, it can often be useful to perform profiling here also.

One approach that we have used internally is to combine manual profiling settings (-agentpath JVM argument) with spark-submit arguments for YourKit profiler.

To perform profiling in this manner, 5 steps are required:

  1. Download YourKit profiler to a location on each worker (must be the same location on each worker) and (optionally) the driver

  2. [Optional] Copy the profiling configuration onto each worker (must be the same location on each worker)

  3. Create a local output directory for storing the profiling result files on each worker

  4. Launch the Spark job with the appropriate configuration (see example below)

  5. The snapshots will be saved when the Spark job completes (or is cancelled) to the specified directories.

For example, to perform tracing on both the driver and the workers,

The configuration (tracing_settings_path) is optional. A sample tracing settings file is provided below:

Backends

Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.

ND4J works atop so-called backends, or linear-algebra libraries, such as Native nd4j-native and nd4j-cuda-10.2 (GPUs), which you can select by pasting the right dependency into your project’s POM.xml file.

ND4J backends for GPUs and CPUs

You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.

If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:

As of now, the artifactId for the CUDA versions can be one of nd4j-cuda-11.0,nd4j-cuda-11.2. Generally, the last 2 cuda versions are supported for a given release.

Otherwise you will need to use the native implementation of ND4J as a CPU backend:

Building for Multiple Operating Systems

If you are developing your project on multiple operating systems/system architectures, you can add -platform to the end of your artifactId which will download binaries for most major systems.

Bundling multiple Backends

For enabling different backends at runtime, you set the priority with your environment via the environment variable

Relative to the priority, it will allow you to dynamically set the backend type.

CuDNN

CUDA Installation

Troubleshooting

Nd4jBackend$NoAvailableBackendException

There are multiple reasons why you might run into this error message.

  1. You haven't configured an ND4J backend at all.

  2. You have a jar file that doesn't contain a backend for your platform.

  3. You have a jar file that doesn't contain service loader files.

You haven't configured any ND4J Backend

Read this page and add a ND4J Backend to your dependencies:

You have a jar file that doesn't contain a backend for your platform.

This happens when you use a non -platform type backend dependency definition. In this case, only the Backend for the system that the jar file was built on will be included.

To solve this issue, use nd4j-native-platform instead of nd4j-native, if you are running on CPU and nd4j-cuda-11.2-platform instead of nd4j-cuda-11.2 when using the GPU backend.

If the jar file only contains the GPU backend, but your system has no CUDA capable (CC >= 3.5) GPU or CUDA isn't installed on the system, the CPU Backend should be used instead.

You have a jar file that doesn't contain service loader files.

To double check that the required files are included, open your uberjar and make sure it contains /META-INF/services/org.nd4j.linalg.factory.Nd4jBackend. Then open the file, and make sure there are entries for all of your configured backends.

Language Processing

Overview of language processing in DL4J

Although not designed to be comparable to tools such as Stanford CoreNLP or NLTK, deepLearning4J does include some core text processing tools that are described here.

Deeplearning4j's NLP support contains interfaces for different NLP libraries. A user wraps third party libraries via our interfaces. Deeplearning4j as of M1, does not support any 3rd party libraries directly. This is due to the lack of maintenance and custom work needed to make this work well for users. Instead, we expose interfaces to allow users to implement their own tokenizers.

SentenceIterator

There are several steps involved in processing natural language. The first is to iterate over your corpus to create a list of documents, which can be as short as a tweet, or as long as a newspaper article. This is performed by a SentenceIterator, which will appear like this:

The SentenceIterator encapsulates a corpus or text, organizing it, say, as one Tweet per line. It is responsible for feeding text piece by piece into your natural language processor. The SentenceIterator is not analogous to a similarly named class, the DatasetIterator, which creates a dataset for training a neural net. Instead it creates a collection of strings by segmenting a corpus.

Tokenizer

A Tokenizer further segments the text at the level of single words, also alternatively as n-grams. ClearTK contains the underlying tokenizers, such as parts of speech (PoS) and parse trees, which allow for both dependency and constituency parsing, like that employed by a recursive neural tensor network (RNTN).

Both Tokenizers and SentenceIterators work with Preprocessors to deal with anomalies in messy text like Unicode, and to render such text, say, as lowercase characters uniformly.

Vocab

Each document has to be tokenized to create a vocab, the set of words that matter for that document or corpus. Those words are stored in the vocab cache, which contains statistics about a subset of words counted in the document, the words that "matter". The line separating significant and insignifant words is mobile, but the basic idea of distinguishing between the two groups is that words occurring only once (or less than, say, five times) are hard to learn and their presence represents unhelpful noise.

The vocab cache stores metadata for methods such as Word2vec and Bag of Words, which treat words in radically different ways. Word2vec creates representations of words, or neural word embeddings, in the form of vectors that are hundreds of coefficients long. Those coefficients help neural nets predict the likelihood of a word appearing in any given context; for example, after another word. Here's Word2vec, configured:

Once you obtain word vectors, you can feed them into a deep net for classification, prediction, sentiment analysis and the like.

Build Tools

Configure the build tools for Deeplearning4j.

Configuring your build tool

While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects.

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Gradle

You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:

Add a backend by adding the following:

SBT

You can use Deeplearning4j with SBT by adding the following to your build.sbt:

Add a backend by adding the following:

Ivy

You can use Deeplearning4j with ivy by adding the following to your ivy.xml:

Add a backend by adding the following:

Leinengen

NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.

Doc2Vec

Doc2Vec and arbitrary documents for language processing in DL4J.

The main purpose of Doc2Vec is associating arbitrary documents with labels, so labels are required. Doc2vec is an extension of word2vec that learns to correlate labels and words, rather than words with other words. Deeplearning4j's implentation is intended to serve the Java, Scala and Clojure communities.

The first step is coming up with a vector that represents the "meaning" of a document, which can then be used as input to a supervised machine learning algorithm to associate documents with labels.

In the ParagraphVectors builder pattern, the labels() method points to the labels to train on. In the example below, you can see labels related to sentiment analysis:

Further Reading

Sentence Iterator

Iteration of words, documents, and sentences for language processing in DL4J.

It feeds bits of text into a neural network in the form of vectors, and also covers the concept of documents in text processing.

In natural-language processing, a document or sentence is typically used to encapsulate a context which an algorithm should learn.

Some typical examples are below:

This assumes that each line in a file is a sentence.

You can also do list of strings as sentence as follows:

This will assume that each string is a sentence (document). Remember this could be a list of Tweets or articles -- both are applicable.

You can iterate over files as follows:

This will parse the files line by line and return individual sentences on each one.

For anything complex, we recommend any pipeline that can implement more in depth support than space separated tokens.

Tokenization

Breaking text into individual words for language processing in DL4J.

Notes to write on: 1. Tokenizer factory interface 2. Tokenizer interface 2. How to write your own factory and tokenizer

Tokenization

What is Tokenization?

Example

Here's an example of tokenization done with DL4J tools:

The above snippet creates a tokenizer capable of stemming.

In Word2Vec, that's the recommended a way of creating a vocabulary, because it averts various vocabulary quirks, such as the singular and plural of the same noun being counted as two different words.

Vocabulary Cache

Mechanism for handling general NLP tasks in DL4J.

The vocabulary cache, or vocab cache, is a mechanism for handling general-purpose natural-language tasks in Deeplearning4j, including normal TF-IDF, word vectors and certain information-retrieval techniques. The goal of the vocab cache is to be a one-stop shop for text vectorization, encapsulating techniques common to bag of words and word vectors, among others.

Vocab cache handles storage of tokens, word-count frequencies, inverse-document frequencies and document occurrences via an inverted index. The InMemoryLookupCache is the reference implementation.

In order to use a vocab cache as you iterate over text and index tokens, you need to figure out if the tokens should be included in the vocab. The criterion is usually if tokens occur with more than a certain pre-configured frequency in the corpus. Below that frequency, an individual token isn't a vocab word, and it remains just a token.

We track tokens as well. In order to track tokens, do the following:

When you want to add a vocab word, do the following:

Adding the word to the index sets the index. Then you declare it as a vocab word. (Declaring it as a vocab word will pull the word from the index.)

Sequential Models

Importing the functional model.

Getting started with importing Keras Sequential models

Let's say you start with defining a simple MLP using Keras:

In Keras there are several ways to save a model. You can store the whole model (model definition, weights and training configuration) as HDF5 file, just the model configuration (as JSON or YAML file) or just the weights (as HDF5 file). Here's how you do each:

If you decide to save the full model, you will have access to the training configuration of the model, otherwise you don't. So if you want to further train your model in DL4J after import, keep that in mind and use model.save(...) to persist your model.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

In case you didn't compile your Keras model, it will not come with a training configuration. In that case you need to explicitly tell model import to ignore training configuration by setting the enforceTrainingConfig flag to false like this:

To load just the model configuration from JSON, you use KerasModelImport as follows:

If additionally you also want to load the model weights with the configuration, here's what you do:

In the latter two cases no training configuration will be read.

Maven

Configure the Maven build tool for Deeplearning4j.

Configuring the Maven build tool

You can use Deeplearning4j with Maven by adding the following to your pom.xml:

The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.

Add a backend

DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your pom.xml:

Keras Import

Overview of model import.

Deeplearning4j: Keras model import

Getting started: Import a Keras model in 60 seconds

If you put this model file (simple_mlp.h5) into the base of your resource folder of your project, you can load the Keras model as DL4J MultiLayerNetwork as follows

You can now use your imported model for inference (here with dummy data for simplicity)

Here's how you do training in DL4J for your imported model:

Project setup

To use Keras model import in your existing project, all you need to do is add the following dependency to your pom.xml.

Backend

DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models can be imported into DL4J.

Popular models and applications

  • Deep convolutional and Wasserstein GANs

  • UNET

  • ResNet50

  • SqueezeNet

  • MobileNet

  • Inception

  • Xception

Troubleshooting and support

An IncompatibleKerasConfigurationException message indicates that you are attempting to import a Keras model configuration that is not currently supported in Deeplearning4j (either because model import does not cover it, or DL4J does not implement the layer, or feature).

Once you have imported your model, we recommend our own ModelSerializer class for further saving and reloading of your model.

Why Keras model import?

Keras is a popular and user-friendly deep learning library written in Python. The intuitive API of Keras makes defining and running your deep learning models in Python easy. Keras allows you to choose which lower-level library it runs on, but provides a unified API for each such backend. Currently, Keras supports Tensorflow, CNTK and Theano backends.

There is often a gap between the production system of a company and the experimental setup of its data scientists. Keras model import allows data scientists to write their models in Python, but still seamlessly integrates with the production stack.

Keras model import is targeted at users mainly familiar with writing their models in Python with Keras. With model import you can bring your Python models to production by allowing users to import their models into the DL4J ecosystem for either further training or evaluation purposes.

Keras Import API Overview

Keras model import API

KerasModelImport

Reads stored Keras configurations and weights from one of two archives: either as

  • a single HDF5 file storing model and training JSON configurations and weights

  • separate text file storing model JSON configuration and HDF5 file storing weights.

importKerasModelAndWeights

Load Keras (Functional API) Model saved using model.save_model(…).

  • param modelHdf5Stream InputStream containing HDF5 archive storing Keras Model

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • see ComputationGraph

importKerasModelAndWeights

Load Keras (Functional API) Model saved using model.save_model(…).

  • param modelHdf5Stream InputStream containing HDF5 archive storing Keras Model

  • return ComputationGraph

  • see ComputationGraph

importKerasSequentialModelAndWeights

Load Keras Sequential model saved using model.save_model(…).

  • param modelHdf5Stream InputStream containing HDF5 archive storing Keras Sequential model

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • see ComputationGraph

importKerasSequentialModelAndWeights

Load Keras Sequential model saved using model.save_model(…).

  • param modelHdf5Stream InputStream containing HDF5 archive storing Keras Sequential model

  • return ComputationGraph

  • see ComputationGraph

importKerasModelAndWeights

Load Keras (Functional API) Model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Model

  • param inputShape optional input shape for models that come without such (e.g. notop = false models)

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • throws IOException IO exception

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

  • see ComputationGraph

importKerasModelAndWeights

Load Keras (Functional API) Model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Model

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • throws IOException IO exception

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

  • see ComputationGraph

importKerasModelAndWeights

Load Keras (Functional API) Model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Model

  • return ComputationGraph

  • throws IOException IO exception

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

  • see ComputationGraph

importKerasSequentialModelAndWeights

Load Keras Sequential model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Sequential model

  • param inputShape optional input shape for models that come without such (e.g. notop = false models)

  • param enforceTrainingConfig whether to enforce training configuration options

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasSequentialModelAndWeights

Load Keras Sequential model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Sequential model

  • param enforceTrainingConfig whether to enforce training configuration options

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasSequentialModelAndWeights

Load Keras Sequential model saved using model.save_model(…).

  • param modelHdf5Filename path to HDF5 archive storing Keras Sequential model

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasModelAndWeights

Load Keras (Functional API) Model for which the configuration and weights were saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Model configuration

  • param weightsHdf5Filename path to HDF5 archive storing Keras model weights

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • throws IOException IO exception

  • see ComputationGraph

importKerasModelAndWeights

Load Keras (Functional API) Model for which the configuration and weights were saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Model configuration

  • param weightsHdf5Filename path to HDF5 archive storing Keras model weights

  • return ComputationGraph

  • throws IOException IO exception

  • see ComputationGraph

importKerasSequentialModelAndWeights

Load Keras Sequential model for which the configuration and weights were saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Sequential model configuration

  • param weightsHdf5Filename path to HDF5 archive storing Keras model weights

  • param enforceTrainingConfig whether to enforce training configuration options

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasSequentialModelAndWeights

Load Keras Sequential model for which the configuration and weights were saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Sequential model configuration

  • param weightsHdf5Filename path to HDF5 archive storing Keras model weights

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasModelConfiguration

Load Keras (Functional API) Model for which the configuration was saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Model configuration

  • param enforceTrainingConfig whether to enforce training configuration options

  • return ComputationGraph

  • throws IOException IO exception

  • see ComputationGraph

importKerasModelConfiguration

Load Keras (Functional API) Model for which the configuration was saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Model configuration

  • return ComputationGraph

  • throws IOException IO exception

  • see ComputationGraph

importKerasSequentialConfiguration

Load Keras Sequential model for which the configuration was saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Sequential model configuration

  • param enforceTrainingConfig whether to enforce training configuration options

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

importKerasSequentialConfiguration

Load Keras Sequential model for which the configuration was saved separately using calls to model.to_json() and model.save_weights(…).

  • param modelJsonFilename path to JSON file storing Keras Sequential model configuration

  • return MultiLayerNetwork

  • throws IOException IO exception

  • see MultiLayerNetwork

Quick Start

Quickstart for Java using Maven

Get started

This is everything you need to run DL4J examples and begin your own projects.

We are currently reworking the Getting Started Guide.

A Taste of Code

Deeplearning4j is a domain-specific language to configure deep neural networks, which are made of multiple layers. Everything starts with a MultiLayerConfiguration, which organizes those layers and their hyperparameters.

Hyperparameters are variables that determine how a neural network learns. They include how many times to update the weights of the model, how to initialize those weights, which activation function to attach to the nodes, which optimization algorithm to use, and how fast the model should learn. This is what one configuration would look like:

With Deeplearning4j, you add a layer by calling layer on the NeuralNetConfiguration.Builder(), specifying its place in the order of layers (the zero-indexed layer below is the input layer), the number of input and output nodes, nIn and nOut, as well as the type: DenseLayer.

Once you've configured your net, you train the model with model.fit.

Prerequisites

You should have these installed to use this QuickStart guide. DL4J targets professional Java developers who are familiar with production deployments, IDEs and automated build tools. Working with DL4J will be easiest if you already have experience with these.

Please make sure you have a 64-Bit version of java installed, as you will see an error telling you no jnind4j in java.library.path if you decide to try to use a 32-Bit version instead. Make sure the JAVA_HOME environment variable is set.

If you are working on a Mac, you can simply enter the following into the command line:

The latest version of Mac's Mojave OS breaks git, producing the following error message:

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

This can be fixed by running:

  1. Use the command line to enter the following:

  2. Open IntelliJ and choose Import Project. Then select the main 'dl4j-examples' directory. (Note: the example in the illustration below refers to an outdated repository named dl4j-0.4-examples. However, the repository that you will download and install will be called dl4j-examples).![select directory](../../.gitbook/assets/install_intj_1%20(2).png)

  3. Choose 'Import project from external model' and ensure that Maven is selected.

    ![select directory](../../.gitbook/assets/install_intj_2%20(2).png)

  4. Continue through the wizard's options. Select the SDK that begins with jdk. (You may need to click on a plus sign to see your options...) Then click Finish. Wait a moment for IntelliJ to download all the dependencies. You'll see the horizontal bar working on the lower right.

  5. Pick an example from the file tree on the left. Right-click the file to run.

    ![run IntelliJ example](../../.gitbook/assets/install_intj_3%20(3).png)

Using DL4J In Your Own Projects: Configuring the POM.xml File

To run DL4J in your own projects, we highly recommend using Maven for Java users, or a tool such as SBT for Scala. The basic set of dependencies and their versions are shown below. This includes:

  • deeplearning4j-core, which contains the neural network implementations

  • nd4j-native-platform, the CPU version of the ND4J library that powers DL4J

  • datavec-api - Datavec is our library vectorizing and loading data

To run the example, right click on it and select the green button in the drop-down menu. You will see, in IntelliJ's bottom window, a series of scores. The rightmost number is the error score for the network's classifications. If your network is learning, then that number will decrease over time with each batch it processes. At the end, this window will tell you how accurate your neural-network model has become:

![](../../.gitbook/assets/mlp_classifier_results%20(4).png)

In another window, a graph will appear, showing you how the multilayer perceptron (MLP) has classified the data in the example. It will look like this:

Congratulations! You just trained your first neural network with Deeplearning4j.

Next Steps

Additional links

Troubleshooting

Q: I'm using a 64-Bit Java on Windows and still get the no jnind4j in java.library.path error

A: You may have incompatible DLLs on your PATH. To tell DL4J to ignore those, you have to add the following as a VM parameter (Run -> Edit Configurations -> VM Options in IntelliJ):

Q: SPARK ISSUES I am running the examples and having issues with the Spark based examples such as distributed training or datavec transform options.

Troubleshooting: Debugging UnsatisfiedLinkError on Windows

Windows users might be seeing something like:

Quickstart template

Now that you've learned how to run the different examples, we've made a template available for you that has a basic MNIST trainer with simple evaluation code.

To use the template:

  1. Copy the standalone-sample-project from the examples and give it the name of your project.

  2. Import the folder into IntelliJ.

  3. Start coding!

Custom Layers

How to implement custom Keras layers for import in Deeplearning4J.

Many more advanced models will contain custom layers, i.e. layers that aren't included in Keras.

You can import those models too, but you will have to provide an implementation of that layer yourself, as the exported model file only provides us with a name for it.

Usually, you will have found out about needing to implement a custom layer, when you saw an exception like the following:

or

Implementing a custom layer for Keras import

There are two ways of implementing a custom layer for Keras import. Which one is the right approach for you, depends on the type of layer you need to implement.

  1. SameDiffLambdaLayer Use this approach if your layer doesn't have any weights and defines just a computation. It is most useful when you have to define a custom layer because you are using a lambda in your model definition. This is the approach you should be using when you've gotten the exception about no lambda layer being found.

  2. KerasLayer Use this approach if your layer needs its own weights. It is most useful when you have to define some complex layer that is more than just a simple computation. This is the approach you should be using when you've gotten the exception about an unsupported layer type.

SameDiffLambdaLayer

Using a SameDiffLambdaLayer is pretty easy. You create a new class that extends it, and override the defineLayer and getOutputType methods.

This simple lambda layer just multiplies its input by 3.

defineLayer will only be called once to create the SameDiff graph that is used as the definition of this layer. Do not use information about the size of the inputs or other non-static sizes, like batch size, when defining the layer, or it may fail later on.

After defining your layer, you have to register it to make it available on import.

The correct name for your lambda layer will depend on the model you are importing. As you, most likely, were made aware of needing to implement the lambda layer by an exception, this exception should have given you the proper name already.

KerasLayer

Implementing a full layer with weights is more complex than defining a lambda layer. You will have to create a new class that extends KerasLayer and that reads the configuration of that layer and defines it appropriately.

After you've defined your layer, you will have to register it to make it available on import:

Again, the appropriate name will we apparent from the exception that has notified you about needing to implement the custom layer in the first place.

Custom Layers

Extend DL4J functionality for custom layers.

There are two components to adding a custom layer:

  1. Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer

  2. Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer

The configuration layer ((1) above) class handles the settings. It's the one you would use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom settings here, and use them in your layer.

The implementation layer ((2) above) class has parameters, and handles network forward pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...) method. In other words: the instantiate method is how we go from the configuration to the implementation; MultiLayerNetwork or ComputationGraph will call this method when initializing the

An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the implementation class). Both of these classes have extensive comments regarding their methods.

You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes, etc: the reason is because one is for the configuration, one is for the implementation. We have not followed this "same name" pattern here to hopefully avoid confusion.

Testing Your Custom Layer

Once you have added a custom layer, it is necessary to run some tests to ensure it is correct.

These tests should at a minimum include the following:

  1. Tests to ensure that the JSON configuration (to/from JSON) works correctly

    This is necessary for networks with your custom layer to function with both

    model serialization (saving) and Spark training.

  2. Gradient checks to ensure that the implementation is correct.

Example

DL4J heavily depends on for its interop between java and platform optimized c++ libraries. However, due to our usage of JNI this comes with certain complexities in the build anyone should be aware of.

Each of these libraries are what comprise our nd4j backends. Leveraging [libnd4j], javacpp handles linking each nd4j-backend against the libnd4j c++ codebase. This linking is done using a libnd4j home. This will contain all of the include files and necessary binary files for specific platforms. By default, nd4j backends and the libnd4j code base are compiled within the same build step. This is the recommended default, but for specific circumstances. A libnd4j release is also uploaded to maven central as a zip file and can be used in place of libnd4j compilation. See our for more information on this.

The presets: This is a similar concept in spirit to the In order to avoid a race condition between the backend and the presets compilation, this is a separate dependency that just exists to handle interop between the libnd4j code base and the java frontend. The above backend then contains the rest of the logic needed for execution of the math operations on specific platforms.

After a libnd4j build is executed for a specific platform, we need to leverage javacpp to actually link against libnd4j to create a complete libnd4j backend. When invoking a maven build, the is used to actually invoke a build. The presets will be compiled first. Generally the presets are just 1 or 2 classes containing a description of how to map the actual nd4j code base to the libnd4j codebase.

Nd4j reuses javacpp's notion of a -platform library. This is a curated set of dependencies most users will use as part of a build. Each backend will have an associated -platform artifact so users don't have to deal with maven classifiers. See for how to leverage this artifact.

A comprehensive list of classifiers can be found Note that each library we link against such as will also have a similar set of classifiers.

Throughout the dl4j pom.xml files, platform specific profiles that setup dependencies exist. An can be found here. This helps us dynamically figure out which platform someone is building for.

A testing setup the team uses for testing android involves lineageos, termux, and some arm32 based open jdk debian files that can be found

We recommend that you join our . There you can request help and give feedback, but please do use this guide before asking questions we've answered below. If you are new to deep learning, we've included with links to courses, readings and other resources.

If you just want to get started, please consider reading our .

If you find that you have trouble following along here, take a look at the Konduit blog, as it features .

Use cases include: 1. Numerical computation. See:

2. Define and train models using a tensorflow/pytorch like interface. See:

3. Model import and deployment. See:

4. Running models on spark. See:

5. A small self contained library for running math code. See:

Other use cases are available as well, please feel free to check more of our

1.8 or later (Only 64-Bit versions supported)

(automated build and dependency manager)

or Eclipse

If you are new to Java or unfamiliar with these tools, read the details below for help with installation and setup. Otherwise, skip to .

If you don't have Java 1.8 or later, download the current . To check if you have a compatible version of Java installed, use the following command:

Maven is a dependency management and automated build tool for Java projects. It works well with IDEs such as IntelliJ and lets you install DL4J project libraries easily. to the latest release following for your system. To check if you have the most recent version of Maven installed, enter the following:

Maven is widely used among Java developers and it's pretty much mandatory for working with DL4J. If you come from a different background, and Maven is new to you, check out and our , which includes some additional troubleshooting tips. such as Ivy and Gradle can also work, but we support Maven best.

An Integrated Development Environment () allows you to work with our API and configure neural networks in a few steps. We strongly recommend using , which communicates with Maven to handle dependencies. The is free.

There are other popular IDEs such as and . However, IntelliJ is preferred, and using it will make finding help on the easier if you need it.

Install the . If you already have Git, you can update to the latest version using Git itself:

Every Maven project has a POM file. Here is when you run your examples.

Within IntelliJ, you will need to choose the first Deeplearning4j example you're going to run. We suggest MLPClassifierLinear, as you will almost immediately see the network classify two groups of data in our UI. The file on .

Join our community forums on .

Read the .

Check out the more detailed .

Python folks: If you plan to run benchmarks on Deeplearning4j comparing it to well-known Python framework [x], please read on how to optimize heap space, garbage collection and ETL on the JVM. By following them, you will see at least a 10x speedup in training time.

A: You may be missing some dependencies that Spark requires. See this for a discussion of potential dependency issues. Windows users may need the winutils.exe from Hadoop.

Download winutils.exe from and put it into the null/bin/winutils.exe (or create a hadoop folder and add that to HADOOP_HOME)

If that is the issue, see . In this case replace with "Nd4jCpu".

The Quickstart template is available at .

If you'd like to deploy models to production, you might like our .

Deeplearning4j has several submodules. These range from a visualization UI to distributed training on Spark. For an overview of these modules, please look at the .

To get started with a simple desktop app, you need two things: An and deeplearning4j-core. For more code, see the .

If you want a flexible deep-learning API, there are two ways to go. You can use nd4j standalone See our or the .

If you want distributed training on Spark, you can see our . Keep in mind that we cannot setup Spark for you. If you want to set up distributed Spark and GPUs, that is largely up to you. Deeplearning4j simply deploys as a JAR file on an existing Spark cluster.

If you want Spark with GPUs, we recommend .

If you want to deploy on mobile, you can see our .

We deploy optimized code for various hardware architectures natively. We use C++ based for loops just like everybody else. For that, please see our .

Deeplearning4j is meant to be an end-to-end platform for building real applications, not just a tensor library with automatic differentiation. If you want a tensor library with autodiff, please see ND4J and . Samediff is still in beta, but if you want to contribute, please join our .

Lastly, if you are benchmarking Deeplearnin4j, please consider coming in to our and asking for tips. Deeplearning4j has , but some may not work exactly like the Python frameworks do.

Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j . These include:

We also have an extensive examples repository at .

Talking to the developers on the

If you are unsure about something - ask us on the !

This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository:

These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here:

This can be done at

Go to and follow the instructions.

For Windows command line, similar options are available through a few mechanisms (see )

For details on GPG signing, see

IntelliJ can be used to perform git commits, including through signed commits. See for details.

Each has 10 parameters for manually invoking builds. The reason this is manual is due to the different ways a release can break. Being manual also allows us to re invoke only the parts of a build we need, rather than the whole release pipeline.

releaseVersion: This is the intended release version to be converted to from snapshots. The script is run converting the versions of every module to that specific version intended for release. This is what will get uploaded to a staging repository for release. Otherwise, all intended versions should be SNAPSHOT.

modules: The maven modules to build. This is fairly raw and error prone. The intended usage is with the Typical usage is to skip libnd4j builds with something like:

libnd4jDownload/libnd4jUrl: In tandem with modules, you can specify a libnd4j zip file distribution that was compiled before for download. The builds will download a libnd4j distribution and use that for linking. This can be handy when recompiling the nd4j-native/nd4j-cuda backends for a specific platform without needing to recompile the whole c++ codebase. A url in a matrix build will be sourced from a hard coded file name from - each file name will be updated to point to a zip file distribution appropriate for an individual matrix build. This was done because 1 url is not going to be suitable for individual matrix builds.

Maven profiles for deeplearning4j matter a lot. Especially if you want to run tests. Read more on the test profiles . For most code nd4j-tests-cpu should probably be the main profile you use.

Deeplearning4j uses lombok for its dependencies. Ensure you install lombok for your favorite IDE in order to use the project. Please follow the for setting this up in your IDE.

Once cloned locally, open intellij. Please follow the guide to import from external maven sources.

Note: for now the latest version of eclipse appears to fail upon first import. Any suggestions maybe reported on the .

Once cloned locally, open eclipse. Please follow the guide to import from external maven sources. Importing your project in to eclipse may take a while. Of note is due to the profile sensitive nature of the deeplearning4j suite, there maybe issues when opening and building the project.

In order to run the deeplearning4j tests, many pretrained models and other resources are required. Ensure as a dependency on your classpath. It is a big repository that needs to be mvn clean installed in order to run the tests properly. You can do this by adding -Ptestresources to your test execution when running the tests from maven.

Deeplearning4j uses' junit 5's tags to categorize tests in to different types. All of the tag names used throughout the code base can be found Nd4j-common-tests is included as a dependency for all tests and has a few reusable utilities used throughout the code base for tests. This makes it a great location to put common utilities we want to use throughout the code base. The tag names are mainly there to categorize tests that can take longer or use more resources so we can avoid running those dynamically depending on the size of the machine we are running tests on.

(For those interested in a survey of artificial intelligence.)

(For those interested in image recognition.)

; Patrick van der Smagt

(Vim is an editor accessible from the command line.)

If you want to jump into deep-learning from here without Java, we recommend and the various Python frameworks built atop it, including and .

With that under your belt, we recommend you approach Deeplearning4j through its .

Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups .

While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like and .

A reference for building dl4j from source can be found for every platform in our . For maintenance reasons, we would prefer to have a canonical source of up to date build information for users rather than out of date install instructions in this guide. This guide will contain specific long lived tips for how to interpret the workflows and what to consider when building.

For an overview of the GitHub actions workflows see the

This document will cover the specific components of the build by platform rather than step through what's already in the workflows. If you have suggestions for improving this document, please comment over at

From there, the normal platform specific libraries should be installed before hand. Up to date install instructions can be found in our CPU builds for , and

ARM based builds all link against the by default and, as mentioned above, use the pi_build.sh script for building libnd4j on specific platforms. Note that pi_build.sh can also be used to compile all of dl4j for a specific project.

This will ensure that all library versions are set to the appropriate version. Ensure that the CUDA toolkit you need is installed. If you intend on using CuDNN, ensure that is also installed correctly. For installing CUDA, consider using as a reference if you intend on doing automated installs.

Jetson nano users: please see for successfully compiling deeplearning4j on Jetson nano.

In short: It relies on CUDA 10.0. The for CUDA are also only compiled for arm64 for CUDA 10.0. You can find the supported CUDA versions for CUDA 10.0 If you would like something more up to date, please feel free to contact us over at As of 1.0.0-M1.1 you can also use updated dependencies:

We use msys2 for compiling libnd4j. CUDA requires MSVC in order to be installed in order to properly compile CUDA kernels. If you want to compile libnd4j for CUDA from source, please ensure you first invoke the vcvars.bat script in a cmd terminal, then launch msys2 manually. For more specifics, please see our Windows and build files.

The DL4J/ND4J developers are available on discourse. You can ask questions about benchmarking and performance there:

ND4J can use multiple BLAS implementations - versions up to and including 1.0.0-beta6 have defaulted to OpenBLAS. However, if Intel MKL (free versions are available ) is installed an available, ND4J will link with it for improved performance in many BLAS operations.

Regarding array orders, this also matters for performance. ND4J has the possibility of representing arrays in either row major ('c') or column major ('f') order. See for more details. Performance in operations such as matrix multiplication - but also more general ND4J operations - depends on the input and result array orders.

If you are using CUDA, ensure you are using CuDNN ()

Check the and guides. The defaults are usually good - but sometimes better performance can be obtained with some tweaking. This is especially important if you have a lot of Java objects (such as, Word2Vec vectors) in memory while training.

Not asking the devs (via - we are happy to provide suggestions and investigate if performance isn't where it should be!

Increase the

In , not a program argument. When you hit run in IntelliJ (the green button), that sets up a run-time configuration. IJ starts a Java VM for you with the configurations you specify.

IntelliJ will automatically specify the in question.

Better garbage collection increases throughput. For a more detailed exploration of the issue, please read this .

DL4J is tightly linked to the garbage collector. , the bridge between the JVM and C++, adheres to the heap space you set with Xmx and works extensively with off-heap memory. The off-heap memory will not surpass the amount of heap space you specify.

This is the , in particular the third line:

There are actually two types of asynchronous dataset iterators. The AsyncDataSetIterator is what you would use most of the time. It's described in the .

For special cases such as recurrent neural nets applied to time series, or for computation graphs, you would use a AsyncMultiDataSetIterator, described in the .

In Python, programmers are converting their data into , or binary data objects. And if they're working with a smallish toy dataset, they're loading all those pickles into RAM. So they're effectively sidestepping a major task in dealing with larger datasets. At the same time, when benchmarking against Dl4j, they're not loading all the data onto RAM. So they're effectively comparing Dl4j speed for training computations + ETL against only training computation time for Python frameworks.

Deeplearning4j uses DataVec as it ETL and vectorization library. Unlike other deep-learning tools, DataVec does not force a particular format on your dataset. (Caffe forces you to use , for example.)

Here’s how you .

Here’s how you .

MemoryWorkspaces - see for details

Custom java code: using 3rd party libraries such as and

CSV: The CSV record reader in datavec is fairly good for this if you have a lot of data. The reason is the record readers assume that the data you are using is too large to fit in memory. If you have a smaller dataset that can fit in memory you can look at our . If you have a large amount of CSV data then our example should work well.

Images: The native image loader and image record reader based on javacv handles loading images of any format and are easily converted to labeled image datasets. We have a comprehensive image example .

NLP: The DL4J suite has a core tokenizer api where a user can supply a tokenizer and build an iterator from that. A combination of that interface and something like our allow usage of the latest transformer models. If you are looking for word2vec, then we also have examples for that as well .

Audio: We do have a midi example . Audio should be treated as time series. For your workflow, javacpp (which our ndarray library nd4j supports internally) has . Due to licensing restrictions for the project (basically no gpl code) we can not directly include ffmpeg in the project, but you are welcome to ask questions on the community forums.

Video: Dl4j does not directly support video, but does have 3d convolutional layers for processing video frames. It is suggested to use javacv or ffmpeg mentioned above to process video and convert them in to frames. Please use our for additional support.

An example of that workflow maybe found . If your data may not fit in memory, it maybe worth looking in to our minibatch pipelines and ways of creating your test train splits over minibatches. Our image examples cover . For larger input data like images, it is highly suggested to do minibatch partitioning of your data.

- note that this can also be used to scale to a min and max like for images in this case being between 1 and 255.

Normalizers, like models upcoming can be saved and loaded as part of your pipeline. Models must have their accompanying normalizers even during deployment. An example of serializing normalizers can be found .

Train a model using the higher level dl4j interface. One quick example can be found .

Train a model using samediff: lower level but more flexible. An example can be found .

Keras: The keras h5 format integration is a bit older and uses the higher level dl4j interface. Keras model import for non sequential models use the computation graph. An example can be found . Sequential models can be found .

Tuning a model can be difficult. Our can help navigate this. It uses the deeplearning4j ui to monitor the gradients and ensure that they converge quickly. It is recommended to run the dl4j ui in a separate process to avoid dependency clashes. An example of how to run the UI server in a separate process can be found .

When evaluating models, it is suggested to pair the workflow here with the data set splitting considerations above. Our evaluation API takes in ndarrays and tracks evaluations in bits. An example of the higher level dl4j interface's evaluate call can be shown .

A samediff model also has a similar evaluate call. In samediff, you pass in an evaluation object in to a training configuration. Results for the validation set will be streamed in to this object. An example can be found .

After a model has been built and deployed, usually the next thing users will want to do are setup the environment in which the model will run. One immediate suggestion is to optimize your dependencies. Since the whole deeplearning4j suite heavily relies on javacpp for its underlying dependencies, is recommended reading as next steps for optimizing your binaries.

Helpers: Accelerated libraries for faster platform specific math routines including

Avx: We pre compile our binaries for specific intel cpus including avx2 and avx512. Various classifiers are available for developers which can be found .

For building deployment pipelines, it is recommended to use which is built on the same technology and is usually co released alongside deeplearning4j.

Finally, this page has a short section on

Instructions for configuring CuDNN can be found . In summary, include the deeplearning4j-cuda-x.x dependency (where x.x is your CUDA version - such as 9.2 or 10.0). The network configuration does not need to change to utilize cuDNN - cuDNN simply needs to be available along with the deeplearning4j-cuda module.

One useful way to get more information is to perform profiling, as described in the later in this page. For custom ETL pipelines, adding logging for the various stages can help. Finally, another approach to use a process of elimination - for example, measuring the latency and throughput of reading raw files from disk or from remote storage vs. measuring the time to actually process the data from its raw format.

Java uses garbage collection for management of on-heap memory (see for example for an explanation). Note that DL4J and ND4J use off-heap memory for storage of all INDArrays (see the for details).

If you suspect garbage collection overhead is having an impact on performance, try changing these settings. The main downside to reducing the frequency or disabling periodic GC entirely is when you are not using , though workspaces are enabled by default for all neural networks in Deeplearning4j.

Side note: if you are using DL4J for training on Spark, setting these values on the master/driver will not impact the settings on the worker. Instead, see .

Another useful tool is the -verbose:gc, -XX:+PrintGCDetails -XX:+PrintGCTimeStamps command line options. For more details, see and

These options can be passed to the JVM on launch (when using java -jar or java -cp) or can be added to IDE launch options (for example, in IntelliJ: these should be placed in the “VM Options” field in Run/Debug Configurations - see )

For example, can be used to determine both the frequency and duration of garbage collection - see for more details.

, such as VisualVM can also be used to monitor GC activity.

A number of alternatives for generating heap dumps can be found .

For serving predictions in multi-threaded applications (such as a web server), should be used.

For inference from multiple threads, you should use one model per thread (as this avoids locks) or for serving predictions in multi-threaded applications (such as a web server), use .

For best performance, this value should be left as its default. If 64-bit floating point precision (double precision) is used instead, performance can be significantly reduced, especially on GPUs - most consumer NVIDIA GPUs have very poor double precision performance (and half precision/FP16). On Tesla series cards, double precision performance is usually much better than for consumer (GeForce) cards, though is still usually half or less of the single precision performance. Wikipedia has a summary of the single and double precision performance of NVIDIA GPUs .

For details on workspaces, see the .

For RNNs, the sequence length matters. If you are using sequences longer than a few hundred steps, you should use if possible.

For more details on AVX, see the

In either case, you may see better overall throughput by reducing the number of OpenMP threads by setting the OMP_NUM_THREADS environment variable - see for details.

One reason for reducing OMP_NUM_THREADS improving overall performance is due to reduced .

Multiple options are available for performing profiling locally. We suggest using either or for profiling.

Start your application with the profiler enabled. For details, see and

Note that IDE integrations are available - see

Note that YourKit provides multiple different types of profiling: Sampling, tracing, and call counting. Each type of profiling has different pros and cons, such as accuracy vs. overhead. For more details, see

VisualVM also supports profiling - see the Profiling Applications section of the for more details.

You can also find the available CUDA versions via or in the .

See our page on .

Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA .

ND4J uses the Java in order to detect which backends are available on the class path. Depending on your uberjar packaging configuration, those files might be stripped away or broken.

If your uberjar does not contain that file, or if not all of the configured backends are listed there, you will have to reconfigure your shade plugin. See documentation for how to do that.

A Tokenizer is created and wrapped by a . The default tokens are words separated by spaces. The tokenization process also involves some machine learning to differentiate between ambibuous symbols like . which end sentences and also abbreviate words such as Mr. and vs.

You can also swap the standard CPU implementation for .

You can also swap the standard CPU implementation for .

You can also swap the standard CPU implementation for .

Clojure programmers may want to use or to work with Maven. A .

Here's a full working example of :

A sentence iterator is used in both and .

A few examples include analyzing Tweets and full-blown news articles. The purpose of the is to divide text into processable bits. Note the sentence iterator is input agnostic. So bits of text (a document) can come from a file system, the Twitter API or Hadoop.

Depending on how input is processed, the output of a sentence iterator will then be passed to a for the processing of individual tokens, which are usually words, but could also be ngrams, skipgrams or other units. The tokenizer is created on a per-sentence basis by a . The tokenizer factory is what is passed into a text-processing vectorizer.

Tokenization is the process of breaking text down into individual words. Word windows are also composed of tokens. can output text windows that comprise training examples for input into neural nets, as seen here.

You can also swap the standard CPU implementation for .

provides routines for importing neural network models originally configured and trained using , a popular Python deep learning library.

Once you have imported your model into DL4J, our full production stack is at your disposal. We support import of all Keras model types, most layers and practically all utility functionality. Please check for a complete list of supported Keras features.

Note to users: tf.keras models are also supported. Please check for an overview of what to expect for tf.keras as well as other features. Our documentation needs to be updated to reflect the changes between keras and tf.keras. For now, users should aware of this as you read the below docs. Migrating from keras to tf.keras mainly involves changing the imports in your python script. The equivalent kind of changes needed to happen for the model import in deeplearning4j. Those changes happened in beta7.

To import a Keras model, you need to create and such a model first. Here's a simple example that you can use. The model is a simple MLP that takes mini-batches of vectors of length 100, has two Dense layers and predicts a total of 10 categories. After defining the model, we serialize it in HDF5 format.

This shows only how to import a Keras Sequential model. For more details take a look at both import and import.

That's it! The KerasModelImport is your main entry point to model import and class takes care of mapping Keras to DL4J concepts internally. As user you just have to provide your model file, see our for more details and options to load Keras models into DL4J.

The full example just shown can be found in our .

If you need a project to get started in the first place, consider cloning and follow the instructions in the repository to build the project.

We support import for a growing number of applications, check for a full list of currently covered models. These applications include

You can inquire further by visiting the . You might consider filing a so that this missing functionality can be placed on the DL4J development roadmap or even sending us a pull request with the necessary changes!

You should use this module when the experimentation phase of your project is completed and you need to ship your models to production. commercial support for Keras implementations in enterprise.

We recommend that you join our . There you can request help and give feedback, but please do use this guide before asking questions we've answered below. If you are new to deep learning, we've included with links to courses, readings and other resources.

If you find that you have trouble following along here, take a look at the Konduit blog, as it features .

1.7 or later (Only 64-Bit versions supported)

(automated build and dependency manager)

or Eclipse

If you are new to Java or unfamiliar with these tools, read the details below for help with installation and setup. Otherwise, .

If you don't have Java 1.7 or later, download the current . To check if you have a compatible version of Java installed, use the following command:

Maven is a dependency management and automated build tool for Java projects. It works well with IDEs such as IntelliJ and lets you install DL4J project libraries easily. to the latest release following for your system. To check if you have the most recent version of Maven installed, enter the following:

Maven is widely used among Java developers and it's pretty much mandatory for working with DL4J. If you come from a different background, and Maven is new to you, check out and our , which includes some additional troubleshooting tips. such as Ivy and Gradle can also work, but we support Maven best.

An Integrated Development Environment () allows you to work with our API and configure neural networks in a few steps. We strongly recommend using , which communicates with Maven to handle dependencies. The is free.

There are other popular IDEs such as and . However, IntelliJ is preferred, and using it will make finding help on the easier if you need it.

Install the . If you already have Git, you can update to the latest version using Git itself:

Every Maven project has a POM file. Here is when you run your examples.

Within IntelliJ, you will need to choose the first Deeplearning4j example you're going to run. We suggest MLPClassifierLinear, as you will almost immediately see the network classify two groups of data in our UI. The file on .

Join our community forums on .

Read the .

Check out the more detailed .

Python folks: If you plan to run benchmarks on Deeplearning4j comparing it to well-known Python framework [x], please read on how to optimize heap space, garbage collection and ETL on the JVM. By following them, you will see at least a 10x speedup in training time.

A: You may be missing some dependencies that Spark requires. See this for a discussion of potential dependency issues. Windows users may need the winutils.exe from Hadoop.

Download winutils.exe from and put it into the null/bin/winutils.exe (or create a hadoop folder and add that to HADOOP_HOME)

If that is the issue, see . In this case replace with "Nd4jCpu".

The Quickstart template is available at .

For examples on how this was done, take a look at and which are custom layers that were needed to be able to import GoogLeNet.

A full custom layer example is available in our

javacpp
Github actions overview libnd4jUrl parameter
official javacpp presets
javacpp maven plugin
docs from javacpp
here
openblas
example
here
community forum
a road map for beginners
core workflow guide
some getting started guides from the community
https://github.com/eclipse/deeplearning4j-examples/tree/master/nd4j-ndarray-examples
https://github.com/eclipse/deeplearning4j-examples/tree/master/samediff-examples
https://github.com/eclipse/deeplearning4j-examples/tree/master/tensorflow-keras-import-examples
https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-distributed-training-examples
https://github.com/eclipse/deeplearning4j/tree/master/libnd4j
examples
Java (developer version)
Apache Maven
IntelliJ IDEA
Git
Java
Java Development Kit (JDK) here
Apache Maven
Install or update Maven
their instructions
Apache's Maven overview
introduction to Maven for non-Java programmers
Other build tools
Paul Dubs' guide to maven
Maven In Five Minutes
IntelliJ IDEA
IDE
IntelliJ
community edition of IntelliJ
Eclipse
Netbeans
community forums
Git
latest version of Git
DL4J Examples in a Few Easy Steps
how the POM file should appear
Github can be found here
community.konduit.ai
introduction to deep neural networks
Comprehensive Setup Guide
these instructions
Deeplearning4j artifacts on Maven Central
ND4J artifacts on Maven Central
Datavec artifacts on Maven Central
Scala code for UCI notebook
Stack Overflow discussion
https://github.com/steveloughran/winutils
this page
https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template
model import from Keras
Deeplearning4j examples on Github
nd4j backend
simpler examples submodule
nd4j examples
computation graph API
Spark page
Spark with Mesos
Android page
C++ framework libnd4j
Arbiter: hyperparameter optimization and model evaluation
DataVec: built-in ETL for machine-learning data pipelines
Samediff
community forum
community forum
all the knobs
monorepo
dl4j-examples
https://github.com/eclipse/deeplearning4j/issues
https://github.com/eclipse/deeplearning4j-examples/issues
community forums
community forums
https://github.com/eclipse/deeplearning4j
https://projects.eclipse.org/
https://accounts.eclipse.org/user/register
https://accounts.eclipse.org/user/eca
here
this link
this page
github actions workflow
update-versions.sh
-pl/--projects flag
this repo
baeldung guide
here
community forums
here
dl4j test resources
here
Andrew Ng's Machine-Learning Class on YouTube
Geoff Hinton's Neural Networks Class on YouTube
Patrick Winston's Introduction to Artificial Intelligence @MIT
Andrej Karpathy's Convolutional Neural Networks Class at Stanford
ML@B: Machine Learning Crash Course: Part 1
ML@B: Machine Learning Crash Course: Part 2
Gradient descent, how neural networks learn, Deep learning, part 2
Calculus Made Easy, by Silvanus P. Thompson
Seeing Theory: A Visual Introduction to Probability and Statistics
Andrew Ng's 6-Part Review of Linear Algebra
Khan Academy's Linear Algebra Course
Linear Algebra for Machine Learning
CMU's Linear Algebra Review
Math for Machine Learning
Immersive Linear Algebra
Probability Cheatsheet
The best linear algebra books
Markov Chains, Visually Explained
An Introduction to MCMC for Machine Learning
Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
Markov Chain Monte Carlo (MCMC) & Machine Learning
Relearning Matrices as Linear Functions
Scratch: A Visual Programming Environment From MIT
Learn to Program (Ruby)
Grasshopper: A Mobile App to Learn Basic Coding (Javascript)
Intro to the Command Line
Additional command-line tutorial
A Vim Tutorial and Primer
Intro to Computer Science (CS50 @Harvard edX)
A Gentle Introduction to Machine Fundamentals
Teaching C
Theano
Keras
Lasagne
Learn Python the Hard Way
Google's Python Class
Udemy: Complete Python 3 Masterclass Journey
MIT: Introduction to Computer Science and Python Programming
David Beazley: Python Tutorials
CS231n: Python Numpy Tutorial
Pyret: A Python Learning Environment
Think Java: Interactive Web-based Dev Environment
Learn Java The Hard Way
Introduction to JShell
JShell in 5 Minutes
Java Resources
Java Ranch: A Community for Java Beginners
Intro to Programming in Java @Princeton
Head First Java
Java in a Nutshell
Java Programming for Complete Beginners in 250 Steps
examples
Quickstart
here
Stackoverflow
Math Stackexchange
workflows
overview doc
the community forums
Windows
Mac
Linux
armcompute library
our install scripts
this thread
JavaCPP presets
here
our forums
CUDA 11
11.2
https://community.konduit.ai/c/dl4j
here
this Wikipedia page
link
Workspaces
Memory
Discourse
heap space
IntelliJ, this is a VM parameter
Java main class
InfoQ article
JavaCPP
relevant code
Javadoc here
Javadoc here
pickles
hdf5
pre-save datasets
load a pre-saved dataset
DL4J Examples
here
-Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=8G -Dorg.bytedeco.javacpp.maxphysicalbytes=10G
WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
                .initialSize(1000000000)
                .policyLocation(LocationPolicy.MMAP)
                .build();

try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) {
    INDArray x = Nd4j.create(10000);
}
WorkspaceConfiguration basicConfig = WorkspaceConfiguration.builder()
    .policyAllocation(AllocationPolicy.STRICT)
    .policyLearning(LearningPolicy.FIRST_LOOP)
    .policyMirroring(MirroringPolicy.HOST_ONLY) // <--- this option does this trick
    .policySpill(SpillPolicy.EXTERNAL)
    .build();
o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for NativeOps: 8
o.n.n.Nd4jBlas - Number of threads used for BLAS: 8
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]
13:08:09,042 INFO  ~ Loaded [JCublasBackend] backend
13:08:13,061 INFO  ~ Number of threads used for NativeOps: 32
13:08:14,265 INFO  ~ Number of threads used for BLAS: 0
13:08:14,274 INFO  ~ Backend used: [CUDA]; OS: [Windows 10]
13:08:14,274 INFO  ~ Cores: [16]; Memory: [7.1GB];
13:08:14,274 INFO  ~ Blas vendor: [CUBLAS]
13:08:14,274 INFO  ~ Device Name: [TITAN X (Pascal)]; CC: [6.1]; Total/free memory: [12884901888]
o.d.n.l.c.ConvolutionLayer - cuDNN not found: use cuDNN for better GPU performance by including the deeplearning4j-cuda module. For more information, please refer to: https://deeplearning4j.org/cudnn
java.lang.ClassNotFoundException: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
MultiLayerNetwork net = ...
LayerHelper h = net.getLayer(0).getHelper();    //Index 0: assume layer 0 is a ConvolutionLayer in this example
System.out.println("Layer helper: " + (h == null ? null : h.getClass().getName()));
Layer helper: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
Layer helper: null
MultiLayerNetwork net = ...
net.setListeners(new PerformanceListener(1));       //Logs ETL and iteration speed on each iteration
.d.o.l.PerformanceListener - ETL: 0 ms; iteration 16; iteration time: 65 ms; samples/sec: 492.308; batches/sec: 15.384;
Nd4j.getMemoryManager().setAutoGcWindow(10000);             //Set to 10 seconds (10000ms) between System.gc() calls
Nd4j.getMemoryManager().togglePeriodicGc(false);            //Disable periodic GC calls
int listenerFrequency = 1;
boolean reportScore = true;
boolean reportGC = true;
net.setListeners(new PerformanceListener(listenerFrequency, reportScore, reportGC));
o.d.o.l.PerformanceListener - ETL: 0 ms; iteration 30; iteration time: 17 ms; samples/sec: 588.235; batches/sec: 58.824; score: 0.7229335801186025; GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)];
5.938: [GC (System.gc()) [PSYoungGen: 5578K->96K(153088K)] 9499K->4016K(502784K), 0.0006252 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
5.939: [Full GC (System.gc()) [PSYoungGen: 96K->0K(153088K)] [ParOldGen: 3920K->3911K(349696K)] 4016K->3911K(502784K), [Metaspace: 22598K->22598K(1069056K)], 0.0117132 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
System.out.println("ND4J Data Type Setting: " + Nd4j.dataType());
System.out.println("Training workspace config: " + net.getLayerWiseConfigurations().getTrainingWorkspaceMode());
System.out.println("Inference workspace config: " + net.getLayerWiseConfigurations().getInferenceWorkspaceMode());
System.out.println("Training workspace config: " + cg.getConfiguration().getTrainingWorkspaceMode());
System.out.println("Inference workspace config: " + cg.getConfiguration().getInferenceWorkspaceMode());
spark-submit
    --conf 'spark.executor.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/executor/,tracing_settings_path=/home/user/yourkitconf.txt'
    --conf 'spark.driver.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/driver/,tracing_settings_path=/home/user/yourkitconf.txt'
    <other spark submit arguments>
walltime=*
adaptive=true
adaptive_min_method_invocation_count=1000
adaptive_max_average_method_time_ns=100000
<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-cuda-11.2</artifactId>
 <version>1.0.0-M1.1</version>
</dependency>
<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-native</artifactId>
 <version>1.0.0-M1.1</version>
</dependency>
<dependency>
 ...
 <artifactId>nd4j-native-platform</artifactId>
 ...
</dependency>
BACKEND_PRIORITY_CPU=SOME_NUM
BACKEND_PRIORITY_GPU=SOME_NUM
 org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
    at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:221)
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5091)
    ... 2 more
// Gets Path to Text file
String filePath = new File(dataLocalPath,"raw_sentences.txt").getAbsolutePath();
// Strip white space before and after for each line
SentenceIterator iter = new BasicLineIterator(filePath);
 public static void main(String[] args) throws Exception {

        dataLocalPath = DownloaderUtility.NLPDATA.Download();
        // Gets Path to Text file
        String filePath = new File(dataLocalPath,"raw_sentences.txt").getAbsolutePath();

        log.info("Load & Vectorize Sentences....");
        // Strip white space before and after for each line
        SentenceIterator iter = new BasicLineIterator(filePath);
        // Split on white spaces in the line to get words
        TokenizerFactory t = new DefaultTokenizerFactory();

        /*
            CommonPreprocessor will apply the following regex to each token: [\d\.:,"'\(\)\[\]|/?!;]+
            So, effectively all numbers, punctuation symbols and some special symbols are stripped off.
            Additionally it forces lower case for all tokens.
         */
        t.setTokenPreProcessor(new CommonPreprocessor());
package org.deeplearning4j.examples.nlp.word2vec;

import org.deeplearning4j.examples.download.DownloaderUtility;
import org.deeplearning4j.models.word2vec.Word2Vec;
import org.deeplearning4j.text.sentenceiterator.BasicLineIterator;
import org.deeplearning4j.text.sentenceiterator.SentenceIterator;
import org.deeplearning4j.text.tokenization.tokenizer.preprocessor.CommonPreprocessor;
import org.deeplearning4j.text.tokenization.tokenizerfactory.DefaultTokenizerFactory;
import org.deeplearning4j.text.tokenization.tokenizerfactory.TokenizerFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;
import java.util.Collection;

/**
 * Created by agibsonccc on 10/9/14.
 *
 * Neural net that processes text into wordvectors. See below url for an in-depth explanation.
 * https://deeplearning4j.org/word2vec.html
 */
public class Word2VecRawTextExample {

    private static Logger log = LoggerFactory.getLogger(Word2VecRawTextExample.class);

    public static String dataLocalPath;


    public static void main(String[] args) throws Exception {

        dataLocalPath = DownloaderUtility.NLPDATA.Download();
        // Gets Path to Text file
        String filePath = new File(dataLocalPath,"raw_sentences.txt").getAbsolutePath();

        log.info("Load & Vectorize Sentences....");
        // Strip white space before and after for each line
        SentenceIterator iter = new BasicLineIterator(filePath);
        // Split on white spaces in the line to get words
        TokenizerFactory t = new DefaultTokenizerFactory();

        /*
            CommonPreprocessor will apply the following regex to each token: [\d\.:,"'\(\)\[\]|/?!;]+
            So, effectively all numbers, punctuation symbols and some special symbols are stripped off.
            Additionally it forces lower case for all tokens.
         */
        t.setTokenPreProcessor(new CommonPreprocessor());

        log.info("Building model....");
        Word2Vec vec = new Word2Vec.Builder()
                .minWordFrequency(5)
                .iterations(1)
                .layerSize(100)
                .seed(42)
                .windowSize(5)
                .iterate(iter)
                .tokenizerFactory(t)
                .build();

        log.info("Fitting Word2Vec model....");
        vec.fit();

        log.info("Writing word vectors to text file....");

        // Prints out the closest 10 words to "day". An example on what to do with these Word Vectors.
        log.info("Closest Words:");
        Collection<String> lst = vec.wordsNearestSum("day", 10);
        log.info("10 Words closest to 'day': {}", lst);
    }
}
implementation "org.deeplearning4j:deeplearning4j-core:1.0.0-M1"
implementation "org.nd4j:nd4j-native-platform:1.0.0-M1"
libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-M1"
libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "1.0.0-M1"
<dependency org="org.deeplearning4j" name="deeplearning4j-core" rev="1.0.0-M1" conf="build" />
<dependency org="org.nd4j" name="nd4j-native-platform" rev="1.0.0-M1" conf="build" />
    .labels(Arrays.asList("negative", "neutral","positive"))
public void testDifferentLabels() throws Exception {
    ClassPathResource resource = new ClassPathResource("/labeled");
    File file = resource.getFile();
    LabelAwareSentenceIterator iter = LabelAwareUimaSentenceIterator.createWithPath(file.getAbsolutePath());

    TokenizerFactory t = new UimaTokenizerFactory();

    ParagraphVectors vec = new ParagraphVectors.Builder()
            .minWordFrequency(1).labels(Arrays.asList("negative", "neutral","positive"))
            .layerSize(100)
            .stopWords(new ArrayList<String>())
            .windowSize(5).iterate(iter).tokenizerFactory(t).build();

    vec.fit();

    assertNotEquals(vec.lookupTable().vector("UNK"), vec.lookupTable().vector("negative"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("positive"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("neutral"));}
SentenceIterator iter = new LineSentenceIterator(new File("your file"));
Collection<String> sentences = ...;
SentenceIterator iter = new CollectionSentenceIterator(sentences);
SentenceIterator iter = new FileSentenceIterator(new File("your dir or file"));
TokenizerFactory tokenizerFactory = new DefaultTokenizerFactory();
Tokenizer tokenizer = tokenizerFactory.tokenize("mystring");

//iterate over the tokens
while(tokenizer.hasMoreTokens()) {
      String token = tokenizer.nextToken();
}

//get the whole list of tokens
List<String> tokens = tokenizer.getTokens();
addToken(new VocabWord(1.0,"myword"));
addWordToIndex(0, Word2Vec.UNK);
putVocabWord(Word2Vec.UNK);
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])
model.save('full_model.h5')  # save everything in HDF5 format

model_json = model.to_json()  # save just the config. replace with "to_yaml" for YAML serialization
with open("model_config.json", "w") as f:
    f.write(model_json)

model.save_weights('model_weights.h5') # save just the weights.
String fullModel = new ClassPathResource("full_model.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel);
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(fullModel, false);
String modelJson = new ClassPathResource("model_config.json").getFile().getPath();
MultiLayerNetworkConfiguration modelConfig = KerasModelImport.importKerasSequentialConfiguration(modelJson)
String modelWeights = new ClassPathResource("model_weights.h5").getFile().getPath();
MultiLayerNetwork network = KerasModelImport.importKerasSequentialModelAndWeights(modelJson, modelWeights)
<dependencies>
  <dependency>
      <groupId>org.deeplearning4j</groupId>
      <artifactId>deeplearning4j-core</artifactId>
      <version>1.0.0-M1.1</version>
  </dependency>
</dependencies>
<dependencies>
  <dependency>
      <groupId>org.nd4j</groupId>
      <artifactId>nd4j-native-platform</artifactId>
      <version>1.0.0-M1.1</version>
  </dependency>
</dependencies>
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

model.save('simple_mlp.h5')
String simpleMlp = new ClassPathResource("simple_mlp.h5").getFile().getPath();
MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp);
INDArray input = Nd4j.create(DataType.FLOAT, 256, 100);
INDArray output = model.output(input);
model.fit(input, output);
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-modelimport</artifactId>
    <version>1.0.0-beta6</version> // This version should match that of your other DL4J project dependencies.
</dependency>
public static ComputationGraph importKerasModelAndWeights( InputStream modelHdf5Stream, boolean enforceTrainingConfig)
            throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(InputStream modelHdf5Stream) throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(InputStream modelHdf5Stream,
                                                                         boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(InputStream modelHdf5Stream)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(String modelHdf5Filename, int[] inputShape,
                                                              boolean enforceTrainingConfig)
            throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(String modelHdf5Filename, boolean enforceTrainingConfig)
            throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(String modelHdf5Filename)
            throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(String modelHdf5Filename,
                                                                         int[] inputShape,
                                                                         boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(String modelHdf5Filename,
                                                                         boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(String modelHdf5Filename)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(String modelJsonFilename, String weightsHdf5Filename,
                                                              boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static ComputationGraph importKerasModelAndWeights(String modelJsonFilename, String weightsHdf5Filename)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(String modelJsonFilename,
                                                                         String weightsHdf5Filename,
                                                                         boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerNetwork importKerasSequentialModelAndWeights(String modelJsonFilename,
                                                                         String weightsHdf5Filename)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static ComputationGraphConfiguration importKerasModelConfiguration(String modelJsonFilename,
                                                                              boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static ComputationGraphConfiguration importKerasModelConfiguration(String modelJsonFilename)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerConfiguration importKerasSequentialConfiguration(String modelJsonFilename,
                                                                             boolean enforceTrainingConfig)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public static MultiLayerConfiguration importKerasSequentialConfiguration(String modelJsonFilename)
            throws IOException, InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .weightInit(WeightInit.XAVIER)
        .activation(Activation.RELU)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(new Sgd(0.05))
        // ... other hyperparameters
        .list()
        .backprop(true)
        .build();
        .layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
                .build())
java -version
mvn --version
brew install maven
$ git clone git://git.kernel.org/pub/scm/git/git.git
xcode-select --install
git clone https://github.com/eclipse/deeplearning4j-examples.git
cd dl4j-examples/
mvn clean install
-Djava.library.path=""
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:624)
at org.deeplearning4j.examples.feedforward.anomalydetection.MNISTAnomalyExample.main(MNISTAnomalyExample.java:46)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5556)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:189)
... 2 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:259)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5553)
... 3 more
org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException:
No SameDiff Lambda layer found for Lambda layer lambda_123. You can register a SameDiff Lambda layer using 
KerasLayer.registerLambdaLayer(lambdaLayerName, sameDiffLambdaLayer);
org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: 
Unsupported keras layer type LayerName.
public class TimesThreeLambda extends SameDiffLambdaLayer {
    @Override
    public SDVariable defineLayer(SameDiff sd, SDVariable x) { 
        return x.mul(3); 
    }

    @Override
    public InputType getOutputType(int layerIndex, InputType inputType) {
        return inputType; 
    }
}
KerasLayer.registerLambdaLayer("lambda_2", new TimesThreeLambda());
KerasLayer.registerCustomLayer("PoolHelper", KerasPoolHelper.class);

Functional Models

Importing the functional model.

Getting started with importing Keras functional Models

Let's say you start with defining a simple MLP using Keras' functional API:

from keras.models import Model
from keras.layers import Dense, Input

inputs = Input(shape=(100,))
x = Dense(64, activation='relu')(inputs)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(loss='categorical_crossentropy',optimizer='sgd', metrics=['accuracy'])

In Keras there are several ways to save a model. You can store the whole model (model definition, weights and training configuration) as HDF5 file, just the model configuration (as JSON or YAML file) or just the weights (as HDF5 file). Here's how you do each:

model.save('full_model.h5')  # save everything in HDF5 format

model_json = model.to_json()  # save just the config. replace with "to_yaml" for YAML serialization
with open("model_config.json", "w") as f:
    f.write(model_json)

model.save_weights('model_weights.h5') # save just the weights.

If you decide to save the full model, you will have access to the training configuration of the model, otherwise you don't. So if you want to further train your model in DL4J after import, keep that in mind and use model.save(...) to persist your model.

Loading your Keras model

Let's start with the recommended way, loading the full model back into DL4J (we assume it's on your class path):

String fullModel = new ClassPathResource("full_model.h5").getFile().getPath();
ComputationGraph model = KerasModelImport.importKerasModelAndWeights(fullModel);

In case you didn't compile your Keras model, it will not come with a training configuration. In that case you need to explicitly tell model import to ignore training configuration by setting the enforceTrainingConfig flag to false like this:

ComputationGraph model = KerasModelImport.importKerasModelAndWeights(fullModel, false);

To load just the model configuration from JSON, you use KerasModelImport as follows:

String modelJson = new ClassPathResource("model_config.json").getFile().getPath();
ComputationGraphConfiguration modelConfig = KerasModelImport.importKerasModelConfiguration(modelJson)

If additionally you also want to load the model weights with the configuration, here's what you do:

String modelWeights = new ClassPathResource("model_weights.h5").getFile().getPath();
MultiLayerNetwork network = KerasModelImport.importKerasModelAndWeights(modelJson, modelWeights)

In the latter two cases no training configuration will be read.

Workspaces guide
tablesaw
javacv
tablesaw example
here
here
BERT iterator
here
here
ffmpeg bindings
forums
here
this
Zero mean unit variance
Scale zero to 1
here
here
here
here
here
tuning guide
here
here
here
onednn, armcompute, and cudnn.
here
konduit-serving
Debugging Performance Issues with JVM Profiling
here
profiling section
this link
memory page
workspaces
this guide
Oracle Command Line Options
Oracle GC Portal Documentation
Setting Configuration Options
YourKit Java Profiler
Garbage collection telemetry
Other tools
here
ParallelInference
ParallelInference
here
workspaces page
truncated backpropgation through time
Wikipedia AVX article
ND4JEnvironmentVars
cache thrashing
YourKit Java Profiler
VisualVM
Running applications with the profiler
Local profiling
IDE integration
Sampling, tracing, call counting
VisualVM documentation
Maven Central search
Release Notes
CuDNN
website
ServiceLoader
ServicesResourceTransformer
TokenizerFactory
GPUs
GPUs
GPUs
Leiningen
Boot
Leiningen tutorial is here
classification with paragraph vectors
Distributed Representations of Sentences and Documents
Word2vec: A Tutorial
Word2vec
Bag of Words
sentence iterator
tokenizer
tokenizer factory
Word2Vec
Keras model import
Keras
serialize
Functional Model
Sequential Model
DL4J examples
DL4J examples
here
community forums
feature request via Github
Konduit
[source]
community forum
a road map for beginners
some getting started guides from the community
Java (developer version)
Apache Maven
IntelliJ IDEA
Git
Java
Java Development Kit (JDK) here
Apache Maven
Install or update Maven
their instructions
Apache's Maven overview
introduction to Maven for non-Java programmers
Other build tools
Paul Dubs' guide to maven
Maven In Five Minutes
IntelliJ IDEA
IDE
IntelliJ
community edition of IntelliJ
Eclipse
Netbeans
community forums
Git
latest version of Git
DL4J Examples in a Few Easy Steps
how the POM file should appear
Github can be found here
community.konduit.ai
introduction to deep neural networks
Comprehensive Setup Guide
these instructions
Deeplearning4j artifacts on Maven Central
ND4J artifacts on Maven Central
Datavec artifacts on Maven Central
Scala code for UCI notebook
Stack Overflow discussion
https://github.com/steveloughran/winutils
this page
https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template
KerasLRN
KerasPoolHelper
examples repository.
this guide
Getting started guide
skip to DL4J Examples
here
here

Optimizers

Supported Keras optimizers

All standard Keras optimizers are supported, but importing custom TensorFlow optimizers won't work:

  • SGD

  • RMSprop

  • Adagrad

  • Adadelta

  • Adam

  • Adamax

  • Nadam

  • TFOptimizer

Snapshots

Using daily builds for access to latest Eclipse Deeplearning4j features.

Contents

  • Configuration of ND4J Backend

We provide automated daily builds of repositories such as ND4J, DataVec, DeepLearning4j, RL4J etc. So all the newest functionality and most recent bug fixes are released daily.

Snapshots work like any other Maven dependency. The only difference is that they are served from a custom repository rather than from Maven Central.

Due to ongoing development, snapshots should be considered less stable than releases: breaking changes or bugs can in principle be introduced at any point during the course of normal development. Typically, releases (not snapshots) should be used when possible, unless a bug fix or new feature is required.

Step 1: To use snapshots in your project, you should add snapshot repository information like this to your pom.xml file:

<repositories>
    <repository>
        <id>snapshots-repo</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>  <!-- Optional, update daily -->
        </snapshots>
    </repository>
</repositories>

If using properties like the DL4J examples, change: From version:

<dl4j.version>1.0.0-beta6</dl4j.version>
<nd4j.version>1.0.0-beta6</nd4j.version>

To version:

<dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
<nd4j.version>1.0.0-SNAPSHOT</nd4j.version>

Sample pom.xml using Snapshots

Both -platform (all operating systems) and single OS (non-platform) snapshot dependencies are released. Due to the multi-platform build nature of snapshots, it is possible (though rare) for the -platform artifacts to temporarily get out of sync, which can cause build issues.

If you are building and deploying on just one platform, it is safter use the non-platform artifacts, such as:

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${nd4j.version}</version>
        </dependency>

Two commands that might be useful when using snapshot dependencies in Maven is as follows: 1. -U - for example, in mvn package -U. This -U option forces Maven to check (and if necessary, download) of new snapshot releases. This can be useful if you need the be sure you have the absolute latest snapshot release. 2. -nsu - for example, in mvn package -nsu. This -nsu option stops Maven from checking for snapshot releases. Note however your build will only succeed with this option if you have some snapshot dependencies already downloaded into your local Maven cache (.m2 directory)

An alternative approach to (1) is to set <updatePolicy>always</updatePolicy> in the <repositories> section found earlier in this page. An alternative approach to (2) is to set <updatePolicy>never</updatePolicy> in the <repositories> section found earlier in this page.

Snapshots will not work with Gradle. You must use Maven to download the files. After that, you may try using your local Maven repository with mavenLocal().

In order to download specific snapshot artifacts into your local Maven repository, you can run the following Maven command.

mvn dependency:get -DremoteRepositories=snapshots::::https://oss.sonatype.org/content/repositories/snapshots -Dartifact=org.nd4j:nd4j-native:1.0.0-SNAPSHOT:jar:macos-x86_64

In this example, it will download the nd4j-native (CPU backend) artifact for macOS. If you are on Windows or Linux, you'd use windows-x86_64 or linux-x86_64 respectively.

version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
    mavenCentral()
}

dependencies {
    compile group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-SNAPSHOT'
    compile group: 'org.deeplearning4j', name: 'deeplearning4j-modelimport', version: '1.0.0-SNAPSHOT'
    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT"
    // Use windows-x86_64 or linux-x86_64 if you are not on macos
    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT:macosx-x86_64"
    testCompile group: 'junit', name: 'junit', version: '4.12'

}

Convolutional Layers

KerasConvolution2D

Imports a 2D Convolution layer from Keras.

KerasConvolution2D

public KerasConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getConvolution2DLayer

public ConvolutionLayer getConvolution2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping2D

Imports a Keras Cropping 2D layer.

KerasCropping2D

public KerasCropping2D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping2DLayer

public Cropping2D getCropping2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasUpsampling3D

Keras Upsampling3D layer support

KerasUpsampling3D

public KerasUpsampling3D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling3DLayer

public Upsampling3D getUpsampling3DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolution1D

Imports a 1D Convolution layer from Keras.

KerasConvolution1D

public KerasConvolution1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException

getConvolution1DLayer

public Convolution1DLayer getConvolution1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException

  • throws UnsupportedKerasConfigurationException

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

  • param weights Map from parameter name to INDArray.

KerasUpsampling1D

Keras Upsampling1D layer support

KerasUpsampling1D

public KerasUpsampling1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling1DLayer

public Upsampling1D getUpsampling1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasAtrousConvolution2D

Keras 1D atrous / dilated convolution layer. Note that in keras 2 this layer has been removed and dilations are now available through the “dilated” argument in regular Conv1D layers

author: Max Pumperla

KerasAtrousConvolution2D

public KerasAtrousConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getAtrousConvolution2D

public ConvolutionLayer getAtrousConvolution2D()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasAtrousConvolution1D

Keras 1D atrous / dilated convolution layer. Note that in keras 2 this layer has been removed and dilations are now available through the “dilated” argument in regular Conv1D layers

author: Max Pumperla

KerasAtrousConvolution1D

public KerasAtrousConvolution1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getAtrousConvolution1D

public Convolution1DLayer getAtrousConvolution1D()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping3D

Imports a Keras Cropping 3D layer.

KerasCropping3D

public KerasCropping3D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping3DLayer

public Cropping3D getCropping3DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding2D

Imports a Keras ZeroPadding 2D layer.

KerasZeroPadding2D

public KerasZeroPadding2D(Map<String, Object> layerConfig)
                    throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding2DLayer

public ZeroPaddingLayer getZeroPadding2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolution3D

Imports a 3D Convolution layer from Keras.

KerasConvolution3D

public KerasConvolution3D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getConvolution3DLayer

public ConvolutionLayer getConvolution3DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasDeconvolution2D

Imports a 2D Deconvolution layer from Keras.

KerasDeconvolution2D

public KerasDeconvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getDeconvolution2DLayer

public Deconvolution2D getDeconvolution2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding3D

Imports a Keras ZeroPadding 3D layer.

KerasZeroPadding3D

public KerasZeroPadding3D(Map<String, Object> layerConfig)
                    throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding3DLayer

public ZeroPadding3DLayer getZeroPadding3DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasConvolutionUtils

Utility functionality for Keras convolution layers.

getConvolutionModeFromConfig

public static ConvolutionMode getConvolutionModeFromConfig(Map<String, Object> layerConfig,
                                                               KerasLayerConfiguration conf)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Get (convolution) stride from Keras layer configuration.

  • param layerConfig dictionary containing Keras layer configuration

  • return Strides array from Keras configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasZeroPadding1D

Imports a Keras ZeroPadding 1D layer.

KerasZeroPadding1D

public KerasZeroPadding1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getZeroPadding1DLayer

public ZeroPadding1DLayer getZeroPadding1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasCropping1D

Imports a Keras Cropping 1D layer.

KerasCropping1D

public KerasCropping1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getCropping1DLayer

public Cropping1D getCropping1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasSpaceToDepth

Constructor from parsed Keras layer configuration dictionary.

KerasSpaceToDepth

public KerasSpaceToDepth(Map<String, Object> layerConfig, boolean enforceTrainingConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getSpaceToDepthLayer

public SpaceToDepthLayer getSpaceToDepthLayer()

Get DL4J SpaceToDepth layer.

  • return SpaceToDepth layer

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasUpsampling2D

Keras Upsampling2D layer support

KerasUpsampling2D

public KerasUpsampling2D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration exception

getUpsampling2DLayer

public Upsampling2D getUpsampling2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • throws UnsupportedKerasConfigurationException Invalid Keras configuration exception

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasSeparableConvolution2D

Keras separable convolution 2D layer support

KerasSeparableConvolution2D

public KerasSeparableConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras configuration

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration

getSeparableConvolution2DLayer

public SeparableConvolution2D getSeparableConvolution2DLayer()

Get DL4J SeparableConvolution2D.

  • return SeparableConvolution2D

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasDepthwiseConvolution2D

Keras depth-wise convolution 2D layer support

KerasDepthwiseConvolution2D

public KerasDepthwiseConvolution2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras configuration

  • throws UnsupportedKerasConfigurationException Unsupported Keras configuration

getDepthwiseConvolution2DLayer

public DepthwiseConvolution2D getDepthwiseConvolution2DLayer()

Get DL4J DepthwiseConvolution2D.

  • return DepthwiseConvolution2D

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

Local Layers

KerasLocallyConnected1D

Imports a 1D locally connected layer from Keras.

KerasLocallyConnected1D

public KerasLocallyConnected1D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected1DLayer

public LocallyConnected1D getLocallyConnected1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for 1D locally connected layer.

  • param weights Map from parameter name to INDArray.

KerasLocallyConnected2D

Imports a 2D locally connected layer from Keras.

KerasLocallyConnected2D

public KerasLocallyConnected2D(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getLocallyConnected2DLayer

public LocallyConnected2D getLocallyConnected2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for 2D locally connected layer.

  • param weights Map from parameter name to INDArray.

Core Layers

KerasPermute

Imports Permute layer from Keras

KerasPermute

public KerasPermute(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

isInputPreProcessor

public boolean isInputPreProcessor()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws
            InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras config

  • see InputPreProcessor

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasFlatten

Imports a Keras Flatten layer as a DL4J {Cnn,Rnn}ToFeedForwardInputPreProcessor.

KerasFlatten

public KerasFlatten(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

isInputPreProcessor

public boolean isInputPreProcessor()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras config

  • see org.deeplearning4j.nn.conf.InputPreProcessor

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasReshape

Imports Reshape layer from Keras

KerasReshape

public KerasReshape(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

isInputPreProcessor

public boolean isInputPreProcessor()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras config

  • see org.deeplearning4j.nn.conf.InputPreProcessor

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasMerge

Imports a Keras Merge layer as a DL4J Merge (graph) vertex.

TODO: handle axes arguments that alter merge behavior (requires changes to DL4J?)

KerasMerge

public KerasMerge(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType)

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

KerasDropout

Imports a Dropout layer from Keras.

KerasDropout

public KerasDropout(Map<String, Object> layerConfig)
                    throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getDropoutLayer

public DropoutLayer getDropoutLayer()

Get DL4J DropoutLayer.

  • return DropoutLayer

KerasMasking

Imports Keras masking layers.

KerasMasking

public KerasMasking(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getMaskingLayer

public MaskZeroLayer getMaskingLayer()

Get DL4J MaskZeroLayer.

  • return MaskZeroLayer

KerasSpatialDropout

Keras wrapper for DL4J dropout layer with SpatialDropout, works 1D-3D.

KerasSpatialDropout

public KerasSpatialDropout(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSpatialDropoutLayer

public DropoutLayer getSpatialDropoutLayer()

Get DL4J DropoutLayer with spatial dropout.

  • return DropoutLayer

KerasLambda

Wraps a DL4J SameDiffLambda into a KerasLayer

KerasLambda

public KerasLambda(Map<String, Object> layerConfig, SameDiffLayer sameDiffLayer)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSameDiffLayer

public SameDiffLayer getSameDiffLayer()

Get DL4J SameDiffLayer.

  • return SameDiffLayer

KerasActivation

Imports an Activation layer from Keras.

KerasActivation

public KerasActivation(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getActivationLayer

public ActivationLayer getActivationLayer()

Get DL4J ActivationLayer.

  • return ActivationLayer

KerasDense

Imports a Dense layer from Keras.

KerasDense

public KerasDense(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getDenseLayer

public DenseLayer getDenseLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

  • return number of trainable parameters (2)

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

  • param weights Dense layer weights

KerasRepeatVector

Imports a Keras RepeatVector layer

KerasRepeatVector

public KerasRepeatVector(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getRepeatVectorLayer

public RepeatVector getRepeatVectorLayer()

Get DL4J RepeatVector.

  • return RepeatVector

Activations

Supported Keras activations.

  • softmax

  • elu

  • selu

  • softplus

  • softsign

  • relu

  • tanh

  • sigmoid

  • hard_sigmoid

  • linear

Pooling Layers

KerasPooling1D

Imports a Keras 1D Pooling layer as a DL4J Subsampling layer.

KerasPooling1D

public KerasPooling1D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling1DLayer

public Subsampling1DLayer getSubsampling1DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasPoolingUtils

Utility functionality for Keras pooling layers.

mapPoolingType

public static PoolingType mapPoolingType(String className, KerasLayerConfiguration conf)
            throws UnsupportedKerasConfigurationException

Map Keras pooling layers to DL4J pooling types.

  • param className name of the Keras pooling class

  • return DL4J pooling type

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

KerasPooling3D

Imports a Keras 3D Pooling layer as a DL4J Subsampling3D layer.

KerasPooling3D

public KerasPooling3D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling3DLayer

public Subsampling3DLayer getSubsampling3DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasGlobalPooling

Imports a Keras Pooling layer as a DL4J Subsampling layer.

KerasGlobalPooling

public KerasGlobalPooling(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getGlobalPoolingLayer

public GlobalPoolingLayer getGlobalPoolingLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras config

  • see org.deeplearning4j.nn.conf.InputPreProcessor

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

KerasPooling2D

Imports a Keras 2D Pooling layer as a DL4J Subsampling layer.

KerasPooling2D

public KerasPooling2D(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSubsampling2DLayer

public SubsamplingLayer getSubsampling2DLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

Supported Features Overview

Supported Keras features.

Keras Model Import: Supported Features

Note that we also support importing tf.keras models as well. The format only changed a little bit from keras to tf.keras. We handle this transition from beta7 and above.

  • ❌ ActivityRegularization

  • ❌ SeparableConv1D

  • ❌ Conv3DTranspose

  • ❌ GRU

  • ❌ ConvLSTM2D

  • ✅ Add / add

  • ✅ Multiply / multiply

  • ✅ Subtract / subtract

  • ✅ Average / average

  • ✅ Maximum / maximum

  • ✅ Concatenate / concatenate

  • ❌ Dot / dot

  • ✅ ELU

Noise Layers

Layer Wrappers

  • ❌ TimeDistributed

  • ✅ mean_squared_error

  • ✅ mean_absolute_error

  • ✅ mean_absolute_percentage_error

  • ✅ mean_squared_logarithmic_error

  • ✅ squared_hinge

  • ✅ hinge

  • ✅ categorical_hinge

  • ❌ logcosh

  • ✅ categorical_crossentropy

  • ✅ sparse_categorical_crossentropy

  • ✅ binary_crossentropy

  • ✅ kullback_leibler_divergence

  • ✅ poisson

  • ✅ cosine_proximity

  • ✅ softmax

  • ✅ elu

  • ✅ selu

  • ✅ softplus

  • ✅ softsign

  • ✅ relu

  • ✅ tanh

  • ✅ sigmoid

  • ✅ hard_sigmoid

  • ✅ linear

  • ✅ Zeros

  • ✅ Ones

  • ✅ Constant

  • ✅ RandomNormal

  • ✅ RandomUniform

  • ✅ TruncatedNormal

  • ✅ VarianceScaling

  • ✅ Orthogonal

  • ✅ Identity

  • ✅ lecun_uniform

  • ✅ lecun_normal

  • ✅ glorot_normal

  • ✅ glorot_uniform

  • ✅ he_normal

  • ✅ he_uniform

  • ✅ l1

  • ✅ l2

  • ✅ l1_l2

  • ✅ max_norm

  • ✅ non_neg

  • ✅ unit_norm

  • ✅ min_max_norm

  • ✅ SGD

  • ✅ RMSprop

  • ✅ Adagrad

  • ✅ Adadelta

  • ✅ Adam

  • ✅ Adamax

  • ✅ Nadam

  • ❌ TFOptimizer

Recurrent Layers

KerasSimpleRnn

Imports a Keras SimpleRNN layer as a DL4J SimpleRnn layer.

KerasSimpleRnn

public KerasSimpleRnn(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getSimpleRnnLayer

public Layer getSimpleRnnLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

  • return number of trainable parameters (12)

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • see org.deeplearning4j.nn.conf.InputPreProcessor

getUnroll

public boolean getUnroll()

Get whether SimpleRnn layer should be unrolled (for truncated BPTT).

  • return whether RNN should be unrolled (boolean)

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

  • param weights Simple RNN weights

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

KerasRnnUtils

Utility functions for Keras RNN layers

getUnrollRecurrentLayer

public static boolean getUnrollRecurrentLayer(KerasLayerConfiguration conf, Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException

Get unroll parameter to decide whether to unroll RNN with BPTT or not.

  • param conf KerasLayerConfiguration

  • param layerConfig dictionary containing Keras layer properties

  • return boolean unroll parameter

  • throws InvalidKerasConfigurationException Invalid Keras configuration

getRecurrentDropout

public static double getRecurrentDropout(KerasLayerConfiguration conf, Map<String, Object> layerConfig)
            throws UnsupportedKerasConfigurationException, InvalidKerasConfigurationException

Get recurrent weight dropout from Keras layer configuration. Non-zero dropout rates are currently not supported.

  • param conf KerasLayerConfiguration

  • param layerConfig dictionary containing Keras layer properties

  • return recurrent dropout rate

  • throws InvalidKerasConfigurationException Invalid Keras configuration

KerasLSTM

Imports a Keras LSTM layer as a DL4J LSTM layer.

KerasLSTM

public KerasLSTM(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getLSTMLayer

public Layer getLSTMLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

  • return number of trainable parameters (12)

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for layer.

  • param weights LSTM layer weights

getUnroll

public boolean getUnroll()

Get whether LSTM layer should be unrolled (for truncated BPTT).

  • return whether to unroll the LSTM

getGateActivationFromConfig

public IActivation getGateActivationFromConfig(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Get LSTM gate activation function from Keras layer configuration.

  • param layerConfig dictionary containing Keras layer configuration

  • return LSTM inner activation function

  • throws InvalidKerasConfigurationException Invalid Keras config

getForgetBiasInitFromConfig

public double getForgetBiasInitFromConfig(Map<String, Object> layerConfig, boolean train)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException

Get LSTM forget gate bias initialization from Keras layer configuration.

  • param layerConfig dictionary containing Keras layer configuration

  • return LSTM forget gate bias init

  • throws InvalidKerasConfigurationException Unsupported Keras config

Wrapper Layers

KerasBidirectional

Builds a DL4J Bidirectional layer from a Keras Bidirectional layer wrapper

KerasBidirectional

public KerasBidirectional(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getUnderlyingRecurrentLayer

public Layer getUnderlyingRecurrentLayer()

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getBidirectionalLayer

public Bidirectional getBidirectionalLayer()

Get DL4J Bidirectional layer.

  • return Bidirectional Layer

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

public int getNumParams()

Returns number of trainable parameters in layer.

  • return number of trainable parameters

getInputPreprocessor

public InputPreProcessor getInputPreprocessor(InputType... inputType) throws InvalidKerasConfigurationException

Gets appropriate DL4J InputPreProcessor for given InputTypes.

  • param inputType Array of InputTypes

  • return DL4J InputPreProcessor

  • throws InvalidKerasConfigurationException Invalid Keras configuration exception

  • see org.deeplearning4j.nn.conf.InputPreProcessor

setWeights

public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException

Set weights for Bidirectional layer.

  • param weights Map of weights

Losses

Supported Keras loss functions.

  • mean_squared_error

  • mean_absolute_error

  • mean_absolute_percentage_error

  • mean_squared_logarithmic_error

  • squared_hinge

  • hinge

  • categorical_hinge

  • logcosh

  • categorical_crossentropy

  • sparse_categorical_crossentropy

  • binary_crossentropy

  • kullback_leibler_divergence

  • poisson

  • cosine_proximity

Visualization

How to visualize, monitor and debug neural network learning.

Contents

Note: This information here pertains to DL4J versions 1.0.0-beta6 and later.

DL4J Provides a user interface to visualize in your browser (in real time) the current network status and progress of training. The UI is typically used to help with tuning neural networks - i.e., the selection of hyperparameters (such as learning rate) to obtain good performance for a network.

Step 1: Add the Deeplearning4j UI dependency to your project.

Step 2: Enable the UI in your project

This is relatively straightforward:

To access the UI, open your browser and go to http://localhost:9000/train/overview. You can set the port by using the org.deeplearning4j.ui.port system property: i.e., to use port 9001, pass the following to the JVM on launch: -Dorg.deeplearning4j.ui.port=9001

Information will then be collected and routed to the UI when you call the fit method on your network.

The overview page (one of 3 available pages) contains the following information:

  • Top left: score vs iteration chart - this is the value of the loss function on the current minibatch

  • Top right: model and training information

  • Bottom left: Ratio of parameters to updates (by layer) for all network weights vs. iteration

  • Bottom right: Standard deviations (vs. time) of: activations, gradients and updates

Note that for the bottom two charts, these are displayed as the logarithm (base 10) of the values. Thus a value of -3 on the update: parameter ratio chart corresponds to a ratio of 10-3 = 0.001.

The ratio of updates to parameters is specifically the ratio of mean magnitudes of these values (i.e., log10(mean(abs(updates))/mean(abs(parameters))).

See the later section of this page on how to use these values in practice.

The model page contains a graph of the neural network layers, which operates as a selection mechanism. Click on a layer to display information for it.

On the right, the following charts are available, after selecting a layer:

  • Table of layer information

  • Update to parameter ratio for this layer, as per the overview page. The components of this ratio (the parameter and update mean magnitudes) are also available via tabs.

  • Layer activations (mean and mean +/- 2 standard deviations) over time

  • Histograms of parameters and updates, for each parameter type

  • Learning rate vs. time (note this will be flat, unless learning rate schedules are used)

Note: parameters are labeled as follows: weights (W) and biases (b). For recurrent neural networks, W refers to the weights connecting the layer to the layer below, and RW refers to the recurrent weights (i.e., those between time steps).

The DL4J UI can be used with Spark. However, as of 0.7.0, conflicting dependencies mean that running the UI and Spark is the same JVM can be difficult.

Two alternatives are available:

  1. Collect and save the relevant stats, to be visualized (offline) at a later point

  2. Run the UI in a separate server, and Use the remote UI functionality to upload the data from the Spark master to your UI instance

Collecting Stats for Later Offline Use

Then, later you can load and display the saved information using:

Using the Remote UI Functionality

First, in the JVM running the UI (note this is the server):

This will require the deeplearning4j-ui dependency. (NOTE THIS IS NOT THE CLIENT THIS IS YOUR SERVER - SEE BELOW FOR THE CLIENT WHICH USES: deeplearning4j-ui-model)

To avoid dependency conflicts with Spark, you should use the deeplearning4j-ui-model dependency to get the StatsListener, not the full deeplearning4j-ui UI dependency.

Note: you should replace UI_MACHINE_IP with the IP address of the machine running the user interface instance.

Tuning neural networks is often more an art than a science. However, here's some ideas that may be useful:

Overview Page - Model Score vs. Iteration Chart

The score vs. iteration should (overall) go down over time.

  • If the score increases consistently, your learning rate is likely set too high. Try reducing it until scores become more stable.

  • Increasing scores can also be indicative of other network issues, such as incorrect data normalization

  • If the score is flat or decreases very slowly (over a few hundred iterations) (a) your learning rate may be too low, or (b) you might be having difficulties with optimization. In the latter case, if you are using the SGD updater, try a different updater such as Nesterovs (momentum), RMSProp or Adagrad.

  • Note that data that isn't shuffled (i.e., each minibatch contains only one class, for classification) can result in very rough or abnormal-looking score vs. iteration graphs

  • Some noise in this line chart is expected (i.e., the line will go up and down within a small range). However, if the scores vary quite significantly between runs variation is very large, this can be a problem

    • The issues mentioned above (learning rate, normalization, data shuffling) may contribute to this.

    • Setting the minibatch size to a very small number of examples can also contribute to noisy score vs. iteration graphs, and might lead to optimization difficulties

Overview Page and Model Page - Using the Update: Parameter Ratio Chart

  • The ratio of mean magnitude of updates to parameters is provided on both the overview and model pages

    • "Mean magnitude" = the average of the absolute value of the parameters or updates at the current time step

  • The most important use of this ratio is in selecting a learning rate. As a rule of thumb: this ratio should be around 1:1000 = 0.001. On the (log10) chart, this corresponds to a value of -3 (i.e., 10-3 = 0.001)

    • Note that is a rough guide only, and may not be appropriate for all networks. It's often a good starting point, however.

    • If the ratio diverges significantly from this (for example, > -2 (i.e., 10-2=0.01) or < -4 (i.e., 10-4=0.0001), your parameters may be too unstable to learn useful features, or may change too slowly to learn useful features

    • To change this ratio, adjust your learning rate (or sometimes, parameter initialization). In some networks, you may need to set the learning rate differently for different layers.

  • Keep an eye out for unusually large spikes in the ratio: this may indicate exploding gradients

Model Page: Layer Activations (vs. Time) Chart

This chart can be used to detect vanishing or exploding activations (due to poor weight initialization, too much regularization, lack of data normalization, or too high a learning rate).

  • This chart should ideally stabilize over time (usually a few hundred iterations)

  • A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate one of the problems mentioned above.

Model Page: Layer Parameters Histogram

The layer parameters histogram is displayed for the most recent iteration only.

  • For weights, these histograms should have an approximately Gaussian (normal) distribution, after some time

  • For biases, these histograms will generally start at 0, and will usually end up being approximately Gaussian

    • One exception to this is for LSTM recurrent neural network layers: by default, the biases for one gate (the forget gate) are set to 1.0 (by default, though this is configurable), to help in learning dependencies across long time periods. This results in the bias graphs initially having many biases around 0.0, with another set of biases around 1.0

  • Keep an eye out for parameters that are diverging to +/- infinity: this may be due to too high a learning rate, or insufficient regularization (try adding some L2 regularization to your network).

  • Keep an eye out for biases that become very large. This can sometimes occur in the output layer for classification, if the distribution of classes is very imbalanced

Model Page: Layer Updates Histogram

The layer update histogram is displayed for the most recent iteration only.

  • Note that these are the updates - i.e., the gradients after applying learning rate, momentum, regularization etc

  • As with the parameter graphs, these should have an approximately Gaussian (normal) distribution

  • Keep an eye out for very large values: this can indicate exploding gradients in your network

    • Exploding gradients are problematic as they can 'mess up' the parameters of your network

    • In this case, it may indicate a weight initialization, learning rate or input/labels data normalization issue

Model Page: Parameter Learning Rates Chart

This chart simply shows the learning rates of the parameters of selected layer, over time.

If you are not using learning rate schedules, the chart will be flat. If you are using learning rate schedules, you can use this chart to track the current value of the learning rate (for each parameter), over time.

The recommended solution (for Maven) is to use the Maven Shade plugin to produce an uber-jar, configured as follows:

Then, create your uber-jar with mvn package and run via cd target && java -cp dl4j-examples-0.9.1-bin.jar org.deeplearning4j.examples.userInterface.UIExample. Note the "-bin" suffix for the generated JAR file: this includes all dependencies.

Note also that this Maven Shade approach is configured for DL4J's examples repository.

Troubleshooting Training

Understanding common errors like NaNs and tuning hyperparameters.

Troubleshooting Neural Net Training

Neural networks can be difficult to tune. If the network hyperparameters are poorly chosen, the network may learn slowly, or perhaps not at all. This page aims to provide some baseline steps you should take when tuning your network.

Many of these tips have already been discussed in the academic literature. Our purpose is to consolidate them in one site and express them as clearly as possible.

Contents

What's distribution of your data? Are you scaling it properly? As a general rule:

  • For continuous values: you want these to be in the range of -1 to 1, 0 to 1 or ditributed normally with mean 0 and standard deviation 1. This does not have to be exact, but ensuring your inputs are approximately in this range can help during training. Scale down large inputs, and scale up small inputs.

  • For discrete classes (and, for classification problems for the output), generally use a one-hot representation. That is, if you have 3 classes, then your data will be represeted as [1,0,0], [0,1,0] or [0,0,1] for each of the 3 classes respectively.

Note that it's very important to use the exact same normalization method for both the training data and testing data.

Deeplearning4j supports several different kinds of weight initializations with the weightInit parameter. These are set using the .weightInit(WeightInit) method in your configuration.

You need to make sure your weights are neither too big nor too small. Xavier weight initialization is usually a good choice for this. For networks with rectified linear (relu) or leaky relu activations, RELU weight initialization is a sensible choice.

An epoch is defined as a full pass of the data set.

The learning rate is one of, if not the most important hyperparameter. If this is too large or too small, your network may learn very poorly, very slowly, or not at all. Typical values for the learning rate are in the range of 0.1 to 1e-6, though the optimal learning rate is usually data (and network architecture) specific. Some simple advice is to start by trying three different learning rates – 1e-1, 1e-3, and 1e-6 – to get a rough idea of what it should be, before further tuning this. Ideally, they run models with different learning rates simultaneously to save time.

For training neural networks in a distributed manner, you may need a different (frequently higher) learning rate compared to training the same network on a single machine.

Policies and Scheduling

Note that if you're using multiple GPUs, this will affect your scheduling. For example, if you have 2x GPUs, then you will need to divide the iterations in your schedule by 2, since the throughput of your training process will be double, and the learning rate schedule is only applicable to the local GPU.

There are two aspects to be aware of, with regard to the choice of activation function.

First, the activation function of the hidden (non-output) layers. As a general rule, 'relu' or 'leakyrelu' activations are good choices for this. Some other activation functions (tanh, sigmoid, etc) are more prone to vanishing gradient problems, which can make learning much harder in deep neural networks. However, for LSTM layers, the tanh activation function is still commonly used.

Second, regarding the activation function for the output layer: this is usually application specific. For classification problems, you generally want to use the softmax activation function, combined with the negative log likelihood / MCXENT (multi-class cross entropy). The softmax activation function gives you a probability distribution over classes (i.e., outputs sum to 1.0). For regression problems, the "identity" activation function is frequently a good choice, in conjunction with the MSE (mean squared error) loss function.

Loss functions for each neural network layer can either be used in pretraining, to learn better weights, or in classification (on the output layer) for achieving some result. (In the example above, classification happens in the override section.)

Your net's purpose will determine the loss function you use. For pretraining, choose reconstruction entropy. For classification, use multiclass cross entropy.

Regularization methods can help to avoid overfitting during training. Overfitting occurs when the network predicts the training set very well, but makes poor predictions on data the network has never seen. One way to think about overfitting is that the network memorizes the training data (instead of learning the general relationships in it).

Common types of regularization include:

  • l1 and l2 regularization penalizes large network weights, and avoids weights becoming too large. Some level of l2 regularization is commonly used in practice. However, note that if the l1 or l2 regularization coefficients are too high, they may over-penalize the network, and stop it from learning. Common values for l2 regularization are 1e-3 to 1e-6.

  • Dropconnect (conceptually similar to dropout, but used much less frequently)

  • Restricting the total number of network size (i.e., limit the number of layers and size of each layer)

To use l1/l2/dropout regularization, use .regularization(true) followed by .l1(x), .l2(y), .dropout(z) respectively. Note that z in dropout(z) is the probability of retaining an activation.

A minibatch refers to the number of examples used at a time, when computing gradients and parameter updates. In practice (for all but the smallest data sets), it is standard to break your data set up into a number of minibatches.

The ideal minibatch size will vary. For example, a minibatch size of 10 is frequently too small for GPUs, but can work on CPUs. A minibatch size of 1 will allow a network to train, but will not reap the benefits of parallelism. 32 may be a sensible starting point to try, with minibatches in the range of 16-128 (sometimes smaller or larger, depending on the application and type of network) being common.

In DL4J, the term 'updater' refers to training mechanisms such as momentum, RMSProp, adagrad, and others. Using one of these methods can result in much faster network training companed to 'vanilla' stochastic gradient descent. You can set the updater using the .updater(Updater) configuration option.

The optimization algorithm is how updates are made, given the gradient. The simplest (and most commonly used) method is stochastic gradient descent (SGD), however DL4J also provides SGD with line search, conjugate gradient and LBFGS optimization algorithms. These latter algorithms are more powerful compared to SGD, but considerably more costly per parameter update due to a line search component, and aren't used as much in practice. Note that you can in principle combine any updater with any optimization algorithm.

A good default choice in most cases is to use the stochastic gradient descent optimization algorithm combined with one of the momentum/rmsprop/adagrad updaters, with momentum frequently being used in practice. Note that for momentum, the updater is called NESTEROVS (a reference to the Nesterovs variant of momentum), and the momentum rate can be set by the .momentum(double) option.

Q. Why is my Neural Network throwing nan values?

A. Backpropagation involves the multiplication of very small gradients, due to limited precision when representing real numbers values very close to zero can not be represented. The term for this issue is Arithmetic Underflow. If your Neural Network is throwing nan's then the solution is to retune your network to avoid the very small gradients. This is more likely an issue with deeper Neural Networks.

You can try using double data type but it's usually recommended to retune the net first.

Following the basic tuning tips and monitoring the results is the way to ensure NAN doesn't show up anymore.

Regularizers

Supported Keras regularizers.

All [Keras regularizers] are supported by DL4J model import:

  • l1

  • l2

  • l1_l2

Initializers

Supported Keras weight initializers.

  • Zeros

  • Ones

  • Constant

  • RandomNormal

  • RandomUniform

  • TruncatedNormal

  • VarianceScaling

  • Orthogonal

  • Identity

  • lecun_uniform

  • lecun_normal

  • glorot_normal

  • glorot_uniform

  • he_normal

  • he_uniform

Step 2: Make sure to specify the snapshot version. We follow a simple rule: If the latest stable release version is A.B.C, the snapshot version will be A.B.(C+1)-SNAPSHOT. The current snapshot version is 1.0.0-SNAPSHOT. For more details on the repositories section of the pom.xml file, see

A sample pom.xml is provided here: This has been taken from the DL4J standalone sample project and modified using step 1 and 2 above. The original (using the last release) can be found

A bare minimum file like the following should work in theory, but it does not. This is due to . Gradle with snapshots and Maven classifiers appears to be a problem.

Of note when using the nd4j-native backend (in contrast to nd4j-native-platform) on Gradle (and SBT - but not Maven), you need to add openblas as a dependency. We do this for you in the -platform pom. Reference the -platform pom to double check your dependencies. Note that these are version properties. See the <properties> section of the pom for current versions of the openblas and javacpp presets required to run nd4j-native.

We support all , namely:

The mapping of Keras to DL4J activation functions is defined in

While not every concept in DL4J has an equivalent in Keras and vice versa, many of the key concepts can be matched. Importing keras models into DL4J is done in our module. Below is a comprehensive list of currently supported features.

Mapping keras to DL4J layers is done in the sub-module of model import. The structure of this project loosely reflects the structure of Keras.

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

✅

DL4J supports all available (except for logcosh), namely:

The mapping of Keras loss functions can be found in .

Example:

The full set of UI examples are available .

Client (both spark and standalone neural networks using simple deeplearning4j-nn) Second, for your neural net (Note this example is for spark, but computation graph and multi layer network both have the equivalemtn setListeners method with the same usage, ):

Here's an excellent about visualizing neural net training. It is worth reading and understanding that page first.

In the case of recurrent neural networks, adding some may help

Too few epochs don't give your network enough time to learn good parameters; too many and you might overfit the training data. One way to choose the number of epochs is to use early stopping. can also help to prevent the neural network from overfitting (i.e., can help the net generalize better to unseen data).

The usual approach to selecting an appropriate learning rate is to use to visualize the progress of training. You want to pay attention to both the loss over time, and the ratio of update magnitudes to parameter magnitudes (a ratio of approximately 1:1000 is a good place to start). For more information on tuning the learning rate, see .

You can optionally define a learning rate policy for your neural network. A policy will change the learning rate over time, achieving better results since the learning rate can "slow down" to find closer local minima for convergence. A common policy used is scheduling. See the for a learning rate schedule used in practice.

, is a frequently used regularization method can be very effective. Dropout is most commoly used with a dropout rate of 0.5.

When training a neural network, it can sometimes be helpful to apply gradient normalization, to avoid the gradients being too large (the so-called exploding gradient problem, common in recurrent neural networks) or too small. This can be applied using the .gradientNormalization(GradientNormalization) and .gradientNormalizationThreshould(double) methods. For an example of gradient normalization see, . The test code for that example is .

When training recurrent networks with long time series, it is generally advisable to use truncated backpropagation through time. With 'standard' backpropagation through time (the default in DL4J) the cost per parameter update can become prohibative. For more details, see .

Mapping of regularizers can be found in .

DL4J supports all available , namely:

The mapping of Keras to DL4J initializers can be found in .

Introduction to Snapshots
Setup Instructions
Limitations
Note to Gradle Users
Overview/Introduction
Setup Instructions
Maven documentation
sample pom.xml using snapshots
here
Limitations
Useful Maven Commands for Snapshots
Note to Gradle users
a bug in Gradle
here
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
Keras activation functions
KerasActivationUtils
[source]
[source]
[source]
[source]
[source]
deeplearning4j-modelimport
Layers
Losses
Activations
Initializers
Regularizers
Constraints
Metrics
Optimizers
Layers
layers
Core Layers
Dense
Activation
Dropout
Flatten
Reshape
Merge
Permute
RepeatVector
Lambda
Masking
SpatialDropout1D
SpatialDropout2D
SpatialDropout3D
Convolutional Layers
Conv1D
Conv2D
Conv3D
AtrousConvolution1D
AtrousConvolution2D
SeparableConv2D
Conv2DTranspose
Cropping1D
Cropping2D
Cropping3D
UpSampling1D
UpSampling2D
UpSampling3D
ZeroPadding1D
ZeroPadding2D
ZeroPadding3D
Pooling Layers
MaxPooling1D
MaxPooling2D
MaxPooling3D
AveragePooling1D
AveragePooling2D
AveragePooling3D
GlobalMaxPooling1D
GlobalMaxPooling2D
GlobalMaxPooling3D
GlobalAveragePooling1D
GlobalAveragePooling2D
GlobalAveragePooling3D
Locally-connected Layers
LocallyConnected1D
LocallyConnected2D
Recurrent Layers
SimpleRNN
LSTM
Embedding Layers
Embedding
Merge Layers
Advanced Activation Layers
LeakyReLU
PReLU
ThresholdedReLU
Normalization Layers
BatchNormalization
GaussianNoise
GaussianDropout
AlphaDropout
Bidirectional
Losses
Activations
Initializers
Regularizers
Constraints
Optimizers
[source]
[source]
[source]
[source]
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-ui</artifactId>
        <version>{{ page.version }}</version>
    </dependency>
    MultiLayerNetwork net = ...;
    //Also CompuptationGraph
    //ComputationGraph net = ...;
    //Initialize the user interface backend
    UIServer uiServer = UIServer.getInstance();

    //Configure where the network information (gradients, score vs. time etc) is to be stored. Here: store in memory.
    StatsStorage statsStorage = new InMemoryStatsStorage();         //Alternative: new FileStatsStorage(File), for saving and loading later

    //Attach the StatsStorage instance to the UI: this allows the contents of the StatsStorage to be visualized
    uiServer.attach(statsStorage);

    //Then add the StatsListener to collect this information from the network, as it trains
    net.setListeners(new StatsListener(statsStorage));
    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);

    StatsStorage ss = new FileStatsStorage(new File("myNetworkTrainingStats.dl4j"));
    sparkNet.setListeners(ss, Collections.singletonList(new StatsListener(null)));
    StatsStorage statsStorage = new FileStatsStorage(statsFile);    //If file already exists: load the data from it
    UIServer uiServer = UIServer.getInstance();
    uiServer.attach(statsStorage);
    UIServer uiServer = UIServer.getInstance();
    uiServer.enableRemoteListener();        //Necessary: remote support is not enabled by default
    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);

    StatsStorageRouter remoteUIRouter = new RemoteUIStatsStorageRouter("http://UI_MACHINE_IP:9000");
    sparkNet.setListeners(remoteUIRouter, Collections.singletonList(new StatsListener(null)));
    <build>
        <plugins>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>${exec-maven-plugin.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>exec</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <executable>java</executable>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade-plugin.version}</version>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
                    <createDependencyReducedPom>true</createDependencyReducedPom>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <!--<exclude>org/datanucleus/**</exclude>-->
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>

                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        <plugins>
    <build>
Keras losses
KerasLossUtils
Visualizing Network Training with the Deeplearning4j Training UI
Deeplearning4j UI: The Overview Page
Deeplearning4j UI: The Model Page
Deeplearning4J UI and Spark Training
Using the UI to Tune Your Network
TSNE and Word2Vec
Fixing UI Issue: "No configuration setting" exception
Visualizing Network Training with the Deeplearning4j Training UI
See a UI example here
here
Deeplearning4j UI: The Overview Page
Deeplearning4j UI: The Model Page
Deeplearning4J UI and Spark Training
example found here
Using the UI to Tune Your Network
web page by Andrej Karpathy
gradient normalization or gradient clipping
Data Normalization
Weight Initialization
Epochs and Iterations
Learning Rate
Activation Function
Loss Function
Regularization
Minibatch Size
Updater and Optimization Algorithm
Gradient Normalization
Recurrent Neural Networks
Deep Belief Network
NaN, Not a Number issues
Data Normalization
Weight Initialization
Number of Epochs and Number of Iterations
Early stopping
Learning Rate
DL4J's visualization interface
this link
LeNet example
Activation Function
Loss Function
Regularization
Dropout
Early stopping
Minibatch Size
Updater and Optimization Algorithm
Gradient Normalization
GradientNormalization.java
here
Recurrent Neural Networks: Truncated Backpropagation through Time
this page
NaN, Not a Number Errors
KerasRegularizerUtils
Keras initializers
KerasInitilizationUtils

Transfer Learning

DL4J’s Transfer Learning API

The DL4J transfer learning API enables users to:

  • Modify the architecture of an existing model

  • Fine tune learning configurations of an existing model.

  • Hold parameters of a specified layer constant during training, also referred to as “frozen"

Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

The transfer learning helper

The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

  • Featurize an input dataset to save for future use

  • Fit the model with frozen layers with a featurized dataset

  • Output from the model with frozen layers given a featurized input.

When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

Show me the code

I. Import a zoo model

ZooModel zooModel = VGG16.builder().build();
ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);

II. Set up a fine-tune configuration

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Nesterovs(5e-5))
            .seed(seed)
            .build();

III. Build new models based on VGG16

A.Modifying only the last layer, keeping other frozen

The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
    .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("fc2")
              .removeVertexKeepConnections("predictions") 
              .addLayer("predictions", 
        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nIn(4096).nOut(numClasses)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SOFTMAX).build(), "fc2")
              .build();

After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

B. Attach new layers to the bottleneck (block5_pool)

Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor("block5_pool")
              .nOutReplace("fc2",1024, WeightInit.XAVIER)
              .removeVertexAndConnections("predictions") 
              .addLayer("fc3",new DenseLayer.Builder()
              .activation(Activation.RELU)
              .nIn(1024).nOut(256).build(),"fc2") 
              .addLayer("newpredictions",new OutputLayer
              .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                                .activation(Activation.SOFTMAX)
                                .nIn(256).nOut(numClasses).build(),"fc3") 
              .setOutputs("newpredictions") 
              .build();

C. Fine tune layers from a previously saved model

Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
              .fineTuneConfiguration(fineTuneConf)
              .setFeatureExtractor(“block4_pool”)
              .build();

IV. Saving “featurized” datasets and training with them.

We use the transfer learning helper API. Note this freezes the layers of the model passed in.

Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(pretrainedNet, "fc2");
while(trainIter.hasNext()) {
        DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
        saveToDisk(currentFeaturized,trainDataSaved,true);
  trainDataSaved++;
}

Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.

TransferLearningHelper transferLearningHelper = 
    new TransferLearningHelper(vgg16Transfer);
while (trainIter.hasNext()) {
       transferLearningHelper.fitFeaturized(trainIter.next());
}

Notes

  • The TransferLearning builder returns a new instance of a dl4j model.

Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

  • The trained model helper imports models from Keras without enforcing a training configuration.

Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

  • Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

  • Frozen layer configurations are not saved when writing the model to disk.

In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

  • On a copy: With the transfer learning API which will return a new model with the relevant frozen layers

  • In place: With the transfer learning helper API which will apply the frozen layers to the given model.

  • FineTune configurations will selectively update learning parameters.

For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.

Utilities

Activations

Special algorithms for gradient descent.

What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
    // add hyperparameters and other layers
    .addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
    // add more layers and output
    .build();

Available activations

ActivationRectifiedTanh

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

ActivationELU

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

ActivationReLU

f(x) = max(0, x)

ActivationRationalTanh

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

ActivationThresholdedReLU

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

ActivationReLU6

f(x) = min(max(input, cutoff), 6)

ActivationHardTanH

          ⎧  1, if x >  1
 f(x) =   ⎨ -1, if x < -1
          ⎩  x, otherwise

ActivationSigmoid

f(x) = 1 / (1 + exp(-x))

ActivationGELU

GELU activation function - Gaussian Error Linear Units

ActivationPReLU

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

ActivationIdentity

f(x) = x

ActivationSoftSign

f_i(x) = x_i / (1+

x_i

)

ActivationHardSigmoid

f(x) = min(1, max(0, 0.2x + 0.5))

ActivationSoftmax

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

ActivationCube

f(x) = x^3

ActivationRReLU

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

ActivationTanH

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

ActivationSELU

ActivationLReLU

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

ActivationSwish

f(x) = x sigmoid(x)

ActivationSoftPlus

f(x) = log(1+e^x)

Zoo Models

Available models

AlexNet

AlexNet

Model is built in dl4j based on available functionality and notes indicate where there are gaps waiting for enhancements.

Bias initialization in the paper is 1 in certain layers but 0.1 in the imagenetExample code Weight distribution uses 0.1 std for all layers in the paper but 0.005 in the dense layers in the imagenetExample code

Darknet19

FaceNetNN4Small2

InceptionResNetV1

LeNet

LeNet was an early promising achiever on the ImageNet dataset. References:

NASNet

Implementation of NASNet-A in Deeplearning4j. NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.

This implementation uses 1056 penultimate filters and an input shape of (3, 224, 224). You can change this.

ResNet50

Residual networks for deep learning.

SimpleCNN

SqueezeNet

U-Net

An implementation of SqueezeNet. Touts similar accuracy to AlexNet with a fraction of the parameters.

TextGenerationLSTM

LSTM designed for text generation. Can be trained on a corpus of text. For this model, numClasses is

TinyYOLO

String filename = “tiny-yolo-voc.h5”; ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false); INDArray priors = Nd4j.create(priorBoxes);

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .seed(seed) .iterations(iterations) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .gradientNormalizationThreshold(1.0) .updater(new Adam.Builder().learningRate(1e-3).build()) .l2(0.00001) .activation(Activation.IDENTITY) .trainingWorkspaceMode(workspaceMode) .inferenceWorkspaceMode(workspaceMode) .build();

ComputationGraph model = new TransferLearning.GraphBuilder(graph) .fineTuneConfiguration(fineTuneConf) .addLayer(“outputs”, new Yolo2OutputLayer.Builder() .boundingBoxPriors(priors) .build(), “conv2d_9”) .setOutputs(“outputs”) .build();

System.out.println(model.summary(InputType.convolutional(416, 416, 3)));

ModelSerializer.writeModel(model, “tiny-yolo-voc_dl4j_inference.v1.zip”, false); }</pre>

The channels of the 416x416 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

UNet

U-Net

An implementation of U-Net, a deep learning network for image segmentation in Deeplearning4j. The u-net is convolutional network architecture for fast and precise segmentation of images. Up to now it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

VGG16

VGG19

Xception

U-Net

An implementation of Xception in Deeplearning4j. A novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions.

YOLO2

String filename = “yolo.h5”; 
KerasLayer.registerCustomLayer(“Lambda”, KerasSpaceToDepth.class); 
ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(filename, false);
INDArray priors = Nd4j.create(priorBoxes);
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
 .seed(seed)
 .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
 .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
 .gradientNormalizationThreshold(1.0)
 .updater(new Adam.Builder().learningRate(1e-3).build())
 .l2(0.00001)
 .activation(Activation.IDENTITY)
 .trainingWorkspaceMode(workspaceMode)
 .inferenceWorkspaceMode(workspaceMode)
 .build();
ComputationGraph model = new TransferLearning.GraphBuilder(graph)
 .fineTuneConfiguration(fineTuneConf) 
 .addLayer(“outputs”, new Yolo2OutputLayer.Builder() 
                      .boundingBoxPriors(priors)
                      .build(), “conv2d_23”)
 .setOutputs(“outputs”)
 .build();
System.out.println(model.summary(InputType.convolutional(608, 608, 3)));
ModelSerializer.writeModel(model, “yolo2_dl4j_inference.v1.zip”, false); }

The channels of the 608x608 input images need to be in RGB order (not BGR), with values normalized within [0, 1].

pretrainedUrl

public String pretrainedUrl(PretrainedType pretrainedType)

Default prior boxes for the model

Model Zoo

Prebuilt model architectures and weights for out-of-the-box application.

Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.

If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-zoo</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

Getting started

Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the ZooModel abstract class and uses the InstantiableModel interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.

Initializing fresh configurations

You can instantly instantiate a model from the zoo using the .init() method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:

import org.deeplearning4j.zoo.model.AlexNet
import org.deeplearning4j.zoo.*;

...

int numberOfClassesInYourData = 1000;
int randomSeed = 123;

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
Model net = zooModel.init();

If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:

ZooModel zooModel = AlexNet.builder()
                .numClasses(numberOfClassesInYourData)
                .seed(randomSeed)
                .build();
MultiLayerConfiguration net = ((AlexNet) zooModel).conf();

Initializing pretrained weights

Some models have pretrained weights available, and a small number of models are pretrained across different datasets. PretrainedType is an enumerator that outlines different weight types, which includes IMAGENET, MNIST, CIFAR10, and VGGFACE.

For example, you can initialize a VGG-16 model with ImageNet weights like so:

import org.deeplearning4j.zoo.model.VGG16;
import org.deeplearning4j.zoo.*;

...

ZooModel zooModel = VGG16.builder().build();;
Model net = zooModel.initPretrained(PretrainedType.IMAGENET);

And initialize another VGG16 model with weights trained on VGGFace:

ZooModel zooModel = VGG16.builder().build();
Model net = zooModel.initPretrained(PretrainedType.VGGFACE);

If you're not sure whether a model contains pretrained weights, you can use the .pretrainedAvailable() method which returns a boolean. Simply pass a PretrainedType enum to this method, which returns true if weights are available.

Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is new int[]{3, 224, 224}, this means the model has 3 channels and height/width of 224.

What's in the zoo?

The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.

This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more.

Advanced usage

The zoo comes with a couple additional features if you're looking to use the models for different use cases.

Changing Inputs

Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using .setInputShape().

NOTE: this applies to fresh configurations only, and will not affect pretrained models:

int numberOfClassesInYourData = 10;
int randomSeed = 123;

ZooModel zooModel = ResNet50.builder()
        .numClasses(numberOfClassesInYourData)
        .seed(randomSeed)
        .build();
zooModel.setInputShape(new int[][]{{3, 28, 28}});

Transfer Learning

Workspaces

Computation Graph

How to build complex networks with DL4J computation graph.

Building Complex Network Architectures with Computation Graph

This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.

Overview of Computation Graph

DL4J has two types of networks comprised of multiple layers:

Specifically, the ComputationGraph allows for networks to be built with the following features:

  • Multiple network input arrays

  • Multiple network outputs (including mixed classification/regression architectures)

  • Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)

As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.

Computation Graph: Some Example Use Cases

Examples of some architectures that can be built using ComputationGraph include:

  • Multi-task learning architectures

  • Recurrent neural networks with skip connections

Configuring a Computation Graph

Types of Graph Vertices

  • Input Vertices

  • Element-wise operation vertices

  • Merge vertices

  • Subset vertices

  • Preprocessor vertices

These types of graph vertices are described briefly below.

InputVertex: Input vertices are specified by the addInputs(String...) method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).

ElementWiseVertex: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.

MergeVertex: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.

SubsetVertex: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use .addVertex("subset1", new SubsetVertex(0,4), "layer1"): this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.

Example 1: Recurrent Network with Skip Connections

Suppose we wish to build the following recurrent neural network architecture:

For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input") //can use any label for this
    .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
    .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
    .setOutputs("L2")    //We need to specify the network outputs and their order
    .build();

ComputationGraph net = new ComputationGraph(conf);
net.init();

Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.

Example 2: Multiple Inputs and Merge Vertex

Consider the following architecture:

Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.

To build the above network, we use the following configuration:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
    .graphBuilder()
    .addInputs("input1", "input2")
    .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
    .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
    .addVertex("merge", new MergeVertex(), "L1", "L2")
    .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
    .setOutputs("out")
    .build();

Example 3: Multi-Task Learning

In multi-task learning, a neural network is used to make multiple independent predictions. Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.

In this case, the network configuration is:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
        .updater(new Sgd(0.01))
        .graphBuilder()
        .addInputs("input")
        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
        .addLayer("out1", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nIn(4).nOut(3).build(), "L1")
        .addLayer("out2", new OutputLayer.Builder()
                .lossFunction(LossFunctions.LossFunction.MSE)
                .nIn(4).nOut(2).build(), "L1")
        .setOutputs("out1","out2")
        .build();

Automatically Adding PreProcessors and Calculating nIns

One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the .setInputTypes(InputType...) method in the configuration.

The setInputType method has two effects:

  1. It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the setInputTypes(InputType...) functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.

For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use .setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))

Training Data for ComputationGraph

There are two types of data that can be used with the ComputationGraph.

DataSet and the DataSetIterator

The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).

MultiDataSet and the MultiDataSetIterator

MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.

There are currently two ways to use a MultiDataSetIterator:

The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:

  • Multiple DataVec RecordReaders may be used simultaneously

  • The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader

  • It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output

  • It is possible to convert single columns from a class index to a one-hot representation

Example 1: Regression Data (RecordReaderMultiDataSetIterator)

Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";
RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String csvPath = "/path/to/my/file.csv";
rr.initialize(new FileSplit(new File(csvPath)));

int batchSize = 4;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("myReader",rr)
        .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
        .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
        .build();

Example 2: Classification and Multi-Task Learning (RecordReaderMultiDataSetIterator)

Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification. For this example, let's assume the data is as follows:

  • Input file: myInput.csv, and we want to use all columns as input (without modification)

  • Output file: myOutput.csv.

    • Network output 1 - regression: columns 0 to 3

    • Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.

In this case, we can build our iterator as follows:

int numLinesToSkip = 0;
String fileDelimiter = ",";

RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String featuresCsvPath = "/path/to/my/myInput.csv";
featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));

RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String labelsCsvPath = "/path/to/my/myOutput.csv";
labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));

int batchSize = 4;
int numClasses = 3;
MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
        .addReader("csvInput", featuresReader)
        .addReader("csvLabels", labelsReader)
        .addInput("csvInput") //Input: all columns from input reader
        .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
        .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
        .build();

Layers

Supported neural network layers.

What are layers?

Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a deep neural network.

Using layers

All layers available in Eclipse Deeplearning4j can be used either in a MultiLayerNetwork or ComputationGraph. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.

Layers vs. vertices

If you are configuring complex networks such as InceptionV4, you will need to use the ComputationGraph API and join different branches together using vertices. Check the vertices for more information.

General layers

ActivationLayer

Activation layer is a simple layer that applies the specified activation function to the input activations

clone

public ActivationLayer clone()
  • param activation Activation function for the layer

activation

public Builder activation(String activationFunction)

Activation function for the layer

activation

public Builder activation(IActivation activationFunction)
  • param activationFunction Activation function for the layer

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

DenseLayer

Dense layer: a standard fully connected feed forward layer

hasBias

public Builder hasBias(boolean hasBias)

If true (default): include bias parameters in the model. False: no bias.

hasLayerNorm

public Builder hasLayerNorm(boolean hasLayerNorm)

If true (default = false): enable layer normalization on this layer

DropoutLayer

Dropout layer. This layer simply applies dropout at training time, and passes activations through unmodified at test

build

public DropoutLayer build()

Create a dropout layer with standard {- link Dropout}, with the specified probability of retaining the input activation. See {- link Dropout} for the full details

  • param dropout Activation retain probability.

EmbeddingLayer

Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to the equivalent one-hot representation. Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however, it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding for each example. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

  • param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

  • param vectors Vectors to initialize the embedding layer with

EmbeddingSequenceLayer

Embedding layer for sequences: feed-forward layer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the weight rows can be considered a vector/embedding of each index. Note also that embedding layer has an activation function (set to IDENTITY to disable) and optional bias (which is disabled by default)

hasBias

public Builder hasBias(boolean hasBias)

If true: include bias parameters in the layer. False (default): no bias.

inputLength

public Builder inputLength(int inputLength)

Set input sequence length for this embedding layer.

  • param inputLength input sequence length

  • return Builder

inferInputLength

public Builder inferInputLength(boolean inferInputLength)

Set input sequence inference mode for embedding layer.

  • param inferInputLength whether to infer input length

  • return Builder

weightInit

public Builder weightInit(EmbeddingInitializer embeddingInitializer)

Initialize the embedding layer using the specified EmbeddingInitializer - such as a Word2Vec instance

  • param embeddingInitializer Source of the embedding layer weights

weightInit

public Builder weightInit(INDArray vectors)

Initialize the embedding layer using values from the specified array. Note that the array should have shape [vocabSize, vectorSize]. After copying values from the array to initialize the network parameters, the input array will be discarded (so that, if necessary, it can be garbage collected)

  • param vectors Vectors to initialize the embedding layer with

GlobalPoolingLayer

Global pooling layer - used to do pooling over time for RNNs, and 2d pooling for CNNs. Supports the following

Global pooling layer can also handle mask arrays when dealing with variable length inputs. Mask arrays are assumed to be 2d, and are fed forward through the network during training or post-training forward pass:

  • Time series: mask arrays are shape [miniBatchSize, maxTimeSeriesLength] and contain values 0 or 1 only

  • CNNs: mask have shape [miniBatchSize, height] or [miniBatchSize, width]. Important: the current implementation assumes that for CNNs + variable length (masking), the input shape is [miniBatchSize, channels, height, 1] or [miniBatchSize, channels, 1, width] respectively. This is the case with global pooling in architectures like CNN for sentence classification.

Behaviour with default settings:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

Alternatively, by setting collapseDimensions = false in the configuration, it is possible to retain the reduced dimensions as 1s: this gives

  • [miniBatchSize, vectorSize, 1] for RNN output,

  • [miniBatchSize, channels, 1, 1] for CNN output, and

  • [miniBatchSize, channels, 1, 1, 1] for CNN3D output.

poolingDimensions

public Builder poolingDimensions(int... poolingDimensions)

Pooling type for global pooling

poolingType

public Builder poolingType(PoolingType poolingType)
  • param poolingType Pooling type for global pooling

collapseDimensions

public Builder collapseDimensions(boolean collapseDimensions)

Whether to collapse dimensions when pooling or not. Usually you do want to do this. Default: true. If true:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 2d output [miniBatchSize, vectorSize]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels]

If false:

  • 3d (time series) input with shape [miniBatchSize, vectorSize, timeSeriesLength] -> 3d output [miniBatchSize, vectorSize, 1]

  • 4d (CNN) input with shape [miniBatchSize, channels, height, width] -> 2d output [miniBatchSize, channels, 1, 1]

  • 5d (CNN3D) input with shape [miniBatchSize, channels, depth, height, width] -> 2d output [miniBatchSize, channels, 1, 1, 1]

  • param collapseDimensions Whether to collapse the dimensions or not

pnorm

public Builder pnorm(int pnorm)

P-norm constant. Only used if using {- link PoolingType#PNORM} for the pooling type

  • param pnorm P-norm constant

LocalResponseNormalization

k

public Builder k(double k)

LRN scaling constant k. Default: 2

n

public Builder n(double n)

Number of adjacent kernel maps to use when doing LRN. default: 5

  • param n Number of adjacent kernel maps

alpha

public Builder alpha(double alpha)

LRN scaling constant alpha. Default: 1e-4

  • param alpha Scaling constant

beta

public Builder beta(double beta)

Scaling constant beta. Default: 0.75

  • param beta Scaling constant

cudnnAllowFallback

public Builder cudnnAllowFallback(boolean allowFallback)

When using CuDNN and an error is encountered, should fallback to the non-CuDNN implementatation be allowed? If set to false, an exception in CuDNN will be propagated back to the user. If false, the built-in (non-CuDNN) implementation for BatchNormalization will be used

  • param allowFallback Whether fallback to non-CuDNN implementation should be used

LocallyConnected1D

SameDiff version of a 1D locally connected layer.

nIn

public Builder nIn(int nIn)

Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)
  • param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

kernelSize

public Builder kernelSize(int k)
  • param k Kernel size for the layer

stride

public Builder stride(int s)
  • param s Stride for the layer

padding

public Builder padding(int p)
  • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)
  • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int d)
  • param d Dilation for the layer

hasBias

public Builder hasBias(boolean hasBias)
  • param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int inputSize)

Set input filter size for this locally connected 1D layer

  • param inputSize height of the input filters

  • return Builder

LocallyConnected2D

SameDiff version of a 2D locally connected layer.

setKernel

public void setKernel(int... kernel)

Number of inputs to the layer (input size)

setStride

public void setStride(int... stride)
  • param stride Stride for the layer. Must be 2 values (height/width)

setPadding

public void setPadding(int... padding)
  • param padding Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

setDilation

public void setDilation(int... dilation)
  • param dilation Dilation for the layer. Must be 2 values (height/width)

nIn

public Builder nIn(int nIn)
  • param nIn Number of inputs to the layer (input size)

nOut

public Builder nOut(int nOut)
  • param nOut Number of outputs (output size)

activation

public Builder activation(Activation activation)
  • param activation Activation function for the layer

kernelSize

public Builder kernelSize(int... k)
  • param k Kernel size for the layer. Must be 2 values (height/width)

stride

public Builder stride(int... s)
  • param s Stride for the layer. Must be 2 values (height/width)

padding

public Builder padding(int... p)
  • param p Padding for the layer. Not used if {- link ConvolutionMode#Same} is set. Must be 2 values (height/width)

convolutionMode

public Builder convolutionMode(ConvolutionMode cm)
  • param cm Convolution mode for the layer. See {- link ConvolutionMode} for details

dilation

public Builder dilation(int... d)
  • param d Dilation for the layer. Must be 2 values (height/width)

hasBias

public Builder hasBias(boolean hasBias)
  • param hasBias If true (default is false) the layer will have a bias

setInputSize

public Builder setInputSize(int... inputSize)

Set input filter size (h,w) for this locally connected 2D layer

  • param inputSize pair of height and width of the input filters to this layer

  • return Builder

LossLayer

LossLayer is a flexible output layer that performs a loss function on an input without MLP logic. LossLayer is does not have any parameters. Consequently, setting nIn/nOut isn’t supported - the output size is the same size as the input activations.

nIn

public Builder nIn(int nIn)
  • param lossFunction Loss function for the loss layer

OutputLayer

Output layer used for training via backpropagation based on labels and a specified loss function. Can be configured for both classification and regression. Note that OutputLayer has parameters - it contains a fully-connected layer (effectively contains a DenseLayer) internally. This allows the output size to be different to the layer input size.

build

public OutputLayer build()
  • param lossFunction Loss function for the output layer

Pooling1D

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Pooling2D

Supports the following pooling types: MAX, AVG, SUM, PNORM, NONE

Subsampling1DLayer

sequenceLength]}. This layer accepts RNN InputTypes instead of CNN InputTypes.

Supports the following pooling types: MAX, AVG, SUM, PNORM

setKernelSize

public void setKernelSize(int... kernelSize)

Kernel size

  • param kernelSize kernel size

setStride

public void setStride(int... stride)

Stride

  • param stride stride value

setPadding

public void setPadding(int... padding)

Padding

  • param padding padding value

Upsampling1D

sequenceLength]} Example:

If input (for a single example, with channels down page, and sequence from left to right) is:
[ A1, A2, A3]
[ B1, B2, B3]
Then output with size = 2 is:
[ A1, A1, A2, A2, A3, A3]
[ B1, B1, B2, B2, B3, B2]

size

public Builder size(int size)

Upsampling size

  • param size upsampling size in single spatial dimension of this 1D layer

size

public Builder size(int[] size)

Upsampling size int array with a single element. Array must be length 1

  • param size upsampling size in single spatial dimension of this 1D layer

Upsampling2D

Upsampling 2D layer Repeats each value (or rather, set of depth values) in the height and width dimensions by

Input (slice for one example and channel)
[ A, B ]
[ C, D ]
Size = [2, 2]
Output (slice for one example and channel)
[ A, A, B, B ]
[ A, A, B, B ]
[ C, C, D, D ]
[ C, C, D, D ]

size

public Builder size(int size)

Upsampling size int, used for both height and width

  • param size upsampling size in height and width dimensions

size

public Builder size(int[] size)

Upsampling size array

  • param size upsampling size in height and width dimensions

Upsampling3D

Upsampling 3D layer Repeats each value (all channel values for each x/y/z location) by size[0], size[1] and [minibatch, channels, size[0] depth, size[1] height, size[2] width]}

size

public Builder size(int size)

Upsampling size as int, so same upsampling size is used for depth, width and height

  • param size upsampling size in height, width and depth dimensions

size

public Builder size(int[] size)

Upsampling size as int, so same upsampling size is used for depth, width and height

  • param size upsampling size in height, width and depth dimensions

ZeroPadding1DLayer

Zero padding 1D layer for convolutional neural networks. Allows padding to be done separately for top and bottom.

setPadding

public void setPadding(int... padding)

Padding value for left and right. Must be length 2 array

build

public ZeroPadding1DLayer build()
  • param padding Padding for both the left and right

ZeroPadding3DLayer

Zero padding 3D layer for convolutional neural networks. Allows padding to be done separately for “left” and “right” in all three spatial dimensions.

setPadding

public void setPadding(int... padding)

[padLeftD, padRightD, padLeftH, padRightH, padLeftW, padRightW]

build

public ZeroPadding3DLayer build()
  • param padding Padding for both the left and right in all three spatial dimensions

ZeroPaddingLayer

Zero padding layer for convolutional neural networks (2D CNNs). Allows padding to be done separately for top/bottom/left/right

setPadding

public void setPadding(int... padding)

Padding value for top, bottom, left, and right. Must be length 4 array

build

public ZeroPaddingLayer build()
  • param padHeight Padding for both the top and bottom

  • param padWidth Padding for both the left and right

ElementWiseMultiplicationLayer

is a learnable weight vector of length nOut

  • “.” is element-wise multiplication

  • b is a bias vector

Note that the input and output sizes of the element-wise layer are the same for this layer

created by jingshu

getMemoryReport

public LayerMemoryReport getMemoryReport(InputType inputType)

This is a report of the estimated memory consumption for the given layer

  • param inputType Input type to the layer. Memory consumption is often a function of the input type

  • return Memory report for the layer

RepeatVector

RepeatVector layer configuration.

RepeatVector takes a mini-batch of vectors of shape (mb, length) and a repeat factor n and outputs a 3D tensor of shape (mb, n, length) in which x is repeated n times.

getRepetitionFactor

public int getRepetitionFactor()

Set repetition factor for RepeatVector layer

setRepetitionFactor

public void setRepetitionFactor(int n)

Set repetition factor for RepeatVector layer

  • param n upsampling size in height and width dimensions

repetitionFactor

public Builder repetitionFactor(int n)

Set repetition factor for RepeatVector layer

  • param n upsampling size in height and width dimensions

Yolo2OutputLayer

Note: Input activations to the Yolo2OutputLayer should have shape: [minibatch, b(5+c), H, W], where: b = number of bounding boxes (determined by config - see papers for details) c = number of classes H = output/label height W = output/label width

Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change. Label format: [minibatch, 4+C, H, W] Order for labels depth: [x1,y1,x2,y2,(class labels)] x1 = box top left position y1 = as above, y axis x2 = box bottom right position y2 = as above y axis Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).

lambdaCoord

public Builder lambdaCoord(double lambdaCoord)

Loss function coefficient for position and size/scale components of the loss function. Default (as per paper): 5

lambbaNoObj

public Builder lambbaNoObj(double lambdaNoObj)

Loss function coefficient for the “no object confidence” components of the loss function. Default (as per paper): 0.5

  • param lambdaNoObj Lambda value for no-object (confidence) component of the loss function

lossPositionScale

public Builder lossPositionScale(ILossFunction lossPositionScale)

Loss function for position/scale component of the loss function

  • param lossPositionScale Loss function for position/scale

lossClassPredictions

public Builder lossClassPredictions(ILossFunction lossClassPredictions)

Loss function for the class predictions - defaults to L2 loss (i.e., sum of squared errors, as per the paper), however Loss MCXENT could also be used (which is more common for classification).

  • param lossClassPredictions Loss function for the class prediction error component of the YOLO loss function

boundingBoxPriors

public Builder boundingBoxPriors(INDArray boundingBoxes)

Bounding box priors dimensions [width, height]. For N bounding boxes, input has shape [rows, columns] = [N, 2] Note that dimensions should be specified as fraction of grid size. For example, a network with 13x13 output, a value of 1.0 would correspond to one grid cell; a value of 13 would correspond to the entire image.

  • param boundingBoxes Bounding box prior dimensions (width, height)

MaskLayer

MaskLayer applies the mask array to the forward pass activations, and backward pass gradients, passing through this layer. It can be used with 2d (feed-forward), 3d (time series) or 4d (CNN) activations.

MaskZeroLayer

Wrapper which masks timesteps with activation equal to the specified masking value (0.0 default). Assumes that the input shape is [batch_size, input_size, timesteps].

Auto Encoders

What are autoencoders?

Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.

Where’s Restricted Boltzmann Machine?

RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.

Supported layers

AutoEncoder

Autoencoder layer. Adds noise to input and learn a reconstruction function.

corruptionLevel

public Builder corruptionLevel(double corruptionLevel)

Level of corruption - 0.0 (none) to 1.0 (all values corrupted)

sparsity

public Builder sparsity(double sparsity)

Autoencoder sparity parameter

  • param sparsity Sparsity

VariationalAutoencoder

Variational Autoencoder layer

This implementation allows multiple encoder and decoder layers, the number and sizes of which can be set independently.

A note on scores during pretraining: This implementation minimizes the negative of the variational lower bound objective as described in Kingma & Welling; the mathematics in that paper is based on maximization of the variational lower bound instead. Thus, scores reported during pretraining in DL4J are the negative of the variational lower bound equation in the paper. The backpropagation and learning procedure is otherwise as described there.

encoderLayerSizes

public Builder encoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

setEncoderLayerSizes

public void setEncoderLayerSizes(int... encoderLayerSizes)

Size of the encoder layers, in units. Each encoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers (set via {- link #decoderLayerSizes(int…)} is similar to the encoder layers.

  • param encoderLayerSizes Size of each encoder layer in the variational autoencoder

decoderLayerSizes

public Builder decoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

  • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

setDecoderLayerSizes

public void setDecoderLayerSizes(int... decoderLayerSizes)

Size of the decoder layers, in units. Each decoder layer is functionally equivalent to a {- link org.deeplearning4j.nn.conf.layers.DenseLayer}. Typically the number and size of the decoder layers is similar to the encoder layers (set via {- link #encoderLayerSizes(int…)}.

  • param decoderLayerSizes Size of each deccoder layer in the variational autoencoder

reconstructionDistribution

public Builder reconstructionDistribution(ReconstructionDistribution distribution)

The reconstruction distribution for the data given the hidden state - i.e., P(data|Z). This should be selected carefully based on the type of data being modelled. For example:

  • {- link GaussianReconstructionDistribution} + {identity or tanh} for real-valued (Gaussian) data

  • {- link BernoulliReconstructionDistribution} + sigmoid for binary-valued (0 or 1) data

  • param distribution Reconstruction distribution

lossFunction

public Builder lossFunction(IActivation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

lossFunction

public Builder lossFunction(Activation outputActivationFn, LossFunctions.LossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

lossFunction

public Builder lossFunction(IActivation outputActivationFn, ILossFunction lossFunction)

Configure the VAE to use the specified loss function for the reconstruction, instead of a ReconstructionDistribution. Note that this is NOT following the standard VAE design (as per Kingma & Welling), which assumes a probabilistic output - i.e., some p(x|z). It is however a valid network configuration, allowing for optimization of more traditional objectives such as mean squared error. Note: clearly, setting the loss function here will override any previously set recontruction distribution

  • param outputActivationFn Activation function for the output/reconstruction

  • param lossFunction Loss function to use

pzxActivationFn

public Builder pzxActivationFn(IActivation activationFunction)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

  • param activationFunction Activation function for p(z| x)

pzxActivationFunction

public Builder pzxActivationFunction(Activation activation)

Activation function for the input to P(z|data). Care should be taken with this, as some activation functions (relu, etc) are not suitable due to being bounded in range [0,infinity).

  • param activation Activation function for p(z | x)

nOut

public Builder nOut(int nOut)

Set the size of the VAE state Z. This is the output size during standard forward pass, and the size of the distribution P(Z|data) during pretraining.

  • param nOut Size of P(Z | data) and output size

numSamples

public Builder numSamples(int numSamples)

Set the number of samples per data point (from VAE state Z) used when doing pretraining. Default value: 1.

This is parameter L from Kingma and Welling: “In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.”

  • param numSamples Number of samples per data point for pretraining

Convolutional Layers

Also known as CNN.

Available layers

Convolution1D

1D convolution layer. Expects input activations of shape [minibatch,channels,sequenceLength]

Convolution2D

2D convolution layer

Convolution3D

3D convolution layer configuration

hasBias

public boolean hasBias()

An optional dataFormat: “NDHWC” or “NCDHW”. Defaults to “NCDHW”. The data format of the input and output data. For “NCDHW” (also known as ‘channels first’ format), the data storage order is: [batchSize, inputChannels, inputDepth, inputHeight, inputWidth]. For “NDHWC” (‘channels last’ format), the data is stored in the order of: [batchSize, inputDepth, inputHeight, inputWidth, inputChannels].

kernelSize

public Builder kernelSize(int... kernelSize)

The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

stride

public Builder stride(int... stride)

Set stride size for 3D convolutions in (depth, height, width) order

  • param stride kernel size

  • return 3D convolution layer builder

padding

public Builder padding(int... padding)

Set padding size for 3D convolutions in (depth, height, width) order

  • param padding kernel size

  • return 3D convolution layer builder

dilation

public Builder dilation(int... dilation)

Set dilation size for 3D convolutions in (depth, height, width) order

  • param dilation kernel size

  • return 3D convolution layer builder

dataFormat

public Builder dataFormat(DataFormat dataFormat)

The data format for input and output activations. NCDHW: activations (in/out) should have shape [minibatch, channels, depth, height, width] NDHWC: activations (in/out) should have shape [minibatch, depth, height, width, channels]

  • param dataFormat Data format to use for activations

setKernelSize

public void setKernelSize(int... kernelSize)

Set kernel size for 3D convolutions in (depth, height, width) order

  • param kernelSize kernel size

setStride

public void setStride(int... stride)

Set stride size for 3D convolutions in (depth, height, width) order

  • param stride kernel size

setPadding

public void setPadding(int... padding)

Set padding size for 3D convolutions in (depth, height, width) order

  • param padding kernel size

setDilation

public void setDilation(int... dilation)

Set dilation size for 3D convolutions in (depth, height, width) order

  • param dilation kernel size

Deconvolution2D

2D deconvolution layer configuration

Deconvolutions are also known as transpose convolutions or fractionally strided convolutions. In essence, deconvolutions swap forward and backward pass with regular 2D convolutions.

hasBias

public boolean hasBias()

Deconvolution2D layer nIn in the input layer is the number of channels nOut is the number of filters to be used in the net or in other words the channels The builder specifies the filter/kernel size, the stride and padding The pooling layer takes the kernel size

convolutionMode

public Builder convolutionMode(ConvolutionMode convolutionMode)

Set the convolution mode for the Convolution layer. See {- link ConvolutionMode} for more details

  • param convolutionMode Convolution mode for layer

kernelSize

public Builder kernelSize(int... kernelSize)

Size of the convolution rows/columns

  • param kernelSize the height and width of the kernel

Cropping1D

Cropping layer for convolutional (1d) neural networks. Allows cropping to be done separately for top/bottom

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)
  • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

build

public Cropping1D build()
  • param cropping Cropping amount for top/bottom (in that order). Must be length 1 or 2 array.

Cropping2D

Cropping layer for convolutional (2d) neural networks. Allows cropping to be done separately for top/bottom/left/right

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)
  • param cropTopBottom Amount of cropping to apply to both the top and the bottom of the input activations

  • param cropLeftRight Amount of cropping to apply to both the left and the right of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount for top/bottom/left/right (in that order). A length 4 array.

build

public Cropping2D build()
  • param cropping Cropping amount for top/bottom/left/right (in that order). Must be length 4 array.

Cropping3D

Cropping layer for convolutional (3d) neural networks. Allows cropping to be done separately for upper and lower bounds of depth, height and width dimensions.

getOutputType

public InputType getOutputType(int layerIndex, InputType inputType)
  • param cropDepth Amount of cropping to apply to both depth boundaries of the input activations

  • param cropHeight Amount of cropping to apply to both height boundaries of the input activations

  • param cropWidth Amount of cropping to apply to both width boundaries of the input activations

setCropping

public void setCropping(int... cropping)

Cropping amount, a length 6 array, i.e. crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

build

public Cropping3D build()
  • param cropping Cropping amount, must be length 3 or 6 array, i.e. either crop depth, crop height, crop width or crop left depth, crop right depth, crop left height, crop right height, crop left width, crop right width

DataSet Iterators

Data iteration tools for loading into neural networks.

What is an iterator?

A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.

Usage

For most use cases, initializing an iterator and passing a reference to a MultiLayerNetwork or ComputationGraph fit() method is all you need to begin a task for training:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);

Many other methods also accept iterators for tasks such as evaluation:

// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);

// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
    DataSet next = mnistTest.next();
    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
    eval.eval(next.getLabels(), output); //check the prediction against the true class
}

Available iterators

MnistDataSetIterator

UciSequenceDataSetIterator

UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories: Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift

UciSequenceDataSetIterator

public UciSequenceDataSetIterator(int batchSize)

Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123

  • param batchSize Minibatch size

Cifar10DataSetIterator

CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)

Cifar10DataSetIterator

public Cifar10DataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

  • param batchSize Minibatch size for the iterator

IrisDataSetIterator

IrisDataSetIterator

public IrisDataSetIterator()

next

public DataSet next()

IrisDataSetIterator handles traversing through the Iris Data Set.

  • param batch Batch size

  • param numExamples Total number of examples

LFWDataSetIterator

LFWDataSetIterator

public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
                    PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
                    ImageTransform imageTransform, Random rng)

Create LFW data specific iterator

  • param batchSize the batch size of the examples

  • param numExamples the overall number of examples

  • param imgDim an array of height, width and channels

  • param numLabels the overall number of examples

  • param useSubset use a subset of the LFWDataSet

  • param labelGenerator path label generator to use

  • param train true if use train value

  • param splitTrainTest the percentage to split data for train and remainder goes to test

  • param imageTransform how to transform the image

  • param rng random number to lock in batch shuffling

TinyImageNetDataSetIterator

Tiny ImageNet is a subset of the ImageNet database. TinyImageNet is the default course challenge for CS321n at Stanford University.

Tiny ImageNet has 200 classes, each consisting of 500 training images. Images are 64x64 pixels, RGB.

TinyImageNetDataSetIterator

public TinyImageNetDataSetIterator(int batchSize)

Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)

  • param batchSize Minibatch size for the iterator

EmnistDataSetIterator

EMNIST DataSetIterator

  • COMPLETE: Also known as 'ByClass' split. 814,255 examples total (train + test), 62 classes

  • MERGE: Also known as 'ByMerge' split. 814,255 examples total. 47 unbalanced classes. Combines lower and upper case characters (that are difficult to distinguish) into one class for each letter (instead of 2), for letters C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z

  • BALANCED: 131,600 examples total. 47 classes (equal number of examples in each class)

  • LETTERS: 145,600 examples total. 26 balanced classes

  • DIGITS: 280,000 examples total. 10 balanced classes

EmnistDataSetIterator

public EmnistDataSetIterator(Set dataSet, int batch, boolean train) throws IOException

EMNIST dataset has multiple different subsets. See {- link EmnistDataSetIterator} Javadoc for details.

numExamplesTrain

public static int numExamplesTrain(Set dataSet)

Create an EMNIST iterator with randomly shuffled data based on a specified RNG seed

  • param dataSet Dataset (subset) to return

  • param batchSize Batch size

  • param train If true: use training set. If false: use test set

  • param seed Random number generator seed

numExamplesTest

public static int numExamplesTest(Set dataSet)

Get the number of test examples for the specified subset

  • param dataSet Subset to get

  • return Number of examples for the specified subset

numLabels

public static int numLabels(Set dataSet)

Get the number of labels for the specified subset

  • param dataSet Subset to get

  • return Number of labels for the specified subset

isBalanced

public static boolean isBalanced(Set dataSet)

Get the labels as a character array

  • return Labels

RecordReaderDataSetIterator

DataSet objects as well as producing minibatches from individual records.

Example 1: Image classification, batch size 32, 10 classes

rr.initialize(new FileSplit(new File("/path/to/directory")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
//  that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler())      //For normalization of image values 0-255 to 0-1
.build()
}

Example 2: Multi-output regression from CSV, batch size 128

rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));

DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
}

RecordReaderDataSetIterator

public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)

Constructor for classification, where: (a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced

  • param recordReader Record reader to use as the source of data

  • param batchSize Minibatch size, for each call of .next()

setCollectMetaData

public void setCollectMetaData(boolean collectMetaData)

Main constructor for classification. This will convert the input class index (at position labelIndex, with integer values 0 to numPossibleLabels-1 inclusive) to the appropriate one-hot output/labels representation.

  • param recordReader RecordReader: provides the source of the data

  • param batchSize Batch size (number of examples) for the output DataSet objects

  • param labelIndex Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()

  • param numPossibleLabels Number of classes (possible labels) for classification

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the RecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

writableConverter

public Builder writableConverter(WritableConverter converter)

Builder class for RecordReaderDataSetIterator

maxNumBatches

public Builder maxNumBatches(int maxNumBatches)

Optional argument, usually not used. If set, can be used to limit the maximum number of minibatches that will be returned (between resets). If not set, will always return as many minibatches as there is data available.

  • param maxNumBatches Maximum number of minibatches per epoch / reset

regression

public Builder regression(int labelIndex)

Use this for single output regression (i.e., 1 output/regression target)

  • param labelIndex Column index that contains the regression target (indexes start at 0)

regression

public Builder regression(int labelIndexFrom, int labelIndexTo)

Use this for multiple output regression (1 or more output/regression targets). Note that all regression targets must be contiguous (i.e., positions x to y, without gaps)

  • param labelIndexFrom Column index of the first regression target (indexes start at 0)

  • param labelIndexTo Column index of the last regression target (inclusive)

classification

public Builder classification(int labelIndex, int numClasses)

Use this for classification

  • param labelIndex Index that contains the label index. Column (indexes start from 0) be an integer value, and contain values 0 to numClasses-1

  • param numClasses Number of label classes (i.e., number of categories/classes in the dataset)

preProcessor

public Builder preProcessor(DataSetPreProcessor preProcessor)

Optional arg. Allows the preprocessor to be set

  • param preProcessor Preprocessor to use

collectMetaData

public Builder collectMetaData(boolean collectMetaData)

When set to true: metadata for the current examples will be present in the returned DataSet. Disabled by default.

  • param collectMetaData Whether metadata should be collected or not

RecordReaderMultiDataSetIterator

The idea: generate multiple inputs and multiple outputs from one or more Sequence/RecordReaders. Inputs and outputs may be obtained from subsets of the RecordReader and SequenceRecordReaders columns (for examples, some inputs and outputs as different columns in the same record/sequence); it is also possible to mix different types of data (for example, using both RecordReaders and SequenceRecordReaders in the same RecordReaderMultiDataSetIterator). inputs and subsets.

RecordReaderMultiDataSetIterator

public RecordReaderMultiDataSetIterator build()

When dealing with time series data of different lengths, how should we align the input/labels time series? For equal length: use EQUAL_LENGTH For sequence classification: use ALIGN_END

loadFromMetaData

public MultiDataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public MultiDataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

SequenceRecordReaderDataSetIterator

Sequence record reader data set iterator. Given a record reader (and optionally another record reader for the labels) generate time series (sequence) data sets. Supports padding for one-to-many and many-to-one type data loading (i.e., with different number of inputs vs.

SequenceRecordReaderDataSetIterator

public SequenceRecordReaderDataSetIterator(SequenceRecordReader featuresReader, SequenceRecordReader labels,
                    int miniBatchSize, int numPossibleLabels)

Constructor where features and labels come from different RecordReaders (for example, different files), and labels are for classification.

  • param featuresReader SequenceRecordReader for the features

  • param labels Labels: assume single value per time step, where values are integers in the range 0 to numPossibleLables-1

  • param miniBatchSize Minibatch size for each call of next()

  • param numPossibleLabels Number of classes for the labels

hasNext

public boolean hasNext()

Constructor where features and labels come from different RecordReaders (for example, different files)

loadFromMetaData

public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException

Load a single sequence example to a DataSet, using the provided RecordMetaData. Note that it is more efficient to load multiple instances at once, using {- link #loadFromMetaData(List)}

  • param recordMetaData RecordMetaData to load from. Should have been produced by the given record reader

  • return DataSet with the specified example

  • throws IOException If an error occurs during loading of the data

loadFromMetaData

public DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException

Load a multiple sequence examples to a DataSet, using the provided RecordMetaData instances.

  • param list List of RecordMetaData instances to load from. Should have been produced by the record reader provided to the SequenceRecordReaderDataSetIterator constructor

  • return DataSet with the specified examples

  • throws IOException If an error occurs during loading of the data

AsyncMultiDataSetIterator

Async prefetching iterator wrapper for MultiDataSetIterator implementations This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

next

public MultiDataSet next(int num)

We want to ensure, that background thread will have the same thread->device affinity, as master thread

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

IteratorDataSetIterator

required to get the specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

AsyncDataSetIterator

Async prefetching iterator wrapper for DataSetIterator implementations. This will asynchronously prefetch the specified number of minibatches from the underlying iterator. Also has the option (enabled by default for most constructors) to use a cyclical workspace to avoid creating INDArrays with off-heap memory that needs to be cleaned up by the JVM garbage collector.

Note that appropriate DL4J fit methods automatically utilize this iterator, so users don’t need to manually wrap their iterators when fitting a network

AsyncDataSetIterator

public AsyncDataSetIterator(DataSetIterator baseIterator)

Create an Async iterator with the default queue size of 8

  • param baseIterator Underlying iterator to wrap and fetch asynchronously from

next

public DataSet next(int num)

Create an Async iterator with the default queue size of 8

  • param iterator Underlying iterator to wrap and fetch asynchronously from

  • param queue Queue size - number of iterators to

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects? Most DataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

shutdown

public void shutdown()

We want to ensure, that background thread will have the same thread->device affinity, as master thread

batch

public int batch()

Batch size

  • return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

  • param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

  • return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DoublesDataSetIterator

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

DoublesDataSetIterator

public DoublesDataSetIterator(@NonNull Iterable<Pair<double[], double[]>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

IteratorMultiDataSetIterator

required to get a specified batch size.

Typically used in Spark training, but may be used elsewhere. NOTE: reset method is not supported here.

SamplingDataSetIterator

A wrapper for a dataset to sample from. This will randomly sample from the given dataset.

SamplingDataSetIterator

public SamplingDataSetIterator(DataSet sampleFrom, int batchSize, int totalNumberSamples)

INDArrayDataSetIterator

First value in pair is the features vector, second value in pair is the labels.

INDArrayDataSetIterator

public INDArrayDataSetIterator(@NonNull Iterable<Pair<INDArray, INDArray>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

WorkspacesShieldDataSetIterator

This iterator detaches/migrates DataSets coming out from backed DataSetIterator, thus providing “safe” DataSets. This is typically used for debugging and testing purposes, and should not be used in general by users

WorkspacesShieldDataSetIterator

public WorkspacesShieldDataSetIterator(@NonNull DataSetIterator iterator)
  • param iterator The underlying iterator to detach values from

MultiDataSetIteratorSplitter

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

MultiDataSetIteratorSplitter

public MultiDataSetIteratorSplitter(@NonNull MultiDataSetIterator baseIterator, long totalBatches, double ratio)
  • param baseIterator

  • param totalBatches - total number of batches in underlying iterator. this value will be used to determine number of test/train batches

  • param ratio - this value will be used as splitter. should be between in range of 0.0 > X < 1.0. I.e. if value 0.7 is provided, then 70% of total examples will be used for training, and 30% of total examples will be used for testing

getTrainIterator

public MultiDataSetIterator getTrainIterator()

This method returns train iterator instance

  • return

next

public MultiDataSet next(int num)

This method returns test iterator instance

  • return

AsyncShieldDataSetIterator

This wrapper takes your existing DataSetIterator implementation and prevents asynchronous prefetch This is mainly used for debugging purposes; generally an iterator that isn’t safe to asynchronously prefetch from

AsyncShieldDataSetIterator

public AsyncShieldDataSetIterator(@NonNull DataSetIterator iterator)
  • param iterator Iterator to wrop, to disable asynchronous prefetching for

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

setPreProcessor

public void setPreProcessor(DataSetPreProcessor preProcessor)

Set a pre processor

  • param preProcessor a pre processor to set

getPreProcessor

public DataSetPreProcessor getPreProcessor()

Returns preprocessors, if defined

  • return

hasNext

public boolean hasNext()

Get dataset iterator record reader labels

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

DummyBlockDataSetIterator

This class provides baseline implementation of BlockDataSetIterator interface

BaseDatasetIterator

Baseline implementation includes control over the data fetcher and some basic getters for metadata

AsyncShieldMultiDataSetIterator

This wrapper takes your existing MultiDataSetIterator implementation and prevents asynchronous prefetch

next

public MultiDataSet next(int num)

Fetch the next ‘num’ examples. Similar to the next method, but returns a specified number of examples

  • param num Number of examples to fetch

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

/ Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?

PLEASE NOTE: This iterator ALWAYS returns FALSE

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

RandomMultiDataSetIterator

RandomMultiDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomMultiDataSetIterator

public RandomMultiDataSetIterator(int numMiniBatches, @NonNull List<Triple<long[], Character, Values>> features, @NonNull List<Triple<long[], Character, Values>> labels)
  • param numMiniBatches Number of minibatches per epoch

  • param features Each triple in the list specifies the shape, array order and type of values for the features arrays

  • param labels Each triple in the list specifies the shape, array order and type of values for the labels arrays

addFeatures

public Builder addFeatures(long[] shape, Values values)
  • param numMiniBatches Number of minibatches per epoch

addFeatures

public Builder addFeatures(long[] shape, char order, Values values)

Add a new features array to the iterator

  • param shape Shape of the features

  • param order Order (‘c’ or ‘f’) for the array

  • param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, Values values)

Add a new labels array to the iterator

  • param shape Shape of the features

  • param values Values to fill the array with

addLabels

public Builder addLabels(long[] shape, char order, Values values)

Add a new labels array to the iterator

  • param shape Shape of the features

  • param order Order (‘c’ or ‘f’) for the array

  • param values Values to fill the array with

generate

public static INDArray generate(long[] shape, Values values)

Generate a random array with the specified shape

  • param shape Shape of the array

  • param values Values to fill the array with

  • return Random array of specified shape + contents

generate

public static INDArray generate(long[] shape, char order, Values values)

Generate a random array with the specified shape and order

  • param shape Shape of the array

  • param order Order of array (‘c’ or ‘f’)

  • param values Values to fill the array with

  • return Random array of specified shape + contents

EarlyTerminationMultiDataSetIterator

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationMultiDataSetIterator

public EarlyTerminationMultiDataSetIterator(MultiDataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

  • param underlyingIterator, iterator to wrap

  • param terminationPoint, minibatches after which hasNext() will return false

ExistingDataSetIterator

ExistingDataSetIterator

public ExistingDataSetIterator(@NonNull Iterator<DataSet> iterator)

Note that when using this constructor, resetting is not supported

  • param iterator Iterator to wrap

next

public DataSet next(int num)

Note that when using this constructor, resetting is not supported

  • param iterator Iterator to wrap

  • param labels String labels. May be null.

DummyBlockMultiDataSetIterator

This class provides baseline implementation of BlockMultiDataSetIterator interface

EarlyTerminationDataSetIterator

Builds an iterator that terminates once the number of minibatches returned with .next() is equal to a specified number. Note that a call to .next(num) is counted as a call to return a minibatch regardless of the value of num This essentially restricts the data to this specified number of minibatches.

EarlyTerminationDataSetIterator

public EarlyTerminationDataSetIterator(DataSetIterator underlyingIterator, int terminationPoint)

Constructor takes the iterator to wrap and the number of minibatches after which the call to hasNext() will return false

  • param underlyingIterator, iterator to wrap

  • param terminationPoint, minibatches after which hasNext() will return false

ReconstructionDataSetIterator

Wraps a data set iterator setting the first (feature matrix) as the labels.

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public DataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

DataSetIteratorSplitter

This iterator virtually splits given MultiDataSetIterator into Train and Test parts. I.e. you have 100000 examples. Your batch size is 32. That means you have 3125 total batches. With split ratio of 0.7 that will give you 2187 training batches, and 938 test batches.

PLEASE NOTE: You can’t use Test iterator twice in a row. Train iterator should be used before Test iterator use. PLEASE NOTE: You can’t use this iterator, if underlying iterator uses randomization/shuffle between epochs.

DataSetIteratorSplitter

public DataSetIteratorSplitter(@NonNull DataSetIterator baseIterator, long totalBatches, double ratio)

The only constructor

  • param baseIterator - iterator to be wrapped and split

  • param totalBatches - total batches in baseIterator

  • param ratio - train/test split ratio

getTrainIterator

public DataSetIterator getTrainIterator()

This method returns train iterator instance

  • return

next

public DataSet next(int i)

This method returns test iterator instance

  • return

JointMultiDataSetIterator

This dataset iterator combines multiple DataSetIterators into 1 MultiDataSetIterator. Values from each iterator are joined on a per-example basis - i.e., the values from each DataSet are combined as different feature arrays for a multi-input neural network. Labels can come from either one of the underlying DataSetIteartors only (if ‘outcome’ is >= 0) or from all iterators (if outcome is < 0)

JointMultiDataSetIterator

public JointMultiDataSetIterator(DataSetIterator... iterators)
  • param iterators Underlying iterators to wrap

next

public MultiDataSet next(int num)
  • param outcome Index to get the label from. If < 0, labels from all iterators will be used to create the final MultiDataSet

  • param iterators Underlying iterators to wrap

setPreProcessor

public void setPreProcessor(MultiDataSetPreProcessor preProcessor)

Set the preprocessor to be applied to each MultiDataSet, before each MultiDataSet is returned.

  • param preProcessor MultiDataSetPreProcessor. May be null.

getPreProcessor

public MultiDataSetPreProcessor getPreProcessor()

Get the {- link MultiDataSetPreProcessor}, if one has previously been set. Returns null if no preprocessor has been set

  • return Preprocessor

resetSupported

public boolean resetSupported()

Is resetting supported by this DataSetIterator? Many DataSetIterators do support resetting, but some don’t

  • return true if reset method is supported; false otherwise

asyncSupported

public boolean asyncSupported()

Does this MultiDataSetIterator support asynchronous prefetching of multiple MultiDataSet objects? Most MultiDataSetIterators do, but in some cases it may not make sense to wrap this iterator in an iterator that does asynchronous prefetching. For example, it would not make sense to use asynchronous prefetching for the following types of iterators: (a) Iterators that store their full contents in memory already (b) Iterators that re-use features/labels arrays (as future next() calls will overwrite past contents) (c) Iterators that already implement some level of asynchronous prefetching (d) Iterators that may return different data depending on when the next() method is called

  • return true if asynchronous prefetching from this iterator is OK; false if asynchronous prefetching should not be used with this iterator

reset

public void reset()

Resets the iterator back to the beginning

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

next

public MultiDataSet next()

Returns the next element in the iteration.

  • return the next element in the iteration

remove

public void remove()

PLEASE NOTE: This method is NOT implemented

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

  • implSpec The default implementation throws an instance of {- link UnsupportedOperationException} and performs no other action.

FloatsDataSetIterator

First value in pair is the features vector, second value in pair is the labels. Supports generating 2d features/labels only

FloatsDataSetIterator

public FloatsDataSetIterator(@NonNull Iterable<Pair<float[], float[]>> iterable, int batchSize)
  • param iterable Iterable to source data from

  • param batchSize Batch size for generated DataSet objects

FileSplitDataSetIterator

Simple iterator working with list of files. File to DataSet conversion will be handled via provided FileCallback implementation

FileSplitDataSetIterator

public FileSplitDataSetIterator(@NonNull List<File> files, @NonNull FileCallback callback)
  • param files List of files to iterate over

  • param callback Callback for loading the files

MultipleEpochsIterator

A dataset iterator for doing multiple passes over a dataset

Use MultiLayerNetwork/ComputationGraph.fit(DataSetIterator, int numEpochs) instead

next

public DataSet next(int num)

Like the standard next method but allows a customizable number of examples returned

  • param num the number of examples

  • return the next data applyTransformToDestination

inputColumns

public int inputColumns()

Input columns for the dataset

  • return

totalOutcomes

public int totalOutcomes()

The number of labels for the dataset

  • return

reset

public void reset()

Resets the iterator back to the beginning

batch

public int batch()

Batch size

  • return

hasNext

public boolean hasNext()

Returns {- code true} if the iteration has more elements. (In other words, returns {- code true} if {- link #next} would return an element rather than throwing an exception.)

  • return {- code true} if the iteration has more elements

remove

public void remove()

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to {- link #next}. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

  • throws UnsupportedOperationException if the {- code remove} operation is not supported by this iterator

  • throws IllegalStateException if the {- code next} method has not yet been called, or the {- code remove} method has already been called after the last call to the {- code next} method

MultiDataSetWrapperIterator

This class is simple wrapper that takes single-input MultiDataSets and converts them to DataSets on the fly

PLEASE NOTE: This only works if number of features/labels/masks is 1

MultiDataSetWrapperIterator

public MultiDataSetWrapperIterator(MultiDataSetIterator iterator)
  • param iterator Undelying iterator to wrap

RandomDataSetIterator

RandomDataSetIterator: Generates random values (or zeros, ones, integers, etc) according to some distribution. Note: This is typically used for testing, debugging and benchmarking purposes.

RandomDataSetIterator

public RandomDataSetIterator(int numMiniBatches, long[] featuresShape, long[] labelsShape, Values featureValues, Values labelValues)
  • param numMiniBatches Number of minibatches per epoch

  • param featuresShape Features shape

  • param labelsShape Labels shape

  • param featureValues Type of values for the features

  • param labelValues Type of values for the labels

MultiDataSetIteratorAdapter

Iterator that adapts a DataSetIterator to a MultiDataSetIterator

1.0.0-beta4

Highlights - 1.0.0-beta4 Release

  • DOUBLE: double precision floating point, 64-bit (8 byte)

  • FLOAT: single precision floating point, 32-bit (4 byte)

  • HALF: half precision floating point, 16-bit (2 byte), "FP16"

  • LONG: long signed integer, 64 bit (8 byte)

  • INT: signed integer, 32 bit (4 byte)

  • SHORT: signed short integer, 16 bit (2 byte)

  • UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255

  • BYTE: signed byte, 8 bit (1 byte), -128 to 127

  • BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization

  • UTF8: String array type, UTF8 format

ND4J Behaviour changes of note:

  • When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)

    • For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray

    • Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively

  • Some operations require matched datatypes for operands

    • For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))

  • Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.

DL4J Behaviour changes of note:

  • MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.

    • The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)

    • Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType) method

Main new methods:

  • Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument

  • INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another

  • New Nd4j.createFromArray(...) methods for

ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

ND4J: Mac/OSX CUDA support dropped

Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

  • ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)

  • SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)

  • BatchNormalization layer (and BatchNorm ND4J op)

  • LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)

  • Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)

MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

ND4J: Improved Performance due to Memory Management Changes

Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

Nd4j.getMemoryManager().togglePeriodicGc(true);
Nd4j.getMemoryManager().setAutoGcWindow(5000);

ND4J: Improved Rank 0/1 Array Support

In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

DL4J: Attention layers added

Deeplearning4J

Deeplearning4J: Features and Enhancements

Deeplearning4J: Bug Fixes and Optimizations

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

  • Added basic ("technology preview") of SameDiff UI. Should be considered early WIP with breaking API changes expected in future releases. Supports plotting of SameDiff graphs as well as various metrics (line charts, histograms, etc)

    • Currenty embedding in the DL4J UI - call UIServer.getInstance() then go to localhost:9000/samediff to access.

  • ND4J/SameDiff - new operations added:

  • SameDiff TensorFlow Import

ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

  • ND4J datatypes - significant changes, see highlights at top of this section

ND4J/SameDiff: Bug Fixes and Optimizations

  • SameDiff: Numerous fixes and enhancements

ND4J: Known Issues

  • Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.

DataVec

DataVec: Features and Enhancements

DataVec: Optimizations and Bug Fixes

Arbiter

Arbiter: Enhancements

Arbiter: Fixes

0.5.0

  • FP16 support for CUDA

  • Better performance for multi-gpu

  • Including optional P2P memory access support

  • Normalization support for time series and images

  • Normalization support for labels

  • Numerous bug fixes

  • Spark improvements

0.9.0

Deeplearning4J

  • VPTree performance significantly improved

  • Convolution performance improvements, including activation caching

  • Evaluation improvements

    • ComputationGraph and SparkComputationGraph evaluation convenience methods added (evaluateROC, etc)

    • RegressionEvaluation, ROCBinary etc now support per-output masking (in addition to per-example/per-time-step masking)

  • Optimizations: updaters, bias calculation

  • New loss functions:

ND4J

  • Native parallel sort was added

  • New ops added: SELU/SELUDerivative, TAD-based comparisons, percentile/median, Reverse, Tan/TanDerivative, SinH, CosH, Entropy, ShannonEntropy, LogEntropy, AbsoluteMin/AbsoluteMax/AbsoluteSum, Atan2

  • New distance functions added: CosineDistance, HammingDistance, JaccardDistance

DataVec

  • TransformProcess and Transforms now support NDArrayWritables and NDArrayWritable columns

  • Multiple new Transform classes

Arbiter

    • UI now uses Play framework, integrates with DL4J UI (replaces Dropwizard backend). Dependency issues/clashing versions fixed.

    • Supports DL4J StatsStorage and StatsStorageRouter mechanisms (FileStatsStorage, Remote UI via RemoveUIStatsStorageRouter)

    • General UI improvements (additional information, formatting fixes)

This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from

Deeplearning4j has a new native model zoo. Read about the module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:

Rational tanh approximation From

Dl4j’s AlexNet model interpretation based on the original paper ImageNet Classification with Deep Convolutional Neural Networks and the imagenetExample code referenced. References:

Darknet19 Reference: ImageNet weights for this model are available and have been converted from using .

There are 2 pretrained models, one for 224x224 images and one fine-tuned for 448x448 images. Call setInputShape() with either {3, 224, 224} or {3, 448, 448} before initialization. The channels of the input images need to be in RGB order (not BGR), with values normalized within [0, 1]. The output labels are as per .

A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

A variant of the original FaceNet model that relies on embeddings and triplet loss. Reference: Also based on the OpenFace implementation:

MNIST weights for this model are available and have been converted from .

Paper: ImageNet weights for this model are available and have been converted from .

Paper: ImageNet weights for this model are available and have been converted from ;.

A simple convolutional network for generic image classification. Reference:

Paper: ImageNet weights for this model are available and have been converted from .

Architecture follows this implementation:

Walt Whitman weights are available for generating text from his works, adapted from .

Tiny YOLO Reference:

ImageNet+VOC weights for this model are available and have been converted from using and the following code.

Paper: Weights are available for image segmentation trained on a synthetic dataset

VGG-16, from Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Face Recognition

ImageNet weights for this model are available and have been converted from . CIFAR-10 weights for this model are available and have been converted using “approach 2” from . VGGFace weights for this model are available and have been converted from .

VGG-19, from Very Deep Convolutional Networks for Large-Scale Image Recognition ImageNet weights for this model are available and have been converted from .

Paper: ImageNet weights for this model are available and have been converted from .

YOLOv2 Reference:

ImageNet+COCO weights for this model are available and have been converted from using and the following code.

You can find a complete list of models using this .

Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J .

Initialization methods often have an additional parameter named workspaceMode. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass WorkspaceMode.SINGLE for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see .

The , which is essentially a stack of neural network layers (with a single input layer and single output layer), and

The , which allows for greater freedom in network architectures

, a complex type of convolutional netural network for image classification

The basic idea is that in the ComputationGraph, the core building block is the , instead of layers. Layers (or, more accurately the objects), are but one type of vertex in the graph. Other types of vertices include:

LayerVertex: Layer vertices (graph vertices with neural network layers) are added using the .addLayer(String,Layer,String...) method. The first argument is the label for the layer, and the last arguments are the inputs to that layer. If you need to manually add an (usually this is unnecessary - see next section) you can use the .addLayer(String,Layer,InputPreProcessor,String...) method.

PreProcessorVertex: Occasionally, you might want to the functionality of an without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.

Finally, it is also possible to define custom graph vertices by implementing both a and class for your custom GraphVertex.

It will automatically add any s as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.

A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see for more details). A DataSetIterator is essentially an iterator over DataSet objects.

By implementing the interface directly

By using the in conjuction with DataVec record readers

Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find to be useful.

Local response normalization layer See section 3.3 of

Output (loss) layer for YOLOv2 object detection model, based on the papers: YOLO9000: Better, Faster, Stronger - Redmon & Farhadi (2016) - and You Only Look Once: Unified, Real-Time Object Detection - Redmon et al. (2016) - This loss function implementation is based on the YOLOv2 version of the paper. However, note that it doesn’t currently support simultaneous training on both detection and classification datasets as described in the YOlO9000 paper.

See: Kingma & Welling, 2013: Auto-Encoding Variational Bayes -

See the paper by Matt Zeiler for details:

For an intuitive guide to convolution arithmetic and shapes, see:

MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale). For futher details, see

Details: Data: Image:

This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: .

IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes

see

LFW iterator - Labeled Faces from the Wild dataset See 13233 images total, with 5749 classes.

See: and

See: and

Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following are supported:

Added MKL-DNN support for Conv/Pool/BatchNorm/LRN layers. MKL-DNN will be used automatically when using nd4j-native backend. (, )

L1/L2 regularization now made into a class; weight decay added, with better control as to when/how it is applied. See for more details on the difference between L2 and weight decay. In general, weight decay should be preferred to L2 regularization. (, )

Added dot product attention layers: , , and

The parameter/activation datatypes for new models can be set for new networks using the dataType(DataType) method on NeuralNetConfiguration.Builder ()

MultiLayerNetwork/ComputationGraph can be converted between (floating point) datatypes FP16/32/64 for the parameters and activations using the MultiLayerNetwork/ComputationGraph.convertDataType(DataType) methods (, )

EmbeddingLayer and EmbeddingSequenceLayer builders now have .weightInit(INDArray) and .weightInit(Word2Vec) methods for initializing parameters from pretrained word vectors ()

PerformanceListener can now be configured to report garbage collection information (number/duration)

Evaluation class will now check for NaNs in the predicted output and throw an exception instead treating argMax(NaNs) as having value 0 ()

Added ModelAdapter for ParallelInference for convenience and for use cases such as YOLO (allows improved performance by avoiding detached (out-of-workspace) arrays) ()

Added GELU Activation function ()

Added BertIterator (a MultiDataSetIterator for BERT training - supervised and unsupervised)

Added validation to MultiLayerNetwork/ComputationGraph that throws an exception when attempting to perform Regression evaluation on a classifier, or vice-versa (, )

Added ComputationGraph.output(List<String> layers, boolean train, INDArray[] features, INDArray[] featureMasks) method to get the activations for a specific set of layers/vertices only (without redundant calculations) ()

Weight initialization for networks is now implemented as classes (not just enumerations) and hence is now extesible via IWeightInit interface (); i.e., custom weight initializations are now supported (, )

Added Capsule Network layers (no GPU acceleration until next release) - , and ()

Added Cifar10DataSetIterator to replace CifarDataSetIterator (, )

Keras import: Importing models from InputStream is now supported (, )

Layer/NeuralNetConfiguration builders now have getter/setter methods also, for better Kotlin support ()

Most JavaScript dependencies and fonts for UI have been migrated to WebJars ()

CheckpointListener now has static availableCheckpoints(File), loadCheckpointMLN(File, int) and lostLastCheckpointMLN(File) etc methods ()

MultiLayerNetwork/ComputationGraph now validate and throw an exception in certain incompatible RNN configurations, like truncated backpropagation through time combined with LastTimeStepLayer/Vertex ()

Added BERT WordPiece tokenizers ()

Deeplearning4j UI now has multi-user/multi-session support - use UIServer.getInstance(boolean multiSession, Function<String,StatsStorage>) to start UI in multi-session mode ()

Layer/NeuralNetworkConfiguration builder method validation standardized and improved ()

WordVectorSerializer now supports reading and exporting text forwat vectors via WordVectorSerializer.writeLookupTable and readLookupTable (]

Updated to JavaCPP, JavaCPP presets, and JavaCV version 1.5 ()

Added EvaluationBinary false alarm rate calculation ()

ComputationGraph GraphBuilder now has an appendLayer method that can be used to add layers connected to the last added layer/vertex ()

Added Wasserstein loss function ()

Keras import: Improved errors/exceptions for lambda layer import ()

Apache Lucene/Solr upgraded from 7.5.0 to 7.7.1 ()

KMeans clustering strategy is now configurable ()

DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly ()

cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown ()

Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training ()

Fixed a bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout (, )

Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs ()

Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores ()

Fixed an edge case with ParallelInference on CUDA where (very rarely) input array operations (such as normalization) may not be fully completed before transferring an array between threads (, )

Fixed an edge case with KFoldIterator when the total number of examples is not a multiple of the batch size (, )

Fixed an issue where DL4J UI could throw a NoClassDefFoundError on Java 9/10/11 (, )

Keras import: added aliases for weight initialization ()

Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned ()

Fixed workspace issue with ElementwiseVertex with single input ()

Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception ()

Fixed issue where LossMultiLabel would generate NaNs when all labels in minibatch are the same class. Now 0 gradient is returned instead. (, )

Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format ()

Fixed issue where BaseDatasetIterator.next() would not apply preprocessors, if one was set ()

Improved default configuration for CenterLossOutputLayer ()

Fixed an issue for UNet non-pretrained configuration ()

Fixed an issue where Word2Vec VocabConstructor could deadlock under some circumstances ()

SkipGram and CBOW (used in Word2Vec) were made native operations for better performance ()

Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener ()

Optimization: Workspaces were added to SequenceVectors and Word2Vec ()

Improved validation for RecordReaderDataSetIterator ()

Improved handling of unknown words in WordVectors implementation ()

Yolo2OutputLayer: Added validation for incorrect labels shape. ()

LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) ()

Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases ()

Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() ()

Async iterators are now used in EarlyStoppingTrained to improve data loading performance ()

EmbeddingLayer and EmbeddingSequenceLayer performance has been improved on CUDA ()

Removed outdated/legacy scala tools repository (, )

Fixed issues in L2NormalizeVertex equals/hashcode methods ()

Fixed Workspace issue in ConvolutionalListener ()

Fixed EvaluationBinary falsePositiveRate calculation ()

Added validation and useful exception for MultiLayerNetwork.output(DataSetIterator) methods ()

Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called ()

Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training ()

Improved performance for KMeans implementation ()

Fixed an issue with rnnGetPreviousState for RNNs in 'wrapper' layers such as FrozenLayer ()

Keras import: Fixed an issue with order of words when importing some Keras tokenizers ()

Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class ()

Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers ()

Keras import: fixed an import issue with input type inference for some RNN models ()

Fixed some padding issues in LocallyConnected1D/2D layers ()

Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays ()

Added INDArray.close() method to allow users to manually release off-heap memory immediately ()

SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff ()

Added Nd4j.createFromNpzFile method to load Numpy npz files ()

Added support for importing BERT models into SameDiff (, )

Added SameDiff GraphTransformUtil for performing transfer learning and other graph modifications (, , )

Evaluation, RegressionEvaluation etc now support 4d (CNN segmentation) data formats; also added Evaluation.setAxis(int) method to support other data formats such as channels-last/NHWC for CNNs and NWC for CNN1D/RNNs. Defaults to axis 1 (which matches DL4J CNN and RNN data formats) (, )

For more details, see , ,

Added DotProductAttention and MultiHeadDotProductAttention operations ()

Added Nd4j.exec(Op) and Nd4j.exec(CustomOp) convenience methods ()

, , ,

, ,

, , ),

, ,

Import of TF Assertions added ()

Support/fixes for control dependencies ()

Support/fixes for TensorArray and related ops (, , )

nd4j-common - tar/tar.gz support added; Zip file listing and single file extraction added (, )

SameDiff: reductions operations now support "dynamic" (non-constant) inputs for axis argument ()

ROCBinary now has .getROC(int outputNum) method ()

SameDiff: L1/L2 regularization added (, )

SameDiff: Added SDVariable.convertToVariable() and convertToConstant() - to change SDVariable type ()

Added checks and useful exceptions for reductions on empty arrays ()

SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields ()

SameDiff TensorFlow import: import can now be overridden for cases such as user-defined functions (, )

Libnd4j (c++) benchmarking framework added ()

Added OpExecutioner.inspectArray(INDArray) method to get summary statistics for analysis/debugging purposes ()

Added INDArray.reshape(char order, boolean enforceView, long... newShape) to reshape array whilst throwing an exception (instead of returning a copy) if the reshape cannot be performed (, )

Added SDVariable method overloads (plus, minus, times, etc) for Kotlin ()

Added SDVariable convenience methods for dot, reshape, permute ()

Added SameDiff SDIndex.point(long, boolean keepDim) method (to keep point indices in output array as size 1 axis) ()

Added SameDiff ProtoBufToFlatBufConversion command line tool for doing TensorFlow frozen model (protobuf) to SameDiff FlatBuffers conversion ()

Improved DataType validation for SameDiff operations ()

nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api ()

When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. ()

Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. ()

Removed Nd4j.ENFORCE_NUMERICAL_STABILITY, Nd4j.copyOnOps, etc ()

SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields ()

Nd4j.emptyLike(INDArray) has been removed. Use Nd4j.like(INDArray) instead ()

org.nd4jutil.StringUtils removed; suggest using Apache commons lang3 StringUtils instead ()

ND4J Jackson RowVector(De)Serializer has been deprecated due to datatype changes; NDArrayText(De)Serializer should be used instead (, )

nd4j-instrumentation module has been removed due to lack of use/maintenance ()

Fixed bug with InvertMatrix.invert() with [1,1] shape matrices ()

Fixed edge case bug for Updater instances with length 1 state arrays ()

Fixed edge case with FileDocumentIterator with empty documents ()

, , ,

Improved functionality for losses (, , , )

Improved errors for missing/misspelled placeholders ()

Fixed edge cases in loops (, )

Fixed issue with Nd4j.vstack on 1d arrays returning 1d output, not 2d stacked output ()

Conv2D op can infer kernel size from input arrays directly when required (, )

Fixed an issue with Numpy format export - Nd4j.toNpyByteArray(INDArray) ()

Fixes for SameDiff when it is used within an external workspace ()

Fixed an issue where empty NDArrays would be reported as having scalar shape information, length 1 ()

Optimization: libnd4j (c++) indexing for ops will use uint for faster offset calculations when required and possible ()

Optimization: libnd4j loops performance improved for faster execution of some operations (, , )

Local response normalization op optimized (, )

Fixed an issue with INDArray.repeat on some view arrays ()

Improved performance for execution of some operations on view arrays ()

Improved performance on broadcast operations (, , )

Improved performance for non-EWS reduction along dimension operations ()

Improved performance fo IndexReduce operations () and small reductions ()

Improved performonce of one_hot operation (), tanh operation ()

Improved performance for transform operations ()

Optimization: empty arrays are created only once and cached (as they are immutable) ()

Improved performance on operations using tensor along dimension for parallelization (, )

Improved performance on "reduce 3" reduction operations ()

Improved handling of CUDA contexts in heavily multi-threaded environments ()

Fixed an issue where Evaluation.reset() would incorrectly clear the String class labels ()

SameDiff: Improved gradient calculation performance/efficiency; "gradients" are now no longer defined for non-floating-point variables, and variables that aren't required to calculate loss or parameter gradients ()

Behaviour of IEvaluation instances now no longer depends on the global (default) datatype setting ()

INDArray.get(point(x), y) or .get(y, point(x)) now returns rank 1 arrays when performed on rank 2 arrays ()

Removed reliance on Guava for SameDiff, fixing potential issue for Java 11/12 and when earlier versions of Guava are on the classpath (, )

ND4J indexing (INDArray.get) implementation rewritten for better performance and reliability ()

Fixes for local response normalization backprop op ()

Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See . Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

Added PythonTransform (arbitrary python code execution for pre processing) (, )

Added FirstDigit (Benford's law) transform (, )

StringToTimeTransform now supports setting Locale (, )

Added StreamInputSplit for creating local data pipelines where data is stored remotely on storage such as HDFS or S3 (, )

LineRecordReader (and subtypes) now have the option to define the character set ()

Added TokenizerBagOfWordsTermSequenceIndexTransform (TFIDF transform), GazeteerTransform (binary vector for word present) and MultiNlpTransform transforms; added BagOfWordsTransform interface ()

Fixed issue with ImageLoader.scalingIfNeeded ()

Arbiter now supports genetic algorithm search ()

Fixed an issue where early stopping used in Arbiter would result in a serialization exception ()

Removal of Canova and shift to DataVec: Javadoc,

Workspaces feature added (faster training performance + less memory)

SharedTrainingMaster added for Spark network training (improved performance) ,

ParallelInference added - wrapper that server inference requests using internal batching and queues

ParallelWrapper now able to work with gradients sharing, in addition to existing parameters averaging mode

CacheMode network configuration option added - improved CNN and LSTM performance at the expense of additional memory use

LSTM layer added, with CuDNN support (Note that the existing GravesLSTM implementation does not support CuDNN)

New native model zoo with pretrained ImageNet, MNIST, and VGG-Face weights

Custom/user defined updaters are now supported

EvaluationBinary, ROCBinary classes added: for evaluation of binary multi-class networks (sigmoid + xent output layers)

Evaluation and others now have G-Measure and Matthews Correlation Coefficient support; also macro + micro-averaging support for Evaluation class metrics

ROC and ROCMultiClass support exact calculation (previous: thresholded calculation was used)

ROC classes now support area under precision-recall curve calculation; getting precision/recall/confusion matrix at specified thresholds (via PrecisionRecallCurve class)

EvaluationCalibration added (residual plots, reliability diagrams, histogram of probabilities)

Evaluation and EvaluationBinary: now supports custom classification threshold or cost array

Network memory estimation functionality added. Memory requirements can be estimated from configuration without instantiating networks

Mixture density loss function

F-Measure loss function

Workspaces feature added

MapFileRecordReader and MapFileSequenceRecordReader added

Spark: Utilities to save and load JavaRDD<List<Writable>> and JavaRDD<List<List<Writable>> data to Hadoop MapFile and SequenceFile formats

Arbiter UI:

http://download.tensorflow.org/example_images/flower_photos.tgz
deeplearning4j-zoo
[source]
[source]
[source]
[source]
https://arxiv.org/pdf/1508.01292v3
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
Empirical Evaluation of Rectified Activations in Convolutional Network
[source]
[source]
https://arxiv.org/pdf/1706.02515.pdf
[source]
[source]
[source]
[source]
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/imagenet/
https://github.com/allanzelener/YAD2K
https://github.com/pjreddie/darknet/blob/master/data/imagenet.shortnames.list
[source]
https://arxiv.org/abs/1503.03832
http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
[source]
https://arxiv.org/abs/1503.03832
http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf
[source]
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxt
https://github.com/f00-/mnist-lenet-keras
[source]
https://arxiv.org/abs/1707.07012
https://keras.io/applications/
[source]
https://arxiv.org/abs/1512.03385
https://keras.io/applications/</a&gt
[source]
https://github.com/oarriaga/face_classification/
[source]
https://arxiv.org/abs/1602.07360
https://github.com/rcmalli/keras-squeezenet/
[source]
https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py
https://github.com/craigomac/InfiniteMonkeys
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/yolo
https://github.com/allanzelener/YAD2K
[source]
https://arxiv.org/abs/1505.04597
[source]
https://arxiv.org/abs/1409.1556
http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf
https://github.com/fchollet/keras/tree/1.1.2/keras/applications
https://github.com/rajatvikramsingh/cifar10-vgg16
https://github.com/rcmalli/keras-vggface
[source]
https://arxiv.org/abs/1409.1556
https://github.com/fchollet/keras/tree/1.1.2/keras/applications
[source]
https://arxiv.org/abs/1610.02357
https://keras.io/applications/
[source]
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/yolo
https://github.com/allanzelener/YAD2K
deeplearning4j-zoo Github link
AlexNet
Darknet19
FaceNetNN4Small2
InceptionResNetV1
LeNet
ResNet50
SimpleCNN
TextGenerationLSTM
TinyYOLO
VGG16
VGG19
here
this section
MultiLayerNetwork
ComputationGraph
GoogLeNet
Image caption generation
Convolutional networks for sentence classification
Residual learning convolutional neural networks
GraphVertex
LayerVertex
InputPreProcessor
InputPreProcessor
configuration
implementation
InputPreProcessor
this
MultiDataSetIterator
RecordReaderMultiDataSetIterator
these unit tests
[source]
[source]
[source]
[source]
[source]
[source]
[source]
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
https://arxiv.org/abs/1612.08242
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
[source]
[source]
[source]
[source]
https://arxiv.org/abs/1312.6114
[source]
[source]
[source]
[source]
http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf
https://arxiv.org/abs/1603.07285v1
[source]
[source]
[source]
[source]
http://yann.lecun.com/exdb/mnist/
[source]
https://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series
https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data
https://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/data.jpeg
[source]
https://pjreddie.com/projects/cifar-10-dataset-mirror/
[source]
https://archive.ics.uci.edu/ml/datasets/Iris
https://archive.ics.uci.edu/ml/datasets/Iris
[source]
http://vis-www.cs.umass.edu/lfw/
[source]
http://cs231n.stanford.edu/
https://tiny-imagenet.herokuapp.com/
[source]
https://www.nist.gov/itl/iad/image-group/emnist-dataset
https://arxiv.org/abs/1702.05373
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
datatypes
AttentionVertex
LearnedSelfAttentionLayer
RecurrentAttentionLayer
SelfAttentionLayer
Link
Link
this page
Link
Link
AttentionVertex
LearnedSelfAttentionLayer
RecurrentAttentionLayer
SelfAttentionLayer
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
CapsuleLayer
CapsuleStrengthLayer
PrimaryCapsules
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
1
2
3
Link
Link
NonMaxSuppression
LogMatrixDeterminant
NthElement
TruncateMod
Cholesky Decomposition
Image resize nearest neighbor
crop_and_resize
fake_quant_with_min_max_vars
reduce_logsumexp
pow (broadcastable)
linspace (dynamic args)
ExtractImagePatches
GELU
LSTMBlockCell, LSTMBLock, GRUCell
Standardize and LayerNorm ops
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
1
2
3
4
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Issue 7637
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Github Repo
Link
Link 1
Link 2
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link 1
Link 2
Link
Link 1
Link 2
Link
Link
Link
Link 1
Link 2
Link
Link
GPUs

1.0.0-alpha

Highlights - 1.0.0-alpha Release

  • ND4J: Added SameDiff - Java automatic differentiation library (alpha release) with Tensorflow import (technology preview) and hundreds of new operations

  • ND4J: Added CUDA 9.0 and 9.1 support (with cuDNN), dropped support for CUDA 7.5, continued support for CUDA 8.0

  • ND4J: Native binaries (nd4j-native on Maven Central) now ship with AVX/AVX2/AVX-512 support (Windows/Linux)

  • DL4J: Large number of new layers and API improvements

  • DL4J: Keras 2.0 import support

Deeplearning4J

Deeplearning4J: New Features

  • Layers (new and enhanced)

    • Added support for both iteration-based and epoch-based schedules via ISchedule. Also added support for custom (user defined) schedules

    • Learning rate schedules are configured on the updaters, via the .updater(IUpdater) method

  • Adds ComputationGraphConfiguration GraphBuilder .layer(String, Layer, String...) alias for .addLayer(String, Layer, String...)

  • Added deeplearning4j-ui-standalone module: uber-jar for easy launching of UI server (usage: java -jar deeplearning4j-ui-standalone-1.0.0-alpha.jar -p 9124 -r true -f c:/UIStorage.bin)

  • Weight initializations:

  • Added new model zoo models:

  • New iterators, and iterator improvements:

Deeplearning4J: Bug Fixes and Optimizations

  • Evaluation no-arg constructor could cause NaN evaluation metrics when used on Spark

  • ParallelInference fixes:

Deeplearning4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

  • Previously deprecated updater configuration methods (.learningRate(double), .momentum(double) etc) all removed

    • To configure learning rate: use .updater(new Adam(lr)) instead of .updater(Updater.ADAM).learningRate(lr)

    • To configure bias learning rate: use .biasUpdater(IUpdater) method

    • To configure learning rate schedules: use .updater(new Adam(ISchedule)) and similar

  • Updater configuration via enumeration (i.e., .updater(Updater)) has been deprecated; use .updater(IUpdater)

  • .regularization(boolean) config removed; functionality is now always equivalent to .regularization(true)

  • .useDropConnect(boolean) removed; use .weightNoise(new DropConnect(double)) instead

  • .iterations(int) method has been removed (was rarely used and confusing to users)

  • Multiple utility classes (in org.deeplearning4j.util) have been deprecated and/or moved to nd4j-common. Use same class names in nd4j-common org.nd4j.util instead.

  • Previously deprecated .activation(String) has been removed; use .activation(Activation) or .activation(IActivation) instead

  • Layer API change: Custom layers may need to implement applyConstraints(int iteration, int epoch) method

  • Parameter initializer API change: Custom parameter initializers may need to implement isWeightParam(String) and isBiasParam(String) methods

  • GravesBidirectionalLSTM has been deprecated; use new Bidirectional(Bidirectional.Mode.ADD, new GravesLSTM.Builder()....build())) instead

Deeplearning4J: 1.0.0-alpha Known Issues

  • Performance on some networks types may be reduced on CUDA compared to 0.9.1 (with workspaces configured). This will be addressed in the next release

Deeplearing4J: Keras Import

  • Keras 2 support, keeping backward compatibility for keras 1

  • Keras 2 and 1 import use exact same API and are inferred by DL4J

  • Keras unit test coverage increased by 10x, many more real-world integration tests

  • Unit tests for importing and checking layer weights

  • Leaky ReLU, ELU, SELU support for model import

  • All Keras layers can be imported with optional bias terms

  • Old deeplearning4j-keras module removed, old "Model" API removed

  • All Keras initializations (Lecun normal, Lecun uniform, ones, zeros, Orthogonal, VarianceScaling, Constant) supported

  • 1D convolution and pooling supported in DL4J and Keras model import

  • Atrous Convolution 1D and 2D layers supported in Keras model import

  • 1D Zero padding layers supported

  • Keras constraints module fully supported in DL4J and model import

  • Upsampling 1D and 2D layers in DL4J and Keras model import (including GAN examples in tests)

  • Most merge modes supported in Keras model import, Keras 2 Merge layer API supported

  • Separable Convolution 2D layer supported in DL4J and Keras model import

  • Deconvolution 2D layer supported in DL4J and Keras model import

  • Full support of Keras noise layers on import (Alpha dropout, Gaussian dropout and noise)

  • Support for SimpleRNN layer in Keras model import

  • Support for Bidirectional layer wrapper Keras model import

  • Addition of LastTimestepVertex in DL4J to support return_sequences=False for Keras RNN layers.

  • DL4J support for recurrent weight initializations and Keras import integration.

  • SpaceToBatch and BatchToSpace layers in DL4J for better YOLO support, plus end-to-end YOLO Keras import test.

  • Cropping2D support in DL4J and Keras model import

Deeplearning4J: Keras Import - API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

Deeplearning4J: Keras Import - Known Issues

  • Embedding layer: In DL4J the output of an embedding layer is 2D by default, unless preprocessors are specified. In Keras the output is always 3D, but depending on specified parameters can be interpreted as 2D. This often leads to difficulties when importing Embedding layers. Many cases have been covered and issues fixed, but inconsistencies remain.

  • Batchnormalization layer: DL4J's batch normalization layer is much more restrictive (in a good way) than Keras' version of it. For instance, DL4J only allows to normalize spatial dimensions for 4D convolutional inputs, while in Keras any axis can be used for normalization. Depending on the dimension ordering (NCHW vs. NHWC) and the specific configuration used by a Keras user, this can lead to expected (!) and unexpected import errors.

  • Support for importing a Keras model for training purposes in DL4J (enforceTrainingConfig == true) is still very limited and will be tackled properly for the next release.

  • Keras Merge layers: seem to work fine with the Keras functional API, but have issues when used in a Sequential model.

  • Reshape layers: can be somewhat unreliable on import. DL4J rarely has a need to explicitly reshape input beyond (inferred) standard input preprocessors. In Keras, Reshape layers are used quite often. Mapping the two paradigms can be difficult in edge cases.

ND4J

ND4J: New Features

  • Hundreds of new operations added

  • Technology preview of tensorflow import added (supports 1.4.0 and up)

  • nVidia CUDA 8/9.0/9.1 now supported

  • Worskpaces improvements were introduced to ensure safety: SCOPE_PANIC profiling mode is enabled by default

  • FlatBuffers support for INDArray serde

  • Support for auto-broadcastable operations was added

  • libnd4j, underlying c++ library, got functionality boost and now offers: NDArray class, Graph class, and can be used as standalone library or executable.

  • Convolution-related ops now support NHWC in addition to NCHW data format.

  • Accumulation ops now have option to keep reduced dimensions.

ND4J: Known Issues

  • Not all op gradients implemented for automatic differentiation

  • Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

ND4J: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

ND4J - SameDiff

  • Control flow is supported with IF and WHILE primitives.

Features

  • Two execution modes available: Java-driven execution, and Native execution for serialized graphs.

  • SameDiff graphs can be serialized using FlatBuffers

  • Building and running computation graphs build from SameDiff operations.

  • Graphs can run forward pass on input data and compute gradients for the backward pass.

  • Already supports many high-level layers, like dense layers, convolutions (1D-3D) deconvolutions, separable convolutions, pooling and upsampling, batch normalization, local response normalization, LSTMs and GRUs.

  • In total there are about 350 SameDiff operations available, including many basic operations used in building complex graphs.

Known Issues and Limitations

  • Vast majority of new operations added in 1.0.0-alpha do NOT use GPU yet.

  • While many of the widely used base operations and high-level layers used in practice are supported, op coverage is still limited. Goal is to achieve feature parity with TensorFlow and fully support import for TF graphs.

  • Some of the existing ops do not have a backward pass implemented (called doDiff in SameDiff).

DataVec

DataVec: New Features

  • Add new RecordReader / SequenceRecordReader implementations:

  • Add new transforms:

DataVec: Fixes

DataVec: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

  • RecordWriter and SequenceRecordWriter APIs have been updated with multiple new methods

Arbiter

Arbiter: New Features

Arbiter: Fixes

Arbiter: API Changes (Transition Guide): 0.9.1 to 1.0.0-alpha

  • As per DL4J updater API changes: old updater configuration (learningRate, momentum, etc) methods have been removed. Use .updater(IUpdater) or .updater(ParameterSpace<IUpdater>) methods instead

RL4J

  • Add support for LSTM layer to A3C

  • Fix A3C to make it actually work using new ActorCriticLoss and correct use of randomness

  • Fix cases when QLearning would fail (non-flat input, incomplete serialization, incorrect normalization)

  • Fix logic of HistoryProcessor with async algorithms and failures when preprocessing images

  • Tidy up and correct the output of statistics, also allowing the use of IterationListener

  • Fix issues preventing efficient execution with CUDA

  • Provide access to more of the internal structures with NeuralNet.getNeuralNetworks(), Policy.getNeuralNet(), and convenience constructors for Policy

  • Add MDPs for ALE (Arcade Learning Environment) and MALMO to support Atari games and Minecraft

  • Update MDP for Doom to allow using the latest version of VizDoom

ScalNet

  • Can be built with sbt and maven.

  • Project structure is closely aligned to both DL4J model-import module and Keras.

  • Supports the following layers: Convolution2D, Dense, EmbeddingLayer, AvgPooling2D, MaxPooling2D, GravesLSTM, LSTM, Bidirectional layer wrapper, Flatten, Reshape. Additionally, DL4J OutputLayers are supported.

ND4S

  • Scala 2.12 support

Examples Tour

Brief tour of available examples in DL4J.

Prerequisites

Example Content

Projects are based on what functionality the included examples demonstrate to the user and not necessarily which library in the DL4J stack the functionality lives in.

Examples in a project are in general separated into "quickstart" and "advanced".

Each project README also lists all the examples it contains, with a recommended order to explore them in.

Feedback & Contributions

Release

How to conduct a release to Maven Central

Deeplearning4j has several steps to a release. Below is a brief outline with follow on descriptions.

  1. Compile libnd4j for different cpu architectures

  2. Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date

  3. Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu

  4. Create a staging repository for testing using github actions running manually on each platform

  5. Update the examples to be compatible with the latest release

  6. Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)

    to sanity check platform specific numerical bugs using the staging repository

  7. Double check any user related bugs to see if they should block a release

  8. Hit release button

  9. Perform follow up release of -platform projects under same version

  10. Tag release

Compile libnd4j on different cpu architectures

  • Platform compatibility

    We currently compile libnd4j on ubuntu 16.04. This means glibc 2.23.

    For our cuda builds, we use gcc7.

    Users of older glibc versions may need to compile from source. For our standard release, we try to keep it reasonably old, but do not support end of lifed

    end of linux distributions for public builds.

  • Platform specific helpers

Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date

Of note here is that certain older versions of libraries can use older javacpp versions. It is recommended that that the desired version be up to date if possible. Otherwise, if an older version of javacpp is the only version available, this is generally ok.

Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu

We run all of the major integration tests on the core major platforms where higher end compute is accessible. This is generally a bigger machine. It is expected that some builds can take up to 2 hours depending on the specs of the desired machine.

Update the examples to be compatible with the latest release

To ensure the examples stay compatible with the current release, we also tag the release version to be the latest version found on maven central. This step may also involve adding or removing examples for new or deprecated features respectivley.

Ensure different classifiers work

  1. Different supported cuda versions with and without cudnn

  2. Onednn and associated classifiers per platform

Android

Ensure testing happens on the android emulator.

Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)

Double check any user related bugs to see if they should block a release

Hit release button

Ensure a tag exists

After a release happens, a version update to the stable version + a github tag needs to happen. This is achived in the desktop app by going to: 1. History 2. Right click on target commit you want to tag 3. Click tag 4. Push the revision 5. Update the version back to snapshot after tag.

Cudnn

Using the NVIDIA cuDNN library with DL4J.

Using Deeplearning4j with cuDNN

There are 2 ways of using cudnn with deeplearning4j. One is an older way described below that is built in to the various deeplearning4j layers at the java level.

The other is to use the new nd4j cuda bindings that link to cudnn at the c++ level. Both will be described below. The newer way first, followed by the old way.

Cudnn setup

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

 <dependency>
     <groupId>org.bytedeco</groupId>
     <artifactId>cuda-platform-redist</artifactId>
     <version>$CUDA_VERSION-$CUDNN_VERSIUON-$JAVACPP_VERSION</version>
 </dependency>

The same versioning scheme for redist applies to the cuda bindings that leverage an installed cuda.

Using cuDNN via nd4j

Similar to our avx bindings, nd4j leverages our c++ library libnd4j for running mathematical operations. In order to use cudnn, all you need to do is change the cuda backend dependency from:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.0</artifactId>
    <version>1.0.0-M1</version>
</dependency>

to

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

For jetson nano cuda 10.2:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
  <version>linux-arm64</version>
</dependency>

Note that we are only adding an additional dependency. The reason we use an additional classifier is to pull in an optional dependency on cudnn based routines. The default does not use cudnn, but instead built in standalone routines for various operations implemented in cudnn such as conv2d and lstm.

For users of the -platform dependencies such as nd4j-cuda-11.2-platform, this classifier is still required. The -platform dependencies try to set sane defaults for each platform, but give users the option to include whatever they want. If you need optimizations, please become familiar with this.

Using cudnn via deeplearning4j

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-11.0, or deeplearning4j-cuda-11.2, for example:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.0</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

or

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

or

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>
    // for the whole network
    new NeuralNetConfiguration.Builder()
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...
    // or separately for each layer
    new ConvolutionLayer.Builder(h, w)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...

CPU

CPU and AVX support in ND4J/Deeplearning4j

What is AVX, and why does it matter?

Note that AVX only applies to nd4j-native (CPU) backend for x86 devices, not GPUs and not ARM/PPC devices.

Why AVX matters: performance. You want to use the version of ND4J compiled with the highest level of AVX supported by your system.

AVX support for different CPUs - summary:

  • Most modern x86 CPUs: AVX2 is supported

  • Some high-end server CPUs: AVX512 may be supported

  • Old CPUs (pre 2012) and low power x86 (Atom, Celeron): No AVX support (usually)

Note that CPUs supporting later versions of AVX include all earlier versions also. This means it's possible run a generic x86 or AVX2 binary on a system supporting AVX512. However it is not possible to run binaries built for later versions (such as avx512) on a CPU that doesn't have support for those instructions.

In version 1.0.0-beta6 and later you may get a warning as follows, if AVX is not configured optimally:

*********************************** CPU Feature Check Warning ***********************************
Warning: Initializing ND4J with Generic x86 binary on a CPU with AVX/AVX2 support
Using ND4J with AVX/AVX2 will improve performance. See deeplearning4j.org/cpu for more details
Or set environment variable ND4J_IGNORE_AVX=true to suppress this warning
************************************************************************************************

This warning has been removed in more recent versions as it's more confusing to users and out of date.

Configure mkl usage

When using the nd4j-native backend on intel platforms, our openblas bindings give the ability to also use mkl instead. In order to use mkl, set the system property as follows eitehr on launch or before Nd4j is initialized with Nd4j.create():

 System.setProperty("org.bytedeco.openblas.load", "mkl");

Configuring AVX in ND4J/DL4J

As noted earlier, for best performance you should use the version of ND4J that matches your CPU's supported AVX level.

ND4J defaults configuration (when just including the nd4j-native or nd4j-native-platform dependencies without maven classifier configuration) is "generic x86" (no AVX) for nd4j/nd4j-platform dependencies.

To configure AVX2 and AVX512, you need to specify a classifier for the appropriate architecture.

The following binaries (nd4j-native classifiers) are provided for x86 architectures:

  • Generic x86 (no AVX): linux-x86_64, windows-x86_64, macosx-x86_64

  • AVX2: linux-x86_64-avx2, windows-x86_64-avx2, macosx-x86_64-avx2

  • AVX512: linux-x86_64-avx512

  • Generic x86 (no AVX): linux-x86_64-onednn, windows-x86_64-onednn, macosx-x86_64-onednn

  • AVX2: linux-x86_64-onednn-avx2, windows-x86_64-onednn-avx2, macosx-x86_64-onednn-avx2

  • AVX512: linux-x86_64-onednn-avx512

Example: Configuring AVX2 on Windows (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>windows-x86_64-avx2</classifier>
</dependency>

Example: Configuring AVX512 on Linux (Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>linux-x86_64-avx512</classifier>
</dependency>

Example: Configuring AVX512 on Linux with onednn(Maven pom.xml)

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
</dependency>

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>${nd4j.version}</version>
    <classifier>linux-x86_64-onednn-avx512</classifier>
</dependency>

Note that you need both nd4j-native dependencies - with and without the classifier.

In the examples above, it is assumed that a Maven property nd4j.version is set to an appropriate ND4J version such as 1.0.0-M1.1

Workspaces

Workspaces are an efficient model for memory paging in DL4J.

What are workspaces?

ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all INDArrays' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.

The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call INDArray.detach() and you'll get an independent INDArray copy.

Neural Networks

For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards. Thus for most users, no explicit worspaces configuration is required.

To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:

.trainingWorkspaceMode(WorkspaceMode.SEPARATE) and/or .inferenceWorkspaceMode(WorkspaceMode.SINGLE) in your neural network configuration.

The difference between SEPARATE and SINGLE workspaces is a tradeoff between the performance & memory footprint:

  • SEPARATE is slightly slower, but uses less memory.

  • SINGLE is slightly faster, but uses more memory.

That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).

With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference. The only exclusion is the output() method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting INDArray from the workspaces, thus providing you with independent INDArray which will be handled by the JVM GC.

Please note: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.

Garbage Collector

If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:

Put that somewhere before your model.fit(...) call.

ParallelWrapper & ParallelInference

For ParallelWrapper, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.

Iterators

We provide asynchronous prefetch iterators, AsyncDataSetIterator and AsyncMultiDataSetIterator, which are usually used internally.

These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first DataSet coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).

Caution: If you’re using a custom iterator or the RecordReader, please make sure you’re not initializing something huge within the first next() call. Do that in your constructor to avoid undesired workspace growth.

Caution: With AsyncDataSetIterator being used, DataSets are supposed to be used before calling the next() DataSet. You are not supposed to store them, in any way, without the detach() call. Otherwise, the memory used for INDArrays within DataSet will be overwritten within AsyncDataSetIterator eventually.

If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: AsyncShieldDataSetIterator and AsyncShieldMultiDataSetIterator. Basically, those are just thin wrappers that prevent prefetch.

Evaluation

Usually, evaluation assumes use of the model.output() method, which essentially returns an INDArray detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:

This piece of code will run a single cycle over iteratorTest, and it will update both (or less/more if required by your needs) IEvaluation implementations without any additional INDArray allocation.

Workspace Destruction

There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.

That could be done like so: Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.

Workspace Exceptions

If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:

DL4J's LayerWorkspaceMgr

DL4J's Layer API includes the concept of a "layer workspace manager".

The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces. For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons. However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.

What does this mean in practice? Usually it's quite simple...

  • When returning activations (activate(boolean training, LayerWorkspaceMgr workspaceMgr) method), make sure the returned array is defined in ArrayType.ACTIVATIONS (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)

  • When returning activation gradients (backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)), similarly return an array defined in ArrayType.ACTIVATION_GRAD

You can also leverage an array defined in any workspace to the appropriate workspace using, for example, LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)

Note that if you are not implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use LayerWorkspaceMgr.noWorkspaces().

Added Yolo2OutputLayer CNN layer for object detection (). See also DataVec's

Adds support for 'no bias' layers via hasBias(boolean) config (DenseLayer, EmbeddingLayer, OutputLayer, RnnOutputLayer, CenterLossOutputLayer, ConvolutionLayer, Convolution1DLayer). EmbeddingLayer now defaults to no bias ()

Adds support for dilated convolutions (aka 'atrous' convolutions) - ConvolutionLayer, SubsamplingLayer, and 1D versions there-of. ()

Added Upsampling2D layer, Upsampling1D layer (, )

ElementWiseVertex now (additionally) supports Average and Max modes in addition to Add/Subtract/Product ()

Added SeparableConvolution2D layer ()

Added Deconvolution2D layer (aka transpose convolution, fractionally strided convolution layer) ()

Added ReverseTimeSeriesVertex ()

Added RnnLossLayer - no-parameter version of RnnOutputLayer, or RNN equivalent of LossLayer ()

Added CnnLossLayer - no-parameter CNN output layer for use cases such as segmentation, denoising, etc. ()

Added Bidirectional layer wrapper (converts any uni-directional RNN to a bidirectional RNN) ()

Added SimpleRnn layer (aka "vanilla" RNN layer) ()

Added LastTimeStep wrapper layer (wraps a RNN layer to get last time step, accounting for masking if present) ()

Added MaskLayer utility layer that simply zeros out activations on forward pass when a mask array is present ()

Added alpha-version (not yet stable) SameDiff layer support to DL4J (Note: forward pass, CPU only for now)()

Added SpaceToDepth and SpaceToBatch layers (, )

Added Cropping2D layer ()

Added parameter constraints API (LayerConstraint interface), and MaxNormConstraint, MinMaxNormConstraint, NonNegativeConstraint, UnitNormConstraint implementations ()

Significant refactoring of learning rate schedules ()

Added ISchedule interface; added Exponential, Inverse, Map, Poly, Sigmoid and Step schedule implementations ()

Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types ()

Added support for dropout schedules via ISchedule interface ()

Added weight/parameter noise API (IWeightNoise interface); added DropConnect and WeightNoise (additive/multiplicative Gaussian noise) implementations (); dropconnect and dropout can now be used simultaneously

Adds layer configuration alias .units(int) equivalent to .nOut(int) ()

Layer index no longer required for MultiLayerConfiguration ListBuilder (i.e., .list().layer(<layer>) can now be used for configs) ()

Added MultiLayerNetwork.summary(InputType) and ComputationGraph.summary(InputType...) methods (shows layer and activation size information) ()

MultiLayerNetwork, ComputationGraph and layerwise trainable layers now track the number of epochs ()

Added .weightInit(Distribution) convenience/overload (previously: required .weightInit(WeightInit.DISTRIBUTION).dist(Distribution)) ()

WeightInit.NORMAL (for self-normalizing neural networks) ()

Ones, Identity weight initialization ()

Added new distributions (LogNormalDistribution, TruncatedNormalDistribution, OrthogonalDistribution, ConstantDistribution) which can be used for weight initialization ()

RNNs: Added ability to specify weight initialization for recurrent weights separately to "input" weights ()

Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) ()

Added Spark IteratorUtils - wraps a RecordReaderMultiDataSetIterator for use in Spark network training ()

CuDNN-supporting layers (ConvolutionLayer, etc) now warn the user if using CUDA without CuDNN ()

Binary cross entropy (LossBinaryXENT) now implements clipping (1e-5 to (1 - 1e-5) by default) to avoid numerical underflow/NaNs ()

SequenceRecordReaderDataSetIterator now supports multi-label regression ()

TransferLearning FineTuneConfiguration now has methods for setting training/inference workspace modes ()

IterationListener iterationDone method now reports both current iteration and epoch count; removed unnecessary invoke/invoked methods ()

Added MultiLayerNetwork.layerSize(int), ComputationGraph.layerSize(int)/layerSize(String) to easily determine size of layers ()

Added MultiLayerNetwork.toComputationGraph() method ()

Added NetworkUtils convenience methods to easily change the learning rate of an already initialized network ()

Added MultiLayerNetwork.save(File)/.load(File) and ComputationGraph.save(File)/.load(File) convenience methods ()

Added CheckpointListener to periodically save a copy of the model during training (every N iter/epochs, every T time units) ()

Added ComputationGraph output method overloads with mask arrays ()

New LossMultiLabel loss function for multi-label classification ()

Darknet19 ()

TinyYOLO ()

Added FileDataSetIterator, FileMultiDataSetIterator for flexibly iterating over directories of saved (Multi)DataSet objects ()

UCISequenceDataSetIterator ()

RecordReaderDataSetIterator now has builder pattern for convenience, improved javadoc ()

Added DataSetIteratorSplitter, MultiDataSetIteratorSplitter (, )

Added additional score functions for early stopping (ROC metrics, full set of Evaluation/Regression metrics, etc) ()

Added additional ROC and ROCMultiClass evaluation overloads for MultiLayerNetwork and ComputationGraph ()

Clarified Evaluation.stats() output to refer to "Predictions" instead of "Examples" (former is more correct for RNNs) ()

EarlyStoppingConfiguration now supports Supplier<ScoreCalculator> for use with non-serializable score calculators ()

Improved ModelSerializer exceptions when trying to load a model via wrong method (i.e., try to load ComputationGraph via restoreMultiLayerNetwork) ()

Added SparkDataValidation utility methods to validate saved DataSet and MultiDataSet on HDFS or local ()

ModelSerializer: added restoreMultiLayerNetworkAndNormalizer and restoreComputationGraphAndNormalizer methods ()

ParallelInference now has output overloads with support for input mask arrays ()

Lombok is no longer included as a transitive dependency ()

ComputationGraph can now have a vertex as the output (not just layers) (, )

Performance improvement for J7FileStatsStorage with large amount of history ()

Fixed UI layer sizes for variational autoencoder layers ()

Fixes to avoid HDF5 library crashes (, )

UI Play servers switch to production (PROD) mode ()

Related to the above: users can now set play.crypto.secret system property to manually set the Play application secret; is randomly generated by default ().

SequenceRecordReaderDataSetIterator would apply preprocessor twice ()

CollectScoresIterationListener could recurse endlessly ()

Async(Multi)DataSetIterator calling reset() on underlying iterator could cause issues in some situations ()

In some cases, L2 regularization could be (incorrectly) applied to frozen layers ()

Logging fixes for NearestNeighboursServer ()

Memory optimization for BaseStatsListener ()

ModelGuesser fix for loading Keras models from streams (previously would fail) ()

Various fixes for workspaces in MultiLayerNetwork and ComputationGraph (, , , , , )

Fix for incorrect condition in DuplicateToTimeSeriesVertex ()

Fix for getMemoryReport exception on some valid ComputationGraph networks ()

RecordReaderDataSetIterator when used with preprocessors could cause an exception under some circumstances ()

CnnToFeedForwardPreProcessor could silently reshape invalid input, as long as the input array length matches the expected length ()

ModelSerializer temporary files would not be deleted if JVM crashes; now are deleted immediately when no longer required ()

RecordReaderMultiDataSetIterator may not add mask arrays under some circumstances, when set to ALIGN_END mode ()

ConvolutionIterationListener previously produced an IndexOutOfBoundsException when all convolution layers are frozen ()

PrecisionRecallCurve.getPointAtRecall could return a point with a correct but sub-optimal precision when multiple points had identical recall ()

Setting dropout(0) on transfer learning FineTuneConfiguration did not remove dropout if present on existing layer ()

Under some rare circumstances, Spark evaluation could lead to a NullPointerException ()

ComputationGraph: disconnected vertices were not always detected in configuration validation ()

Activation layers would not always inherit the global activation function configuration ()

RNN evaluation memory optimization: when TBPTT is configured for training, also use TBPTT-style splitting for evaluation (identical result, less memory) (, )

PerformanceListener is now serializable ()

ScoreIterationListener and PerformanceListener now report model iteration, not "iterations since listener creation" ()

Precision/recall curves cached values in ROC class may not be updated after merging ROC instances ()

ROC merging after evaluating a large number of examples may produce IllegalStateException ()

Added checks for invalid input indices to EmbeddingLayer ()

Fixed possible NPE when loading legacy (pre-0.9.0) model configurations from JSON ()

Fixed issues with EvaluationCalibration HTML export chart rendering ()

Fixed possible incorrect redering of UI/StatsStorage charts with J7FileStatsStorage when used with Spark training ()

MnistDataSetIterator would not always reliably detect and automatically fix/redownload on corrupted download data ()

MnistDataSetIterator / EmnistDataSetIterator: updated download location after hosting URL change (, )

Fixes to propagation of thread interruptions ()

MultiLayerNetwork/ComputationGraph will no longer throw an ND4JIllegalStateException during initialization if a network contains no parameters (, )

Fixes for TSNE posting of data to UI for visualization ()

PerformanceListener now throws a useful exception (in constructor) on invalid frequency argument, instead of runtime ArithmeticException ()

RecordReader(Multi)DataSetIterator now throws more useful exceptions when Writable values are non-numerical ()

UI: Fixed possible character encoding issues for non-English languages when internationalization data .txt files are read from uber JARs ()

UI: Fixed UI incorrectly trying to parse non-DL4J UI resources when loading I18N data ()

Various threading fixes ()

Evaluation: no-arg methods (f1(), precion(), etc) now return single class value for binary case instead of macro-averaged value; clarify values in stats() method and javadoc ()

Early stopping training: TrainingListener opEpochStart/End (etc) methods were not being called correctly ()

Fixes issue where dropout was not always applied to input of RNN layers ()

ModelSerializer: improved validation/exceptions when reading from invalid/empty/closed streams ()

fixes for variable size inputs (variable length time series, variable size CNN inputs) when using batch mode ()

fixes undelying model exceptions during output method are now properly propagated back to the user ()

fixes support for 'pre-batched' inputs (i.e., inputs where minibatch size is > 1) ()

Memory optimization for network weight initialization via in-place random ops ()

Fixes for CuDNN with SAME mode padding (, )

Fix for VariationalAutoencoder builder decoder layer size validation ()

Improved Kmeans throughput

Add RPForest to nearest neighbors

Default training workspace mode has been switched to SEPARATE from NONE for MultiLayerNetwork and ComputationGraph ()

Behaviour change: fit(DataSetIterator) and similar methods no longer perform layerwise pretraining followed by backprop - only backprop is performed in these methods. For pretraining, use pretrain(DataSetIterator) and pretrain(MultiDataSetIterator) methods ()

DataSetIterators in DL4J have been moved from deeplearning4j-nn module to new deeplearning4j-datasets, deeplearning4j-datavec-iterators and deeplearning4j-utility-iterators modules. Packages/imports are unchanged; deeplearning4j-core pulls these in as transitive dependencies hence no user changes should be required in most cases ()

RBM (Restricted Boltzmann Machine) layers have been removed entirely. Consider using VariationalAutoencoder layers as a replacement ()

Previously deprecated WordVectorSerializer methods have now been removed ()

Removed deeplearning4j-ui-remote-iterationlisteners module and obsolete RemoteConvolutionalIterationListener ()

Some issues have been noted with FP16 support on CUDA ()

In 0.9.1 deprecated Model and ModelConfiguration have been permanently removed. Use instead, which is now the only entry point for Keras model import.

New DifferentialFunction api with automatic differentiation (see samediff section)

Apache Arrow serialization added supporting new tensor API

Add support for AVX/AVX2 and AVX-512 instruction sets for Windows/Linux for nd4j-native backend

Initial tech preview

Alpha release of auto-differentiation engine for ND4J.

Supports rudimentary import of and ONNX graphs for inference.

is a dedicated project for creating test resources for TensorFlow import.

Added ObjectDetectionRecordReader - for use with DL4J's Yolo2OutputLayer () (also supports image transforms: )

Added ImageObjectLabelProvider, VocLabelProvider and SvhnLabelProvider (Streetview house numbers) for use with ObjectDetectionRecordReader (, )

Added LocalTransformExecutor for single machine execution (without Spark dependency) ()

Added ArrowRecordReader (for reading Apache Arrow format data) ()

Added RecordMapper class for conversion between RecordReader and RecordWriter ()

RecordWriter and InputSplit APIs have been improved; more flexible and support for partitioning across all writers (, , )

Added ArrowWritableRecordBatch and NDArrayRecordBatch for efficient batch storage (List<List<Writable>>) (, )

Added BoxImageTransform - an ImageTransform that either crops or pads without changing aspect ratio ()

TransformProcess now has executeToSequence(List<Writable)), executeSequenceToSingle(List<List<Writable>>) and executeToSequenceBatch(List<List<Writable>>) methods (, )

Added CSVVariableSlidingWindowRecordReader ()

ImageRecordReader: supports regression use cases for labels (previously: only classification) ()

ImageRecordReader: supports multi-class and multi-label image classification (via PathMultiLabelGenerator interface) (, )

DataAnalysis/AnalyzeSpark now includes quantiles (via t-digest) ()

Added AndroidNativeImageLoader.asBitmap(), Java2DNativeImageLoader.asBufferedImage() ()

datavec-excel module and ExcelRecordReader ()

JacksonLineRecordReader ()

ConcatenatingRecordReader ()

TextToTermIndexSequenceTransform ()

ConditionalReplaceValueTransformWithDefault ()

GeographicMidpointReduction ()

StringToTimeTransform will con try to guess time format if format isn't provided ()

Improved performance for NativeImageLoader on Android ()

Added BytesWritable (Writable for byte[] data) ()

Added TranformProcess.inferCategories methods to auto-infer categories from a RecordReader ()

Lombok is no longer included as a transitive dependency ()

MapFileRecordReader and MapFileSequenceRecordReader can handle empty partitions/splits for multi-part map files ()

CSVRecordReader is now properly serializable using Java serialization () and Kryo serialization ()

Writables: equality semantics have been changed: for example, now DoubleWritable(1.0) is equal to IntWritable(1) ()

NumberedFileInputSplit now supports leading zeros ()

CSVSparkTransformServer and ImageSparkTransformServer Play severs changed to production mode ()

Fix for JSON subtype info for FloatMetaData ()

Serialization fixes for JacksonRecordReader, RegexSequenceRecordReader ()

Added RecordReader.resetSupported() method ()

SVMLightRecordReader now implements nextRecord() method ()

Fix for custom reductions when using conditions ()

SequenceLengthAnalysis is now serializable () and supports to/from JSON ()

Fixes for FFT functionality (, )

Remove use of backported java.util.functions; use ND4J functions API instead ()

Fix for transforms data quality analysis for time columns ()

Many of the util classes (in org.datavec.api.util mainly) have been deprecated or removed; use equivalently named util clases in nd4j-common module ()

RecordReader.next(int) method now returns List<List<Writable>> for batches, not List<Writable>. See also

Workspace support added (, )

Added new layer spaces: LSTM, CenterLoss, Deconvolution2D, LossLayer, Bidirectional layer wrapper (, )

As per DL4J API changes: Updater configuration options (learning rate, momentum, epsilon, rho etc) have been moved to ParameterSpace instead. Updater spaces (AdamSpace, AdaGradSpace etc) introduced ()

As per DL4J API changes: Dropout configuration is now via ParameterSpace<IDropout>, DropoutSpace introduced ()

RBM layer spaces removed ()

ComputationGraphSpace: added layer/vertex methods with overloads for preprocessors ()

Added support to specify 'fixed' layers using DL4J layers directly (instead of using LayerSpaces, even for layers without hyperparameters) ()

Added LogUniformDistribution ()

Improvements to score functions; added ROC score function ()

Learning rate schedule support added ()

Add math ops for ParameterSpace<Double> and ParameterSpace<Integer> ()

Fix parallel job execution (when using multiple execution threads) (, )

Improved logging for failed task execution ()

Fix for UI JSON serialization ()

Fix threading issues when running on CUDA and multiple execution threads (, , )

Rename saved model file to model.bin ()

Fix threading issues with non thread-safe candidates / parameter spaces ()

Lombok is no longer included as a transitive dependency ()

First release of , which closely resembles Keras' API.

Supports both Keras inspired models, corresponding to DL4J's MultiLayerNetwork, and , corresponding to ComputationGraph.

Deeplearning4J has a wealth of examples of how to use its many parts. You can find the examples in the .

The consists of several separate Maven Java projects, each with their own pom files. Maven is a popular build automation tool for Java Projects. The contents of a "pom.xml" file dictate the configurations. Read more about how to configure Maven .

Users can also refer to the to get started with a clean project from scratch.

Build tools are considered standard software engineering best practice. Besides this the complexities posed by the projects in the DL4J ecosystem make dependencies too difficult to manage manually. All the projects in the DL4J ecosystem can be used with other build tools like Gradle, SBT etc. More information on that can be found .

This project contains a set of examples that demonstrate use of the high level DL4J API to build a variety of neural networks. Some of these examples are end to end, in the sense they start with raw data, process it and then build and train neural networks on it.

This project contains a set of examples that demonstrate how to import Keras h5 models and TensorFlow frozen pb models into the DL4J ecosystem. Once imported into DL4J these models can be treated like any other DL4J model - meaning you can continue to run training on them or modify them with the transfer learning API or simply run inference on them.

This project contains a set of examples that demonstrate how to do distributed training, inference and evaluation in DL4J on Apache Spark. DL4J distributed training employs a "hybrid" asynchronous SGD approach - further details can be found in the distributed deep learning documentation

This project contains a set of examples that demonstrate how to leverage multiple GPUs for data-parallel training of neural networks for increased performance.

This project contains a set of examples that demonstrate the SameDiff API. SameDiff (which is part of the ND4J library) can be used to build lower level auto-differentiating computation graphs. An analogue to the SameDiff API vs the DL4J API is the low level TensorFlow API vs the higher level of abstraction Keras API.

This project contains a set of examples that demonstrate how raw data in various formats can be loaded, split and preprocessed to build serializable (and hence reproducible) ETL pipelines.

This project contains a set of examples that demonstrate how to manipulate NDArrays. The functionality of ND4J demonstrated here can be likened to NumPy.

This project contains a set of examples that demonstrate usage of the Arbiter library for hyperparameter tuning of Deeplearning4J neural networks.

This project contains examples of using RL4J, the reinforcement learning library in DL4J.

This project contains an Android example project, that shows DL4J being used in an Android application.

While these set of examples don't cover all the features available in DL4J the intent is to cover functionality required for most users - beginners and advanced. File an issue if you have feedback or feature requests that are not covered here. We are also available via our for questions. We welcome contributions from the community. More information can be found We love hearing from you. Cheers!

Compiling libnd4j on different cpu architectures ensures there is platform optimized math in c++ for each platform. The is a self contained cmake project that can be run on different platforms. In each there are steps for deploying for each platform.

At the core of compiling from source for libnd4j is a maven pom.xml that is run as part of the overall build process that invokes our with various parameters that then get passed to our overall cmake structure for compilation. This script exists to formalize some of the required parameters for invokving cmake. Any developer is welcome to invoke cmake directly.

Each build of libnd4j links against an accelerated backend for and convolution operations such as , , or The implementations for each platform can be found

This is a step that just ensures that the dl4j release matches the current state of the dependencies provided by javacpp on maven central. This affects every module including python4j, nd4j-native/cuda, datavec-image, among others. The versions of everything can be found in the top level The general convention is library version followed by a - and the version of javacpp that that version uses.

This step may also involve invoking tests with specific tags if only running a subset of tests is desired. This can be achived using the -Dgroups flag.

The examples contain a set of tests which just allow us to run maven clean test on a small number of examples. Instead of us picking examples manually, we can just run mvn clean test on any platform we need by just specifying a version of dl4j to depend on and usually a

Generally, sometimes users will raise issues right before a release that can be critical. It is the sole discretion of the maintainers to ask the user to use snapshots or to wait for a follow on version. For certain fixes, we will publish quick bugfix releases. If your team has specific requirements on a release, please contact us on the

This means after , hitting the release button initiating a sync of the staging repository with the desired version to maven central. Sync usually takes 2 hours or less.

Note there are multiple combinations of cuDNN and CUDA supported. Deeplearning4j's cuda support is based on . The way to read the versioning is: cuda version - cudnn version - javacpp version. For example, if the cuda version is set to 11.2, you can expect us to support cudnn 8.1.

Alternatively, in the case of the most recent supported cuda version, cuDNN comes bundled with the "redist" package of the . , we can add the following dependencies instead of installing CUDA and cuDNN:

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the , instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

AVX (Advanced Vector Extensions) is a set of CPU instructions for accelerating numerical computations. See for more details.

As of 1.0.0-M1, the following combinations are also possible with :

Link
ObjectDetectionRecordReader
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
link
link
Link
Link
Link
Link
Link
Link
Link
KerasModelImport
Link
Link
Link
Link
SameDiff
TensorFlow
TFOpTests
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
NDArrayRecordBatch
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
ScalNet Scala API
Sequential
Model
Examples Repository
example repository
here
simple sample project provided
here
dl4j-examples
tensorflow-keras-import-examples
dl4j-distributed-training-examples
here
cuda-specific-examples
samediff-examples
data-pipeline-examples
nd4j-ndarray-examples
arbiter-examples
rl4j-examples
android-examples
here
community forum
here
single code base
github actions workflow
build script
blas
onednn
cudnn
armcompute
here
deeplearning4j pom
surefire plugin
staging repository
community forums
closing a staging repository
NVIDIA cuDNN
javacpp's cuda bindings
JavaCPP Presets for CUDA
After agreeing to the license
NO_WORKSPACE mode settable via the network configuration
Wikipedia
onednn
// this will limit frequency of gc calls to 5000 milliseconds
Nd4j.getMemoryManager().setAutoGcWindow(5000)

// OR you could totally disable it
Nd4j.getMemoryManager().togglePeriodicGc(false);
ParallelWrapper wrapper = new ParallelWrapper.Builder(model)
      // DataSets prefetching options. Buffer size per worker.
      .prefetchBuffer(8)

      // set number of workers equal to number of GPUs.
      .workers(2)

      // rare averaging improves performance but might reduce model accuracy
      .averagingFrequency(5)

      // if set to TRUE, on every averaging model score will be reported
      .reportScoreAfterAveraging(false)

      // 3 options here: NONE, SINGLE, SEPARATE
      .workspaceMode(WorkspaceMode.SINGLE)

      .build();
Evaluation eval = new Evaluation(outputNum);
ROC roceval = new ROC(outputNum);
model.doEvaluation(iteratorTest, eval, roceval);
org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic

Noise Layers

KerasGaussianNoise

Keras wrapper for DL4J dropout layer with GaussianNoise.

KerasGaussianNoise

public KerasGaussianNoise(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianNoiseLayer

public DropoutLayer getGaussianNoiseLayer()

Get DL4J DropoutLayer with Gaussian dropout.

  • return DropoutLayer

KerasAlphaDropout

Keras wrapper for DL4J dropout layer with AlphaDropout.

KerasAlphaDropout

public KerasAlphaDropout(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getAlphaDropoutLayer

public DropoutLayer getAlphaDropoutLayer()

Get DL4J DropoutLayer with Alpha dropout.

  • return DropoutLayer

KerasGaussianDropout

Keras wrapper for DL4J dropout layer with GaussianDropout.

KerasGaussianDropout

public KerasGaussianDropout(Integer kerasVersion) throws UnsupportedKerasConfigurationException

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Invalid Keras config

getOutputType

public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration.

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getGaussianDropoutLayer

public DropoutLayer getGaussianDropoutLayer()

Get DL4J DropoutLayer with Gaussian dropout.

  • return DropoutLayer

Normalization Layers

KerasBatchNormalization

Imports a BatchNormalization layer from Keras.

KerasBatchNormalization

Pass-through constructor from KerasLayer

  • param kerasVersion major keras version

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getBatchNormalizationLayer

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

  • return number of trainable parameters (4)

setWeights

Set weights for layer.

  • param weights Map from parameter name to INDArray.

Advanced Activations

KerasPReLU

Imports PReLU layer from Keras

KerasPReLU

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Invalid Keras config

getPReLULayer

Get DL4J ActivationLayer.

  • return ActivationLayer

setWeights

Set weights for layer.

  • param weights Dense layer weights

KerasThresholdedReLU

Imports ThresholdedReLU layer from Keras

KerasThresholdedReLU

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

Get DL4J ActivationLayer.

  • return ActivationLayer

KerasLeakyReLU

Imports LeakyReLU layer from Keras

KerasLeakyReLU

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Invalid Keras config

getOutputType

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • param enforceTrainingConfig whether to enforce training-related configuration options

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Invalid Keras config

getActivationLayer

Get DL4J ActivationLayer.

  • return ActivationLayer

[source]
[source]
[source]
public KerasBatchNormalization(Integer kerasVersion) throws UnsupportedKerasConfigurationException
public BatchNormalization getBatchNormalizationLayer()
public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException
public int getNumParams()
public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException
public KerasPReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException
public PReLULayer getPReLULayer()
public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException
public KerasThresholdedReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException
public ActivationLayer getActivationLayer()
public KerasLeakyReLU(Map<String, Object> layerConfig)
            throws InvalidKerasConfigurationException, UnsupportedKerasConfigurationException
public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException
public ActivationLayer getActivationLayer()
[source]
[source]
[source]
[source]

Constraints

Supported Keras constraints.

  • max_norm

  • non_neg

  • unit_norm

  • min_max_norm

Embedding Layers

KerasEmbedding

Imports an Embedding layer from Keras.

KerasEmbedding

Pass through constructor for unit tests

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getEmbeddingLayer

Constructor from parsed Keras layer configuration dictionary.

  • param layerConfig dictionary containing Keras layer configuration

  • throws InvalidKerasConfigurationException Invalid Keras config

  • throws UnsupportedKerasConfigurationException Unsupported Keras config

getOutputType

Get layer output type.

  • param inputType Array of InputTypes

  • return output type as InputType

  • throws InvalidKerasConfigurationException Invalid Keras config

getNumParams

Returns number of trainable parameters in layer.

  • return number of trainable parameters (1)

setWeights

Set weights for layer.

  • param weights Embedding layer weights

All are supported:

Mapping Keras to DL4J constraints happens in .

Keras constraints
KerasConstraintUtils
public KerasEmbedding() throws UnsupportedKerasConfigurationException
public EmbeddingSequenceLayer getEmbeddingLayer()
public InputType getOutputType(InputType... inputType) throws InvalidKerasConfigurationException
public int getNumParams()
public void setWeights(Map<String, INDArray> weights) throws InvalidKerasConfigurationException
[source]

Early Stopping

Terminate a training session given certain conditions.

What is early stopping?

When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).

Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.

The idea behind early stopping is relatively simple:

  • Split data into training and test sets

  • At the end of each epoch (or, every N epochs):

    • evaluate the network performance on the test set

    • if the network outperforms the previous best model: save a copy of the network at the current epoch

  • Take as our final model the model that has the best test set performance

This is shown graphically below:

The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.

Using DL4J's early stopping functionality requires you to provide a number of configuration options:

  • How frequently we want to calculate the score function (default: every epoch)

  • One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:

    • Epoch termination conditions: evaluated every N epochs

    • Iteration termination conditions: evaluated once per minibatch

  • A model saver, that defines how models are saved

An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:

MultiLayerConfiguration myNetworkConfiguration = ...;
DataSetIterator myTrainData = ...;
DataSetIterator myTestData = ...;

EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
        .epochTerminationConditions(new MaxEpochsTerminationCondition(30))
        .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
        .scoreCalculator(new DataSetLossCalculator(myTestData, true))
        .evaluateEveryNEpochs(1)
        .modelSaver(new LocalFileModelSaver(directory))
        .build();

EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);

//Conduct early stopping training:
EarlyStoppingResult result = trainer.fit();

//Print out the results:
System.out.println("Termination reason: " + result.getTerminationReason());
System.out.println("Termination details: " + result.getTerminationDetails());
System.out.println("Total epochs: " + result.getTotalEpochs());
System.out.println("Best epoch number: " + result.getBestModelEpoch());
System.out.println("Score at best epoch: " + result.getBestModelScore());

//Get the best model:
MultiLayerNetwork bestModel = result.getBestModel();

You can also implement your own iteration and epoch termination conditions.

Early Stopping w/ Parallel Wrapper

The early stopping implementation described above will only work with a single device. However, EarlyStoppingParallelTrainer provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. EarlyStoppingParallelTrainer wraps your model in a ParallelWrapper class and performs localized distributed training.

Note that EarlyStoppingParallelTrainer doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.

Evaluation

Tools and classes for evaluating neural network performance

Why evaluate?

When training or deploying a Neural Network it is useful to know the accuracy of your model. In DL4J the Evaluation Class and variants of the Evaluation Class are available to evaluate your model's performance.

The Evaluation class is used to evaluate the performance for binary and multi-class classifiers (including time series classifiers). This section covers basic usage of the Evaluation Class.

Given a dataset in the form of a DataSetIterator, the easiest way to perform evaluation is to use the built-in evaluate methods on MultiLayerNetwork and ComputationGraph:

DataSetIterator myTestData = ...
Evaluation eval = model.evaluate(myTestData);

The CSV example has CSV data for 3 classes of flowers and builds a simple feed forward neural network to classify the flowers based on 4 measurements.

Evaluation eval = new Evaluation(3);
INDArray output = model.output(testData.getFeatures());
eval.eval(testData.getLabels(), output);
log.info(eval.stats());

The first line creates an Evaluation object with 3 classes. The second line gets the labels from the model for our test dataset. The third line uses the eval method to compare the labels array from the testdata with the labels generated from the model. The fourth line logs the evaluation data to the console.

The output.

Examples labeled as 0 classified by model as 0: 24 times
Examples labeled as 1 classified by model as 1: 11 times
Examples labeled as 1 classified by model as 2: 1 times
Examples labeled as 2 classified by model as 2: 17 times


==========================Scores========================================
 # of classes:    3
 Accuracy:        0.9811
 Precision:       0.9815
 Recall:          0.9722
 F1 Score:        0.9760
Precision, recall & F1: macro-averaged (equally weighted avg. of 3 classes)
========================================================================

By default the .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:

  • Confusion Matrix

  • False Positive/Negative Rate

  • True Positive/Negative

  • Class Counts

Display the Confusion Matrix.

System.out.println(eval.confusionToString());

Displays

Predicted:         0      1      2
Actual:
0  0          |      16      0      0
1  1          |       0     19      0
2  2          |       0      0     18

Additionaly the confusion matrix can be accessed directly, converted to csv or html using.

eval.getConfusionMatrix() ;
eval.getConfusionMatrix().toHTML();
eval.getConfusionMatrix().toCSV();

To Evaluate a network performing regression use the RegressionEvaluation Class.

As with the Evaluation class, RegressionEvaluation on a DataSetIterator can be performed as follows:

DataSetIterator myTestData = ...
RegressionEvaluation eval = model.evaluateRegression(myTestData);

Here is a code snippet with single column, in this case the neural network was predicting the age of shelfish based on measurements.

RegressionEvaluation eval =  new RegressionEvaluation(1);

Print the statistics for the Evaluation.

System.out.println(eval.stats());

Returns

Column    MSE            MAE            RMSE           RSE            R^2            
col_0     7.98925e+00    2.00648e+00    2.82653e+00    5.01481e-01    7.25783e-01

Columns are Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, Relative Squared Error, and R^2 Coefficient of Determination

When performing multiple types of evaluations (for example, Evaluation and ROC on the same network and dataset) it is more efficient to do this in one pass of the dataset, as follows:

DataSetIterator testData = ...
Evaluation eval = new Evaluation();
ROC roc = new ROC();
model.doEvaluation(testdata, eval, roc);

For most users, it is simply sufficient to use the MultiLayerNetwork.evaluate(DataSetIterator) or MultiLayerNetwork.evaluateRegression(DataSetIterator) and similar methods. These methods will properly handle masking, if mask arrays are present.

The EvaluationBinary is used for evaluating networks with binary classification outputs - these networks usually have Sigmoid activation functions and XENT loss functions. The typical classification metrics, such as accuracy, precision, recall, F1 score, etc. are calculated for each output.

EvaluationBinary eval = new EvaluationBinary(int size)

ROC (Receiver Operating Characteristic) is another commonly used evaluation metric for the evaluation of classifiers. Three ROC variants exist in DL4J:

  • ROC - for single binary label (as a single column probability, or 2 column 'softmax' probability distribution).

  • ROCBinary - for multiple binary labels

  • ROCMultiClass - for evaluation of non-binary classifiers, using a "one vs. all" approach

These classes have the ability to calculate the area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC), via the calculateAUC() and calculateAUPRC() methods. Furthermore, the ROC and Precision-Recall curves can be obtained using getRocCurve() and getPrecisionRecallCurve().

The ROC and Precision-Recall curves can be exported to HTML for viewing using: EvaluationTools.exportRocChartsToHtmlFile(ROC, File), which will export a HTML file with both ROC and P-R curves, that can be viewed in a browser.

Note that all three support two modes of operation/calculation

  • Thresholded (approximate AUROC/AUPRC calculation, no memory issues)

  • Exact (exact AUROC/AUPRC calculation, but can require large amount of memory with very large datasets - i.e., datasets with many millions of examples)

The number of bins can be set using the constructors. Exact can be set using the default constructor new ROC() or explicitly using new ROC(0)

Deeplearning4j also has the EvaluationCalibration class, which is designed to analyze the calibration of a classifier. It provides a number of tools for this purpose:

  • Counts of the number of labels and predictions for each class

  • Reliability diagram (or reliability curve)

  • Residual plot (histogram)

  • Histograms of probabilities, including probabilities for each class separately

    Evaluation of a classifier using EvaluationCalibration is performed in a similar manner to the other evaluation classes. The various plots/histograms can be exported to HTML for viewing using EvaluationTools.exportevaluationCalibrationToHtmlFile(EvaluationCalibration, File).

SparkDl4jMultiLayer and SparkComputationGraph both have similar methods for evaluation:

Evaluation eval = SparkDl4jMultiLayer.evaluate(JavaRDD<DataSet>);

//Multiple evaluations in one pass:
SparkDl4jMultiLayer.doEvaluation(JavaRDD<DataSet>, IEvaluation...);

Evaluation Classes useful for Multi-Task Network

Available evaluations

A score calculator, such as the DataSetLossCalculator(, ) for a Multi Layer Network, or DataSetLossCalculatorCG (, ) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)

However, evaluation can be performed on individual minibatches also. Here is an example taken from our dataexamples/CSVExample in the project.

F-beta, G-measure, Matthews Correlation Coefficient and more, see

See

Time series evaluation is very similar to the above evaluation approaches. Evaluation in DL4J is performed on all (non-masked) time steps separately - for example, a time series of length 10 will contribute 10 predictions/labels to an Evaluation object. One difference with time seires is the (optional) presence of mask arrays, which are used to mark some time steps as missing or not present. See for more details on masking.

See

See is used to evaluate Binary Classifiers.

A multi-task network is a network that is trained to produce multiple outputs. For example a network given audio samples can be trained to both predict the language spoken and the gender of the speaker. Multi-task configuration is briefly described .

See

See

JavaDoc
Source Code
JavaDoc
Source Code
Evaluation for Classification
Examples
Evaluation JavaDoc
Evaluation for Regression
RegressionEvaluation JavaDoc
Performing Multiple Evaluations Simultaneously
Evaluation of Time Series
Using RNNs - Masking
Evaluation for Binary Classifiers
EvaluationBinary JavaDoc
ROC
ROCBinary JavaDoc
Evaluating Classifier Calibration
Distributed Evaluation for Spark Networks
Evaluation for Multi-task Networks
here
ROCMultiClass JavaDoc
ROCBinary JavaDoc