Release Notes

New changes in each release of Eclipse Deeplearning4j.

Version 1.0.0-beta7

Read the announcement at https://blog.konduit.ai/2020/05/14/deeplearning4j-1-0-0-beta7-released/ for the highlights of this release.

Deeplearning4j

Features and Enhancements

  • Added Keras model import support for tf.keras models Link, Link

    • Full inference and training support is available for ops/layers in the tf.keras namespace; inference only for general Tensorflow operations outside of the tf.keras namespace

    • Note also improvements to Keras import for reshape, permute, etc operations due to NHWC and NWC support in DL4J

  • DL4J now supports NHWC (channels last) data format for all CNN 2D layers, in addition to NCHW Link

  • DL4J now supports NWC (channels last - [minibatch, sequence_length, size]) for all RNN and CNN 1D layers, in addition to NCW Link

  • Added Deconvolution3D layer Link

  • Keras import: added ReLU, ELU and Softmax advanced activation layers Link and Swish activation function Link

  • Added DL4J SameDiffLoss class (for easily-defined DL4J ILossFunction's via SameDiff) Link

  • Useful exceptions are now thrown when attempting to perform unsupported operations on FastText Link

  • Added MultiLayerNetwork.evaluate(MultiDataSetIterator) and .evaluateRegression(MultiDataSetIterator) methods Link, Link

Bug Fixes and Optimizations

  • Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff Link, Link

  • Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules) Link

    • Note: this is a breaking change for some class packages/imports. See this link for details on exact package changes

  • Deeplearning4j UI: Webjars versions locked down using dependency management to avoid check on each build Link

  • Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Link

  • Refactored/merged modules dl4j-perf and dl4j-util into deeplearning4j-core Link

  • Fixed an issue with BertWordPieceTokenizer - potential StackOverflowError with certain inputs Link

  • Fixed an issue with GlobalPooling layer with masks of different datatype to the activations datatype Link

  • Fixed an issue with DL4JModelValidator for ComputationGraph Link

  • Fixed an issue where SameDiff layers in DL4J could throw an exception when used with transfer learning Link

  • Weight initialization for EmbeddingLayer and EmbeddingSequenceLayer now no longer depend on the vocabulary size (only the vector size) Link

  • Fixed an issue with Keras import with bidirectional layers + preprocessors Link

  • DL4J UI: added redirect from /train to /train/overview Link

  • Fixed an issue where RecordReaderDataSetIterator builder collectMetaData configuration was not being applied Link

  • Fixed an issue where MultiLayerNetwork evaluation was not passing metadata to the IEvaluation instances during evaluation Link, Link

  • Fixed an issue with Spark training SharedTrainingMaster when training with a ComputationGraph and MultiDataSets Link

  • Assorted fixes for edge cases for DL4J Keras import Link

  • deelpearning4j-nlp-korean will no longer be released for Scala 2.12 due to required dependency only having Scala 2.11 version avairable Link

  • Fix for ConvolutionalIterationListener for ComputationGraph Link

  • Fixed an issue where dataset and model zoo downloads could get stuck if the server fails to send any data (now: timeout + retry) Link

  • DL4J ModelSerializer no longer writes temporary files when restoring models from InputStream Link

  • Fixes issues with UIServer multi session mode, and potential shutdown race condition Link

  • Fixed an issue where TfidfVectorizer.vectorize() could throw a NPE when fit from LabelAwareIterator Link

ND4J/SameDiff:

Features and Enhancements

  • SameDiff multi-threaded inference enhanced (and fixed) - a single SameDiff instance can now be used for inference safely and efficiently from multiple threads Link Link

  • cuDNN support added to SameDiff (automatically enabled for nd4j-cuda-10.x backend) Link

  • Added ND4J namespaces: Nd4j.cnn, Nd4j.rnn, Nd4j.image Link

  • Added new Image operations namespace operations:

    • rgbToHsv, hsvToRgb Link

    • rgbToYiq, yiqToRgb, rgbToYuv, yuvToRgb Link

    • imageResize Link

  • Added new Random operations namespace operations:

    • gamma, poisson, shuffle Link

  • Added new Math namespace operations:

    • clipByAvgNorm, embeddingLookup Link

    • mergeMaxIndex Link

  • Added new NN namespace operations:

  • Added new CNN namespace operations:

  • Added new linalg operations namespace

  • Added new RNN operation namespace operations:

    • lstmLayer (note old lstmLayer method renamed to lstmBlock) Link

    • gru Link

  • Added new Loss operations namespace - Nd4j.loss Link

  • Mapped operations for Tensorflow import:

    • HSVToRGB, RGBToHSV, Igamma, Igammac, RandomGamma, RandomPoisson, RandomPoissonV2, RandomShuffle Link

  • Added SameDiff ProfilingListener - writes op performance profiles in Chrome profiler format (load in chrome://tracing/) Link Link

  • Added SameDiff ProfileAnalyzer tool to compare profiles output from ProfilingListener (or Tensorflow) Link Link

  • SameDiff listener API: added frame and iteration information for listener methods Link Link

  • Added (non-backend-specific) method of accessing Nd4j environment: Nd4j.getEnvironment() method (environment info and low-level configuration options) Link Link

  • Improved memory limits/configuration support for libnd4j (c++) Link

  • Added pairwise (broadcastable) power backprop operation Link

  • Updated JavaCPP presets MKL version to 2020.0 from 2019.5 Link

  • Added DynamicCustomOp dargs - datatype arguments Link Link

    • Output datatype configuration for Range op Link, SequenceOp Link, ConfusionMatrix Link

  • Added tensormmul_bp op Link

  • OpenBLAS version upgraded to 0.3.8 Link

  • libnd4j (c++ codebase underlying DL4J, ND4J and SameDiff) refactored to be more easily embeddable in other C++ projects Link

  • ImagePreProcessingScaler now supports preprocessing of labels (for segmentation) Link

  • Additional datatypes now supported for nd4j-tensorflow TensorflowConversion Link

  • SameDiff operation namespaces (sd.math, sd.image, etc) are now code generated to ensure SameDiff and ND4J namespaces are identical (all operations included, same API) Link

  • Added ND4J ArchiveUtils.unzipFileTo(String, String, boolean logFiles) overload to enable/disable extracted file path logging Link

  • Added weight format configuration for following operations: conv1D, conv2D, conv3D, deconv2d, deconv3d, depthwiseConv2d, pointwiseConv2d, sconv2d Link

  • Added backprop operation implementations for mergemax, mergeadd, mergeavg operations Link

  • MKL version upgraded to 2020.0 2020.1; OpenCV upgraded from 4.2.0 to 4.3.0 Link

  • SameDiff: DifferentialFunctionFactory class removed in favor of namespace methods (sd.math, sd.linalg, etc) Link

  • Added lstmLayer_bp operation Link

  • Added gru_bp operation Link

  • linspace operation can now use both targs and arrays for start/end/size arguments Link

  • Assorted dependency updates - OpenBLAS (0.3.9), OpenCV (4.3.0), Leptonica (1.79.0) Link

  • Upgraded assorted dependency versions: javax.activation:activation (1.1 -> 1.1.1), stream analytics (2.7.0->2.9.8), Apache Spark (2.4.3->2.4.5), Jackson databind (2.10.1 -> 2.10.3), Vertx (3.8.3 -> 3.9.0) Link

  • Added nd4j-common-tests ResourceUtils.listClassPathfiles method Link

Bug Fixes and Optimizations

  • Updaters (Adam, AdaGrad, etc) optimized via C++ operations (significant training performance boost) for DL4J and SameDiff Link, Link

  • SameDiff - added CuDNN support Link

  • Some packages relocated to avoid split packages (that can be a problem for OSGi and Java 9 modules) Link

    • Note: this is a breaking change for some class packages/imports. See this link for details on exact package changes

  • Fixed some issues with Tensorflow import of FusedBatchNorm operation Link

  • Fixed an issue where the Roll operation did not match Tensorflow operation Link Link

  • Fixed an issue where ArchiveUtils could fail to create the top level destination directory when it does not exist Link

  • Fixed an issue where resize_bicubic operation did not match Tensorflow for some configuration values Link Link

  • Pad operation now supports long/int64 values for padding array Link Link

  • Fixed an issue where hashcode operation shape function wasn't always returning int64/long dtype Link

  • Fixed an issue with reshape operation on empty arrays with -1s Link Link

  • Improved performance on CUDA for concat operation Link and CPU/GPU Link

  • Improved performance for bias_add operation

    • On CPU for NHWC case Link

    • Generally Link

    • On CUDA for 2D case Link

  • Added MKLDNN (DNNL/OneDNN) support for depthwise_conv2d operation for DL4J and SameDiff Link

  • Fixed a small SameDiff execution issue for switch operation where the predicate is a constant Link

  • Fixed an issue with batchnorm operation when input arrays have unusual strides Link

  • Merged nd4j-buffer, nd4j-content modules into nd4j-api Link

  • Deleted deprecated nd4j-jackson module (remaining functionality available in nd4j-api) Link

  • Deleted unused/unmaintained nd4j-camel and nd4j-gson modules Link

  • Optimization for legacy random ops Link

  • Optimization for broadcast operations Link, Link, Link, Link, Link

  • Performance optimization for multiple operations: softmax, squeeze, expand_dims, tanh Link

  • Optimization for transpose/permute operations Link

  • Performance enhancement: MKLDNN matmul used for some mmul operation cases Link

  • Optimization for gather operation on CPU Link

  • Optimization for stack/unstack operations on CPU Link

  • Optimization for split operation (CPU and CUDA) Link Link

  • ND4J initialization no longer logs number of OpenMP BLAS threads for CUDA Link

  • Optimization: Fixed issues with auto-vectorization on multple CPU operations Link

  • Optimization for reshape operation Link, Link

  • Fixed an issue where INDArray.hashCode() could cause an exception on some datatypes Link

  • Optimization for CPU: MKLDNN is now used for softmax, tanh, softmax_bp and tanh_bp operations Link, Link, Link, Link

  • Fixed random_exponential operation Link

  • Improved performance on C++ SameDiff graph execution via reduced array zeroing where safe to do so Link

  • Improved C++ indexing implementation impacting CPU performance on some operations Link

  • Fixed an issue where Split operation could have incorrect output shapes for empty arrays Link

  • Fixed some issues with SameDiff.equals method Link

  • Fixed an issue with reshape operation output shape on empty arrays Link, Link

  • Nd4j.gemm now uses Mmul operation internally to avoid potential threading issues with direct BLAS calls on CUDA Link

  • Fixed an edge case issue with percentile operation link

  • Fixed an edge case issue for cusolved (CUDA) in libnd4j Link

  • Fixed an issue with error formatting for segment operations for incorrect lengths Link

  • Fixed an issue where ND4J workspaces were not guaranteed to be unique Link

  • Fixed some operation implementations when operating on views (Batch/Space to Space/Batch/Depth; batchnorm_bp) Link

  • Fixed an issue where exponential distribution random number generation operation could produce infinities extremely rarely (~1 in 10^9 values) Link

  • Fixed an issue with long file paths for memory mapped workspaces on Windows Link

  • Memory for memory mapped workspaces are now deallocated immediately when workspace is destroyed, instead of waiting for GC to free memory Link

  • Fall-back to other BLAS implementation for cases where MKLDNN GEMM implementation is slow Link

  • Set nd4j-native source/target to Java 7 Link, Link

DataVec

Features and Enhancements

  • datavec-python: added zero-copy support for bytes/byte buffers Link

  • datavec-python: Python exceptions are now thrown as Java exceptions Link

  • datavec-python: Added support for additional NumPy datatypes Link

  • datavec-python: Python version upgraded from 3.7.6 to 3.7.7 Link

Bug Fixes and Optimizations

  • Deleted not properly maintained modules: datavec-camel, datavec-perf Link

  • Fixed missing BOOL datatype support for arrow conversion functionality Link

  • Assorted fixes for datavec-python Link Link, Link

  • Fixed an issue with LineRecordReader where initialization was performed unnecessarily (adding performance overhead) Link

RL4J

Features and Enhancements

  • Refactoring to decouple configuration and learning methods from their implementations Link

  • Added builder patterns for all configuration classes Link

Arbiter

Bug Fixes and Optimizations

  • Fixes an issue with GridSearchCandidateGenerator not working correctly for some cases Link, Link

Version 1.0.0-beta6

Highlights - 1.0.0-beta6 Release

  • Added support for CUDA 10.2. 1.0.0-beta6 released with CUDA 9.2, 10.0, 10.1 and 10.2 support

  • SameDiff optimizations - memory use for inference and training significantly reduced, with some performance improvements also

  • Deeplearning4j UI - Play framework replaced with Vertx; deeplearning4j-ui dependency now no longer has Scala dependency or Scala version suffix Link

    • Note: No API changes, only artifact ID change: replace deeplearning4j-ui_2.1x with deeplearning4j-ui

  • ND4j namespace operation methods: operations are available through the Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network), for example Nd4j.math.abs(INDArray), Nd4j.random.logNormal etc Link.

    • Note that additional ND4J namespaces API will have additions (new namespaces and methods), and may have some API changes, in the next release

  • OpenMP replaced with thread pool c++ parallelism framework; enabled c++ parallelism for platforms without C++-level threading for operations

Deeplearning4J

Deeplearning4J: Features and Enhancements

  • DNNL (MKL-DNN) upgraded to version 1.1

  • Added causal convolution mode for Convolution1D layer (ConvolutionMode.Causal) and added causal conv1d support for Keras import Link

  • Keras import now supports scaled identity weight initialization Link

  • Added Mish activation function Link, Link

  • BertIterator now has a BertIterator.featurizeSentences(List<String>) method for inference Link, Link

  • BertIterator now supports sentence pairs for supervised training Link

  • Added Sparse multi-class cross entropy for both Deeplearning4j and Keras import Link, Link

  • Deeplearning4j UI: migrated from Play to Vertx for web serving backend, also removing dependency on Scala libraries; no API changes, only artifact ID change - replace deeplearning4j-ui_2.1x with deeplearning4j-ui Link, Link

  • Added TimeDistributed wrapper layer Link

Deeplearning4J: Bug Fixes and Optimizations

  • KDTree implementation optimized Link

  • Deeplearning4j zoo models and datasets hosting location updated Link

  • Fixed nIn validation for Deconv2D layer Link

  • Fixed an issue with incorrect Deconvolution2d results for Keras import models Link

  • Added DNNL/MKLDNN support for batch normalization layer Link, Link

  • Fixed various integer casts to avoid overflows for very large arrays (with dimensions or length > Integer.MAX_VALUE) Link

  • Fixed an issue with UNet non-pretrained model architecture (last layer kernel size) Link

  • Deeplearning4j SameDiff layers now use DL4J workspaces for better performance and reduced memory consumption Link

  • Updated broken links in afew error messages Link

  • Cleaned up a few unused dependencies in various modules Link

  • Cleaned up duplicate SamplingDataSetIterator class Link

  • Fixed an issue where ComputationGraph instances with a single input going into multiple embedding layers could throw a NPE Link

  • Fixed an issue where loss function weights were not automatically cast to network datatype, resulting in an exception if not already correct type Link

  • Shaded Jackson version upgraded from 2.9.9/2.9.9.3 to 2.10.1 Link

  • Fixed an issue with KNN where getMostPopulatedClusters actually returned the least populated clusters Link

Deeplearning4j: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

  • Deeplearning4j UI artifact ID has changed: deeplearning4j-ui_2.1x (beta5 and earlier) with deeplearning4j-ui

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

  • Added suport for CUDA 10.2 Link

  • DNNL (MKL-DNN) upgraded to version 1.1 Link

  • Added ND4j namespaces to match SameDiff: Nd4j.math, Nd4j.random, Nd4j.bitwise, Nd4j.nn (neural network) Link

  • Added SameDiff.calculateGradientsAndOutputs method Link Link

  • Additional SameDiff single batch .output method overloads for DataSet/MultiDataSet added Link

  • TensorFlow import ops coverage enhanced (significant number of additional ops supported) Link, Link, Link, Link, Link

  • PRelu op added Link

  • adjust_contrast, igamma and igammac ops added Link

  • ND4J/SameDiff: BitCast, CompareAndBitpack, DivideNoNan, DrawBoundingBoxes, FakeQuantWithMinMaxVarsPerChannel ops added Link

  • non_max_suppression_overlaps op added Link

  • ImagePreProcessingScaler now supports segmentation use cases Link

  • concat operation now supports the concatenation axis being specified via the last input array Link

  • Added Gamma and Poisson RNG distributions Link

  • SameDiff’s use of DeviceLocal for variables/constants etc is now configurable Link

  • Uniform distribution op now supports random integer generation, not just random floating point generation Link

  • SameDiff: Added simple OpBenchmarkListener for benchmarking purposes Link

  • Added the ability to disable platform helpers (DNNL/MKLDNN etc) via Nd4jCPU.Environment.getInstance().allowHelpers(false); and Nd4jCuda.Environment.getInstance().allowHelpers(false); Link

  • Added draw_bounding_boxes operation Link

  • Added resize_bicubic operation Link

  • Added causal padding mode to conv1d operation Link

  • DNNL (MKLDNN) is included and enabled by default for non-AVX builds Link

  • Added SameDiff ArraySavingListener for debugging purposes Link

ND4J/SameDiff: Bug Fixes and Optimizations

  • OpenMP replaced with ThreadPool abstraction, enables parallelism for platforms without OpenMP support Link

  • SameDiff memory management overheauled for (in some cases significantlny) reduced memory consumption and improved performance Link, Link

  • Switched to Clang instead of gcc for OSX compilation to avoid compiler-related issues Link

  • Removed SameDiff.outputs() “best guess” output inference due to being unreliable, in favor of explicit SameDiff.setOutputs(String...) call Link

  • Fixed an issue with Nd4j.hstack on 1D arrays Link

  • SameDiff no longer allows empty arrays for variables Link

  • Fixed an issue with Nadam updater LR schedules not being cloned Link

  • Cleaned up IActivation interface Link

  • Added new LSTM op implementation with DNNL/MKLDNN support (forward pass only so far) Link

  • SameDiff API cleaned up; deprecated methods removed Link

  • Switched SameDiff variable initialization to non-lazy, to avoid unexpected behaviour when mixing execution and ND4J RNG seed setting Link

  • SameDiff.zero and .one methods now create constants, not vairables Link

  • Moved CUDA build version and device logging to Java logging, from c++ stdout to enable disabling logging (via ND4J config or slf4j config) Link

  • Added DNNL/MKLDNN support for batch normalization Link

  • SameDiff: Fixed an issue where listeners weren’t being called for gradient calculation Link

  • Added DNNL/MKLDNN support for deconv2d/3d operations Link

  • Fixed an issue with biasadd_bp operation and NHWC data format Link

  • Fixed an issue with certain strided slice backprop configurations Link, Link

  • Fixed an issue with LogSumExp reduction operation backprop for along dimension case Link, Link

  • INDArray.toString() now has correct brackets for rank 1+ scalars to avoid ambiguity Link

  • Fixed an issue where some ND4J methods could fail when the library is compiled on Java 9+ but run on Java 8 Link

  • Fixed empty array input case for is_strictly_increasing, non_decreasing and non_max_suppression ops Link, Link

  • Fixed empty input arrays for legacy ops (transform, scalar, pairwise, broadcast) Link

  • CUDA compute capability 3.0 is supported again Link

  • Improved performance for Scatter operations (1D case) + index validation Link

  • Fixed an issue where SameDiff TrainingConfig serialization would fail if evaluation instances are set Link, Link

  • SameDiff execution will now throw an exception when assertion operations in the graph fail Link

  • PolyGamma function now returns NaNs when passed double for args requiring integer values Link

  • Fixed some issues for pad and mirror_pad ops to ensure they conform with Tensorflow for imported networks Link

  • Updated and fixed some issues for TensorFlow graph runner Link

  • Improved performance for Reverse operation Link

  • Removed/cleanup up unused ND4J list functionality Link

  • Fixed reduce bool operation results (such as any, all, IsInf, etc) for empty array inputs Link

ND4J: Transition Guide, 1.0.0-beta5 to 1.0.0-beta6

  • SameDiff.outputs() now requires user to call SameDiff.setOutputs(String...) first; previous “best guess” output inference was unreliable Link

  • SameDiff.zero and .one methods now create constants, not vairables Link

DataVec

DataVec: Bug Fixes and Optimizations

  • NativeImageLoader now checks for empty input streams and throws an exception instead of crashing Link

  • NDArrayScalarOpTransform now supports modulus operator Link

RL4J

RL4J: Features and Enhancements

  • Added AsyncTrainingListener Link

  • Replaced multiple uses of java.util.Random with ND4J Random Link

  • Added Observable and LegacyMDPWrapper Link

RL4J: Bug Fixes and Optimizations

  • Refactored RL4J video recording to separate VideoRecorder class Link

  • Fixed an issue with target for DQN Link, Link

  • Refactoring for DQN and double DQN for improved maintainability Link

  • Internal refactoring and various bug fixes Link

PyDataVec

PyDataVec Features and Enhancements

  • PyDataVec TransformProcess now supports non-inplace operations Link

PyDataVec Bug Fixes and Optimizations

  • Fixed various issues with PyDataVec Link

  • Fixed an issue with data locality that could cause incorrect results under some circumstances when running on CUDA Link

Version 1.0.0-beta5

Highlights - 1.0.0-beta5 Release

  • Added model server - remote inference of SameDiff and DL4J models using JSON or (optionally) binary serialization

  • Added Scala 2.12 support, dropped Scala 2.10 support. Modules with Scala dependencies are now released with Scala 2.11 and 2.12 versions

  • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading: 1.0.0-beta4_spark2 -> 1.0.0-beta5

  • Added FastText support to deeplearning4j-nlp

  • CUDA support for all ND4J/SameDiff Operations

    • In 1.0.0-beta4, some operations were CPU only. Now, all operations have full CUDA support

  • Added support for new data types in ND4J (and DL4J/SameDiff): BFLOAT16, UINT16, UINT32, UINT64

  • ND4J: Implicit broadcasting support added to INDArray (already present in SameDiff - for example shape [3,1]+[3,2]=[3,2])

  • CUDA 9.2, 10.0 and 10.1-Update2 still supported

    • NOTE: For CUDA 10.1, CUDA 10.1 update 2 is recommended. CUDA 10.1 and 10.1 Update 1 will still run, but rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems

  • Dependency upgrades: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, and shaded to avoid dependency clashes)

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

Deeplearning4J

Deeplearning4J: Features and Enhancements

  • Added FastText - inference and training, including OOV (out of vocabulary) support (Link)

  • Scala 2.12 support added, Scala 2.10 support dropped (Link)

  • Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link

  • Added saved model format validation utilities - DL4JModelValidator, DL4JKerasModelValidator (Link)

  • Added LabelLastTimeStepPreProcessor (Link)

  • BertIterator: added option to prepend token to the output (such as [cls] expected by some models) (Link)

  • Added trace level logging to MultiLayerNetwork and ComputationGraph assist with debugging certain issues (Link)

  • Upsampling3D: Added NDHWC support (Link)

  • MergeVertex now supports broadcasting (Link)

  • LSTM and Dropout will now fall back on built-in implementations if an exception is encountered from cuDNN (same as Subsampling/ConvolutionLayer) (Link)

  • Improved JavaDoc and cleanup up API for WordVectorSerializer (Link, Link)

Deeplearning4J: Bug Fixes and Optimizations

  • Updated deeplearning4j-ui theme (Link)

  • Fixed an issue with MergeVertex and CNN3D activations (Link)

  • Fixed typo in Yolo2OutputLayer builder/configuration method name (Link)

  • Improved ComputationGraph builder InputType validation (Link)

  • Removed dl4j-spark-ml module until it can be properly maintained (Link)

  • Fixed an issue with BertWordPieceTokenizerFactory and bad character encoding (Link)

  • Fixed an issue with LearnedSelfAttentionLayer and variable minibatch size (Link, Link)

  • Fixed issue with SharedTrainingMaster controller address when set from environment variable (Link)

  • Fixed issue with SameDiffOutputLayer initialization under some circumstances (Link)

  • https is now used by default for data and zoo model downloads (Link, Link)

  • Fixed an issue where UI WebJars dependencies would check for updates on every single build (Link, Link)

  • Fixed issue where Upsampling layer memory report could produce an OOM exception (Link)

  • Improved UX/validation for RecordReaderDataSetIterator (Link)

  • Fixed an issue where EmbeddingSequenceLayer would not check mask array datatype (Link)

  • Improved validation when initializing networks with a non rank-2 (shape [1, numParams]) array (Link)

  • Fixed a DataType issue for BertIterator (Link)

  • Fixed Word2Vec model backward compatibilty (beta3 and earlier models now loadable again) Link

  • Fixed issue where some Keras import models could fail with Could not read abnormally long HDF5 attribute (Link)

  • Added validation for RnnOutputLayer - feature/label array lengths (Link)

  • Fixed an issue where SameDiffOutputLayer would not support variable minibatch size (Link)

  • Fixed DL4J SameDiff layer mask support (Link)

  • DL4J UI: Fixed an issue where tab switching did not work when visualizing saved/stored data (Link, Link)

  • DL4J UI: Fixed a rare UI threading issue (Link)

  • Fixed a Keras import issue with JSON format change (Link)

  • Fixed a Keras import issue where updater learning rate schedule could be imported incorrectly (Link)

  • Fixed an issue with CnnSentenceDataSetIterator when using UnknownWordHandling.UseUnknownVector (Link, Link)

  • Fixes and optimizations to DL4J SameDiff layers (Link)

  • MultiLayerNetwork/ComputationGraph will now log the original exception if a second exception occurs during workspace closing, instead of swallowing it (inference/fit operation try/finally blocks) (Link)

  • Upgraded dependencies: Jackson (2.5.1 to 2.9.9/2.9.9.3), Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, shaded to avoid dependency clashes) (Link)

  • Logging framework can now be configured for DL4J UI (due to Play framework dependency upgrade) (Link)

  • Reduced amount of garbage produced by MnistDataFetcher (impacts MNIST and EMNIST DataSetIterators) (Link)

  • Activation function backpropagation has been optimized for many activation functions (Link, Link)

Deeplearning4j: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

  • DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J, use org.nd4j.linalg.dataset.Async(Multi)DataSetIterator instead

  • Saved models with custom layers from 1.0.0-alpha and before can no longer be loaded. Workaround: load in 1.0.0-beta4, and re-save the model (Link). Models without custom layers can still be loaded back to 0.5.0

  • Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading, change versions as follows: 1.0.0-beta4_spark2 -> 1.0.0-beta5

  • Scala 2.10 dropped, Scala 2.12 added (for modules with Scala dependencies)

Deeplearning4j: 1.0.0-beta5 Known Issues

  • dl4j-spark_2.11 and _2.12 dependencies incorrectly pull in datavec-spark_2.11/2.12 version 1.0.0-SNAPSHOT. Workaround: control version using dependency management as per here or here

  • Some layers (such as LSTM) may run slower on 1.0.0-beta5 than 1.0.0-beta4 on CUDA when not using cuDNN, due to added synchronization. This synchronization will be removed in the next release after 1.0.0-beta5

  • CUDA 10.1: Rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems, when running CUDA 10.1 Update 1 (and maybe 10.1). CUDA 10.1 update 2 is recommended.

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

  • Added new data types: BFLOAT16, UINT16, UINT32, UINT64 (Link)

  • CUDA support for all operations without CUDA implementations (Link, Link, Link, Link, Link)

  • Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link

  • Added support for empty arrays with zeros in shape, for compatibility with TensorFlow import (Link)

  • CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer

  • Improved SameDiff training API - added "in line" test set evaluation, returning History object with loss curve, etc (Link)

  • Added saved model format validation utilities - Nd4jValidator, Nd4jCommonValidator (Link)

  • Added SameDiff ScoreListener (equivalent to DL4J ScoreIterationListener/PerformanceListener) (Link, Link)

  • Added SameDiff.convertDataTypes method, for variable dtype conversion (Link)

  • Added crop and resize op (Link)

  • DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J Link

  • Added basic/MVP SameDiff UI listener (Link)

  • Added SameDiff CheckpointListener (Link, Link)

  • Added SameDiff name scopes (Link)

  • SameDiff: Updater state and training configuration is now written to FlatBuffers format (Link)

  • Added c++ benchmark suite callable from Java - call using Nd4j.getExecutioner().runLightBenchmarkSuit() and Nd4j.getExecutioner().runFullBenchmarkSuit() (Link)

  • Added SameDiff.save/load methods with InputStream/OutputStream arguments (Link, Link)

  • Added axis configuraiton for evaluation instances (Evaluation, RegressionEvaluation, ROC, etc - getAxis and setAxis methods) to allow different data formats (NCHW vs. NHWC for CNNs, for example) (Link)

  • SameDiff: Added support to convert constants to placeholders, via SDVariable.convertToConstant() method (Link)

  • SameDiff: Added GradCheckUtil.checkActivationGradients method to check activation gradients for SameDiff instance (not just parameter gradients as in existing gradient check methods) (Link)

  • Added CheckNumerics op (Link)

  • Added FakeQuantWithMinMaxArgs and FakeQuantWithMinMaxVars ops (Link)

  • Added INDArray reduction methods with "keep dimensions" option - for example, INDArray.mean(boloean, int... dimension) (Link)

  • Added Nd4j SystemInfo class - SystemInfo.getSystemInfo, .writeSystemInfo(File) to aid with debugging issues (Link, Link)

  • Added INDArray.toString(NDArrayStrings options), toStringFull() and toString overloads for easier control of array printing (Link)

  • Added HashCode op, INDArray.hashCode() (Link)

  • SameDiff: added whileLoop, ifCond methods for loops/conditional ops (Link)

  • Cleaned up some infrequently used Nd4j methods (Link, Link, Link, Link)

  • Added bitwise integer operations: left/right bit shift, left/right cyclical bit shift, bitwise Hamming distance (Link, Link, Link, Link, Link)

  • deeplearning4j-nlp: renamed AggregatingSentencePreProcessor to sentencePreProcessor method (Link)

  • Upgraded (and shaded) Protobuf version - 3.5.1 to 3.8.0 (Link)

  • Switched to c=style error handling for libnd4j native operations (Link)

  • Renamed FlatBuffers enum org.nd4j.graph.DataType to org.nd4j.graph.DType to avoid users importing incorrect type when using Nd4j methods (Link, Link)

  • Added SameDiff.bitwise namespace for bitwise ops (Link, Link)

ND4J/SameDiff: Bug Fixes and Optimizations

  • Updated to JavaCPP/JavaCV 1.5.1-1 (Link)

  • SameDiff: Placeholders must now only be provided if required to calculate the requested variables (Link)

  • SameDiff: Fixed an issue with duplicate variable name validation (Link)

  • SameDiff: Fixed an issue with SDVariable.getArr for scalars (Link)

  • Added delayed mode to DeviceLocalNDArray (don't replicate to device until needed) (Link)

  • ND4J: Fixed an issue with writing 0d (scalar) NDArrays in numpy .npy format (Link)

  • Fixed an issue with Pad operation for some constant cases (Link)

  • Fixed some issues with strided_slice operation (Link, Link, Link)

  • SameDiff: Fixed issue with DataType inference for some ops using ND4J default datatype (Link)

  • INDArray.castTo(DataType) is now a no-op when array is already the correct type (Link)

  • SameDiff: Fixed an issue with training mixed precision networks (Link)

  • Fixed an issue where Evaluation class was incorrectly reporting macro-averaged precision for binary case (Link)

  • Removed trainableParams config/field from SameDiff TrainingConfig (no longer required) (Link)

  • Improvements and cleanup to ND4J Javadoc (Link, Link, Link, Link)

  • Fixed an issue with Cholesky Lapack op on CUDA (Link, Link)

  • Fixed an issue where [1,N] and [N,1] arrays were not considered a matrix (rank 2 array) according to INDArray.isMatrix() (Link)

  • Fixed RegressionEvaluation for 4D arrays (CNNs / segmentation) (Link, Link)

  • Fixed issue with INDArray.median(int... dimension) (Link)

  • Fixed NPE that could occur when executing gather operation backprop (Link)

  • Fixed issue with LogSumExp operation Java/C++ mapping (Link)

  • Added header validation when reading Numpy .npy files, to ensure file is valid (Link)

  • Fixed a possible issue with reading Numpy .npy files on CUDA (Link)

  • Fixed an issue when reading Numpy .npy boolean files (Link)

  • Various fixes for TensorFlow import (Link)

  • Fixed an issue with a small number of Nd4j.create methods not creating arrays corresponding to the java primitive (Link)

  • Improved shape validation for some Nd4j.create methods (Link)

  • Cleaned up unmaintained Nd4j.createSparse methods (Link)

  • Fixed a CUDA issue for CUDA GPUs with CC 3.0 (Link)

  • Fixed some possible integer overflows in c++ code (Link)

  • Removed deprecated methods: Nd4j.trueScalar and Nd4j.trueVector (Link, Link)

  • Fixed an issue where some JVMs could warn about "Illegal reflective access" due to a (now removed) SameDiff dependency (Link)

  • SDVariable now no longer extends DifferentialFunction (Link)

  • Moved numerous operation calculateOutputShape instances from Java to C++ (Link)

  • Fixed an issue where maxpool2d_bp could throw an exception when NaN values are present (Link)

  • Fixed an issue with concatenation of empty shapes (with zeros) (Link)

  • Removed INDArray.javaTensorAlongDimension (Link)

  • LayerNorm operation now properly supports axis arg, NCHW format data (Link)

  • libnd4j: cuBLAS hgemm (FP16 gemm) wil only be called for devices with compute capability >= 5.3 due to cuBLAS limitations (Link)

  • Nd4j.readNumpy optimized (Link)

  • Added configurable alpha parameter to ELU and lrelu_bp operations in c++ (Link)

  • Cleaned up SameDiff SDCNN/SDRNN (SameDiff.cnn, .rnn) API/methods (Link, Link)

ND4J: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

  • OldAddOp, OldSubOp, etc removed: Replace with AddOp, SubOp, etc

  • Nd4j.trueScalar and trueVector removed; use Nd4j.scalar and Nd4j.createFromArray methods

  • INDArray.javaTensorAlongDimension removed; use INDArray.tensorAlongDimension instead

  • INDArray.lengthLong() removed; use INDArray.length() instead

ND4J: 1.0.0-beta5 Known Issues

  • nd4j-native on some OSX systems can fail with Symbol not found: ___emutls_get_address - See this link

  • SBT 1.3.0 can fail with an Illegal character in path error; SBT 1.2.8 is OK. This is an SBT issue, not an ND4J issue. See this link for details

DataVec

DataVec: Features and Enhancements

  • ImageRecordReader: Support for 16-bit TIFF added (Link)

  • Added SequenceTrimToLengthTransform (Link)

DataVec: Bug Fixes and Optimizations

  • Fixed an issue with AnalyzeSpark and String columns (Link)

  • Fixed an issue with URL scheme detection in NumberedFileInputScheme (Link)

  • Fixed an issue with RandomPathFilter sampling being biased (Link, Link)

RL4J

RL4J: Features and Enhancements

RL4J: Bug Fixes and Optimizations

  • Fixed issue with compression for HistoryProcessor (Link)

Arbiter

Bug Fixes and Optimizations

  • Updated EvaluationScoreFunction to use ND4J Evaluation class metrics (Link)

  • Fixed incorrect search size in GridSearchCandidateGenerator (Link)

Arbiter: Known Issues

  • The Jackson version upgrade necessitated a change to how generic object serialization was performed; Arbiter JSON data stored in 1.0.0-beta4 or earlier format may not be readable in 1.0.0-beta5 (Link)

ND4S

ND4S Features and Enhancements

  • Added full data type support to ND4S as per ND4J (Link)

  • Added syntactic sugar for SameDiff (implicits, operator overloads) (Link)

Version 1.0.0-beta4

Highlights - 1.0.0-beta4 Release

Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following datatypes are supported:

  • DOUBLE: double precision floating point, 64-bit (8 byte)

  • FLOAT: single precision floating point, 32-bit (4 byte)

  • HALF: half precision floating point, 16-bit (2 byte), "FP16"

  • LONG: long signed integer, 64 bit (8 byte)

  • INT: signed integer, 32 bit (4 byte)

  • SHORT: signed short integer, 16 bit (2 byte)

  • UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255

  • BYTE: signed byte, 8 bit (1 byte), -128 to 127

  • BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization

  • UTF8: String array type, UTF8 format

ND4J Behaviour changes of note:

  • When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)

    • For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray

    • Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively

  • Some operations require matched datatypes for operands

    • For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))

  • Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.

DL4J Behaviour changes of note:

  • MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.

    • The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)

    • Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType) method

Main new methods:

  • Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument

  • INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another

  • New Nd4j.createFromArray(...) methods for

ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

ND4J: Mac/OSX CUDA support dropped

Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

  • ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)

  • SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)

  • BatchNormalization layer (and BatchNorm ND4J op)

  • LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)

  • Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)

MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

ND4J: Improved Performance due to Memory Management Changes

Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

Nd4j.getMemoryManager().togglePeriodicGc(true);
Nd4j.getMemoryManager().setAutoGcWindow(5000);

ND4J: Improved Rank 0/1 Array Support

In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

DL4J: Attention layers added

Deeplearning4J

Deeplearning4J: Features and Enhancements

  • Added MKL-DNN support for Conv/Pool/BatchNorm/LRN layers. MKL-DNN will be used automatically when using nd4j-native backend. (Link, Link)

  • L1/L2 regularization now made into a class; weight decay added, with better control as to when/how it is applied. See this page for more details on the difference between L2 and weight decay. In general, weight decay should be preferred to L2 regularization. (Link, Link)

  • The parameter/activation datatypes for new models can be set for new networks using the dataType(DataType) method on NeuralNetConfiguration.Builder (Link)

  • MultiLayerNetwork/ComputationGraph can be converted between (floating point) datatypes FP16/32/64 for the parameters and activations using the MultiLayerNetwork/ComputationGraph.convertDataType(DataType) methods (Link, Link)

  • EmbeddingLayer and EmbeddingSequenceLayer builders now have .weightInit(INDArray) and .weightInit(Word2Vec) methods for initializing parameters from pretrained word vectors (Link)

  • PerformanceListener can now be configured to report garbage collection information (number/duration) Link

  • Evaluation class will now check for NaNs in the predicted output and throw an exception instead treating argMax(NaNs) as having value 0 (Link)

  • Added ModelAdapter for ParallelInference for convenience and for use cases such as YOLO (allows improved performance by avoiding detached (out-of-workspace) arrays) (Link)

  • Added GELU Activation function (Link)

  • Added BertIterator (a MultiDataSetIterator for BERT training - supervised and unsupervised) Link

  • Added validation to MultiLayerNetwork/ComputationGraph that throws an exception when attempting to perform Regression evaluation on a classifier, or vice-versa (Link, Link)

  • Added ComputationGraph.output(List<String> layers, boolean train, INDArray[] features, INDArray[] featureMasks) method to get the activations for a specific set of layers/vertices only (without redundant calculations) (Link)

  • Weight initialization for networks is now implemented as classes (not just enumerations) and hence is now extesible via IWeightInit interface (Link); i.e., custom weight initializations are now supported (Link, Link)

  • Added Capsule Network layers (no GPU acceleration until next release) - CapsuleLayer, CapsuleStrengthLayer and PrimaryCapsules (Link)

  • Added Cifar10DataSetIterator to replace CifarDataSetIterator (Link, Link)

  • Keras import: Importing models from InputStream is now supported (Link, Link)

  • Layer/NeuralNetConfiguration builders now have getter/setter methods also, for better Kotlin support (Link)

  • Most JavaScript dependencies and fonts for UI have been migrated to WebJars (Link)

  • CheckpointListener now has static availableCheckpoints(File), loadCheckpointMLN(File, int) and lostLastCheckpointMLN(File) etc methods (Link)

  • MultiLayerNetwork/ComputationGraph now validate and throw an exception in certain incompatible RNN configurations, like truncated backpropagation through time combined with LastTimeStepLayer/Vertex (Link)

  • Added BERT WordPiece tokenizers (Link)

  • Deeplearning4j UI now has multi-user/multi-session support - use UIServer.getInstance(boolean multiSession, Function<String,StatsStorage>) to start UI in multi-session mode (Link)

  • Layer/NeuralNetworkConfiguration builder method validation standardized and improved (Link)

  • WordVectorSerializer now supports reading and exporting text forwat vectors via WordVectorSerializer.writeLookupTable and readLookupTable (Link]

  • Updated to JavaCPP, JavaCPP presets, and JavaCV version 1.5 (Link)

  • Added EvaluationBinary false alarm rate calculation (Link)

  • ComputationGraph GraphBuilder now has an appendLayer method that can be used to add layers connected to the last added layer/vertex (Link)

  • Added Wasserstein loss function (Link)

  • Keras import: Improved errors/exceptions for lambda layer import (Link)

  • Apache Lucene/Solr upgraded from 7.5.0 to 7.7.1 (Link)

  • KMeans clustering strategy is now configurable (Link)

Deeplearning4J: Bug Fixes and Optimizations

  • DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly (Link)

  • cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown (Link)

  • Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training (Link)

  • Fixed a bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout (Link, Link)

  • Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs (Link)

  • Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores (Link)

  • Fixed an edge case with ParallelInference on CUDA where (very rarely) input array operations (such as normalization) may not be fully completed before transferring an array between threads (Link, Link)

  • Fixed an edge case with KFoldIterator when the total number of examples is not a multiple of the batch size (Link, Link)

  • Fixed an issue where DL4J UI could throw a NoClassDefFoundError on Java 9/10/11 (Link, Link)

  • Keras import: added aliases for weight initialization (Link)

  • Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned (Link)

  • Fixed workspace issue with ElementwiseVertex with single input (Link)

  • Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception (Link)

  • Fixed issue where LossMultiLabel would generate NaNs when all labels in minibatch are the same class. Now 0 gradient is returned instead. (Link, Link)

  • Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format (Link)

  • Fixed issue where BaseDatasetIterator.next() would not apply preprocessors, if one was set (Link)

  • Improved default configuration for CenterLossOutputLayer (Link)

  • Fixed an issue for UNet non-pretrained configuration (Link)

  • Fixed an issue where Word2Vec VocabConstructor could deadlock under some circumstances (Link)

  • SkipGram and CBOW (used in Word2Vec) were made native operations for better performance (Link)

  • Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener (Link)

  • Optimization: Workspaces were added to SequenceVectors and Word2Vec (Link)

  • Improved validation for RecordReaderDataSetIterator (Link)

  • Improved handling of unknown words in WordVectors implementation (Link)

  • Yolo2OutputLayer: Added validation for incorrect labels shape. (Link)

  • LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) (Link)

  • Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases (Link)

  • Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() (Link)

  • Async iterators are now used in EarlyStoppingTrained to improve data loading performance (Link)

  • EmbeddingLayer and EmbeddingSequenceLayer performance has been improved on CUDA (Link)

  • Removed outdated/legacy scala tools repository (Link, Link)

  • Fixed issues in L2NormalizeVertex equals/hashcode methods (Link)

  • Fixed Workspace issue in ConvolutionalListener (Link)

  • Fixed EvaluationBinary falsePositiveRate calculation (Link)

  • Added validation and useful exception for MultiLayerNetwork.output(DataSetIterator) methods (Link)

  • Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called (Link)

  • Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training (Link)

  • Improved performance for KMeans implementation (Link)

  • Fixed an issue with rnnGetPreviousState for RNNs in 'wrapper' layers such as FrozenLayer (Link)

  • Keras import: Fixed an issue with order of words when importing some Keras tokenizers (Link)

  • Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class (Link)

  • Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers (Link)

  • Keras import: fixed an import issue with input type inference for some RNN models (Link)

  • Fixed some padding issues in LocallyConnected1D/2D layers (Link)

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

  • Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays (Link)

  • Added INDArray.close() method to allow users to manually release off-heap memory immediately (Link)

  • SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff (Link)

  • Added Nd4j.createFromNpzFile method to load Numpy npz files (Link)

  • Added support for importing BERT models into SameDiff (Link, Link)

  • Added SameDiff GraphTransformUtil for performing transfer learning and other graph modifications (Link, Link, Link)

  • Evaluation, RegressionEvaluation etc now support 4d (CNN segmentation) data formats; also added Evaluation.setAxis(int) method to support other data formats such as channels-last/NHWC for CNNs and NWC for CNN1D/RNNs. Defaults to axis 1 (which matches DL4J CNN and RNN data formats) (Link, Link)

  • Added basic ("technology preview") of SameDiff UI. Should be considered early WIP with breaking API changes expected in future releases. Supports plotting of SameDiff graphs as well as various metrics (line charts, histograms, etc)

    • Currenty embedding in the DL4J UI - call UIServer.getInstance() then go to localhost:9000/samediff to access.

    • For more details, see 1, 2, 3

  • Added DotProductAttention and MultiHeadDotProductAttention operations (Link)

  • Added Nd4j.exec(Op) and Nd4j.exec(CustomOp) convenience methods (Link)

  • SameDiff TensorFlow Import

    • Import of TF Assertions added (Link)

    • Support/fixes for control dependencies (Link)

    • Support/fixes for TensorArray and related ops (Link, Link, Link)

  • nd4j-common - tar/tar.gz support added; Zip file listing and single file extraction added (Link, Link)

  • SameDiff: reductions operations now support "dynamic" (non-constant) inputs for axis argument (Link)

  • ROCBinary now has .getROC(int outputNum) method (Link)

  • SameDiff: L1/L2 regularization added (Link, Link)

  • SameDiff: Added SDVariable.convertToVariable() and convertToConstant() - to change SDVariable type (Link)

  • Added checks and useful exceptions for reductions on empty arrays (Link)

  • SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)

  • SameDiff TensorFlow import: import can now be overridden for cases such as user-defined functions (Link, Link)

  • Libnd4j (c++) benchmarking framework added (Link)

  • Added OpExecutioner.inspectArray(INDArray) method to get summary statistics for analysis/debugging purposes (Link)

  • Added INDArray.reshape(char order, boolean enforceView, long... newShape) to reshape array whilst throwing an exception (instead of returning a copy) if the reshape cannot be performed (Link, Link)

  • Added SDVariable method overloads (plus, minus, times, etc) for Kotlin (Link)

  • Added SDVariable convenience methods for dot, reshape, permute (Link)

  • Added SameDiff SDIndex.point(long, boolean keepDim) method (to keep point indices in output array as size 1 axis) (Link)

  • Added SameDiff ProtoBufToFlatBufConversion command line tool for doing TensorFlow frozen model (protobuf) to SameDiff FlatBuffers conversion (Link)

  • Improved DataType validation for SameDiff operations (Link)

ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

  • ND4J datatypes - significant changes, see highlights at top of this section

  • nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api (Link)

  • When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. (Link)

  • Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. (Link)

  • Removed Nd4j.ENFORCE_NUMERICAL_STABILITY, Nd4j.copyOnOps, etc (Link)

  • SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)

  • Nd4j.emptyLike(INDArray) has been removed. Use Nd4j.like(INDArray) instead (Link)

  • org.nd4jutil.StringUtils removed; suggest using Apache commons lang3 StringUtils instead (Link)

  • ND4J Jackson RowVector(De)Serializer has been deprecated due to datatype changes; NDArrayText(De)Serializer should be used instead (Link, Link)

  • nd4j-instrumentation module has been removed due to lack of use/maintenance (Link)

ND4J/SameDiff: Bug Fixes and Optimizations

  • Fixed bug with InvertMatrix.invert() with [1,1] shape matrices (Link)

  • Fixed edge case bug for Updater instances with length 1 state arrays (Link)

  • Fixed edge case with FileDocumentIterator with empty documents (Link)

  • SameDiff: Numerous fixes and enhancements

    • 1, 2, 3, 4

    • Improved functionality for losses (Link, Link, Link, Link)

    • Improved errors for missing/misspelled placeholders (Link)

    • Fixed edge cases in loops (Link, Link)

  • Fixed issue with Nd4j.vstack on 1d arrays returning 1d output, not 2d stacked output (Link)

  • Conv2D op can infer kernel size from input arrays directly when required (Link, Link)

  • Fixed an issue with Numpy format export - Nd4j.toNpyByteArray(INDArray) (Link)

  • Fixes for SameDiff when it is used within an external workspace (Link)

  • Fixed an issue where empty NDArrays would be reported as having scalar shape information, length 1 (Link)

  • Optimization: libnd4j (c++) indexing for ops will use uint for faster offset calculations when required and possible (Link)

  • Optimization: libnd4j loops performance improved for faster execution of some operations (Link, Link, Link)

  • Local response normalization op optimized (Link, Link)

  • Fixed an issue with INDArray.repeat on some view arrays (Link)

  • Improved performance for execution of some operations on view arrays (Link)

  • Improved performance on broadcast operations (Link, Link, Link)

  • Improved performance for non-EWS reduction along dimension operations (Link)

  • Improved performance fo IndexReduce operations (Link) and small reductions (Link)

  • Improved performonce of one_hot operation (Link), tanh operation (Link)

  • Improved performance for transform operations (Link)

  • Optimization: empty arrays are created only once and cached (as they are immutable) (Link)

  • Improved performance on operations using tensor along dimension for parallelization (Link, Link)

  • Improved performance on "reduce 3" reduction operations (Link)

  • Improved handling of CUDA contexts in heavily multi-threaded environments (Link)

  • Fixed an issue where Evaluation.reset() would incorrectly clear the String class labels (Link)

  • SameDiff: Improved gradient calculation performance/efficiency; "gradients" are now no longer defined for non-floating-point variables, and variables that aren't required to calculate loss or parameter gradients (Link)

  • Behaviour of IEvaluation instances now no longer depends on the global (default) datatype setting (Link)

  • INDArray.get(point(x), y) or .get(y, point(x)) now returns rank 1 arrays when performed on rank 2 arrays (Link)

  • Removed reliance on Guava for SameDiff, fixing potential issue for Java 11/12 and when earlier versions of Guava are on the classpath (Link, Link)

  • ND4J indexing (INDArray.get) implementation rewritten for better performance and reliability (Link)

  • Fixes for local response normalization backprop op (Link)

ND4J: Known Issues

  • Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.

  • Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See Issue 7637. Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

DataVec

DataVec: Features and Enhancements

  • Added PythonTransform (arbitrary python code execution for pre processing) (Link, Link)

  • Added FirstDigit (Benford's law) transform (Link, Link)

  • StringToTimeTransform now supports setting Locale (Link, Link)

  • Added StreamInputSplit for creating local data pipelines where data is stored remotely on storage such as HDFS or S3 (Link, Link)

  • LineRecordReader (and subtypes) now have the option to define the character set (Link)

  • Added TokenizerBagOfWordsTermSequenceIndexTransform (TFIDF transform), GazeteerTransform (binary vector for word present) and MultiNlpTransform transforms; added BagOfWordsTransform interface (Link)

DataVec: Optimizations and Bug Fixes

  • Fixed issue with ImageLoader.scalingIfNeeded (Link)

Arbiter

Arbiter: Enhancements

  • Arbiter now supports genetic algorithm search (Link)

Arbiter: Fixes

  • Fixed an issue where early stopping used in Arbiter would result in a serialization exception (Link)

Version 1.0.0-beta3

Highlights - 1.0.0-beta3 Release

  • ND4J/Deeplearning4j: Added support for CUDA 10.0. Dropped support for CUDA 8.0. (1.0.0-beta3 release has CUDA 9.0, 9.2 and 10.0 support)

  • SameDiff now supports training and evaluation from DataSetIterator and MultiDataSetIterator. Evaluation classes have been moved to ND4J.

  • DL4J Spark training (gradient sharing) is now fully fault tolerant, and has improvements for threshold adaption (potentially more robust convergence). Ports can now be easily configured independently on master/workers.

Deeplearning4J

Deeplearning4J: New Features

  • Added OutputAdapter interface and MultiLayerNetwork/ComputationGraph.output method overloads using OutputAdapter (avoids allocating off-heap memory that needs to be cleaned up by GC) Link, Link, Link

  • Added ComputationGraph/MultiLayerNetwork rnnTimeStep overload with user-specified workspace. Link

  • Added Cnn3DLossLayer Link

  • ParallelInference: Instances can now update the model in real-time (without re-init) Link

  • ParallelInferenc: Added ParallelInference INPLACE mode Link

  • Added validation for incompatible loss/activation function combinations (such as softmax+nOut=1, or sigmoid+mcxent). New validation can be disabled using outputValidation(false) Link

  • Spark training: Added full fault tolerance (robust failure recovery) for gradient sharing implementation Link Link

  • Spark training now supports configuring ports more flexibly (and differently for different workers) using PortSupplier Link Link

  • Spark training: overhauled gradient sharing threshold adaption algorithms; made it possible to customize threshold settings, plus made defaults more robust to initial threshold configuration improving convergence speed in some cases. Link

  • Spark training: implemented chunked messaging to reduce memory requirements (and insufficient buffer length issues) for large messages Link

  • Spark training: Added MeshBuildMode configuration for improved scalability for large clusters Link

  • Spark network data pipelines: added FileBatch, FileBatchRecordReader etc for "small files" (images etc) distributed training use cases Link

  • Added FailureTestingListener for fault tolerance/debugging purposes Link

  • Upgraded Apache Lucene/Solr to version 7.5.0 (from 7.4.0) Link

  • Added system properties (org.deeplearning4j.tempdir and org.nd4j.tempdir) to allow overriding of the temporary directories ND4J and DL4J use Link Link

  • Mode MultiLayerNetwork/ComputationGraph.clearLayerStates methods public (was protected) Link

  • AbstactLayer.layerConf() method is now public Link

  • ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper Link

  • Improved validation and error mesages for invalid inputs/labels in Yolo2OutputLayer Link

  • Spark training: added SharedTrainingMaster.Builder.workerTogglePeriodicGC and .workerPeriodicGCFrequency to easily configure the ND4J garbage collection configuration on workers. Set default GC to 5 seconds on workers Link

  • Spark training: added threshold encoding debug mode (logs current threshold and encoding statistics on each worker during training). Enable using SharedTrainingConfiguration.builder.encodingDebugMode(true). Note this operation has computational overhead. Link

Deeplearning4J: Bug Fixes and Optimizations

  • Fixed an issue where L1/L2 and updaters (Adam, Nesterov, etc) were applied before dividing gradients by minibatch to obtain average gradient. To maintain old behaviour, use NeuralNetConfiguration.Builder.legacyBatchScaledL2(true) Link.

    • Note that learning rates may need to be decreased for some updaters (such as Adam) to account for this change vs. earlier versions. Some other updaters (such as SGD, NoOp, etc) should be unaffected.

    • Note that deserialized (loaded) configurations/networks saved in 1.0.0-beta2 or earlier will default to old behaviour for backward compatibility. All new networks (created in 1.0.0-beta3) will default to the new behaviour.

  • Fixed an issue where EarlyStoppingScoreCalculator would not correctly handle "maximize score" cases instead of minimizing Link

  • Fixed order (BGR vs. RGB) for VGG16ImagePreProcessor channel offset values Link

  • Fixed bug with variational autoencoders using weight noise Link

  • Fixed issue with BaseDataSetIterator not respecting the 'maximum examples' configuration Link

  • Optimization: A workspace is now used for ComputationGraph/MultiLayerNetwork evaluation methods (avoids allocating off-heap memory during evaluation that must be cleaned up by garbage collector) Link

  • Fixed an issue where shuffling combined with a subset for MnistDataSetIterator would not maintain the same subset between resets Link

  • Fixed issue with StackVertex.getOutputType Link

  • Fix issue with CNN to/from RNN preprocessors handling of mask arrays Link

  • Fixed issue with VGG16 non-pretrained configuration in model zoo Link

  • Fixed issue with TransferLearning nOutReplace where multiple layers in a row are modified Link

  • Fixed issue with CuDNN workspaces where backpropagation is performed outside of a standard fit call Link

  • Fixed an issue with dropout masks being cleared prematurely on output layers in ComputationGraph Link

  • RecordReaderMultiDataSetIterator now supports 5D arrays (for 3D CNNs) Link

  • Fixed bug in multi input/output ComputationGraphs with TBPTT combined with both masking and different number of input/output arrays Link

  • Improved input validation/exceptions for batch normalization layer Link

  • Fixed bug with TransferLearning GraphBuilder nOutReplace when combined with subsampling layers Link

  • SimpleRnnParamInitializer now properly respects bias initialization configuration Link

  • Fixed SqueezeNet zoo model non-pretrained configuration Link

  • Fixed Xception zoo model non-pretrained configuration Link

  • Fixed an issue with some evaluation signatures for multi-output ComputationGraphs Link

  • Improved MultiLayerNetwork/ComputationGraph summary method formatting for large nets Link

  • Fixed an issue where gradient normalization could result in NaNs if gradient is exactly 0.0 for all parameters in a layer Link

  • Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate could throw an exception for SGD and NoOp updaters Link

  • Fixed an issue with StackVertex plus masking in some rare cases Link

  • Fixed an issue with JSON deserialization of frozen layers in pre-1.0.0-alpha format Link

  • Fixed an issue where GraphBuilder.removeVertex can fail under some limited circumstances Link

  • Fixed a bug in CacheableExtractableDataSetFetcher Link

  • DL4J Spark training: Fixed issues with thread/device affinity for multi-GPU training + evaluation Link

  • DL4J Spark training: Made all Aeron threads daemon threads to prevent Aeron from stopping JVM shutdown when all other threads have completed Link

  • Added cudnnAllowFallback configuration for BatchNormalization layer (fallback to built-in implementation if CuDNN fails unexpectedly) Link

  • Fixed some rare concurrency issues with multi-worker (multi-GPU) nodes for Spark training Link Link

  • Fixed an issue with BatchNormalization layers that prevented the mean/variance estimates from being synced properly on each worker for GradientSharing training, causing convergence issues Link

  • Added a check to detect ZipSlip CVE attempts in ArchiveUtils Link

  • DL4J Spark training and evaluation: methods now use Hadoop Configuration from Spark context to ensure runtime-set configuration is available in Spark functions reading directly from remote storage (HDFS etc) Link

  • MultiLayerNetwork and ComputationGraph now properly support more than Integer.MAX_VALUE parameters Link Link

  • Added data validation for Nd4j.readTxt - now throws exception on invalid input instead of returning incorrect values Link

  • Fixed an issue with KNN implementation where a deadlock could occur if an invalid distance function (one returning "distances" less than 0) was utilized Link

  • Added synchronization to loading of Keras import models to avoid thread safety issues in the underlying HDFS library used for loading Link

  • Fixed rare issue for Async(Multi)DataSetIterator with large prefetch values Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

  • IEvaluation classes in DL4J have been deprecated and moved to ND4J so they are available for SameDiff training. Functionality and APIs are unchanged

  • MultiLayerConfiguration/ComputationGraphConfiguration pretrain(boolean) and backprop(boolean) have been deprecated and are no longer used. Use fit and pretrain/pretrainLayer methods instead. Link

  • ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper which should be used instead Link

  • deeplearning4j-nlp-korean module now has Scala version suffix due to scala dependencies; new artifact ID is deeplearning4j-nlp-korean_2.10 and deeplearning4j-nlp-korean_2.11 Link

Deeplearning4J: Known issues: 1.0.0-beta3

  • Running multiple Spark training jobs simultaneously on the one physical node (i.e., multiple JVMs from one or more Spark jobs) may cause problems with network communication. A workaround for this is to manually set a unique stream ID manually in the VoidConfiguration. Use a unique (or random) integer value for different jobs Link

Deeplearning4J: Keras Import

  • Fixed import issue due to Keras JSON format changes for Keras 2.2.3+ Link

  • Added Keras import for timeseries preprocessing Link

  • Elephas Link

  • Fixed issue with importing models with reshaping after an embedding layer Link

  • Added support for Keras masking layers Link

  • Fixed JSON deserialization issue with some layers/preprocessors, such as Permute Link

  • Fixed issue with Keras import of Nadam configuration Link

ND4J

ND4J: New Features

  • Added SameDiff training and evaluation: SameDiff instances can now be trained directly using DataSetIterator and MultiDataSetIterator, and evaluated using IEvaluation instances (that have been moved from ND4J to DL4J) Link

  • Added GraphServer implementation: c++ inference server for SameDiff (and Tensorflow, via TF import) with Java API Link

  • SameDiff instances can now be loaded from serialized FlatBuffers format (SameDiff.asFlatFile plus fromFlatFile) Link Link

  • Added MKL-DNN support for some operations (Conv2d, etc) Link

  • Upgraded ND4J (and DataVec) to Arrow 0.11.0 Link, which also fixes Link

  • Added Nd4j.where op method (same semantics as numpy.where) Link

  • Added Nd4j.stack op method (combine arrays + increase array rank by 1) Link

  • Libnd4j new ops:

    • Matrix band part Link

    • Scatter ND, ND-add, ND-sub and ND-update ops Link

    • Sparse softmax cross entropy loss with logits Link

    • Histogram fixed width op Link

    • broadcast_to op Link

    • deconv3d op added Link

    • Unsorted segment ops added Link

    • Segment_X backprop ops added Link

    • batchnorm_new op added that supports multiple axes for mean/variance Link

    • GRU cell backprop added Link

  • Nd4j Preconditions class now has methods for formatting INDArray arguments Link, Link

  • SameDiff loss functions: cleanup plus forward pass implementation Link

  • CudaGridExecutioner now warns that exception stack traces may be delayed to avoid confusion in debugging exceptions occuring during asynchronous execution of ops Link

  • JavaCPP and JavaCPP-presets have been upgraded to version 1.4.3 Link

  • Improved Javadoc on SDVariable class Link

ND4J: Bug Fixes and Optimizations

  • Fixes for android: Remove use of RawIndexer Link

  • Libnd4j custom ops: conv op weight layouts are now not dependent on the input format (NCHW/NHWC) - now always [kH, kW, inChannels, outChannels] for 2d CNNs, [kH, kW, kD, inChannels, outChannels] for 3d CNNs. Link, Link

  • Libnd4j native op fixes:

    • Dot operation backprop Link, determinant Link

    • Backprop op fix for the broadcast case for some pairwise transform custom op implementations Link

    • Fix for reverse custom op with rank 1 inputs Link

    • ATan2 op is now broadcastable Link

    • Boolean custom op broadcast fixes/additions Link

    • Scatter op edge case fixes Link

    • ArgMin shape function fix Link, negative axis fix Link

    • Unique op fix Link

    • Pad op fix Link

    • Fixed where op shape function Link

    • SVD rank 1 edge case fix Link

    • Range op Link

    • Split and space_to_batch fixes Link

    • Broadcast dynamic shape Link

    • embedding_lookup op now supports multiple input arrays Link

    • Matrix determinant op edge case (rank 0 result) shape fix Link

  • SameDiff TensorFlow import: fixes for multiple operations Link, Link, Link, Link

  • SameDiff: Improved error handling for multiple outputs case Link

  • Fixed issue where INDArray.permute would not correctly throw an exception for invalid length case Link

  • Fixed issues with INDArray.get/put with SpecifiedIndex Link, Link

  • Minor change to DataSet.merge - signature now accepts any DataSet subtypes Link

  • INDArray.transposei operation was not in-place Link

  • Fixed issues with INDArray.mmul with MMulTranspose Link

  • Added additional order validation for ND4J creation methods (create, rand, etc) Link

  • Fix for ND4J binary deserialization (BinarySerde) when deserializing from heap byte buffers Link

  • Fixed issue with Nd4j-common ClassPathResource path resolution in some IDEs Link

  • Fixed issue where INDArray.get(interval) on rank 1 array would return rank 2 array Link

  • Fixed a validation issue with Nd4j.gemm/mmuli on views Link Link

  • INDArray.assign(INDArray) no longer allows assigning different shape arrays (other than scalar/vector cases) Link

  • NDarrayStrings (and INDArray.toString()) now always uses US locale when formatting numbers Link

  • Fixed an issue with GaussianDistribution specific to V100 GPUs Link

  • Fixed an issue with bitmap compression/encoding specific to V100 GPUs Link

  • Transforms.softmax now throws an error on unsupported shapes instead of simply not applying operation Link

  • VersionCheck functionality: handle case where SimpleFileVisitor is not available on earlier versions of Android Link

  • SameDiff convolution layer configuration (Conv2dConfig/Conv3dConfig/Pooling3dConfig etc) have had parameter names aligned Link

ND4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

  • CUDA 8.0 support has been removed. CUDA 9.0, 9.2 and 10.0 support is available in 1.0.0-beta3

  • nd4j-base64 module contents have been deprecated; use the equivalent classes in nd4j-api from now on Link

  • Some classes in nd4j-jackson module has been deprecated; use the equivalent classes in nd4j-api from now on Link

ND4J: Known issues: 1.0.0-beta3

  • Android users may need to manually exclude the (now deprecated) module nd4j-base64. This is due to org.nd4j.serde.base64.Nd4jBase64 class being present in both nd4j-api and nd4j-base64 modules. Both versions have identical content. Use exclude group: 'org.nd4j', module: 'nd4j-base64' to exclude.

DataVec

DataVec: New Features

  • Added NativeImageLoader method overloads for org.opencv.core.Mat and String as filename Link

DataVec: Optimizations and Bug Fixes

  • Fix for JDBCRecordReader handling of null values Link

  • Improved errors/validation for ObjectDetectionRecordReader for invalid input (where image object centers are outside of image bounds) Link

  • Fixed issue where FileSplit using methods that are unavailable on earlier versions of Android Link

  • Added SerializableHadoopConfiguration and BroadcastHadoopConfigHolder for cases where a Hadoop configuration is required in Spark functions Link Link

  • Fixed issue with JDBCRecordReader's handling of real-valued column result types Link

  • Added validation and useful exception for CSVRecordReader/LineRecordReader being used without initialization Link

Arbiter

Arbiter: Fixes

  • Fixed some issues with dropout layers Link

ND4S

  • Added conversion between org.nd4j.linalg.primitives.Pair/Triple and Scala Tuple Link

Version 1.0.0-beta2

Highlights - 1.0.0-beta2 Release

  • ND4J/Deeplearning4j: Added support for CUDA 9.2. Dropped support for CUDA 9.1. (1.0.0-beta2 release has CUDA 8.0, 9.0 and 9.2 support)

  • Deeplearning4j: New SameDiff layers with training support - Link Link

  • Deeplearning4j resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property

  • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

  • ND4J: nd4j-native-platform will now use Intel MKL-DNN as the default/bundled BLAS implementation (replacing OpenBLAS as the previous default)

  • Deeplearning4j: Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs (to assist with debugging and tuning memory configuration).

  • Deeplearning4j - new layers: Locally connected 1d Link, Locally connected 2d Link

Deeplearning4J

Deeplearning4J: New Features

  • Added new SameDiff layers (automatic differentiation - only single class, forward pass definition required) to DL4J with full training support - SameDiffLayer, SameDiffVertex, SameDiffOutputLayer, SameDiffLambdaLayer, SameDiffLambdaVertex - note that these are CPU-only execution for now Link Link Link

  • Resource (datasets, pretrained models) storage directory can now be configured via DL4JResources.setBaseDirectory method or org.deeplearning4j.resources.directory system property. Note that it is also possible to set a different base location for downloads (for local mirrors of DL4J resources) Link

  • Added Out-of-memory (OOM) crash dump reporting functionality. Provides a dump with memory use and configuration if training/inference OOMs. Same information is available (without a crash) for MultiLayerNetwork/ComputationGraph.memoryInfo methods. Can be disabled (or output directory set) using system properties - Link

  • Added Composite[Multi]DataSetPreProcessor to enable multiple [Multi]DataSetPreProcessors to be applied in a single iterator Link

  • Added ComputationGraph evaluate methods for multi-output networks: evaluate(DataSetIterator, Map<Integer,IEvaluation[]>) and evaluate(MultiDataSetIterator, Map<Integer,IEvaluation[]>) Link

  • Added JointMultiDataSetIterator - utility iterator used to create MultiDataSetIterator from multiple DataSetIterators Link

  • GraphVertices may now have trainable parameters directly (not just enclose layers with trainable parameters) Link

  • Added MultiLayerNetwork/ComputationGraph getLearningRate methods Link

  • Added RandomDataSetIterator and RandomMultiDataSetIterator (mainly for testing/debugging) Link Link

  • Added cyclical "1cycle" schedule for learning rate schedules etc - Link

  • RDD repartitioning for Spark training is more configurable (adds Repartitioner interface) Link

  • Added ComputationGraph.getIterationCount() and .getEpochCount() for consistency with MultiLayerNetwork Link

  • Added locally connected 1d layer Link Link

  • Spark "data loader" API (mainly for Spark) Link Link Link

  • Spark evaluation: added evaluation method overloads that allow specifying the number of evaluation workers (less than number of Spark threads) Link

  • CnnSentenceDataSetIterator now has a Format argument, and supports outputting data for RNNs and 1D CNNs Link

  • Added ComputationGraph/MultiLayerNetwork.pretrain((Multi)DataSetIterator, int epochs) method overloads Link

  • MultiLayerNetwork and ComputationGraph now have output method overloads where the network output can be placed in the user-specified workspace, instead of being detached Link Link. This can be used to avoid creating INDArrays that need to be garbage collected before native memory can be freed.

  • EmbeddingSequenceLayer now supports [minibatch,1,seqLength] format sequence data in addition to [minibatch,seqLength] format data Link

  • CuDNN batch norm implementation will now be used for rank 2 input, not just rank 4 input Link

  • Environment variables and system properties for DL4J have been centralized into DL4JResources and DL4JEnvironmentVars classes, with proper descriptions Link Link

  • MultiLayerNetwork and ComputationGraph output/feedForward/fit methods are now thread-safe via synchronization. Note that concurrent use is not recommended due to performance (instead: use ParallelInference); however the now-synchronized methods should avoid obscure errors due to concurrent modifications Link

  • BarnesHutTSNE now throws a useful exception in the case where the distance metric is undefined (for example, all zeros plus cosine similarity) Link

Deeplearning4J: Bug Fixes and Optimizations

  • ComputationGraph.addListeners was not working correctly if listeners were already present Link, Link

  • TinyImageNetDataSetIterator did not validate/correctly use input shape configuration Link, Link

  • BatchNormalization layer now correctly asserts that nOut is set if required (instead of unfriendly shape errors later) Link

  • Fixed issue where OutputLayer may not initialize parameter constraints correctly Link

  • Fixed performance issue with Nesterov updater using CPU-only op for CUDA execution Link

  • Removed TerminationCondition for DL4J optimizers - was not used in practice, and had minor overhead Link

  • Fixed issue where EvaluativeListener could hit a workspace validation exception when workspaces are enabled Link

  • Fixed issue where TrainingListener.onEpochStart/onEpochEnd were not being called correctly for ComputationGraph Link

  • Fixed workspace issue with TensorFlowCnnToFeedForwardPreProcessor Link

  • Performance optimization for BatchNormalization when using CuDNN Link

  • Performance optimization: Dropout will be applied in-place when safe to do so, avoiding a copy Link

  • Added CuDNN implementation of Dropout Link

  • Reduced memory use for CuDNN: CuDNN working memory is now shared and reused between layers within a network Link

  • CuDNN batch normalization implementation would fail with FP16 datatype Link

  • Fixed issue Bidirectional LSTM may incorrectly use workspaces causing an exception Link

  • Fixed issue with early stopping where scores to be maximized (accuracy, f1, etc) were not properly triggering termination conditions Link

  • Fixed issue where label mask counter could be incorrectly incremented in ComputationGraph.computeGradientAndScore() Link

  • ComputationGraph was not setting lastEtlTime field during training Link

  • Fixed issue with AutoEncoder layer when workspaces are enabled Link

  • Fixed issue with EmbeddingSequenceLayer use of mask arrays Link

  • Lombok is now provided scope everywhere, isn't on user classpath when using DL4J Link

  • Fixed issue where WordVectorSerializer.readParagraphVectors(File) initialization of label source Link

  • Spark training (gradient sharing) now properly handles empty partition edge case when encountered during training Link

  • Errors are propagated better/more consistently for Spark gradient sharing training Link

  • Fixed issue with 1D CNN layers with mask arrays and stride > 1 (masks not being correctly downsized) Link

  • DL4J Batch norm implementation was not correctly adding epsilon value during inference, only during training (CuDNN unaffected) Link

  • CuDNN subsampling layers with max pooling and ConvolutionMode.SAME may have taken padding value (0) as the maximum for border values when all non-padding values are less than 0 Link

  • Spark training with gradient sharing now passes listeners to workers correctly Link

  • Fixed rare (and non-terminal) concurrent modification issue with UI and FileStatsStorage Link

  • CuDNN convolution layer now supports dilation > 2 (previously: used DL4J conv layer implementation as a fallback) Link

  • Yolo2OutputLayer now implements computeScoreForExamples() Link

  • SequenceRecordReeaderDataSetIterator now handles the "no labels" case correctly Link

  • Fixed issue where BarnesHutTSNE could hit a workspace validation exception Link

  • EMNIST iterator could produce incorrect data in some cases after a reset Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta to 1.0.0-beta2

  • GravesLSTM has been deprecated in favor of LSTM due to lack of CuDNN support but otherwise similar accuracy to in practice. Use LSTM class instead.

  • deeplearning4j-modelexport-solr: now uses Lucene/Solr version 7.4.0 (was 7.3.0) Link

  • Mask arrays for CNN2d layers must be in broadcastable 4d format: [minibatch,depth or 1, height or 1, width or 1] - previously they were 2d with shape [minibatch,height] or [minibatch,width]. This provents ambiguity in later cases (pooling layers), and allows for more complex masking scenarios (such as masking for different image sizes in same minibatch). Link

  • Some older/deprecated Model and Layer methods have been removed. (validateInput(), initParams()). Some custom layers may need to be updated as a result Link

Deelpearning4J: 1.0.0-beta2 Known Issues

  • Windows users are unable to load the HDF5 files used in SvhnLabelProvider (used in HouseNumberDetection example). Linux/Mac users are unaffected. A workaround for windows users is to add the sonatype snapshot dependency org.bytedeco.javacpp-presets:hdf5-platform:jar:1.10.2-1.4.3-SNAPSHOT Link

Deeplearing4J: Keras Import

  • Keras model import now imports every Keras application

  • Supports GlobalPooling3D layer import

  • Supports RepeatVector layer import

  • Supports LocallyConnected1D and LocallyConnected2D layers

  • Keras Lambda layers can now be imported by registering custom SameDiff layers

  • All Keras optimizers are now supported

  • All advanced activation functions can now be imported.

  • Many minor bugs have been fixed, including proper weight setting for all configurations of BatchNormalization, improvements to Reshape SeparableConvolution2D, and full support of Bidirectional layers.

ND4J

ND4J: New Features

  • ND4J: all indexing is now done with longs instead of ints to allow for arrays with dimensions and lengths greater than Integer.MAX_VALUE (approx. 2.1 billion)

  • Added the ability to write Numpy .npy format using Nd4j.writeAsNumpy(INDArray,File) and convert an INDArray to a numpy strict in-memory using Nd4j.convertToNumpy(INDArray) Link

  • ND4j-common ClassPathResource: added ClassPathResource.copyDirectory(File) Link

  • SameDiff: A significant number of new ops, and backprop implementations for existing ops

  • Added Nd4j.randomBernoulli/Binomial/Exponential convenience methods Link

  • Added way to disable/suppress ND4J initialization logging via org.nd4j.log.initialization system property Link

  • SameDiff class - most op/constructor methods now have complete/useful javadoc Link

  • Workspaces can now be disabled globally, ignoring workspace configuration. This is mainly used for debugging; use Nd4j.getWorkspaceManager().setDebugMode(DebugMode.DISABLED) or Nd4j.getWorkspaceManager().setDebugMode(DebugMode.SPILL_EVERYTHING); to enable this. Link [Link]

  • Added EnvironmentalAction API for environment variable processing Link

  • ND4J environment variables and system properties have been centralized in ND4jEnvironmentVars and ND4jSystemProperties classes Link and Link

ND4J: Bug Fixes and Optimizations

  • SameDiff: a significant number of bug fixes for execution and individual ops

  • Fixed issue where INDArray.toDoubleArray() with true scalars (rank 0 arrays) Link

  • Fixed issue with DataSet.sample() not working for rank 3+ features Link

  • IActivation implementations now validate/enforce same shape for activations and gradients Link