Highlights - 1.0.0-beta3 Release

    ND4J/Deeplearning4j: Added support for CUDA 10.0. Dropped support for CUDA 8.0. (1.0.0-beta3 release has CUDA 9.0, 9.2 and 10.0 support)
    SameDiff now supports training and evaluation from DataSetIterator and MultiDataSetIterator. Evaluation classes have been moved to ND4J.
    DL4J Spark training (gradient sharing) is now fully fault tolerant, and has improvements for threshold adaption (potentially more robust convergence). Ports can now be easily configured independently on master/workers.


Deeplearning4J: New Features

    Added OutputAdapter interface and MultiLayerNetwork/ComputationGraph.output method overloads using OutputAdapter (avoids allocating off-heap memory that needs to be cleaned up by GC) Link, Link, Link
    Added ComputationGraph/MultiLayerNetwork rnnTimeStep overload with user-specified workspace. Link
    Added Cnn3DLossLayer Link
    ParallelInference: Instances can now update the model in real-time (without re-init) Link
    ParallelInferenc: Added ParallelInference INPLACE mode Link
    Added validation for incompatible loss/activation function combinations (such as softmax+nOut=1, or sigmoid+mcxent). New validation can be disabled using outputValidation(false) Link
    Spark training: Added full fault tolerance (robust failure recovery) for gradient sharing implementation Link Link
    Spark training now supports configuring ports more flexibly (and differently for different workers) using PortSupplier Link Link
    Spark training: overhauled gradient sharing threshold adaption algorithms; made it possible to customize threshold settings, plus made defaults more robust to initial threshold configuration improving convergence speed in some cases. Link
    Spark training: implemented chunked messaging to reduce memory requirements (and insufficient buffer length issues) for large messages Link
    Spark training: Added MeshBuildMode configuration for improved scalability for large clusters Link
    Spark network data pipelines: added FileBatch, FileBatchRecordReader etc for "small files" (images etc) distributed training use cases Link
    Added FailureTestingListener for fault tolerance/debugging purposes Link
    Upgraded Apache Lucene/Solr to version 7.5.0 (from 7.4.0) Link
    Added system properties (org.deeplearning4j.tempdir and org.nd4j.tempdir) to allow overriding of the temporary directories ND4J and DL4J use Link Link
    Mode MultiLayerNetwork/ComputationGraph.clearLayerStates methods public (was protected) Link
    AbstactLayer.layerConf() method is now public Link
    ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper Link
    Improved validation and error mesages for invalid inputs/labels in Yolo2OutputLayer Link
    Spark training: added SharedTrainingMaster.Builder.workerTogglePeriodicGC and .workerPeriodicGCFrequency to easily configure the ND4J garbage collection configuration on workers. Set default GC to 5 seconds on workers Link
    Spark training: added threshold encoding debug mode (logs current threshold and encoding statistics on each worker during training). Enable using SharedTrainingConfiguration.builder.encodingDebugMode(true). Note this operation has computational overhead. Link

Deeplearning4J: Bug Fixes and Optimizations

    Fixed an issue where L1/L2 and updaters (Adam, Nesterov, etc) were applied before dividing gradients by minibatch to obtain average gradient. To maintain old behaviour, use NeuralNetConfiguration.Builder.legacyBatchScaledL2(true) Link.
      Note that learning rates may need to be decreased for some updaters (such as Adam) to account for this change vs. earlier versions. Some other updaters (such as SGD, NoOp, etc) should be unaffected.
      Note that deserialized (loaded) configurations/networks saved in 1.0.0-beta2 or earlier will default to old behaviour for backward compatibility. All new networks (created in 1.0.0-beta3) will default to the new behaviour.
    Fixed an issue where EarlyStoppingScoreCalculator would not correctly handle "maximize score" cases instead of minimizing Link
    Fixed order (BGR vs. RGB) for VGG16ImagePreProcessor channel offset values Link
    Fixed bug with variational autoencoders using weight noise Link
    Fixed issue with BaseDataSetIterator not respecting the 'maximum examples' configuration Link
    Optimization: A workspace is now used for ComputationGraph/MultiLayerNetwork evaluation methods (avoids allocating off-heap memory during evaluation that must be cleaned up by garbage collector) Link
    Fixed an issue where shuffling combined with a subset for MnistDataSetIterator would not maintain the same subset between resets Link
    Fixed issue with StackVertex.getOutputType Link
    Fix issue with CNN to/from RNN preprocessors handling of mask arrays Link
    Fixed issue with VGG16 non-pretrained configuration in model zoo Link
    Fixed issue with TransferLearning nOutReplace where multiple layers in a row are modified Link
    Fixed issue with CuDNN workspaces where backpropagation is performed outside of a standard fit call Link
    Fixed an issue with dropout masks being cleared prematurely on output layers in ComputationGraph Link
    RecordReaderMultiDataSetIterator now supports 5D arrays (for 3D CNNs) Link
    Fixed bug in multi input/output ComputationGraphs with TBPTT combined with both masking and different number of input/output arrays Link
    Improved input validation/exceptions for batch normalization layer Link
    Fixed bug with TransferLearning GraphBuilder nOutReplace when combined with subsampling layers Link
    SimpleRnnParamInitializer now properly respects bias initialization configuration Link
    Fixed SqueezeNet zoo model non-pretrained configuration Link
    Fixed Xception zoo model non-pretrained configuration Link
    Fixed an issue with some evaluation signatures for multi-output ComputationGraphs Link
    Improved MultiLayerNetwork/ComputationGraph summary method formatting for large nets Link
    Fixed an issue where gradient normalization could result in NaNs if gradient is exactly 0.0 for all parameters in a layer Link
    Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate could throw an exception for SGD and NoOp updaters Link
    Fixed an issue with StackVertex plus masking in some rare cases Link
    Fixed an issue with JSON deserialization of frozen layers in pre-1.0.0-alpha format Link
    Fixed an issue where GraphBuilder.removeVertex can fail under some limited circumstances Link
    Fixed a bug in CacheableExtractableDataSetFetcher Link
    DL4J Spark training: Fixed issues with thread/device affinity for multi-GPU training + evaluation Link
    DL4J Spark training: Made all Aeron threads daemon threads to prevent Aeron from stopping JVM shutdown when all other threads have completed Link
    Added cudnnAllowFallback configuration for BatchNormalization layer (fallback to built-in implementation if CuDNN fails unexpectedly) Link
    Fixed some rare concurrency issues with multi-worker (multi-GPU) nodes for Spark training Link Link
    Fixed an issue with BatchNormalization layers that prevented the mean/variance estimates from being synced properly on each worker for GradientSharing training, causing convergence issues Link
    Added a check to detect ZipSlip CVE attempts in ArchiveUtils Link
    DL4J Spark training and evaluation: methods now use Hadoop Configuration from Spark context to ensure runtime-set configuration is available in Spark functions reading directly from remote storage (HDFS etc) Link
    MultiLayerNetwork and ComputationGraph now properly support more than Integer.MAX_VALUE parameters Link Link
    Added data validation for Nd4j.readTxt - now throws exception on invalid input instead of returning incorrect values Link
    Fixed an issue with KNN implementation where a deadlock could occur if an invalid distance function (one returning "distances" less than 0) was utilized Link
    Added synchronization to loading of Keras import models to avoid thread safety issues in the underlying HDFS library used for loading Link
    Fixed rare issue for Async(Multi)DataSetIterator with large prefetch values Link

Deeplearning4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

    IEvaluation classes in DL4J have been deprecated and moved to ND4J so they are available for SameDiff training. Functionality and APIs are unchanged
    MultiLayerConfiguration/ComputationGraphConfiguration pretrain(boolean) and backprop(boolean) have been deprecated and are no longer used. Use fit and pretrain/pretrainLayer methods instead. Link
    ParallelWrapper module now no longer has a Scala version suffix for artifact id; new artifact id is deeplearning4j-parallel-wrapper which should be used instead Link
    deeplearning4j-nlp-korean module now has Scala version suffix due to scala dependencies; new artifact ID is deeplearning4j-nlp-korean_2.10 and deeplearning4j-nlp-korean_2.11 Link

Deeplearning4J: Known issues: 1.0.0-beta3

    Running multiple Spark training jobs simultaneously on the one physical node (i.e., multiple JVMs from one or more Spark jobs) may cause problems with network communication. A workaround for this is to manually set a unique stream ID manually in the VoidConfiguration. Use a unique (or random) integer value for different jobs Link

Deeplearning4J: Keras Import

    Fixed import issue due to Keras JSON format changes for Keras 2.2.3+ Link
    Added Keras import for timeseries preprocessing Link
    Elephas Link
    Fixed issue with importing models with reshaping after an embedding layer Link
    Added support for Keras masking layers Link
    Fixed JSON deserialization issue with some layers/preprocessors, such as Permute Link
    Fixed issue with Keras import of Nadam configuration Link


ND4J: New Features

    Added SameDiff training and evaluation: SameDiff instances can now be trained directly using DataSetIterator and MultiDataSetIterator, and evaluated using IEvaluation instances (that have been moved from ND4J to DL4J) Link
    Added GraphServer implementation: c++ inference server for SameDiff (and Tensorflow, via TF import) with Java API Link
    SameDiff instances can now be loaded from serialized FlatBuffers format (SameDiff.asFlatFile plus fromFlatFile) Link Link
    Added MKL-DNN support for some operations (Conv2d, etc) Link
    Upgraded ND4J (and DataVec) to Arrow 0.11.0 Link, which also fixes Link
    Added Nd4j.where op method (same semantics as numpy.where) Link
    Added Nd4j.stack op method (combine arrays + increase array rank by 1) Link
    Libnd4j new ops:
      Matrix band part Link
      Scatter ND, ND-add, ND-sub and ND-update ops Link
      Sparse softmax cross entropy loss with logits Link
      Histogram fixed width op Link
      broadcast_to op Link
      deconv3d op added Link
      Unsorted segment ops added Link
      Segment_X backprop ops added Link
      batchnorm_new op added that supports multiple axes for mean/variance Link
      GRU cell backprop added Link
    Nd4j Preconditions class now has methods for formatting INDArray arguments Link, Link
    SameDiff loss functions: cleanup plus forward pass implementation Link
    CudaGridExecutioner now warns that exception stack traces may be delayed to avoid confusion in debugging exceptions occuring during asynchronous execution of ops Link
    JavaCPP and JavaCPP-presets have been upgraded to version 1.4.3 Link
    Improved Javadoc on SDVariable class Link

ND4J: Bug Fixes and Optimizations

    Fixes for android: Remove use of RawIndexer Link
    Libnd4j custom ops: conv op weight layouts are now not dependent on the input format (NCHW/NHWC) - now always [kH, kW, inChannels, outChannels] for 2d CNNs, [kH, kW, kD, inChannels, outChannels] for 3d CNNs. Link, Link
    Libnd4j native op fixes:
      Dot operation backprop Link, determinant Link
      Backprop op fix for the broadcast case for some pairwise transform custom op implementations Link
      Fix for reverse custom op with rank 1 inputs Link
      ATan2 op is now broadcastable Link
      Boolean custom op broadcast fixes/additions Link
      Scatter op edge case fixes Link
      ArgMin shape function fix Link, negative axis fix Link
      Unique op fix Link
      Pad op fix Link
      Fixed where op shape function Link
      SVD rank 1 edge case fix Link
      Range op Link
      Split and space_to_batch fixes Link
      Broadcast dynamic shape Link
      embedding_lookup op now supports multiple input arrays Link
      Matrix determinant op edge case (rank 0 result) shape fix Link
    SameDiff TensorFlow import: fixes for multiple operations Link, Link, Link, Link
    SameDiff: Improved error handling for multiple outputs case Link
    Fixed issue where INDArray.permute would not correctly throw an exception for invalid length case Link
    Fixed issues with INDArray.get/put with SpecifiedIndex Link, Link
    Minor change to DataSet.merge - signature now accepts any DataSet subtypes Link
    INDArray.transposei operation was not in-place Link
    Fixed issues with INDArray.mmul with MMulTranspose Link
    Added additional order validation for ND4J creation methods (create, rand, etc) Link
    Fix for ND4J binary deserialization (BinarySerde) when deserializing from heap byte buffers Link
    Fixed issue with Nd4j-common ClassPathResource path resolution in some IDEs Link
    Fixed issue where INDArray.get(interval) on rank 1 array would return rank 2 array Link
    Fixed a validation issue with Nd4j.gemm/mmuli on views Link Link
    INDArray.assign(INDArray) no longer allows assigning different shape arrays (other than scalar/vector cases) Link
    NDarrayStrings (and INDArray.toString()) now always uses US locale when formatting numbers Link
    Fixed an issue with GaussianDistribution specific to V100 GPUs Link
    Fixed an issue with bitmap compression/encoding specific to V100 GPUs Link
    Transforms.softmax now throws an error on unsupported shapes instead of simply not applying operation Link
    VersionCheck functionality: handle case where SimpleFileVisitor is not available on earlier versions of Android Link
    SameDiff convolution layer configuration (Conv2dConfig/Conv3dConfig/Pooling3dConfig etc) have had parameter names aligned Link

ND4J: API Changes (Transition Guide): 1.0.0-beta2 to 1.0.0-beta3

    CUDA 8.0 support has been removed. CUDA 9.0, 9.2 and 10.0 support is available in 1.0.0-beta3
    nd4j-base64 module contents have been deprecated; use the equivalent classes in nd4j-api from now on Link
    Some classes in nd4j-jackson module has been deprecated; use the equivalent classes in nd4j-api from now on Link

ND4J: Known issues: 1.0.0-beta3

    Android users may need to manually exclude the (now deprecated) module nd4j-base64. This is due to org.nd4j.serde.base64.Nd4jBase64 class being present in both nd4j-api and nd4j-base64 modules. Both versions have identical content. Use exclude group: 'org.nd4j', module: 'nd4j-base64' to exclude.


DataVec: New Features

    Added NativeImageLoader method overloads for org.opencv.core.Mat and String as filename Link

DataVec: Optimizations and Bug Fixes

    Fix for JDBCRecordReader handling of null values Link
    Improved errors/validation for ObjectDetectionRecordReader for invalid input (where image object centers are outside of image bounds) Link
    Fixed issue where FileSplit using methods that are unavailable on earlier versions of Android Link
    Added SerializableHadoopConfiguration and BroadcastHadoopConfigHolder for cases where a Hadoop configuration is required in Spark functions Link Link
    Fixed issue with JDBCRecordReader's handling of real-valued column result types Link
    Added validation and useful exception for CSVRecordReader/LineRecordReader being used without initialization Link


Arbiter: Fixes

    Fixed some issues with dropout layers Link


    Added conversion between org.nd4j.linalg.primitives.Pair/Triple and Scala Tuple Link