Highlights - 1.0.0-beta5 Release

    Added model server - remote inference of SameDiff and DL4J models using JSON or (optionally) binary serialization
    Added Scala 2.12 support, dropped Scala 2.10 support. Modules with Scala dependencies are now released with Scala 2.11 and 2.12 versions
    Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading: 1.0.0-beta4_spark2 -> 1.0.0-beta5
    Added FastText support to deeplearning4j-nlp
    CUDA support for all ND4J/SameDiff Operations
      In 1.0.0-beta4, some operations were CPU only. Now, all operations have full CUDA support
    Added support for new data types in ND4J (and DL4J/SameDiff): BFLOAT16, UINT16, UINT32, UINT64
    ND4J: Implicit broadcasting support added to INDArray (already present in SameDiff - for example shape [3,1]+[3,2]=[3,2])
    CUDA 9.2, 10.0 and 10.1-Update2 still supported
      NOTE: For CUDA 10.1, CUDA 10.1 update 2 is recommended. CUDA 10.1 and 10.1 Update 1 will still run, but rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems
    Dependency upgrades: Jackson (2.5.1 to 2.9.9/, Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, and shaded to avoid dependency clashes)
    CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer


Deeplearning4J: Features and Enhancements

    Added FastText - inference and training, including OOV (out of vocabulary) support (Link)
    Scala 2.12 support added, Scala 2.10 support dropped (Link)
    Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link
    Added saved model format validation utilities - DL4JModelValidator, DL4JKerasModelValidator (Link)
    Added LabelLastTimeStepPreProcessor (Link)
    BertIterator: added option to prepend token to the output (such as [cls] expected by some models) (Link)
    Added trace level logging to MultiLayerNetwork and ComputationGraph assist with debugging certain issues (Link)
    Upsampling3D: Added NDHWC support (Link)
    MergeVertex now supports broadcasting (Link)
    LSTM and Dropout will now fall back on built-in implementations if an exception is encountered from cuDNN (same as Subsampling/ConvolutionLayer) (Link)
    Improved JavaDoc and cleanup up API for WordVectorSerializer (Link, Link)

Deeplearning4J: Bug Fixes and Optimizations

    Updated deeplearning4j-ui theme (Link)
    Fixed an issue with MergeVertex and CNN3D activations (Link)
    Fixed typo in Yolo2OutputLayer builder/configuration method name (Link)
    Improved ComputationGraph builder InputType validation (Link)
    Removed dl4j-spark-ml module until it can be properly maintained (Link)
    Fixed an issue with BertWordPieceTokenizerFactory and bad character encoding (Link)
    Fixed an issue with LearnedSelfAttentionLayer and variable minibatch size (Link, Link)
    Fixed issue with SharedTrainingMaster controller address when set from environment variable (Link)
    Fixed issue with SameDiffOutputLayer initialization under some circumstances (Link)
    https is now used by default for data and zoo model downloads (Link, Link)
    Fixed an issue where UI WebJars dependencies would check for updates on every single build (Link, Link)
    Fixed issue where Upsampling layer memory report could produce an OOM exception (Link)
    Improved UX/validation for RecordReaderDataSetIterator (Link)
    Fixed an issue where EmbeddingSequenceLayer would not check mask array datatype (Link)
    Improved validation when initializing networks with a non rank-2 (shape [1, numParams]) array (Link)
    Fixed a DataType issue for BertIterator (Link)
    Fixed Word2Vec model backward compatibilty (beta3 and earlier models now loadable again) Link
    Fixed issue where some Keras import models could fail with Could not read abnormally long HDF5 attribute (Link)
    Added validation for RnnOutputLayer - feature/label array lengths (Link)
    Fixed an issue where SameDiffOutputLayer would not support variable minibatch size (Link)
    Fixed DL4J SameDiff layer mask support (Link)
    DL4J UI: Fixed an issue where tab switching did not work when visualizing saved/stored data (Link, Link)
    DL4J UI: Fixed a rare UI threading issue (Link)
    Fixed a Keras import issue with JSON format change (Link)
    Fixed a Keras import issue where updater learning rate schedule could be imported incorrectly (Link)
    Fixed an issue with CnnSentenceDataSetIterator when using UnknownWordHandling.UseUnknownVector (Link, Link)
    Fixes and optimizations to DL4J SameDiff layers (Link)
    MultiLayerNetwork/ComputationGraph will now log the original exception if a second exception occurs during workspace closing, instead of swallowing it (inference/fit operation try/finally blocks) (Link)
    Upgraded dependencies: Jackson (2.5.1 to 2.9.9/, Commons Compress (1.16.1 to 1.18), Play Framework (2.4.8 to 2.7.3), Guava: (20.0 to 28.0-jre, shaded to avoid dependency clashes) (Link)
    Logging framework can now be configured for DL4J UI (due to Play framework dependency upgrade) (Link)
    Reduced amount of garbage produced by MnistDataFetcher (impacts MNIST and EMNIST DataSetIterators) (Link)
    Activation function backpropagation has been optimized for many activation functions (Link, Link)

Deeplearning4j: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

    DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J, use org.nd4j.linalg.dataset.Async(Multi)DataSetIterator instead
    Saved models with custom layers from 1.0.0-alpha and before can no longer be loaded. Workaround: load in 1.0.0-beta4, and re-save the model (Link). Models without custom layers can still be loaded back to 0.5.0
    Apache Spark 1.x support dropped (now only Spark 2.x is supported). Note: Spark version suffix dropped: For upgrading, change versions as follows: 1.0.0-beta4_spark2 -> 1.0.0-beta5
    Scala 2.10 dropped, Scala 2.12 added (for modules with Scala dependencies)

Deeplearning4j: 1.0.0-beta5 Known Issues

    dl4j-spark_2.11 and _2.12 dependencies incorrectly pull in datavec-spark_2.11/2.12 version 1.0.0-SNAPSHOT. Workaround: control version using dependency management as per here or here
    Some layers (such as LSTM) may run slower on 1.0.0-beta5 than 1.0.0-beta4 on CUDA when not using cuDNN, due to added synchronization. This synchronization will be removed in the next release after 1.0.0-beta5
    CUDA 10.1: Rare internal cuBLAS issues may be encountered in heavily multi-threaded code on some systems, when running CUDA 10.1 Update 1 (and maybe 10.1). CUDA 10.1 update 2 is recommended.

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

    Added new data types: BFLOAT16, UINT16, UINT32, UINT64 (Link)
    CUDA support for all operations without CUDA implementations (Link, Link, Link, Link, Link)
    Added model server (DL4J and SameDiff models, JSON and binary communication) - JsonModelServer, JsonRemoteInference, Link, Link
    Added support for empty arrays with zeros in shape, for compatibility with TensorFlow import (Link)
    CUDA: now host (RAM) buffers are only allocated when required (previously: host buffers were always allocated), in addition to device (GPU) buffer
    Improved SameDiff training API - added "in line" test set evaluation, returning History object with loss curve, etc (Link)
    Added saved model format validation utilities - Nd4jValidator, Nd4jCommonValidator (Link)
    Added SameDiff ScoreListener (equivalent to DL4J ScoreIterationListener/PerformanceListener) (Link, Link)
    Added SameDiff.convertDataTypes method, for variable dtype conversion (Link)
    Added crop and resize op (Link)
    DL4J AsyncDataSetIterator and AsyncMultiDataSetIterator moved to ND4J Link
    Added basic/MVP SameDiff UI listener (Link)
    Added SameDiff CheckpointListener (Link, Link)
    Added SameDiff name scopes (Link)
    SameDiff: Updater state and training configuration is now written to FlatBuffers format (Link)
    Added c++ benchmark suite callable from Java - call using Nd4j.getExecutioner().runLightBenchmarkSuit() and Nd4j.getExecutioner().runFullBenchmarkSuit() (Link)
    Added SameDiff.save/load methods with InputStream/OutputStream arguments (Link, Link)
    Added axis configuraiton for evaluation instances (Evaluation, RegressionEvaluation, ROC, etc - getAxis and setAxis methods) to allow different data formats (NCHW vs. NHWC for CNNs, for example) (Link)
    SameDiff: Added support to convert constants to placeholders, via SDVariable.convertToConstant() method (Link)
    SameDiff: Added GradCheckUtil.checkActivationGradients method to check activation gradients for SameDiff instance (not just parameter gradients as in existing gradient check methods) (Link)
    Added CheckNumerics op (Link)
    Added FakeQuantWithMinMaxArgs and FakeQuantWithMinMaxVars ops (Link)
    Added INDArray reduction methods with "keep dimensions" option - for example, INDArray.mean(boloean, int... dimension) (Link)
    Added Nd4j SystemInfo class - SystemInfo.getSystemInfo, .writeSystemInfo(File) to aid with debugging issues (Link, Link)
    Added INDArray.toString(NDArrayStrings options), toStringFull() and toString overloads for easier control of array printing (Link)
    Added HashCode op, INDArray.hashCode() (Link)
    SameDiff: added whileLoop, ifCond methods for loops/conditional ops (Link)
    Cleaned up some infrequently used Nd4j methods (Link, Link, Link, Link)
    Added bitwise integer operations: left/right bit shift, left/right cyclical bit shift, bitwise Hamming distance (Link, Link, Link, Link, Link)
    deeplearning4j-nlp: renamed AggregatingSentencePreProcessor to sentencePreProcessor method (Link)
    Upgraded (and shaded) Protobuf version - 3.5.1 to 3.8.0 (Link)
    Switched to c=style error handling for libnd4j native operations (Link)
    Renamed FlatBuffers enum org.nd4j.graph.DataType to org.nd4j.graph.DType to avoid users importing incorrect type when using Nd4j methods (Link, Link)
    Added SameDiff.bitwise namespace for bitwise ops (Link, Link)

ND4J/SameDiff: Bug Fixes and Optimizations

    Updated to JavaCPP/JavaCV 1.5.1-1 (Link)
    SameDiff: Placeholders must now only be provided if required to calculate the requested variables (Link)
    SameDiff: Fixed an issue with duplicate variable name validation (Link)
    SameDiff: Fixed an issue with SDVariable.getArr for scalars (Link)
    Added delayed mode to DeviceLocalNDArray (don't replicate to device until needed) (Link)
    ND4J: Fixed an issue with writing 0d (scalar) NDArrays in numpy .npy format (Link)
    Fixed an issue with Pad operation for some constant cases (Link)
    Fixed some issues with strided_slice operation (Link, Link, Link)
    SameDiff: Fixed issue with DataType inference for some ops using ND4J default datatype (Link)
    INDArray.castTo(DataType) is now a no-op when array is already the correct type (Link)
    SameDiff: Fixed an issue with training mixed precision networks (Link)
    Fixed an issue where Evaluation class was incorrectly reporting macro-averaged precision for binary case (Link)
    Removed trainableParams config/field from SameDiff TrainingConfig (no longer required) (Link)
    Improvements and cleanup to ND4J Javadoc (Link, Link, Link, Link)
    Fixed an issue with Cholesky Lapack op on CUDA (Link, Link)
    Fixed an issue where [1,N] and [N,1] arrays were not considered a matrix (rank 2 array) according to INDArray.isMatrix() (Link)
    Fixed RegressionEvaluation for 4D arrays (CNNs / segmentation) (Link, Link)
    Fixed issue with INDArray.median(int... dimension) (Link)
    Fixed NPE that could occur when executing gather operation backprop (Link)
    Fixed issue with LogSumExp operation Java/C++ mapping (Link)
    Added header validation when reading Numpy .npy files, to ensure file is valid (Link)
    Fixed a possible issue with reading Numpy .npy files on CUDA (Link)
    Fixed an issue when reading Numpy .npy boolean files (Link)
    Various fixes for TensorFlow import (Link)
    Fixed an issue with a small number of Nd4j.create methods not creating arrays corresponding to the java primitive (Link)
    Improved shape validation for some Nd4j.create methods (Link)
    Cleaned up unmaintained Nd4j.createSparse methods (Link)
    Fixed a CUDA issue for CUDA GPUs with CC 3.0 (Link)
    Fixed some possible integer overflows in c++ code (Link)
    Removed deprecated methods: Nd4j.trueScalar and Nd4j.trueVector (Link, Link)
    Fixed an issue where some JVMs could warn about "Illegal reflective access" due to a (now removed) SameDiff dependency (Link)
    SDVariable now no longer extends DifferentialFunction (Link)
    Moved numerous operation calculateOutputShape instances from Java to C++ (Link)
    Fixed an issue where maxpool2d_bp could throw an exception when NaN values are present (Link)
    Fixed an issue with concatenation of empty shapes (with zeros) (Link)
    Removed INDArray.javaTensorAlongDimension (Link)
    LayerNorm operation now properly supports axis arg, NCHW format data (Link)
    libnd4j: cuBLAS hgemm (FP16 gemm) wil only be called for devices with compute capability >= 5.3 due to cuBLAS limitations (Link)
    Nd4j.readNumpy optimized (Link)
    Added configurable alpha parameter to ELU and lrelu_bp operations in c++ (Link)
    Cleaned up SameDiff SDCNN/SDRNN (SameDiff.cnn, .rnn) API/methods (Link, Link)

ND4J: Transition Guide, 1.0.0-beta4 to 1.0.0-beta5

    OldAddOp, OldSubOp, etc removed: Replace with AddOp, SubOp, etc
    Nd4j.trueScalar and trueVector removed; use Nd4j.scalar and Nd4j.createFromArray methods
    INDArray.javaTensorAlongDimension removed; use INDArray.tensorAlongDimension instead
    INDArray.lengthLong() removed; use INDArray.length() instead

ND4J: 1.0.0-beta5 Known Issues

    nd4j-native on some OSX systems can fail with Symbol not found: ___emutls_get_address - See this link
    SBT 1.3.0 can fail with an Illegal character in path error; SBT 1.2.8 is OK. This is an SBT issue, not an ND4J issue. See this link for details


DataVec: Features and Enhancements

    ImageRecordReader: Support for 16-bit TIFF added (Link)
    Added SequenceTrimToLengthTransform (Link)

DataVec: Bug Fixes and Optimizations

    Fixed an issue with AnalyzeSpark and String columns (Link)
    Fixed an issue with URL scheme detection in NumberedFileInputScheme (Link)
    Fixed an issue with RandomPathFilter sampling being biased (Link, Link)


RL4J: Features and Enhancements

RL4J: Bug Fixes and Optimizations

    Fixed issue with compression for HistoryProcessor (Link)


Bug Fixes and Optimizations

    Updated EvaluationScoreFunction to use ND4J Evaluation class metrics (Link)
    Fixed incorrect search size in GridSearchCandidateGenerator (Link)

Arbiter: Known Issues

    The Jackson version upgrade necessitated a change to how generic object serialization was performed; Arbiter JSON data stored in 1.0.0-beta4 or earlier format may not be readable in 1.0.0-beta5 (Link)


ND4S Features and Enhancements

    Added full data type support to ND4S as per ND4J (Link)
    Added syntactic sugar for SameDiff (implicits, operator overloads) (Link)