1 of 1

1.0.0-beta4

Highlights - 1.0.0-beta4 Release

Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following datatypes are supported:

DOUBLE: double precision floating point, 64-bit (8 byte)
FLOAT: single precision floating point, 32-bit (4 byte)
HALF: half precision floating point, 16-bit (2 byte), "FP16"
LONG: long signed integer, 64 bit (8 byte)
INT: signed integer, 32 bit (4 byte)
SHORT: signed short integer, 16 bit (2 byte)
UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255
BYTE: signed byte, 8 bit (1 byte), -128 to 127
BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization
UTF8: String array type, UTF8 format

ND4J Behaviour changes of note:

When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)
- For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray
- Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively

DL4J Behaviour changes of note:

MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.
- The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)
- Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType) method

Main new methods:

Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument
INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another
New Nd4j.createFromArray(...) methods for

ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

ND4J: Mac/OSX CUDA support dropped

Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)
SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)
BatchNormalization layer (and BatchNorm ND4J op)
LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)

MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

ND4J: Improved Performance due to Memory Management Changes

Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

ND4J: Improved Rank 0/1 Array Support

In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

DL4J: Attention layers added

Deeplearning4J

Deeplearning4J: Features and Enhancements

Added MKL-DNN support for Conv/Pool/BatchNorm/LRN layers. MKL-DNN will be used automatically when using nd4j-native backend. (, )
L1/L2 regularization now made into a class; weight decay added, with better control as to when/how it is applied. See for more details on the difference between L2 and weight decay. In general, weight decay should be preferred to L2 regularization. (, )
Added dot product attention layers: , , and

Deeplearning4J: Bug Fixes and Optimizations

DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly ()
cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown ()
Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training ()

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays ()
Added INDArray.close() method to allow users to manually release off-heap memory immediately ()
SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff ()

ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

ND4J datatypes - significant changes, see highlights at top of this section
nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api ()
When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. ()

ND4J/SameDiff: Bug Fixes and Optimizations

Fixed bug with InvertMatrix.invert() with [1,1] shape matrices ()
Fixed edge case bug for Updater instances with length 1 state arrays ()
Fixed edge case with FileDocumentIterator with empty documents ()

ND4J: Known Issues

Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.
Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See . Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

DataVec

DataVec: Features and Enhancements

Added PythonTransform (arbitrary python code execution for pre processing) (, )
Added FirstDigit (Benford's law) transform (, )
StringToTimeTransform now supports setting Locale (, )

DataVec: Optimizations and Bug Fixes

Fixed issue with ImageLoader.scalingIfNeeded ()

Arbiter

Arbiter: Enhancements

Arbiter now supports genetic algorithm search ()

Arbiter: Fixes

Fixed an issue where early stopping used in Arbiter would result in a serialization exception ()