githubEdit

1.0.0-beta4

Highlights - 1.0.0-beta4 Release

Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following datatypesarrow-up-right are supported:

  • DOUBLE: double precision floating point, 64-bit (8 byte)

  • FLOAT: single precision floating point, 32-bit (4 byte)

  • HALF: half precision floating point, 16-bit (2 byte), "FP16"

  • LONG: long signed integer, 64 bit (8 byte)

  • INT: signed integer, 32 bit (4 byte)

  • SHORT: signed short integer, 16 bit (2 byte)

  • UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255

  • BYTE: signed byte, 8 bit (1 byte), -128 to 127

  • BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization

  • UTF8: String array type, UTF8 format

ND4J Behaviour changes of note:

  • When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)

    • For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray

    • Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively

  • Some operations require matched datatypes for operands

    • For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))

  • Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.

DL4J Behaviour changes of note:

  • MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.

    • The datatype of a network (DataType for it's parameters and activations) can be set during construction using NeuralNetConfigutation.Builder().dataType(DataType)

    • Networks can be converted from one type to another (double to float, float to half etc) using MultiLayerNetwork/ComputationGraph.convertDataType(DataType) method

Main new methods:

  • Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument

  • INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another

  • New Nd4j.createFromArray(...) methods for

ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped

CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.

ND4J: Mac/OSX CUDA support dropped

Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.

DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:

  • ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)

  • SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)

  • BatchNormalization layer (and BatchNorm ND4J op)

  • LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)

  • Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)

MKL-DNN support for other layer types (such as LSTM) will be added in a future release.

MKL-DNN can be disabled globally (ND4J and DL4J) using Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);

MKL-DNN can be disabled globally for specific ops by setting ND4J_MKL_FALLBACK environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp

ND4J: Improved Performance due to Memory Management Changes

Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.

In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.

To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:

ND4J: Improved Rank 0/1 Array Support

In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the getRow(long,boolean) and getColumn(long,boolean) methods can be used.

DL4J: Attention layers added

Deeplearning4J

Deeplearning4J: Features and Enhancements

Deeplearning4J: Bug Fixes and Optimizations

  • DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly (Linkarrow-up-right)

  • cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown (Linkarrow-up-right)

  • Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training (Linkarrow-up-right)

  • Fixed a bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs (Linkarrow-up-right)

  • Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores (Linkarrow-up-right)

  • Fixed an edge case with ParallelInference on CUDA where (very rarely) input array operations (such as normalization) may not be fully completed before transferring an array between threads (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an edge case with KFoldIterator when the total number of examples is not a multiple of the batch size (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where DL4J UI could throw a NoClassDefFoundError on Java 9/10/11 (Linkarrow-up-right, Linkarrow-up-right)

  • Keras import: added aliases for weight initialization (Linkarrow-up-right)

  • Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned (Linkarrow-up-right)

  • Fixed workspace issue with ElementwiseVertex with single input (Linkarrow-up-right)

  • Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception (Linkarrow-up-right)

  • Fixed issue where LossMultiLabel would generate NaNs when all labels in minibatch are the same class. Now 0 gradient is returned instead. (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format (Linkarrow-up-right)

  • Fixed issue where BaseDatasetIterator.next() would not apply preprocessors, if one was set (Linkarrow-up-right)

  • Improved default configuration for CenterLossOutputLayer (Linkarrow-up-right)

  • Fixed an issue for UNet non-pretrained configuration (Linkarrow-up-right)

  • Fixed an issue where Word2Vec VocabConstructor could deadlock under some circumstances (Linkarrow-up-right)

  • SkipGram and CBOW (used in Word2Vec) were made native operations for better performance (Linkarrow-up-right)

  • Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener (Linkarrow-up-right)

  • Optimization: Workspaces were added to SequenceVectors and Word2Vec (Linkarrow-up-right)

  • Improved validation for RecordReaderDataSetIterator (Linkarrow-up-right)

  • Improved handling of unknown words in WordVectors implementation (Linkarrow-up-right)

  • Yolo2OutputLayer: Added validation for incorrect labels shape. (Linkarrow-up-right)

  • LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) (Linkarrow-up-right)

  • Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases (Linkarrow-up-right)

  • Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() (Linkarrow-up-right)

  • Async iterators are now used in EarlyStoppingTrained to improve data loading performance (Linkarrow-up-right)

  • EmbeddingLayer and EmbeddingSequenceLayer performance has been improved on CUDA (Linkarrow-up-right)

  • Removed outdated/legacy scala tools repository (Linkarrow-up-right, Linkarrow-up-right)

  • Fixed issues in L2NormalizeVertex equals/hashcode methods (Linkarrow-up-right)

  • Fixed Workspace issue in ConvolutionalListener (Linkarrow-up-right)

  • Fixed EvaluationBinary falsePositiveRate calculation (Linkarrow-up-right)

  • Added validation and useful exception for MultiLayerNetwork.output(DataSetIterator) methods (Linkarrow-up-right)

  • Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called (Linkarrow-up-right)

  • Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training (Linkarrow-up-right)

  • Improved performance for KMeans implementation (Linkarrow-up-right)

  • Fixed an issue with rnnGetPreviousState for RNNs in 'wrapper' layers such as FrozenLayer (Linkarrow-up-right)

  • Keras import: Fixed an issue with order of words when importing some Keras tokenizers (Linkarrow-up-right)

  • Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class (Linkarrow-up-right)

  • Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers (Linkarrow-up-right)

  • Keras import: fixed an import issue with input type inference for some RNN models (Linkarrow-up-right)

  • Fixed some padding issues in LocallyConnected1D/2D layers (Linkarrow-up-right)

ND4J and SameDiff

ND4J/SameDiff: Features and Enhancements

ND4J/SameDiff: API Changes (Transition Guide): 1.0.0-beta3 to 1.0.0-beta4

  • ND4J datatypes - significant changes, see highlights at top of this section

  • nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api (Linkarrow-up-right)

  • When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. (Linkarrow-up-right)

  • Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. (Linkarrow-up-right)

  • Removed Nd4j.ENFORCE_NUMERICAL_STABILITY, Nd4j.copyOnOps, etc (Linkarrow-up-right)

  • SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Linkarrow-up-right)

  • Nd4j.emptyLike(INDArray) has been removed. Use Nd4j.like(INDArray) instead (Linkarrow-up-right)

  • org.nd4jutil.StringUtils removed; suggest using Apache commons lang3 StringUtils instead (Linkarrow-up-right)

  • ND4J Jackson RowVector(De)Serializer has been deprecated due to datatype changes; NDArrayText(De)Serializer should be used instead (Linkarrow-up-right, Linkarrow-up-right)

  • nd4j-instrumentation module has been removed due to lack of use/maintenance (Linkarrow-up-right)

ND4J/SameDiff: Bug Fixes and Optimizations

ND4J: Known Issues

  • Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.

  • Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See Issue 7637arrow-up-right. Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.

DataVec

DataVec: Features and Enhancements

DataVec: Optimizations and Bug Fixes

Arbiter

Arbiter: Enhancements

Arbiter: Fixes

  • Fixed an issue where early stopping used in Arbiter would result in a serialization exception (Linkarrow-up-right)

Was this helpful?