Comment on page
1.0.0-beta4
Main highlight: full multi-datatype support for ND4J and DL4J. In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype (float or double), set globally. Now, arrays of all datatypes may be used simultaneously. The following datatypes are supported:
- DOUBLE: double precision floating point, 64-bit (8 byte)
- FLOAT: single precision floating point, 32-bit (4 byte)
- HALF: half precision floating point, 16-bit (2 byte), "FP16"
- LONG: long signed integer, 64 bit (8 byte)
- INT: signed integer, 32 bit (4 byte)
- SHORT: signed short integer, 16 bit (2 byte)
- UBYTE: unsigned byte, 8 bit (1 byte), 0 to 255
- BYTE: signed byte, 8 bit (1 byte), -128 to 127
- BOOL: boolean type, (0/1, true/false). Uses ubyte storage for easier op parallelization
- UTF8: String array type, UTF8 format
ND4J Behaviour changes of note:
- When creating an INDArray from a Java primitive array, the INDArray datatype will be determined by the primitive array type (unless a datatype is specified)
- For example: Nd4j.createFromArray(double[]) -> DOUBLE datatype INDArray
- Similarly, Nd4j.scalar(1), Nd4j.scalar(1L), Nd4j.scalar(1.0) and Nd4j.scalar(1.0f) will produce INT, LONG, DOUBLE and FLOAT type scalar INDArrays respectively
- Some operations require matched datatypes for operands
- For example, if x and y are different datatypes, a cast may be required: x.add(y.castTo(x.dataType()))
- Some operations have datatype restrictions: for example, sum on a UTF8 array is not supported, nor is variance on a BOOL array. For some operations on boolean arrays (such as sum), casting to an integer or floating point type first may make sense.
DL4J Behaviour changes of note:
- MultiLayerNetwork/ComputationGraph no longer depend in any way on ND4J global datatype.
- The datatype of a network (DataType for it's parameters and activations) can be set during construction using
NeuralNetConfigutation.Builder().dataType(DataType)
- Networks can be converted from one type to another (double to float, float to half etc) using
MultiLayerNetwork/ComputationGraph.convertDataType(DataType)
method
Main new methods:
- Nd4j.create(), zeros(), ones(), linspace(), etc methods with DataType argument
- INDArray.castTo(DataType) method - to convert INDArrays from one datatype to another
- New Nd4j.createFromArray(...) methods for
ND4J/DL4J: CUDA - 10.1 support added, CUDA 9.0 support dropped
CUDA versions supported in 1.0.0-beta4: CUDA 9.2, 10.0, 10.1.
ND4J: Mac/OSX CUDA support dropped
Mac (OSX) CUDA binaries are no longer provided. Linux (x86_64, ppc64le) and Windows (x86_64) CUDA support remains. OSX CPU support (x86_64) is still available.
DL4J/ND4J: MKL-DNN Support Added DL4J (and ND4J conv2d etc ops) now support MKL-DNN by default when running on CPU/native backend. MKL-DNN support is implemented for the following layer types:
- ConvolutionLayer and Convolution1DLayer (and Conv2D/Conv2DDerivative ND4J ops)
- SubsamplingLayer and Subsampling1DLayer (and MaxPooling2D/AvgPooling2D/Pooling2DDerivative ND4J ops)
- BatchNormalization layer (and BatchNorm ND4J op)
- LocalResponseNormalization layer (and LocalResponseNormalization ND4J op)
- Convolution3D layer (and Conv3D/Conv3DDerivative ND4J ops)
MKL-DNN support for other layer types (such as LSTM) will be added in a future release.
MKL-DNN can be disabled globally (ND4J and DL4J) using
Nd4jCpu.Environment.getInstance().setUseMKLDNN(false);
MKL-DNN can be disabled globally for specific ops by setting
ND4J_MKL_FALLBACK
environment variable to the name of the operations to have MKL-DNN support disabled for. For example: ND4J_MKL_FALLBACK=conv2d,conv2d_bp
ND4J: Improved Performance due to Memory Management Changes
Prior releases of ND4J used periodic garbage collection (GC) to release memory that was not allocated in a memory workspace. (Note that DL4J uses workspaces for almost all operations by default hence periodic GC could frequently be disabled when training DL4J networks). However, the reliance on garbage collection resulted in a performance overhead that scaled with the number of objects in the JVM heap.
In 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, GC will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces.
To re-enable periodic GC (as per the default in beta3) and set the GC frequency to every 5 seconds (5000ms) you can use:
Nd4j.getMemoryManager().togglePeriodicGc(true);
Nd4j.getMemoryManager().setAutoGcWindow(5000);
ND4J: Improved Rank 0/1 Array Support
In prior versions of ND4J, scalars and vectors would sometimes be rank 2 instead of rank 0/1 when getting rows/columns, getting sub-arrays using INDArray.get(NDArrayIndex...) or when creating arrays from Java arrays/scalars. Now, behaviour should be more consistent for these rank 0/1 cases. Note to maintain old behaviour for getRow and getColumn (i.e., return rank 2 array with shape [1,x] and [x,1] respectively), the
getRow(long,boolean)
and getColumn(long,boolean)
methods can be used.DL4J: Attention layers added
- Added dot product attention layers: AttentionVertex, LearnedSelfAttentionLayer, RecurrentAttentionLayer and SelfAttentionLayer
- The parameter/activation datatypes for new models can be set for new networks using the
dataType(DataType)
method on NeuralNetConfiguration.Builder (Link) - EmbeddingLayer and EmbeddingSequenceLayer builders now have
.weightInit(INDArray)
and.weightInit(Word2Vec)
methods for initializing parameters from pretrained word vectors (Link) - PerformanceListener can now be configured to report garbage collection information (number/duration) Link
- Evaluation class will now check for NaNs in the predicted output and throw an exception instead treating argMax(NaNs) as having value 0 (Link)
- Added ModelAdapter for ParallelInference for convenience and for use cases such as YOLO (allows improved performance by avoiding detached (out-of-workspace) arrays) (Link)
- Added
ComputationGraph.output(List<String> layers, boolean train, INDArray[] features, INDArray[] featureMasks)
method to get the activations for a specific set of layers/vertices only (without redundant calculations) (Link) - Added Capsule Network layers (no GPU acceleration until next release) - CapsuleLayer, CapsuleStrengthLayer and PrimaryCapsules (Link)
- Layer/NeuralNetConfiguration builders now have getter/setter methods also, for better Kotlin support (Link)
- CheckpointListener now has static availableCheckpoints(File), loadCheckpointMLN(File, int) and lostLastCheckpointMLN(File) etc methods (Link)
- MultiLayerNetwork/ComputationGraph now validate and throw an exception in certain incompatible RNN configurations, like truncated backpropagation through time combined with LastTimeStepLayer/Vertex (Link)
- Deeplearning4j UI now has multi-user/multi-session support - use
UIServer.getInstance(boolean multiSession, Function<String,StatsStorage>)
to start UI in multi-session mode (Link) - WordVectorSerializer now supports reading and exporting text forwat vectors via WordVectorSerializer.writeLookupTable and readLookupTable (Link]
- ComputationGraph GraphBuilder now has an appendLayer method that can be used to add layers connected to the last added layer/vertex (Link)
- DL4J Spark training: fix for shared clusters (multiple simultaneous training jobs) - Aeron stream ID now generated randomly (Link)
- cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown (Link)
- Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training (Link)
- Fixed issue where tensorAlongDimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs (Link)
- Fixed an edge case issue with ComputationGraph.getParam(String) where the layer name contains underscores (Link)
- Fixed issue where dropout instances would not be correctly cloned when network configuration was cloned (Link)
- Fixed issue with UI where detaching StatsStorage could attempt to remove storage twice, resulting in an exception (Link)
- Fixed an issue where DepthwiseConv2D weight could be wrong shape on restoring network from saved format (Link)
- Fixed an issue where references to detached StatsListener instances would be maintained, potentially leading to memory issues when using InMemoryStatsListener (Link)
- LastTimeStepLayer will now throw an exception when the input mask is all 0s (no data - no last time step) (Link)
- Fixed an issue where MultiLayerNetwork/ComputationGraph.setLearningRate method could lead to invalid updater state in some rare cases (Link)
- Fixed an issue where Conv1D layer would calculate output length in MultiLayerNetwork.summary() (Link)
- Fixed minor issue where ComputationGraph.summary() would throw a NullPointerException if init() had not already been called (Link)
- Fixed a ComputationGraph issue where an input into a single layer/vertex repeated multiple times could fail during training (Link)
- Keras import: fixed issue with possible UnsupportedOperationException in KerasTokenizer class (Link)
- Keras import: fixed an import issue with models combining embeddings, reshape and convolution layers (Link)
- Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays (Link)
- SameDiff: Added TensorFlowImportValidator tool to determine if a TensorFlow graph can likely be imported into SameDiff. Reports the operations used and whether they are supported in SameDiff (Link)
- Added basic ("technology preview") of SameDiff UI. Should be considered early WIP with breaking API changes expected in future releases. Supports plotting of SameDiff graphs as well as various metrics (line charts, histograms, etc)
- ND4J/SameDiff - new operations added:
- SameDiff: reductions operations now support "dynamic" (non-constant) inputs for axis argument (Link)
- SameDiff: Added SDVariable.convertToVariable() and convertToConstant() - to change SDVariable type (Link)
- SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)
- Added OpExecutioner.inspectArray(INDArray) method to get summary statistics for analysis/debugging purposes (Link)
- Added SameDiff SDIndex.point(long, boolean keepDim) method (to keep point indices in output array as size 1 axis) (Link)
- Added SameDiff ProtoBufToFlatBufConversion command line tool for doing TensorFlow frozen model (protobuf) to SameDiff FlatBuffers conversion (Link)
- ND4J datatypes - significant changes, see highlights at top of this section
- nd4j-base64 module (deprecated in beta3) has been removed. Nd4jBase64 class has been moved to nd4j-api (Link)
- When specifying arguments for op execution along dimension (for example, reductions) the reduction axis are now specified in the operation constructor - not separately in the OpExecutioner call. (Link)
- Removed old Java loop-based BooleanIndexing methods. Equivalent native ops should be used instead. (Link)
- SameDiff "op creator" methods (SameDiff.tanh(), SameDiff.conv2d(...) etc) have been moved to subclasses - access creators via SameDiff.math()/random()/nn()/cnn()/rnn()/loss() methods or SameDiff.math/random/nn/cnn/rnn/loss fields (Link)
- Fixed an issue where empty NDArrays would be reported as having scalar shape information, length 1 (Link)
- Optimization: libnd4j (c++) indexing for ops will use uint for faster offset calculations when required and possible (Link)
- SameDiff: Improved gradient calculation performance/efficiency; "gradients" are now no longer defined for non-floating-point variables, and variables that aren't required to calculate loss or parameter gradients (Link)
- Behaviour of IEvaluation instances now no longer depends on the global (default) datatype setting (Link)
- INDArray.get(point(x), y) or .get(y, point(x)) now returns rank 1 arrays when performed on rank 2 arrays (Link)
- Most CustomOperation operations (such as those used in SameDiff) are CPU only until next release. GPU support was not completed in time for 1.0.0-beta4 release.
- Some users with Intel Skylake CPUs have reported deadlocks on MKL-DNN convolution 2d backprop operations (DL4J ConvolutionLayer backprop, ND4J "conv2d_bp" operation) when OMP_NUM_THREADS is set to 8 or higher. Investigations suggest this is likely an issue with MKL-DNN, not DL4J/ND4J. See Issue 7637. Workaround: Disable MKL-DNN for conv2d_bp operation via ND4J_MKL_FALLBACK (see earlier) or disable MKL-DNN globally, for Skylake CPUs.
- Added TokenizerBagOfWordsTermSequenceIndexTransform (TFIDF transform), GazeteerTransform (binary vector for word present) and MultiNlpTransform transforms; added BagOfWordsTransform interface (Link)
- Fixed an issue where early stopping used in Arbiter would result in a serialization exception (Link)
Last modified 1yr ago