Initial multi-GPU support viable for standalone and Spark.
Refactored the Spark API significantly
Added CuDNN wrapper
Performance improvements for ND4J
Introducing DataVec: Lots of new functionality for transforming, preprocessing, cleaning data. (This replaces Canova)
New DataSetIterators for feeding neural nets with existing data: ExistingDataSetIterator, Floats(Double)DataSetIterator, IteratorDataSetIterator
New learning algorithms for word2vec and paravec: CBOW and PV-DM respectively
New native ops for better performance: DropOut, DropOutInverted, CompareAndSet, ReplaceNaNs
Shadow asynchronous datasets prefetch enabled by default for both MultiLayerNetwork and ComputationGraph
Better memory handling with JVM GC and CUDA backend, resulting in significantly lower memory footprint