Deeplearning4j
Community ForumND4J JavadocDL4J Javadoc
EN 1.0.0-M1.1
EN 1.0.0-M1.1
  • Deeplearning4j Suite Overview
  • Release Notes
    • 1.0.0-M1.1
    • 1.0.0-M1
    • 1.0.0-beta7
    • 1.0.0-beta6
    • 1.0.0-beta5
    • 1.0.0-beta4
    • 1.0.0-beta3
    • 1.0.0-beta2
    • 1.0.0-beta
    • 1.0.0-alpha
    • 0.9.1
    • 0.9.0
    • 0.8.0
    • 0.7.2
    • 0.7.1
    • 0.7.0
    • 0.6.0
    • 0.5.0
    • 0.4.0
  • Multi-Project
    • Tutorials
      • Beginners
      • Quickstart
    • How To Guides
      • Import in to your favorite IDE
      • Contribute
        • Eclipse Contributors
      • Developer Docs
        • Github Actions/Build Infra
        • Javacpp
        • Release
        • Testing
      • Build From Source
      • Benchmark
      • Beginners
    • Reference
      • Examples Tour
    • Explanation
      • The core workflow
      • Configuration
        • Backends
          • Performance Issues
          • CPU
          • Cudnn
        • Memory
          • Workspaces
      • Build Tools
      • Snapshots
      • Maven
  • Deeplearning4j
    • Tutorials
      • Quick Start
      • Language Processing
        • Doc2Vec
        • Sentence Iterator
        • Tokenization
        • Vocabulary Cache
    • How To Guides
      • Custom Layers
      • Keras Import
        • Functional Models
        • Sequential Models
        • Custom Layers
        • Keras Import API Overview
          • Advanced Activations
          • Convolutional Layers
          • Core Layers
          • Embedding Layers
          • Local Layers
          • Noise Layers
          • Normalization Layers
          • Pooling Layers
          • Recurrent Layers
          • Wrapper Layers
        • Supported Features Overview
          • Activations
          • Constraints
          • Initializers
          • Losses
          • Optimizers
          • Regularizers
      • Tuning and Training
        • Visualization
        • Troubleshooting Training
        • Early Stopping
        • Evaluation
        • Transfer Learning
    • Reference
      • Model Zoo
        • Zoo Models
      • Activations
      • Auto Encoders
      • Computation Graph
      • Convolutional Layers
      • DataSet Iterators
      • Layers
      • Model Listeners
      • Saving and Loading Models
      • Multi Layer Network
      • Recurrent Layers
      • Updaters/Optimizers
      • Vertices
      • Word2vec/Glove/Doc2Vec
    • Explanation
  • datavec
    • Tutorials
      • Overview
    • How To Guides
    • Reference
      • Analysis
      • Conditions
      • Executors
      • Filters
      • Normalization
      • Operations
      • Transforms
      • Readers
      • Records
      • Reductions
      • Schemas
      • Serialization
      • Visualization
    • Explanation
  • Nd4j
    • Tutorials
      • Quickstart
    • How To Guides
      • Other Framework Interop
        • Tensorflow
        • TVM
        • Onnx
      • Matrix Manipulation
      • Element wise Operations
      • Basics
    • Reference
      • Op Descriptor Format
      • Tensor
      • Syntax
    • Explanation
  • Samediff
    • Tutorials
      • Quickstart
    • How To Guides
      • Importing Tensorflow
      • Adding Operations
        • codegen
    • Reference
      • Operation Namespaces
        • Base Operations
        • Bitwise
        • CNN
        • Image
        • LinAlg
        • Loss
        • Math
        • NN
        • Random
        • RNN
      • Variables
    • Explanation
      • Model Import Framework
  • Libnd4j
    • How To Guides
      • Building on Windows
      • Building for raspberry pi or Jetson Nano
      • Building on ios
      • How to Add Operations
      • How to Setup CLion
    • Reference
      • Understanding graph execution
      • Overview of working with libnd4j
      • Helpers Overview (CUDNN, OneDNN,Armcompute)
    • Explanation
  • Python4j
    • Tutorials
      • Quickstart
    • How To Guides
      • Write Python Script
    • Reference
      • Python Types
      • Python Path
      • Garbage Collection
      • Python Script Execution
    • Explanation
  • RL4j
    • Tutorials
    • How To Guides
    • Reference
    • Explanation
  • Spark
    • Tutorials
      • DL4J on Spark Quickstart
    • How To Guides
      • How To
      • Data How To
    • Reference
      • Parameter Server
      • Technical Reference
    • Explanation
      • Spark API Reference
  • codegen
Powered by GitBook
On this page
  • Using Deeplearning4j with cuDNN
  • Cudnn setup
  • Using cuDNN via nd4j
  • Using cudnn via deeplearning4j

Was this helpful?

Edit on Git
Export as PDF
  1. Multi-Project
  2. Explanation
  3. Configuration
  4. Backends

Cudnn

Using the NVIDIA cuDNN library with DL4J.

PreviousCPUNextMemory

Last updated 3 years ago

Was this helpful?

Using Deeplearning4j with cuDNN

There are 2 ways of using cudnn with deeplearning4j. One is an older way described below that is built in to the various deeplearning4j layers at the java level.

The other is to use the new nd4j cuda bindings that link to cudnn at the c++ level. Both will be described below. The newer way first, followed by the old way.

Cudnn setup

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

Note there are multiple combinations of cuDNN and CUDA supported. Deeplearning4j's cuda support is based on . The way to read the versioning is: cuda version - cudnn version - javacpp version. For example, if the cuda version is set to 11.2, you can expect us to support cudnn 8.1.

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\ on Windows).

Alternatively, in the case of the most recent supported cuda version, cuDNN comes bundled with the "redist" package of the . , we can add the following dependencies instead of installing CUDA and cuDNN:

 <dependency>
     <groupId>org.bytedeco</groupId>
     <artifactId>cuda-platform-redist</artifactId>
     <version>$CUDA_VERSION-$CUDNN_VERSIUON-$JAVACPP_VERSION</version>
 </dependency>

The same versioning scheme for redist applies to the cuda bindings that leverage an installed cuda.

Using cuDNN via nd4j

Similar to our avx bindings, nd4j leverages our c++ library libnd4j for running mathematical operations. In order to use cudnn, all you need to do is change the cuda backend dependency from:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.0</artifactId>
    <version>1.0.0-M1</version>
</dependency>

to

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

or for cuda 11.0:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
</dependency>
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-11.2</artifactId>
    <version>1.0.0-M1</version>
    <classifier>linux-x86_64-cudnn</classifier>
</dependency>

For jetson nano cuda 10.2:

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
</dependency>

<dependency>
  <groupId>org.nd4j</groupId>
  <artifactId>nd4j-cuda-10.2</artifactId>
  <version>1.0.0-M1.1</version>
  <version>linux-arm64</version>
</dependency>

Note that we are only adding an additional dependency. The reason we use an additional classifier is to pull in an optional dependency on cudnn based routines. The default does not use cudnn, but instead built in standalone routines for various operations implemented in cudnn such as conv2d and lstm.

For users of the -platform dependencies such as nd4j-cuda-11.2-platform, this classifier is still required. The -platform dependencies try to set sane defaults for each platform, but give users the option to include whatever they want. If you need optimizations, please become familiar with this.

Using cudnn via deeplearning4j

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-11.0, or deeplearning4j-cuda-11.2, for example:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.0</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

or

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>

or

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-11.2</artifactId>
    <version>1.0.0-M1.1</version>
</dependency>
    // for the whole network
    new NeuralNetConfiguration.Builder()
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...
    // or separately for each layer
    new ConvolutionLayer.Builder(h, w)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the , instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

NVIDIA cuDNN
javacpp's cuda bindings
JavaCPP Presets for CUDA
After agreeing to the license
NO_WORKSPACE mode settable via the network configuration