Native operations for nd4j. Build using cmake
GCC 4.9+
CUDA Toolkit Versions 10 or 11
CMake 3.8 (as of Nov 2017, in near future will require 3.9)
There's few additional arguments for buildnativeoperations.sh
script you could use:
More about AutoVectorization report
You can provide the compute capability for your card on the NVIDIA website here or use auto. Please also check your Cuda Toolkit Release notes for supported and dropped features. Here is the latest CUDA Toolkit Release note. You can find the same information for the older Toolkit versions in the CUDA archives.
Download the NDK, extract it somewhere, and execute the following commands, replacing android-xxx
with either android-arm
or android-x86
:
Run ./setuposx.sh (Please ensure you have brew installed)
Depends on the distro - ask in the earlyadopters channel for specifics on distro
The standard development headers are needed.
See Windows.md
Set a LIBND4J_HOME as an environment variable to the libnd4j folder you've obtained from GIT
Note: this is required for building nd4j as well.
Setup cpu followed by gpu, run the following on the command line:
For standard builds:
For Debug builds:
For release builds (default):
OpenMP 4.0+ should be used to compile libnd4j. However, this shouldn't be any trouble, since OpenMP 4 was released in 2015 and should be available on all major platforms.
We can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. In either case, simply add the path containing libmkl_rt.so
(or mkl_rt.dll
on Windows), say /path/to/intel64/lib/
, to the LD_LIBRARY_PATH
environment variable on Linux (or PATH
on Windows), and build or run your Java application as usual. If you get an error message like undefined symbol: omp_get_num_procs
, it probably means that libiomp5.so
, libiomp5.dylib
, or libiomp5md.dll
is not present on your system. In that case though, it is still possible to use the GNU version of OpenMP by setting these environment variables on Linux, for example:
Sometimes the above steps might not be all you need to do. Another additional step might be the need to add:
This ensures that mkl will be found first and liked to.
If on Ubuntu (14.04 or above) or CentOS (6 or above), this repository is also set to create packages for your distribution. Let's assume you have built:
for the cpu, your command-line was ./buildnativeoperations.sh ...
:
for the gpu, your command-line was ./buildnativeoperations.sh -c cuda ...
:
The package upload script is in packaging. The upload command for an rpm built for cpu is:
The upload command for a deb package built for cuda is:
Tests are written with gtest, run using cmake. Tests are currently under tests_cpu/
There are 2 directories for running tests:
libnd4j_tests: These are older legacy ops tests.
layers_tests: This covers the newer graph operations and ops associated with samediff.
For running the tests, we currently use cmake or CLion to run the tests.
To run tests using CUDA backend it's pretty much similar process:
./buildnativeoperations.h -c cuda -cc -b debug -t -j
./blasbuild/cuda/tests_cpu/layers_tests/runtests (.exe on Windows)
In order to extend and update libnd4j, understanding libnd4j's various cmake flags is the key. Many of them are in buildnativeoperations.sh. The pom.xml is used to integrate and auto configure the project for building with deeplearning4j.
At a minimum, you will want to enable tests. An example default set of flags for running tests and getting cpu builds working is as follows:
The way the main build script works, it dynamically generates a set of flags suitable for use for building the projects. Understanding the build script will go a long way in to configuring cmake for your particular IDE.
-cc and --compute option examples
description
-cc all
builds for common GPUs
-cc auto
tries to detect automatically
-cc Maxwell
GPU microarchitecture codename
-cc 75
compute capability 7.5 without a dot
-cc 7.5
compute capability 7.5 with a dot
-cc "Maxwell 6.0 7.5"
space-separated multiple arguments within quotes (note: numbers only with a dot)