arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Overview of working with libnd4j

Native operations for nd4j. Build using cmake

hashtag
Prerequisites

  • GCC 4.9+

  • CUDA Toolkit Versions 10 or 11

  • CMake 3.8 (as of Nov 2017, in near future will require 3.9)

hashtag
Additional build arguments

There's few additional arguments for buildnativeoperations.sh script you could use:

You can provide the compute capability for your card or use auto. Please also check your Cuda Toolkit Release notes for supported and dropped features. Here is . You can find the same information for the older Toolkit versions .

hashtag
OS Specific Requirements

hashtag
Android

, extract it somewhere, and execute the following commands, replacing android-xxx with either android-arm or android-x86:

hashtag
OSX

Run ./setuposx.sh (Please ensure you have brew installed)

See

hashtag
Linux

Depends on the distro - ask in the earlyadopters channel for specifics on distro

hashtag
Ubuntu Linux 15.10

hashtag
Ubuntu Linux 16.04

The standard development headers are needed.

hashtag
CentOS 6

hashtag
Windows

See

hashtag
Setup for All OS

  1. Set a LIBND4J_HOME as an environment variable to the libnd4j folder you've obtained from GIT

    • Note: this is required for building nd4j as well.

  2. Setup cpu followed by gpu, run the following on the command line:

hashtag
OpenMP support

OpenMP 4.0+ should be used to compile libnd4j. However, this shouldn't be any trouble, since OpenMP 4 was released in 2015 and should be available on all major platforms.

hashtag
Linking with MKL

We can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. In either case, simply add the path containing libmkl_rt.so (or mkl_rt.dll on Windows), say /path/to/intel64/lib/, to the LD_LIBRARY_PATH environment variable on Linux (or PATH on Windows), and build or run your Java application as usual. If you get an error message like undefined symbol: omp_get_num_procs, it probably means that libiomp5.so, libiomp5.dylib, or libiomp5md.dll is not present on your system. In that case though, it is still possible to use the GNU version of OpenMP by setting these environment variables on Linux, for example:

hashtag
Troubleshooting MKL

Sometimes the above steps might not be all you need to do. Another additional step might be the need to add:

This ensures that mkl will be found first and liked to.

hashtag
Packaging

If on Ubuntu (14.04 or above) or CentOS (6 or above), this repository is also set to create packages for your distribution. Let's assume you have built:

  • for the cpu, your command-line was ./buildnativeoperations.sh ...:

  • for the gpu, your command-line was ./buildnativeoperations.sh -c cuda ...:

hashtag
Uploading package to Bintray

The package upload script is in packaging. The upload command for an rpm built for cpu is:

The upload command for a deb package built for cuda is:

hashtag
Running tests

Tests are written with , run using cmake. Tests are currently under tests_cpu/

There are 2 directories for running tests:

  1. libnd4j_tests: These are older legacy ops tests.

  2. layers_tests: This covers the newer graph operations and ops associated with samediff.

For running the tests, we currently use cmake or CLion to run the tests.

To run tests using CUDA backend it's pretty much similar process:

  1. ./buildnativeoperations.h -c cuda -cc -b debug -t -j

  2. ./blasbuild/cuda/tests_cpu/layers_tests/runtests (.exe on Windows)

hashtag
Development

In order to extend and update libnd4j, understanding libnd4j's various cmake flags is the key. Many of them are in buildnativeoperations.sh. The pom.xml is used to integrate and auto configure the project for building with deeplearning4j.

At a minimum, you will want to enable tests. An example default set of flags for running tests and getting cpu builds working is as follows:

The way the main build script works, it dynamically generates a set of flags suitable for use for building the projects. Understanding the build script will go a long way in to configuring cmake for your particular IDE.

space-separated multiple arguments within quotes (note: numbers only with a dot)

For standard builds:

  • For Debug builds:

  • For release builds (default):

  • -cc and --compute option examples

    description

    -cc all

    builds for common GPUs

    -cc auto

    tries to detect automatically

    -cc Maxwell

    GPU microarchitecture codename

    -cc 75

    compute capability 7.5 without a dot

    -cc 7.5

    compute capability 7.5 with a dot

    More about AutoVectorization reportarrow-up-right
    on the NVIDIA website herearrow-up-right
    the latest CUDA Toolkit Release notearrow-up-right
    in the CUDA archivesarrow-up-right
    Download the NDKarrow-up-right
    macOSx10 CPU only.mdarrow-up-right
    Windows.mdarrow-up-right
    gtestarrow-up-right

    -cc "Maxwell 6.0 7.5"

     ./buildnativeoperations.sh
     ./buildnativeoperations.sh -c cuda -ัั YOUR_DEVICE_ARCH
     ./buildnativeoperations.sh blas -b debug
     ./buildnativeoperations.sh blas -c cuda -ัั YOUR_DEVICE_ARCH -b debug
     ./buildnativeoperations.sh
     ./buildnativeoperations.sh -c cuda -ัั YOUR_DEVICE_ARCH
     -a XXXXXXXX// shortcut for -march/-mtune, i.e. -a native
     -b release OR -b debug // enables/desables debug builds. release is considered by default
     -j XX // this argument defines how many threads will be used to binaries on your box. i.e. -j 8 
     -cc XX// CUDA-only argument, builds only binaries for target GPU architecture. use this for fast builds
     --check-vectorization  auto-vectorization report for developers. (Currently, only GCC is supported)
    git clone https://github.com/eclipse/deeplearning4j
    export ANDROID_NDK=/path/to/android-ndk/
    cd deeplearning4j/libnd4j
    bash buildnativeoperations.sh -platform android-xxx
    cd ../nd4j
    mvn clean install -Djavacpp.platform=android-xxx -DskipTests -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
    wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
    sudo apt-get update
    sudo apt-get install cuda
    sudo apt-get install cmake
    sudo apt-get install gcc-4.9
    sudo apt-get install g++-4.9
    sudo apt-get install git
    git clone https://github.com/deeplearning4j/libnd4j
    cd libnd4j/
    export LIBND4J_HOME=~/libnd4j/
    sudo rm /usr/bin/gcc
    sudo rm /usr/bin/g++
    sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
    sudo ln -s /usr/bin/g++-4.9 /usr/bin/g++
    ./buildnativeoperations.sh
    ./buildnativeoperations.sh -c cuda -ัั YOUR_DEVICE_ARCH
    sudo apt install cmake
    sudo apt install nvidia-cuda-dev nvidia-cuda-toolkit nvidia-361
    export TRICK_NVCC=YES
    ./buildnativeoperations.sh
    ./buildnativeoperations.sh -c cuda -ัั YOUR_DEVICE_ARCH
    yum install centos-release-scl-rh epel-release
    yum install devtoolset-3-toolchain maven30 cmake3 git
    scl enable devtoolset-3 maven30 bash
    ./buildnativeoperations.sh
    ./buildnativeoperations.sh -c cuda -ัั YOUR_DEVICE_ARCH
    export MKL_THREADING_LAYER=GNU
    export LD_PRELOAD=/usr/lib64/libgomp.so.1
    export LD_LIBRARY_PATH=/opt/intel/lib/intel64/:/opt/intel/mkl/lib/intel64
    cd blasbuild/cpu
    make package
    cd blasbuild/cuda
    make package
    ./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cpu/libnd4j-0.8.0.fc7.3.1611.x86_64.rpm https://github.com/deeplearning4j
    ./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cuda/libnd4j-0.8.0.fc7.3.1611.x86_64.deb https://github.com/deeplearning4j
    -DSD_CPU=true -DBLAS=TRUE -DSD_ARCH=x86-64 -DSD_EXTENSION= -DSD_LIBRARY_NAME=nd4jcpu -DSD_CHECK_VECTORIZATION=OFF  -DSD_SHARED_LIB=ON -DSD_STATIC_LIB=OFF -DSD_BUILD_MINIFIER=false -DSD_ALL_OPS=true -DCMAKE_BUILD_TYPE=Release -DPACKAGING=none  -DSD_BUILD_TESTS=OFF -DCOMPUTE=all -DOPENBLAS_PATH=C:/Users/agibs/.javacpp/cache/openblas-0.3.10-1.5.4-windows-x86_64.jar/org/bytedeco/openblas/windows-x86_64 -DDEV=FALSE -DCMAKE_NEED_RESPONSE=YES -DMKL_MULTI_THREADED=TRUE -DSD_BUILD_TESTS=YES