Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
How to conduct a release to Maven Central
Deeplearning4j has several steps to a release. Below is a brief outline with follow on descriptions.
Compile libnd4j for different cpu architectures
Ensure the current javacpp dependencies such as python, mkldnn, cuda, .. are up to date
Run all integration tests on core platforms (windows, mac, linux) with both cpu and gpu
Create a staging repository for testing using github actions running manually on each platform
Update the examples to be compatible with the latest release
Run the deeplearning4j-examples as a litmus tests on all platforms (including embedded)
to sanity check platform specific numerical bugs using the staging repository
Double check any user related bugs to see if they should block a release
Hit release button
Perform follow up release of -platform projects under same version
Tag release
Compiling libnd4j on different cpu architectures ensures there is platform optimized math in c++ for each platform. The single code base is a self contained cmake project that can be run on different platforms. In each github actions workflow there are steps for deploying for each platform.
At the core of compiling from source for libnd4j is a maven pom.xml that is run as part of the overall build process that invokes our build script with various parameters that then get passed to our overall cmake structure for compilation. This script exists to formalize some of the required parameters for invokving cmake. Any developer is welcome to invoke cmake directly.
Platform compatibility
We currently compile libnd4j on ubuntu 16.04. This means glibc 2.23.
For our cuda builds, we use gcc7.
Users of older glibc versions may need to compile from source. For our standard release, we try to keep it reasonably old, but do not support end of lifed
end of linux distributions for public builds.
Platform specific helpers
Each build of libnd4j links against an accelerated backend for blas and convolution operations such as onednn, cudnn, or armcompute The implementations for each platform can be found here
This is a step that just ensures that the dl4j release matches the current state of the dependencies provided by javacpp on maven central. This affects every module including python4j, nd4j-native/cuda, datavec-image, among others. The versions of everything can be found in the top level deeplearning4j pom The general convention is library version followed by a - and the version of javacpp that that version uses.
Of note here is that certain older versions of libraries can use older javacpp versions. It is recommended that that the desired version be up to date if possible. Otherwise, if an older version of javacpp is the only version available, this is generally ok.
We run all of the major integration tests on the core major platforms where higher end compute is accessible. This is generally a bigger machine. It is expected that some builds can take up to 2 hours depending on the specs of the desired machine.
This step may also involve invoking tests with specific tags if only running a subset of tests is desired. This can be achived using the surefire plugin -Dgroups flag.
To ensure the examples stay compatible with the current release, we also tag the release version to be the latest version found on maven central. This step may also involve adding or removing examples for new or deprecated features respectivley.
Different supported cuda versions with and without cudnn
Onednn and associated classifiers per platform
Ensure testing happens on the android emulator.
The examples contain a set of tests which just allow us to run maven clean test on a small number of examples. Instead of us picking examples manually, we can just run mvn clean test on any platform we need by just specifying a version of dl4j to depend on and usually a staging repository
Generally, sometimes users will raise issues right before a release that can be critical. It is the sole discretion of the maintainers to ask the user to use snapshots or to wait for a follow on version. For certain fixes, we will publish quick bugfix releases. If your team has specific requirements on a release, please contact us on the community forums
This means after closing a staging repository, hitting the release button initiating a sync of the staging repository with the desired version to maven central. Sync usually takes 2 hours or less.
After a release happens, a version update to the stable version + a github tag needs to happen. This is achived in the desktop app by going to: 1. History 2. Right click on target commit you want to tag 3. Click tag 4. Push the revision 5. Update the version back to snapshot after tag.
How to conduct a release to Maven Central
test.heap.size: The heap size used for maven surefire plugin sub processes
test.offheap.size: The off heap size used for maven surefire sub processes. This is very important for
configuration (especially on gpu systems)
In order to run the deeplearning4j tests, many pretrained models and other resources are required. Ensure dl4j test resources as a dependency on your classpath. It is a big repository that needs to be mvn clean installed in order to run the tests properly. You can do this by adding -Ptestresources to your test execution when running the tests from maven.
When running deeplearning4j's tests, there are 2 main profiles to be aware of: nd4j-tests-cpu and nd4j-tests-cuda. These each enable running cpu or gpu tests respectively across the whole code base. Please ensure one of these is selected when running tests.
testresources: Used to add the test resources used for nd4j.
Deeplearning4j uses' junit 5's tags to categorize tests in to different types. All of the tag names used throughout the code base can be found here Nd4j-common-tests is included as a dependency for all tests and has a few reusable utilities used throughout the code base for tests. This makes it a great location to put common utilities we want to use throughout the code base. The tag names are mainly there to categorize tests that can take longer or use more resources so we can avoid running those dynamically depending on the size of the machine we are running tests on.
Note when running gpu tests on a box with more than 1 gpu, it can/will run out of memory if test.heap.size is at not at least 4g. Also of note, is when running tests
Github actions Configuration Overview
Each has 10 parameters for manually invoking builds. The reason this is manual is due to the different ways a release can break. Being manual also allows us to re invoke only the parts of a build we need, rather than the whole release pipeline.
Most workflows implement a matrix structure for handling different combinations of builds related to the following: 1. Platform specific optimizations: On windows/linux/mac we allow cpu + optional linking against mkldnn. Each combination is enumerated and ran as part of a matrix build on github actions.
Cuda, optional cudnn: We also allow optional linking against cudnn for gpu routines.
buildThreads: This is the number of builds threads used for compilation in linbnd4j. This is the equivalent of make -j. For specific platforms that use more memory, 1 is the recommended value. On self hosted setups, you may use more threads to make builds run faster.
deployToReleaseStaging: 0 or 1. If 1, this will create a staging repository on oss sonatype. Otherwise, it will deploy to ossrh snapshots. Snapshots is the default.
releaseVersion: This is the intended release version to be converted to from snapshots. The script is run converting the versions of every module to that specific version intended for release. This is what will get uploaded to a staging repository for release. Otherwise, all intended versions should be SNAPSHOT.
snapshotVersion: The current in development snapshot version
releaseRepoId: If blank, then a new staging repository for a version is created. Otherwise, a staging repository id should be obtained from the ossrh nexus sonatype. This releaseRepoId should be passed to subsequent builds so all of the artifacts associated with a version get propagated to 1 place.
serverId: This should be ossrh 90% of the time. A github profile is also available for use with github actions.
modules: The maven modules to build. This is fairly raw and error prone. The intended usage is with the Typical usage is to skip libnd4j builds with something like:
to skip a libnd4j compile. This can speed builds up significantly.
libnd4jDownload/libnd4jUrl: In tandem with modules, you can specify a libnd4j zip file distribution that was compiled before for download. The builds will download a libnd4j distribution and use that for linking. This can be handy when recompiling the nd4j-native/nd4j-cuda backends for a specific platform without needing to recompile the whole c++ codebase. A url in a matrix build will be sourced from a hard coded file name from - each file name will be updated to point to a zip file distribution appropriate for an individual matrix build. This was done because 1 url is not going to be suitable for individual matrix builds.
runsOn: This is the operating system upon which to run the build. For linux, this defaults to ubuntu-16.04. For windows, windows-2019. self-hosted can also be specified for faster builds.
Many configurations on cpu and cuda require a matrix based build structure to capture the various combinations of optimization and software versions people may want to use. In order to accomodate these workflows, we need to attach variables proxying the values of the manual inputs to the individual matrix workers themselves. These parameters are analogous to the above described parameters. Note we will not repeat the descriptions here, but the values can be seen from their values in the form of $ where SOME_VALUE is one of the values above.
The configuration to look for is as follows:
CUDA: Most cuda builds take 4-5 hours. Both windows and linux on GH actions just download the cuda distribution and compile things on their respective platforms.
CPU builds: From scratch libnd4j + cpu builds typically take 1-2 hours max. Anything more than that, your build may have something wrong.
Out of disk: It is very common for a github actions VM to run out of disk. If a build fails with no logs after and all steps terminated, this maybe one of the reasons.
Out of memory: Sometimes builds run out of memory. A few common causes include:
Clang out of memory on android, depending on the number of builds threads assigned, it is easy for clang to run out of memory
Maven javadoc: The maven javadoc plugin for bigger projects can use a ton of ram and crash a job
Network failures: Maven can sometimes (rarely) fail to download certain dependencies in the middle of a job
MAVEN_GPG_KEY: The maven gpg key secret for a release
CROSS_COMPILER_DIR: For the pi_build.sh script in libnd4j. This contains the root directory
for cross compiler invocation. We need this because all cross compilation for various libnd4j builds happens
on x86. We cross compile for speed reasons also easily allowing us to run on github actions.
Debian frontend: This is to ensure that all debian commands by default don't prompt for yes/no
GITHUB_TOKEN: This is for authentication with github actions
BUILD_USING_MAVEN: This is for pi_build.sh. This toggles (0 or 1) whether to use maven or buildnativeoperation.sh
in the libnd4j root directory directly.
NDK_VERSION: Default is r21d. Libnd4j's android is compiled with the android r21 currently.
CURRENT_TARGET: This variable is for pi_build.sh. It tells pi_build.sh which architecture to build for.
PUBLISH_TO: The repo to publish to for releases or snapshots. Valid values are github or ossrh.
These are repositories defined in the deeplearning4j root pom.
OPENBLAS_PATH: We compile libnd4j against openblas for several different cpus. Openblas is manually downloaded and linked against.
This specifies the path to the download for the libnd4j cmake invocation.
MAVEN_USERNAME: The user name to login to for the ossrh maven repository
MAVEN_PASSWORD: The password to login to for the ossrh maven repository
MAVEN_GPG_PASSPHRSE: The gpg password for signing artifacts for uploading to maven central
DEPLOY_TO> Valid values are either ossrh or github.
LIBND4J_BUILD_THREADS: This is the equivalent of make -j. It specifies the number of threads
that should be used to compile libnd4j
PERFORM_RELEASE: Whether to perform a release or not (0 or 1)
RELEASE_VERSION: The version to be released to maven central. change-versions.sh will be run
to change versions throughout the code base from the snapshot verison to the intended release version.
SNAPSHOT_VERSION: The current snapshot version to be changed when performing a release.
After a release is conducted, this should generally be the next development version.
RELEASE_REPO_ID: Leave this empty when first creating a release repository in combination with
DEPLOY set to 1. Afterwards, note which staging repository id gets created in the ossrh interface when publishing
to maven central. Use that id for further buidls to ensure that all uploads for 1 version are synchronized to 1 staging repository.
MODULES: Extra maven flags for pi_build.sh if more flags are needed (such as for debugging or only building specific modules)
LIBND4J_URL: Used when building nd4j-native. If a user does not want to recompile libnd4j for their particular build, you can instead
skip this step and specify a libnd4j zip file download (generally built with the maven assembly plugin)