Github actions Configuration Overview
Each github actions workflow has 10 parameters for manually invoking builds. The reason this is manual is due to the different ways a release can break. Being manual also allows us to re invoke only the parts of a build we need, rather than the whole release pipeline.
Most workflows implement a matrix structure for handling different combinations of builds related to the following: 1. Platform specific optimizations: On windows/linux/mac we allow cpu + optional linking against mkldnn. Each combination is enumerated and ran as part of a matrix build on github actions.
Cuda, optional cudnn: We also allow optional linking against cudnn for gpu routines.
buildThreads: This is the number of builds threads used for compilation in linbnd4j. This is the equivalent of make -j. For specific platforms that use more memory, 1 is the recommended value. On self hosted setups, you may use more threads to make builds run faster.
deployToReleaseStaging: 0 or 1. If 1, this will create a staging repository on oss sonatype. Otherwise, it will deploy to ossrh snapshots. Snapshots is the default.
releaseVersion: This is the intended release version to be converted to from snapshots. The update-versions.sh script is run converting the versions of every module to that specific version intended for release. This is what will get uploaded to a staging repository for release. Otherwise, all intended versions should be SNAPSHOT.
snapshotVersion: The current in development snapshot version
releaseRepoId: If blank, then a new staging repository for a version is created. Otherwise, a staging repository id should be obtained from the ossrh nexus sonatype. This releaseRepoId should be passed to subsequent builds so all of the artifacts associated with a version get propagated to 1 place.
serverId: This should be ossrh 90% of the time. A github profile is also available for use with github actions.
modules: The maven modules to build. This is fairly raw and error prone. The intended usage is with the -pl/--projects flag Typical usage is to skip libnd4j builds with something like:
to skip a libnd4j compile. This can speed builds up significantly.
libnd4jDownload/libnd4jUrl: In tandem with modules, you can specify a libnd4j zip file distribution that was compiled before for download. The builds will download a libnd4j distribution and use that for linking. This can be handy when recompiling the nd4j-native/nd4j-cuda backends for a specific platform without needing to recompile the whole c++ codebase. A url in a matrix build will be sourced from a hard coded file name from this repo - each file name will be updated to point to a zip file distribution appropriate for an individual matrix build. This was done because 1 url is not going to be suitable for individual matrix builds.
runsOn: This is the operating system upon which to run the build. For linux, this defaults to ubuntu-16.04. For windows, windows-2019. self-hosted can also be specified for faster builds.
Many configurations on cpu and cuda require a matrix based build structure to capture the various combinations of optimization and software versions people may want to use. In order to accomodate these workflows, we need to attach variables proxying the values of the manual inputs to the individual matrix workers themselves. These parameters are analogous to the above described parameters. Note we will not repeat the descriptions here, but the values can be seen from their values in the form of $ where SOME_VALUE is one of the values above.
The configuration to look for is as follows:
CUDA: Most cuda builds take 4-5 hours. Both windows and linux on GH actions just download the cuda distribution and compile things on their respective platforms.
CPU builds: From scratch libnd4j + cpu builds typically take 1-2 hours max. Anything more than that, your build may have something wrong.
Out of disk: It is very common for a github actions VM to run out of disk. If a build fails with no logs after and all steps terminated, this maybe one of the reasons.
Out of memory: Sometimes builds run out of memory. A few common causes include:
Clang out of memory on android, depending on the number of builds threads assigned, it is easy for clang to run out of memory
Maven javadoc: The maven javadoc plugin for bigger projects can use a ton of ram and crash a job
Network failures: Maven can sometimes (rarely) fail to download certain dependencies in the middle of a job
MAVEN_GPG_KEY: The maven gpg key secret for a release
CROSS_COMPILER_DIR: For the pi_build.sh script in libnd4j. This contains the root directory
for cross compiler invocation. We need this because all cross compilation for various libnd4j builds happens
on x86. We cross compile for speed reasons also easily allowing us to run on github actions.
Debian frontend: This is to ensure that all debian commands by default don't prompt for yes/no
GITHUB_TOKEN: This is for authentication with github actions
BUILD_USING_MAVEN: This is for pi_build.sh. This toggles (0 or 1) whether to use maven or buildnativeoperation.sh
in the libnd4j root directory directly.
NDK_VERSION: Default is r21d. Libnd4j's android is compiled with the android r21 currently.
CURRENT_TARGET: This variable is for pi_build.sh. It tells pi_build.sh which architecture to build for.
PUBLISH_TO: The repo to publish to for releases or snapshots. Valid values are github or ossrh.
These are repositories defined in the deeplearning4j root pom.
OPENBLAS_PATH: We compile libnd4j against openblas for several different cpus. Openblas is manually downloaded and linked against.
This specifies the path to the download for the libnd4j cmake invocation.
MAVEN_USERNAME: The user name to login to for the ossrh maven repository
MAVEN_PASSWORD: The password to login to for the ossrh maven repository
MAVEN_GPG_PASSPHRSE: The gpg password for signing artifacts for uploading to maven central
DEPLOY_TO> Valid values are either ossrh or github.
LIBND4J_BUILD_THREADS: This is the equivalent of make -j. It specifies the number of threads
that should be used to compile libnd4j
PERFORM_RELEASE: Whether to perform a release or not (0 or 1)
RELEASE_VERSION: The version to be released to maven central. change-versions.sh will be run
to change versions throughout the code base from the snapshot verison to the intended release version.
SNAPSHOT_VERSION: The current snapshot version to be changed when performing a release.
After a release is conducted, this should generally be the next development version.
RELEASE_REPO_ID: Leave this empty when first creating a release repository in combination with
DEPLOY set to 1. Afterwards, note which staging repository id gets created in the ossrh interface when publishing
to maven central. Use that id for further buidls to ensure that all uploads for 1 version are synchronized to 1 staging repository.
MODULES: Extra maven flags for pi_build.sh if more flags are needed (such as for debugging or only building specific modules)
LIBND4J_URL: Used when building nd4j-native. If a user does not want to recompile libnd4j for their particular build, you can instead
skip this step and specify a libnd4j zip file download (generally built with the maven assembly plugin)