How to contribute to the Eclipse Deeplearning4j source code.
Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j monorepo. These include:
DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.
ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.
DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.
RL4J: Reinforcement Learning for Java. This set of libraries contains the ability to do reinforcement learning built on the deeplearning4j library.
Samediff: Built within the nd4j library, this library contains a tensorflow/pytorch like library for building data flow graphs.
We also have an extensive examples repository at dl4j-examples.
There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:
Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)
Add a new training feature
Bug fixes
DL4J examples: Is there an application or network architecture that we don’t have examples for?
Testing performance and identifying bottlenecks or areas to improve
Improve website documentation (or write tutorials, etc)
Improve the JavaDocs
There are a number of different ways to find things to work on. These include:
Looking at the issue trackers:
Reviewing our Roadmap
Talking to the developers on the community forums
Reviewing recent papers and blog posts on training features, network architectures and applications
Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?
Before you dive in, there’s a few things you need to know. In particular, the tools we use:
Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.
Git: the version control system we use
Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE
VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.
IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.
Things to keep in mind:
Code should be Java 7 compliant
If you are adding a new method or class: add JavaDocs
You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)
Provide informative comments throughout your code. This helps to keep all code maintainable.
Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.
If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct
If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.
If you are unsure about something - ask us on the community forums!
IP/Copyright requirements for Eclipse Foundation Projects
This page explains steps required to contribute code to the projects in the eclipse/deeplearning4j GitHub repository: https://github.com/eclipse/deeplearning4j
Contributors (anyone who wants to commit code to the repository) need to do two things, before their code can be merged:
Sign the Eclipse Contributor Agreement (once)
Sign commits (each time)
These two requirements must be satisfied for all Eclipse Foundation projects, not just DL4J and ND4J. A full list of Eclipse Foundation Projects can be found here: https://projects.eclipse.org/
By signing the ECA, you are essentially asserting that the code you are submitting is something that either you wrote, or that you have the right to contribute to the project. This is a necessary legal protection to avoid copyright issues.
By signing your commits, you are asserting that the code in that particular commit is your own.
You only need to sign the Eclipse Contributor Agreement (ECA) once. Here's the process:
Step 1: Sign up for an Eclipse account
This can be done at https://accounts.eclipse.org/user/register
Note: You must register using the same email as your GitHub account (the GitHub account you want to submit pull requests from).
Step 2: Sign the ECA
Go to https://accounts.eclipse.org/user/eca and follow the instructions.
There are a few ways to sign commits. Note that you can use any of these aoptions.
Option 1: Use -s
When Committing on Command Line
Signing commits here is simple:
Note the use of -s
(lower case s) - upper-case S (i.e., -S
) is for GPG signing (see below).
Option 2: Set up Bash Alias (or Windows cmd Alias) for Automated Signing
For example, you could set up the following alias in Bash:
Then committing would be done with the following:
For Windows command line, similar options are available through a few mechanisms (see here)
One simple way is to create a gcm.bat
file with the following contents, and add it to your system path:
You can then commit using the same process as above (i.e., gcm "My Commit"
)
Option 3: Use GPG Signing
For details on GPG signing, see this link
Note that this option can be combined with aliases (above), as in alias gcm='git commit -S -m'
- note the upper case -S
for GPG signing.
Option 4: Commit using IntelliJ with Auto Signing
IntelliJ can be used to perform git commits, including through signed commits. See this page for details.
After performing a commit, you can check in a few different ways. One way is to use git log --show-signature -1
to show the signature for the last commit (use -5 to show the last 5 commits, for example)
The output will look like:
The top commit is unsigned, and the bottom commit is signed (note the presence of the Signed-off-by
).
If you forgot to sign the last commit, you can use the following command:
Suppose your branch has 3 new commits, all of which are unsigned:
One simple way is to squash and sign these commits. To do this for the last 3 commits, use the following: (note you might want to make a backup first)
The result:
You can confirm that the commit is signed using git log -1 --show-signature
as shown earlier.
Note that your commits will be squashed once they are merged to master anyway, so the loss of the commit history does not matter.
If you are updating an existing PR, you may need to force push using -f
(as in git push X -f
).