Overview
Comprehensive programming guide for ND4J. This user guide is designed to explain (and provide examples for) the main functionality in ND4J.
An NDArray is in essence n-dimensional array: i.e., a rectangular array of numbers, with some number of dimensions.
Some concepts you should be familiar with:
- The rank of a NDArray is the number of dimensions. 2d NDArrays have a rank of 2, 3d arrays have a rank of 3, and so on. You can create NDArrays with any arbitrary rank.
- The shape of an NDArray defines the size of each of the dimensions. Suppose we have a 2d array with 3 rows and 5 columns. This NDArray would have shape
[3,5]
- The length of an NDArray defines the total number of elements in the array. The length is always equal to the product of the values that make up the shape.
- The stride of an NDArray is defined as the separation (in the underlying data buffer) of contiguous elements in each dimension. Stride is defined per dimension, so a rank N NDArray has N stride values, one for each dimension. Note that most of the time, you don't need to know (or concern yourself with) the stride - just be aware that this is how ND4J operates internally. The next section has an example of strides.
- The data type of an NDArray refers to the type of data of an NDArray (for example, float or double precision). Note that this is set globally in ND4J, so all NDArrays should have the same data type. Setting the data type is discussed later in this document.
In terms of indexing there are a few things to know. First, rows are dimension 0, and columns are dimension 1: thus
INDArray.size(0)
is the number of rows, and INDArray.size(1)
is the number of columns. Like normal arrays in most programming languages, indexing is zero-based: thus rows have indexes 0
to INDArray.size(0)-1
, and so on for the other dimensions.Throughout this document, we'll use the term
NDArray
to refer to the general concept of an n-dimensional array; the term INDArray
refers specifically to the Java interface that ND4J defines. In practice, these two terms can be used interchangeably.The next few paragraphs describe some of architecture behind ND4J. Understanding this is not strictly necessary in order to use ND4J, but it may help you to understand what is going on behind the scenes. NDArrays are stored in memory as a single flat array of numbers (or more generally, as a single contiguous block of memory), and hence differs a lot from typical Java multidimensional arrays such as a
float[][]
or double[][][]
.Physically, the data that backs an INDArray is stored off-heap: that is, it is stored outside of the Java Virtual Machine (JVM). This has numerous benefits, including performance, interoperability with high-performance BLAS libraries, and the ability to avoid some shortcomings of the JVM in high-performance computing (such as issues with Java arrays being limited to 2^31 -1 (2.14 billion) elements due to integer indexing).
In terms of encoding, an NDArray can be encoded in either C (row-major) or Fortran (column-major) order. For more details on row vs. column major order, see Wikipedia. Nd4J may use a combination of C and F order arrays together, at the same time. Most users can just use the default array ordering, but note that it is possible to use a specific ordering for a given array, should the need arise.
The following image shows how a simple 3x3 (2d) NDArray is stored in memory,

In the above array, we have:
Shape = [3,3]
(3 rows, 3 columns)Rank = 2
(2 dimensions)Length = 9
(3x3=9)- Stride
- C order stride:
[3,1]
: the values in consecutive rows are separated in the buffer by 3, and the values consecutive columns are separated in the buffer by 1 - F order stride:
[1,3]
: the values in consecutive rows are separated in the buffer by 1, and the values in consecutive columns are separated in the buffer by 3
A key concept in ND4J is the fact that two NDArrays can actually point to the same underlying data in memory. Usually, we have one NDArray referring to some subset of another array, and this only occurs for certain operations (such as
INDArray.get()
, INDArray.transpose()
, INDArray.getRow()
etc. This is a powerful concept, and one that is worth understanding.There are two primary motivations for this:
- 1.There are considerable performance benefits, most notably in avoiding copying arrays
- 2.We gain a lot of power in terms of how we can perform operations on our NDArrays
Consider a simple operation like a matrix transpose on a large (10,000 x 10,000) matrix. Using views, we can perform this matrix transpose in constant time without performing any copies (i.e., O(1) in big O notation), avoiding the considerable cost copying all of the array elements. Of course, sometimes we do want to make a copy - at which point we can use the
INDArray.dup()
to get a copy. For example, to get a copy of a transposed matrix, use INDArray out = myMatrix.transpose().dup()
. After this dup()
call, there will be no link between the original array myMatrix
and the array out
(thus, changes to one will not impact the other).So see how views can be powerful, consider a simple task: adding 1.0 to the first row of a larger array,
myArray
. We can do this easily, in one line:myArray.getRow(0).addi(1.0)
Let's break down what is happening here. First, the
getRow(0)
operation returns an INDArray that is a view of the original. Note that both myArrays
and myArray.getRow(0)
point to the same area in memory:
then, after the addi(1.0) is performed, we have the following situation:

As we can see, changes to the NDArray returned by
myArray.getRow(0)
will be reflected in the original array myArray
; similarly, changes to myArray
will be reflected in the row vector.Two of the most commonly used methods of creating arrays are:
Nd4j.zeros(int...)
Nd4j.ones(int...)
The shape of the arrays are specified as integers. For example, to create a zero-filled array with 3 rows and 5 columns, use
Nd4j.zeros(3,5)
.These can often be combined with other operations to create arrays with other values. For example, to create an array filled with 10s:
INDArray tens = Nd4j.zeros(3,5).addi(10)
The above initialization works in two steps: first by allocating a 3x5 array filled with zeros, and then by adding 10 to each value.
Nd4j provides a few methods to generate INDArrays, where the contents are pseudo-random numbers.
To generate uniform random numbers in the range 0 to 1, use
Nd4j.rand(int nRows, int nCols)
(for 2d arrays), or Nd4j.rand(int[])
(for 3 or more dimensions).Similarly, to generate Gaussian random numbers with mean zero and standard deviation 1, use
Nd4j.randn(int nRows, int nCols)
or Nd4j.randn(int[])
.For repeatability (i.e., to set Nd4j's random number generator seed) you can use
Nd4j.getRandom().setSeed(long)
Nd4j provides convenience methods for the creation of arrays from Java float and double arrays.
To create a 1d NDArray from a 1d Java array, use:
- Row vector:
Nd4j.create(float[])
orNd4j.create(double[])
- Column vector:
Nd4j.create(float[],new int[]{length,1})
orNd4j.create(double[],new int[]{length,1})
For 2d arrays, use
Nd4j.create(float[][])
or Nd4j.create(double[][])
.For creating NDArrays from Java primitive arrays with 3 or more dimensions (
double[][][]
etc), one approach is to use the following:double[] flat = ArrayUtil.flattenDoubleArray(myDoubleArray);
int[] shape = ...; //Array shape here
INDArray myArr = Nd4j.create(flat,shape,'c');
There are three primary ways of creating arrays from other arrays:
- Creating an exact copy of an existing NDArray using
INDArray.dup()
- Create the array as a subset of an existing NDArray
- Combine a number of existing NDArrays to create a new NDArray
For the second case, you can use getRow(), get(), etc. See Getting and Setting Parts of NDArrays for details on this.
Two methods for combining NDArrays are
Nd4j.hstack(INDArray...)
and Nd4j.vstack(INDArray...)
.hstack
(horizontal stack) takes as argument a number of matrices that have the same number of rows, and stacks them horizontally to produce a new array. The input NDArrays can have a different number of columns, however.Example:
int nRows = 2;
int nColumns = 2;
// Create INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
//hstack
INDArray hstack = Nd4j.hstack(ones,zeros);
System.out.println("### HSTACK ####");
System.out.println(hstack);
Output:
### HSTACK ####
[[1.00, 1.00, 0.00, 0.00],
[1.00, 1.00, 0.00, 0.00]]
vstack
(vertical stack) is the vertical equivalent of hstack. The input arrays must have the same number of columns.Example:
int nRows = 2;
int nColumns = 2;
// Create INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
//vstack
INDArray vstack = Nd4j.vstack(ones,zeros);
System.out.println("### VSTACK ####");
System.out.println(vstack);
Output:
### VSTACK ####
[[1.00, 1.00],
[1.00, 1.00],
[0.00, 0.00],
[0.00, 0.00]]
ND4J.concat
combines arrays along a dimension.Example:
int nRows = 2;
int nColumns = 2;
//INDArray of zeros
INDArray zeros = Nd4j.zeros(nRows, nColumns);
// Create one of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
// Concat on dimension 0
INDArray combined = Nd4j.concat(0,zeros,ones);
System.out.println("### COMBINED dimension 0####");
System.out.println(combined);
//Concat on dimension 1
INDArray combined2 = Nd4j.concat(1,zeros,ones);
System.out.println("### COMBINED dimension 1 ####");
System.out.println(combined2);
Output:
### COMBINED dimension 0####
[[0.00, 0.00],
[0.00, 0.00],
[1.00, 1.00],
[1.00, 1.00]]
### COMBINED dimension 1 ####
[[0.00, 0.00, 1.00, 1.00],
[0.00, 0.00, 1.00, 1.00]]
ND4J.pad
is used to pad an array.Example:
int nRows = 2;
int nColumns = 2;
// Create INDArray of all ones
INDArray ones = Nd4j.ones(nRows, nColumns);
// pad the INDArray
INDArray padded = Nd4j.pad(ones, new int[]{1,1}, Nd4j.PadMode.CONSTANT );
System.out.println("### Padded ####");
System.out.println(padded);
Output:
### Padded ####
[[0.00, 0.00, 0.00, 0.00],
[0.00, 1.00, 1.00, 0.00],
[0.00, 1.00, 1.00, 0.00],
[0.00, 0.00, 0.00, 0.00]]
One other method that can occasionally be useful is
Nd4j.diag(INDArray in)
. This method has two uses, depending on the argument in
:- If
in
in a vector, diag outputs a NxN matrix with the diagonal equal to the arrayin
(where N is the length ofin
) - If
in
is a NxN matrix, diag outputs a vector taken from the diagonal ofin
To create a row vector with elements
[a, a+1, a+2, ..., b]
you can use the linspace command:Nd4j.linspace(a, b, b-a+1)
Linspace can be combined with a reshape operation to get other shapes. For example, if you want a 2d NDArray with 5 rows and 5 columns, with values 1 to 25 inclusive, you can use the following:
Nd4j.linspace(1,25,25).reshape(5,5)
For an INDArray, you can get or set values using the indexes of the element you want to get or set. For a rank N array (i.e., an array with N dimensions) you need N indices.
Note: getting or setting values individually (for example, one at a time in a for loop) is generally a bad idea in terms of performance. When possible, try to use other INDArray methods that operate on a large number of elements at a time.
To get values from a 2d array, you can use:
INDArray.getDouble(int row, int column)
For arrays of any dimensionality, you can use
INDArray.getDouble(int...)
. For example, to get the value at index i,j,k
use INDArray.getDouble(i,j,k)
To set values, use one of the putScalar methods:
INDArray.putScalar(int[],double)
INDArray.putScalar(int[],float)
INDArray.putScalar(int[],int)
Here, the
int[]
is the index, and the double/float/int
is the value to be placed at that index.Some additional functionality that might be useful in certain circumstances is the
NDIndexIterator
class. The NDIndexIterator allows you to get the indexes in a defined order (specifially, the C-order traversal order: [0,0,0], [0,0,1], [0,0,2], ..., [0,1,0], ... etc for a rank 3 array).To iterate over the values in a 2d array, you can use:
NdIndexIterator iter = new NdIndexIterator(nRows, nCols);
while (iter.hasNext()) {
int[] nextIndex = iter.next();
double nextVal = myArray.getDouble(nextIndex);
//do something with the value
}
In order to get a single row from an INDArray, you can use
INDArray.getRow(int)
. This will obviously return a row vector. Of note here is that this row is a view: changes to the returned row will impact the original array. This can be quite useful at times (for example: myArr.getRow(3).addi(1.0)
to add 1.0 to the third row of a larger array); if you want a copy of a row, use getRow(int).dup()
.Simiarly, to get multiple rows, use
INDArray.getRows(int...)
. This returns an array with the rows stacked; note however that this will be a copy (not a view) of the original rows, a view is not possible here due to the way NDArrays are stored in memory.For setting a single row, you can use
myArray.putRow(int rowIdx,INDArray row)
. This will set the rowIdx
th row of myArray
to the values contained in the INDArray row
.