Iterators
Data iteration tools for loading into neural networks.
A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.
For most use cases, initializing an iterator and passing a reference to a
MultiLayerNetwork
or ComputationGraph
fit()
method is all you need to begin a task for training:MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
// pass an MNIST data iterator that automatically fetches data
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
net.fit(mnistTrain);
Many other methods also accept iterators for tasks such as evaluation:
// passing directly to the neural network
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
net.eval(mnistTest);
// using an evaluation class
Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
while(mnistTest.hasNext()){
DataSet next = mnistTest.next();
INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
eval.eval(next.getLabels(), output); //check the prediction against the true class
}
MNIST data set iterator - 60000 training digits, 10000 test digits, 10 classes. Digits have 28x28 pixels and 1 channel (grayscale).
For futher details, see http://yann.lecun.com/exdb/mnist/
UCI synthetic control chart time series dataset. This dataset is useful for classification of univariate time series with six categories:
Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift
UciSequenceDataSetIterator
public UciSequenceDataSetIterator(int batchSize)
Create an iterator for the training set, with the specified minibatch size. Randomized with RNG seed 123
- param batchSize Minibatch size
CifarDataSetIterator is an iterator for CIFAR-10 dataset - 10 classes, with 32x32 images with 3 channels (RGB)
This fetcher uses a cached version of the CIFAR dataset which is converted to PNG images, see: https://pjreddie.com/projects/cifar-10-dataset-mirror/.
Cifar10DataSetIterator
public Cifar10DataSetIterator(int batchSize)
Create an iterator for the training set, with random iteration order (RNG seed fixed to 123)
- param batchSize Minibatch size for the iterator
IrisDataSetIterator: An iterator for the well-known Iris dataset. 4 features, 3 label classes
https://archive.ics.uci.edu/ml/datasets/Iris
IrisDataSetIterator
public IrisDataSetIterator()
next
public DataSet next()
IrisDataSetIterator handles traversing through the Iris Data Set.
- param batch Batch size
- param numExamples Total number of examples
LFW iterator - Labeled Faces from the Wild dataset
See http://vis-www.cs.umass.edu/lfw/
13233 images total, with 5749 classes.
LFWDataSetIterator
public LFWDataSetIterator(int batchSize, int numExamples, int[] imgDim, int numLabels, boolean useSubset,
PathLabelGenerator labelGenerator, boolean train, double splitTrainTest,
ImageTransform imageTransform, Random rng)
Create LFW data specific iterator
- param batchSize the batch size of the examples
- param numExamples the overall number of examples
- param imgDim an array of height, width and channels
- param numLabels the overall number of examples
- param useSubset use a subset of the LFWDataSet
- param labelGenerator path label generator to use
- param train true if use train value
- param splitTrainTest the percentage to split data for train and remainder goes to test
- param imageTransform how to transform the image
- param rng random number to lock in batch shuffling