public class RecordReaderDataSetIterator extends Object implements org.nd4j.linalg.dataset.api.iterator.DataSetIterator
RecordReader as input, and handles the conversion to ND4J
DataSet objects as well as producing minibatches from individual records.RecordReaderDataSetIterator.Builder class is also available.
RecordReader rr = new ImageRecordReader(28,28,3); //28x28 RGB images
rr.initialize(new FileSplit(new File("/path/to/directory")));
DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
// that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler()) //For normalization of image values 0-255 to 0-1
.build()
RecordReader rr = new CsvRecordReader(0, ','); //Skip 0 header lines, comma separated
rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));
DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
| Modifier and Type | Class and Description |
|---|---|
static class |
RecordReaderDataSetIterator.Builder
Builder class for RecordReaderDataSetIterator
|
| Modifier and Type | Field and Description |
|---|---|
protected int |
batchNum |
protected int |
batchSize |
protected org.datavec.api.io.WritableConverter |
converter |
protected int |
labelIndex |
protected int |
labelIndexTo |
protected org.nd4j.linalg.dataset.DataSet |
last |
protected int |
maxNumBatches |
protected int |
numPossibleLabels |
protected org.nd4j.linalg.dataset.api.DataSetPreProcessor |
preProcessor |
protected org.datavec.api.records.reader.RecordReader |
recordReader |
protected boolean |
regression |
protected Iterator<List<org.datavec.api.writable.Writable>> |
sequenceIter |
protected boolean |
useCurrent |
| Modifier | Constructor and Description |
|---|---|
protected |
RecordReaderDataSetIterator(RecordReaderDataSetIterator.Builder b) |
|
RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize)
Constructor for classification, where:
(a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced |
|
RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels)
Main constructor for classification.
|
|
RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
boolean regression)
Main constructor for multi-label regression (i.e., regression with multiple outputs)
|
|
RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels,
int maxNumBatches)
Constructor for classification, where the maximum number of returned batches is limited to the specified value
|
|
RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
org.datavec.api.io.WritableConverter converter,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
int numPossibleLabels,
int maxNumBatches,
boolean regression)
Main constructor
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
asyncSupported() |
int |
batch() |
int |
cursor() |
List<String> |
getLabels() |
boolean |
hasNext() |
int |
inputColumns() |
org.nd4j.linalg.dataset.DataSet |
loadFromMetaData(List<org.datavec.api.records.metadata.RecordMetaData> list)
Load a multiple examples to a DataSet, using the provided RecordMetaData instances.
|
org.nd4j.linalg.dataset.DataSet |
loadFromMetaData(org.datavec.api.records.metadata.RecordMetaData recordMetaData)
Load a single example to a DataSet, using the provided RecordMetaData.
|
org.nd4j.linalg.dataset.DataSet |
next() |
org.nd4j.linalg.dataset.DataSet |
next(int num) |
int |
numExamples() |
void |
remove() |
void |
reset() |
boolean |
resetSupported() |
void |
setCollectMetaData(boolean collectMetaData)
When set to true: metadata for the current examples will be present in the returned DataSet.
|
void |
setPreProcessor(org.nd4j.linalg.dataset.api.DataSetPreProcessor preProcessor) |
int |
totalExamples() |
int |
totalOutcomes() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetPreProcessorforEachRemainingprotected org.datavec.api.records.reader.RecordReader recordReader
protected org.datavec.api.io.WritableConverter converter
protected int batchSize
protected int maxNumBatches
protected int batchNum
protected int labelIndex
protected int labelIndexTo
protected int numPossibleLabels
protected org.nd4j.linalg.dataset.DataSet last
protected boolean useCurrent
protected boolean regression
protected org.nd4j.linalg.dataset.api.DataSetPreProcessor preProcessor
public RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize)
recordReader - Record reader to use as the source of databatchSize - Minibatch size, for each call of .next()public RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels)
recordReader - RecordReader: provides the source of the databatchSize - Batch size (number of examples) for the output DataSet objectslabelIndex - Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()numPossibleLabels - Number of classes (possible labels) for classificationpublic RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels,
int maxNumBatches)
recordReader - the recordreader to uselabelIndex - the index/column of the label (for classification)numPossibleLabels - the number of possible labels for classification. Not used if regression == truemaxNumBatches - The maximum number of batches to return between resets. Set to -1 to return all available datapublic RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
boolean regression)
recordReader - RecordReader to get data fromlabelIndexFrom - Index of the first regression targetlabelIndexTo - Index of the last regression target, inclusivebatchSize - Minibatch sizeregression - Require regression = true. Mainly included to avoid clashing with other constructors previously defined :/public RecordReaderDataSetIterator(org.datavec.api.records.reader.RecordReader recordReader,
org.datavec.api.io.WritableConverter converter,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
int numPossibleLabels,
int maxNumBatches,
boolean regression)
recordReader - the recordreader to useconverter - Converter. May be null.batchSize - Minibatch size - number of examples returned for each call of .next()labelIndexFrom - the index of the label (for classification), or the first index of the labels for multi-output regressionlabelIndexTo - only used if regression == true. The last index inclusive of the multi-output regressionnumPossibleLabels - the number of possible labels for classification. Not used if regression == truemaxNumBatches - Maximum number of batches to returnregression - if true: regression. If false: classification (assume labelIndexFrom is the class it belongs to)protected RecordReaderDataSetIterator(RecordReaderDataSetIterator.Builder b)
public void setCollectMetaData(boolean collectMetaData)
collectMetaData - Whether to collect metadata or notpublic org.nd4j.linalg.dataset.DataSet next(int num)
next in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int totalExamples()
totalExamples in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int inputColumns()
inputColumns in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int totalOutcomes()
totalOutcomes in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic boolean resetSupported()
resetSupported in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic boolean asyncSupported()
asyncSupported in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic void reset()
reset in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int batch()
batch in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int cursor()
cursor in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic int numExamples()
numExamples in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic void setPreProcessor(org.nd4j.linalg.dataset.api.DataSetPreProcessor preProcessor)
setPreProcessor in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic boolean hasNext()
public org.nd4j.linalg.dataset.DataSet next()
public void remove()
public List<String> getLabels()
getLabels in interface org.nd4j.linalg.dataset.api.iterator.DataSetIteratorpublic org.nd4j.linalg.dataset.DataSet loadFromMetaData(org.datavec.api.records.metadata.RecordMetaData recordMetaData)
throws IOException
loadFromMetaData(List)recordMetaData - RecordMetaData to load from. Should have been produced by the given record readerIOException - If an error occurs during loading of the datapublic org.nd4j.linalg.dataset.DataSet loadFromMetaData(List<org.datavec.api.records.metadata.RecordMetaData> list) throws IOException
list - List of RecordMetaData instances to load from. Should have been produced by the record reader provided
to the RecordReaderDataSetIterator constructorIOException - If an error occurs during loading of the dataCopyright © 2018. All rights reserved.