因为数据集通常是比较大的,所以你可以决定最适合你需要的执行机制。例如,如果你正在对大型训练数据集进行向量化,则可以在分布式spark集群中处理它。但是,如果需要进行实时推理,数据向量还提供不需要任何附加设置的本地执行器。
import org.datavec.local.transforms.LocalTransformExecutor;
List<List<Writable>> transformed = LocalTransformExecutor.execute(recordReader, transformProcess)
List<List<List<Writable>>> transformedSeq = LocalTransformExecutor.executeToSequence(sequenceReader, transformProcess)
List<List<Writable>> joined = LocalTransformExecutor.executeJoin(join, leftReader, rightReader)
import org.datavec.spark.transforms.SparkTransformExecutor;
JavaRDD<List<Writable>> transformed = SparkTransformExecutor.execute(inputRdd, transformProcess)
JavaRDD<List<List<Writable>>> transformedSeq = SparkTransformExecutor.executeToSequence(inputSequenceRdd, transformProcess)
JavaRDD<List<Writable>> joined = SparkTransformExecutor.executeJoin(join, leftRdd, rightRdd)