Analysis
Gather statistics on datasets.
Analysis of data
Using Spark for analysis
import org.datavec.spark.transform.AnalyzeSpark;
import org.datavec.api.writable.Writable;
import org.datavec.api.transform.analysis.*;
int maxHistogramBuckets = 10
DataAnalysis analysis = AnalyzeSpark.analyze(mySchema, javaRdd, maxHistogramBuckets)
DataQualityAnalysis analysis = AnalyzeSpark.analyzeQuality(mySchema, javaRdd)
Writable max = AnalyzeSpark.max(javaRdd, "myColumn", mySchema)
int numSamples = 5
List<Writable> sample = AnalyzeSpark.sampleFromColumn(numSamples, "myColumn", mySchema, javaRdd)Analyzing locally
Utilities
AnalyzeLocal
analyze
analyzeQualitySequence
analyzeQuality
AnalyzeSpark
analyzeSequence
analyze
analyzeQualitySequence
analyzeQuality
min
max
Last updated
Was this helpful?