分析
收集数据集的统计信息。
数据分析
用spark进行数据分析
import org.datavec.spark.transform.AnalyzeSpark;
import org.datavec.api.writable.Writable;
import org.datavec.api.transform.analysis.*;
int maxHistogramBuckets = 10
DataAnalysis analysis = AnalyzeSpark.analyze(mySchema, javaRdd, maxHistogramBuckets)
DataQualityAnalysis analysis = AnalyzeSpark.analyzeQuality(mySchema, javaRdd)
Writable max = AnalyzeSpark.max(javaRdd, "myColumn", mySchema)
int numSamples = 5
List<Writable> sample = AnalyzeSpark.sampleFromColumn(numSamples, "myColumn", mySchema, javaRdd)本地分析
实用工具
Last updated
Was this helpful?