AnalyzeSpark
class which can generate histograms, collect statistics, and return information about the quality of the data. Assuming you have already loaded your data into a Spark RDD, pass the JavaRDD
and Schema
to the class.RDD
class, you can convert it by calling .toJavaRDD()
which returns a JavaRDD
. If you need to convert it back, call rdd()
.javaRdd
and the schema mySchema
:AnalyzeLocal
class works very similarly to its Spark counterpart and has a similar API. Instead of passing an RDD, it accepts a RecordReader
which allows it to iterate over the dataset.