Operations
Implementations for advanced transformation.
Usage
Operations, such as a Function
, help execute transforms and load data into DataVec. The concept of operations is low-level, meaning that most of the time you will not need to worry about them.
Loading data into Spark
If you're using Apache Spark, functions will iterate over the dataset and load it into a Spark RDD
and convert the raw data format into a Writable
.
The above code loads a CSV file into a 2D java RDD. Once your RDD is loaded, you can transform it, perform joins and use reducers to wrangle the data any way you want.
Available ops
AggregableCheckingOp
Created by huitseeker on 5/8/17.
AggregableMultiOp
It is used to execute many reduction operations in parallel on the same column, datavec#238
Created by huitseeker on 5/8/17.
ByteWritableOp
supports a conversion to Byte.
Created by huitseeker on 5/14/17.
DispatchOp
Created by huitseeker on 5/14/17.
DispatchWithConditionOp
before dispatching the appropriate column of this element to its operation.
Created by huitseeker on 5/14/17.
DoubleWritableOp
supports a conversion to Double.
Created by huitseeker on 5/14/17.
FloatWritableOp
supports a conversion to Float.
Created by huitseeker on 5/14/17.
IntWritableOp
supports a conversion to Integer.
Created by huitseeker on 5/14/17.
LongWritableOp
supports a conversion to Long.
Created by huitseeker on 5/14/17.
StringWritableOp
supports a conversion to TextWritable. Created by huitseeker on 5/14/17.
CalculateSortedRank
CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical “score” column, and we want to know for the rank (sort order) for each example, according to that column. The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize - 1.
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data. Furthermore, the current implementation can only sort on one column
transform
param newColumnName Name of the new column (will contain the rank for each example)
param sortOnColumn Name of the column to sort on
param comparator Comparator used to sort examples
outputColumnName
The output column name after the operation has been applied
return the output column name
columnName
The output column names This will often be the same as the input
return the output column names
Last updated