EN 1.0.0-M1.1

Multi-Project

Deeplearning4j

datavec

Powered By GitBook

Transforms

Data wrangling and mapping from one schema to another.

Data wrangling

One of the key tools in DataVec is transformations. DataVec helps the user map a dataset from one schema to another, and provides a list of operations to convert types, format data, and convert a 2D dataset to sequence data.

Building a transform process

A transform process requires a

`Schema`

to successfully transform data. Both schema and transform process classes come with a helper `Builder`

class which are useful for organizing code and avoiding complex constructors.When both are combined together they look like the sample code below. Note how

`inputDataSchema`

is passed into the `Builder`

constructor. Your transform process will fail to compile without it.1

import org.datavec.api.transform.TransformProcess;

2

3

TransformProcess tp = new TransformProcess.Builder(inputDataSchema)

4

.removeColumns("CustomerID","MerchantID")

5

.filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))

6

.conditionalReplaceValueTransform(

7

"TransactionAmountUSD", //Column to operate on

8

new DoubleWritable(0.0), //New value to use, when the condition is satisfied

9

new DoubleColumnCondition("TransactionAmountUSD",ConditionOp.LessThan, 0.0)) //Condition: amount < 0.0

10

.stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)

11

.renameColumn("DateTimeString", "DateTime")

12

.transform(new DeriveColumnsFromTimeTransform.Builder("DateTime").addIntegerDerivedColumn("HourOfDay", DateTimeFieldType.hourOfDay()).build())

13

.removeColumns("DateTime")

14

.build();

Copied!

Executing a transformation

Different "backends" for executors are available. Using the

`tp`

transform process above, here's how you can execute it locally using plain DataVec.1

import org.datavec.local.transforms.LocalTransformExecutor;

2

3

List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);

Copied!

Debugging

Each operation in a transform process represents a "step" in schema changes. Sometimes, the resulting transformation is not the intended result. You can debug this by printing each step in the transform

`tp`

with the following:1

//Now, print the schema after each time step:

2

int numActions = tp.getActionList().size();

3

4

for(int i=0; i<numActions; i++ ){

5

System.out.println("\n\n==================================================");

6

System.out.println("-- Schema after step " + i + " (" + tp.getActionList().get(i) + ") --");

7

8

System.out.println(tp.getSchemaAfterStep(i));

9

}

Copied!

Available transformations and conversions

TransformProcess

A TransformProcess defines an ordered list of transformations to be executed on some data

1

public Schema getFinalSchema()

Copied!

Get the action list that this transform process will execute

return

1

public Schema getSchemaAfterStep(int step)

Copied!

Return the schema after executing all steps up to and including the specified step. Steps are indexed from 0: so getSchemaAfterStep(0) is after one transform has been executed.

param step Index of the step

return Schema of the data, after that (and all prior) steps have been executed

1

public String toJson()

Copied!

Execute the full sequence of transformations for a single example. May return null if example is filtered **NOTE:** Some TransformProcess operations cannot be done on examples individually. Most notably, ConvertToSequence and ConvertFromSequence operations require the full data set to be processed at once

param input

return

1

public String toYaml()

Copied!

Convert the TransformProcess to a YAML string

return TransformProcess, as YAML

1

public static TransformProcess fromJson(String json)

Copied!

Deserialize a JSON String (created by {- link #toJson()}) to a TransformProcess

return TransformProcess, from JSON

1

public static TransformProcess fromYaml(String yaml)

Copied!

Deserialize a JSON String (created by {- link #toJson()}) to a TransformProcess

return TransformProcess, from JSON

1

public Builder transform(Transform transform)

Copied!

Infer the categories for the given record reader for a particular column Note that each “column index” is a column in the context of: List record = ...; record.get(columnIndex);

Note that anything passed in as a column will be automatically converted to a string for categorical purposes.

The expected input is strings or numbers (which have sensible toString() representations)

Note that the returned categories will be sorted alphabetically

param recordReader the record reader to iterate through

param columnIndex te column index to get categories for

return

1

public Builder filter(Filter filter)

Copied!

Add a filter operation to be executed after the previously-added operations have been executed

param filter Filter operation to execute

1

public Builder filter(Condition condition)

Copied!

Add a filter operation, based on the specified condition.

If condition is satisfied (returns true): remove the example or sequence
If condition is not satisfied (returns false): keep the example or sequence

param condition Condition to filter on

1

public Builder removeColumns(String... columnNames)

Copied!

Remove all of the specified columns, by name

param columnNames Names of the columns to remove

1

public Builder removeColumns(Collection<String> columnNames)

Copied!

Remove all of the specified columns, by name

param columnNames Names of the columns to remove

1

public Builder removeAllColumnsExceptFor(String... columnNames)

Copied!

Remove all columns, except for those that are specified here

param columnNames Names of the columns to keep

1

public Builder removeAllColumnsExceptFor(Collection<String> columnNames)

Copied!

Remove all columns, except for those that are specified here

param columnNames Names of the columns to keep

1

public Builder renameColumn(String oldName, String newName)

Copied!

Rename a single column

param oldName Original column name

param newName New column name

1

public Builder renameColumns(List<String> oldNames, List<String> newNames)

Copied!

Rename multiple columns

param oldNames List of original column names

param newNames List of new column names

1

public Builder reorderColumns(String... newOrder)

Copied!

Reorder the columns using a partial or complete new ordering. If only some of the column names are specified for the new order, the remaining columns will be placed at the end, according to their current relative ordering

param newOrder Names of the columns, in the order they will appear in the output

1

public Builder duplicateColumn(String column, String newName)

Copied!

Duplicate a single column

param column Name of the column to duplicate

param newName Name of the new (duplicate) column

1

public Builder duplicateColumns(List<String> columnNames, List<String> newNames)

Copied!

Duplicate a set of columns

param columnNames Names of the columns to duplicate

param newNames Names of the new (duplicated) columns

1

public Builder integerMathOp(String column, MathOp mathOp, int scalar)

Copied!

Perform a mathematical operation (add, subtract, scalar max etc) on the specified integer column, with a scalar

param column The integer column to perform the operation on

param mathOp The mathematical operation

param scalar The scalar value to use in the mathematical operation

1

public Builder integerColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)

Copied!

Calculate and add a new integer column by performing a mathematical operation on a number of existing columns. New column is added to the end.

param newColumnName Name of the new/derived column

param mathOp Mathematical operation to execute on the columns

param columnNames Names of the columns to use in the mathematical operation

1

public Builder longMathOp(String columnName, MathOp mathOp, long scalar)

Copied!

Perform a mathematical operation (add, subtract, scalar max etc) on the specified long column, with a scalar

param columnName The long column to perform the operation on

param mathOp The mathematical operation

param scalar The scalar value to use in the mathematical operation

1

public Builder longColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)

Copied!

Calculate and add a new long column by performing a mathematical operation on a number of existing columns. New column is added to the end.

param newColumnName Name of the new/derived column

param mathOp Mathematical operation to execute on the columns

param columnNames Names of the columns to use in the mathematical operation

1

public Builder floatMathOp(String columnName, MathOp mathOp, float scalar)

Copied!

Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar

param columnName The float column to perform the operation on

param mathOp The mathematical operation

param scalar The scalar value to use in the mathematical operation

1

public Builder floatColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)

Copied!

Calculate and add a new float column by performing a mathematical operation on a number of existing columns. New column is added to the end.

param newColumnName Name of the new/derived column

param mathOp Mathematical operation to execute on the columns

param columnNames Names of the columns to use in the mathematical operation

1

public Builder floatMathFunction(String columnName, MathFunction mathFunction)

Copied!

Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a column

param columnName Column name to operate on

param mathFunction MathFunction to apply to the column

1

public Builder doubleMathOp(String columnName, MathOp mathOp, double scalar)

Copied!

Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar

param columnName The double column to perform the operation on

param mathOp The mathematical operation

param scalar The scalar value to use in the mathematical operation

1

public Builder doubleColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)

Copied!

Calculate and add a new double column by performing a mathematical operation on a number of existing columns. New column is added to the end.

param newColumnName Name of the new/derived column

param mathOp Mathematical operation to execute on the columns

param columnNames Names of the columns to use in the mathematical operation

1

public Builder doubleMathFunction(String columnName, MathFunction mathFunction)

Copied!

Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a column

param columnName Column name to operate on

param mathFunction MathFunction to apply to the column

1

public Builder timeMathOp(String columnName, MathOp mathOp, long timeQuantity, TimeUnit timeUnit)

Copied!

Perform a mathematical operation (add, subtract, scalar min/max only) on the specified time column

param columnName The integer column to perform the operation on

param mathOp The mathematical operation

param timeQuantity The quantity used in the mathematical op

param timeUnit The unit that timeQuantity is specified in

1

public Builder categoricalToOneHot(String... columnNames)

Copied!

Convert the specified column(s) from a categorical representation to a one-hot representation. This involves the creation of multiple new columns each.

param columnNames Names of the categorical column(s) to convert to a one-hot representation

1

public Builder categoricalToInteger(String... columnNames)

Copied!

Convert the specified column(s) from a categorical representation to an integer representation. This will replace the specified categorical column(s) with an integer repreesentation, where each integer has the value 0 to numCategories-1.

param columnNames Name of the categorical column(s) to convert to an integer representation

1

public Builder integerToCategorical(String columnName, List<String> categoryStateNames)

Copied!

Convert the specified column from an integer representation (assume values 0 to numCategories-1) to a categorical representation, given the specified state names

param columnName Name of the column to convert

param categoryStateNames Names of the states for the categorical column

1

public Builder integerToCategorical(String columnName, Map<Integer, String> categoryIndexNameMap)

Copied!

Convert the specified column from an integer representation to a categorical representation, given the specified mapping between integer indexes and state names

param columnName Name of the column to convert

param categoryIndexNameMap Names of the states for the categorical column

1

public Builder integerToOneHot(String columnName, int minValue, int maxValue)

Copied!

Convert an integer column to a set of 1 hot columns, based on the value in integer column

param columnName Name of the integer column

param minValue Minimum value possible for the integer column (inclusive)

param maxValue Maximum value possible for the integer column (inclusive)

1

public Builder addConstantColumn(String newColumnName, ColumnType newColumnType, Writable fixedValue)

Copied!

Add a new column, where all values in the column are identical and as specified.

param newColumnName Name of the new column

param newColumnType Type of the new column

param fixedValue Value in the new column for all records

1

public Builder addConstantDoubleColumn(String newColumnName, double value)

Copied!

Add a new double column, where the value for that column (for all records) are identical

param newColumnName Name of the new column

param value Value in the new column for all records

1

public Builder addConstantIntegerColumn(String newColumnName, int value)

Copied!

Add a new integer column, where th e value for that column (for all records) are identical

param newColumnName Name of the new column

param value Value of the new column for all records

1

public Builder addConstantLongColumn(String newColumnName, long value)

Copied!

Add a new integer column, where the value for that column (for all records) are identical

param newColumnName Name of the new column

param value Value in the new column for all records

1

public Builder convertToString(String inputColumn)

Copied!

Convert the specified column to a string.

param inputColumn the input column to convert

return builder pattern

1

public Builder convertToDouble(String inputColumn)

Copied!

Convert the specified column to a double.

param inputColumn the input column to convert

return builder pattern

1

public Builder convertToInteger(String inputColumn)

Copied!

Convert the specified column to an integer.

param inputColumn the input column to convert

return builder pattern

1

public Builder normalize(String column, Normalize type, DataAnalysis da)

Copied!

Normalize the specified column with a given type of normalization

param column Column to normalize

param type Type of normalization to apply

param da DataAnalysis object

1

public Builder convertToSequence(String keyColumn, SequenceComparator comparator)

Copied!

Convert a set of independent records/examples into a sequence, according to some key. Within each sequence, values are ordered using the provided {- link SequenceComparator}

param keyColumn Column to use as a key (values with the same key will be combined into sequences)

param comparator A SequenceComparator to order the values within each sequence (for example, by time or String order)

1

public Builder convertToSequence()

Copied!

Convert a set of independent records/examples into a sequence; each example is simply treated as a sequence of length 1, without any join/group operations. Note that more commonly, joining/grouping is required; use {- link #convertToSequence(List, SequenceComparator)} for this functionality

1

public Builder convertToSequence(List<String> keyColumns, SequenceComparator comparator)

Copied!

Convert a set of independent records/examples into a sequence, where each sequence is grouped according to one or more key values (i.e., the values in one or more columns) Within each sequence, values are ordered using the provided {- link SequenceComparator}

param keyColumns Column to use as a key (values with the same key will be combined into sequences)

param comparator A SequenceComparator to order the values within each sequence (for example, by time or String order)

1

public Builder convertFromSequence()

Copied!

Convert a sequence to a set of individual values (by treating each value in each sequence as a separate example)

1

public Builder splitSequence(SequenceSplit split)

Copied!

Split sequences into 1 or more other sequences. Used for example to split large sequences into a set of smaller sequences

param split SequenceSplit that defines how splits will occur

1

public Builder trimSequence(int numStepsToTrim, boolean trimFromStart)

Copied!

SequenceTrimTranform removes the first or last N values in a sequence. Note that the resulting sequence may be of length 0, if the input sequence is less than or equal to N.

param numStepsToTrim Number of time steps to trim from the sequence

param trimFromStart If true: Trim values from the start of the sequence. If false: trim values from the end.

1

public Builder offsetSequence(List<String> columnsToOffset, int offsetAmount,

2

SequenceOffsetTransform.OperationType operationType)

Copied!

Perform a sequence of operation on the specified columns. Note that this also truncates sequences by the specified offset amount by default. Use {- code transform(new SequenceOffsetTransform(…)} to change this. See {- link SequenceOffsetTransform} for details on exactly what this operation does and how.

param columnsToOffset Columns to offset

param offsetAmount Amount to offset the specified columns by (positive offset: ‘columnsToOffset’ are moved to later time steps)

param operationType Whether the offset should be done in-place or by adding a new column

1

public Builder reduce(IAssociativeReducer reducer)

Copied!

Reduce (i.e., aggregate/combine) a set of examples (typically by key). **Note**: In the current implementation, reduction operations can be performed only on standard (i.e., non-sequence) data

param reducer Reducer to use

1

public Builder reduceSequence(IAssociativeReducer reducer)

Copied!

Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually. **Note**: This method results in non-sequence data. If you would instead prefer sequences of length 1 after the reduction, use {- code transform(new ReduceSequenceTransform(reducer))}.

param reducer Reducer to use to reduce each window

1

public Builder reduceSequenceByWindow(IAssociativeReducer reducer, WindowFunction windowFunction)

Copied!

Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually - using a window function. For example, take all records/examples in each 24-hour period (i.e., using window function), and convert them into a singe value (using the reducer). In this example, the output is a sequence, with time period of 24 hours.

param reducer Reducer to use to reduce each window

param windowFunction Window function to find apply on each sequence individually

1

public Builder sequenceMovingWindowReduce(String columnName, int lookback, ReduceOp op)

Copied!

SequenceMovingWindowReduceTransform: Adds a new column, where the value is derived by:
(a) using a window of the last N values in a single column,
(b) Apply a reduction op on the window to calculate a new value
for example, this transformer can be used to implement a simple moving average of the last N values, or determine the minimum or maximum values in the last N time steps.

For example, for a simple moving average, length 20: {- code new SequenceMovingWindowReduceTransform(“myCol”, 20, ReduceOp.Mean)}

param columnName Column name to perform windowing on

param lookback Look back period for windowing

param op Reduction operation to perform on each window

1

public Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator)

Copied!

CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical “score” column, and we want to know for the rank (sort order) for each example, according to that column.
The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize-1.

Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column

param newColumnName Name of the new column (will contain the rank for each example)

param sortOnColumn Column to sort on

param comparator Comparator used to sort examples

1

public Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator,

2

boolean ascending)

Copied!

CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical “score” column, and we want to know for the rank (sort order) for each example, according to that column.
The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize-1.

Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column

param newColumnName Name of the new column (will contain the rank for each example)

param sortOnColumn Column to sort on

param comparator Comparator used to sort examples

param ascending If true: sort ascending. False: descending

1

public Builder stringToCategorical(String columnName, List<String> stateNames)

Copied!

Convert the specified String column to a categorical column. The state names must be provided.

param columnName Name of the String column to convert to categorical

param stateNames State names of the category

1

public Builder stringRemoveWhitespaceTransform(String columnName)

Copied!

Remove all whitespace characters from the values in the specified String column

param columnName Name of the column to remove whitespace from

1

public Builder stringMapTransform(String columnName, Map<String, String> mapping)

Copied!

Replace one or more String values in the specified column with new values.

Keys in the map are the original values; the Values in the map are their replacements. If a String appears in the data but does not appear in the provided map (as a key), that String values will not be modified.

param columnName Name of the column in which to do replacement

param mapping Map of oldValues -> newValues

1

public Builder stringToTimeTransform(String column, String format, DateTimeZone dateTimeZone)

Copied!

Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)

param column String column containing the date/time Strings

param format Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html

param dateTimeZone Timezone of the column

1

public Builder stringToTimeTransform(String column, String format, DateTimeZone dateTimeZone, Locale locale)

Copied!

Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)

param column String column containing the date/time Strings

param format Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html

param dateTimeZone Timezone of the column

param locale Locale of the column

1

public Builder appendStringColumnTransform(String column, String toAppend)

Copied!

Append a String to a specified column

param column Column to append the value to

param toAppend String to append to the end of each writable

1

public Builder conditionalReplaceValueTransform(String column, Writable newValue, Condition condition)

Copied!

Replace the values in a specified column with a specified new value, if some condition holds. If the condition does not hold, the original values are not modified.

param column Column to operate on

param newValue Value to use as replacement, if condition is satisfied

param condition Condition that must be satisfied for replacement

1

public Builder conditionalReplaceValueTransformWithDefault(String column, Writable yesVal, Writable noVal, Condition condition)

Copied!

Replace the values in a specified column with a specified “yes” value, if some condition holds. Replace it with a “no” value, otherwise.

param column Column to operate on

param yesVal Value to use as replacement, if condition is satisfied

param noVal Value to use as replacement, if condition is not satisfied

param condition Condition that must be satisfied for replacement

1

public Builder conditionalCopyValueTransform(String columnToReplace, String sourceColumn, Condition condition)

Copied!

Replace the value in a specified column with a new value taken from another column, if a condition is satisfied/true.
Note that the condition can be any generic condition, including on other column(s), different to the column that will be modified if the condition is satisfied/true.

param columnToReplace Name of the column in which values will be replaced (if condition is satisfied)

param sourceColumn Name of the column from which the new values will be

param condition Condition to use

1

public Builder replaceStringTransform(String columnName, Map<String, String> mapping)

Copied!

Replace one or more String values in the specified column that match regular expressions.

Keys in the map are the regular expressions; the Values in the map are their String replacements. For example:

Original

Regex

Replacement

Result

Data_Vec

_

DataVec

B1C2T3

\d

one

BoneConeTone

' 4.25 '

^\s+|\s+$

'4.25'

param columnName Name of the column in which to do replacement

param mapping Map of old values or regular expression to new values

1

public Builder ndArrayScalarOpTransform(String columnName, MathOp op, double value)

Copied!

Element-wise NDArray math operation (add, subtract, etc) on an NDArray column

param columnName Name of the NDArray column to perform the operation on

param op Operation to perform

param value Value for the operation

1

public Builder ndArrayColumnsMathOpTransform(String newColumnName, MathOp mathOp, String... columnNames)

Copied!

Perform an element wise mathematical operation (such as add, subtract, multiply) on NDArray columns. The existing columns are unchanged, a new NDArray column is added

param newColumnName Name of the new NDArray column

param mathOp Operation to perform

param columnNames Name of the columns used as input to the operation

1

public Builder ndArrayMathFunctionTransform(String columnName, MathFunction mathFunction)

Copied!

Apply an element wise mathematical function (sin, tanh, abs etc) to an NDArray column. This operation is performed in place.

param columnName Name of the column to perform the operation on

param mathFunction Mathematical function to apply

1

public Builder ndArrayDistanceTransform(String newColumnName, Distance distance, String firstCol,

2

String secondCol)

Copied!

Calculate a distance (cosine similarity, Euclidean, Manhattan) on two equal-sized NDArray columns. This operation adds a new Double column (with the specified name) with the result.

param newColumnName Name of the new column (result) to add

param distance Distance to apply

param firstCol first column to use in the distance calculation

param secondCol second column to use in the distance calculation

1

public Builder firstDigitTransform(String inputColumn, String outputColumn)

Copied!

FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, “3.1415” becomes “3” and “2.0” becomes “2”.
Negative numbers ignore the sign: “-7.123” becomes “7”.
Note that two {- link FirstDigitTransform.Mode}s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values (“0”, …, “9”), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values (“0”, …, “9”, “Other”), all non-numerical values are mapped to “Other”

FirstDigitTransform is useful (combined with {- link CategoricalToOneHotTransform} and Reductions) to implement Benford’s law.

param inputColumn Input column name

param outputColumn Output column name. If same as input, input column is replaced

1

public Builder firstDigitTransform(String inputColumn, String outputColumn, FirstDigitTransform.Mode mode)

Copied!

FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, “3.1415” becomes “3” and “2.0” becomes “2”.
Negative numbers ignore the sign: “-7.123” becomes “7”.
Note that two {- link FirstDigitTransform.Mode}s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values (“0”, …, “9”), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values (“0”, …, “9”, “Other”), all non-numerical values are mapped to “Other”

FirstDigitTransform is useful (combined with {- link CategoricalToOneHotTransform} and Reductions) to implement Benford’s law.

param inputColumn Input column name

param outputColumn Output column name. If same as input, input column is replaced

param mode See {- link FirstDigitTransform.Mode}

1

public TransformProcess build()

Copied!

Create the TransformProcess object

CategoricalToIntegerTransform

Created by Alex on 4/03/2016.

1

public Object map(Object input)

Copied!

Transform an object in to another object

param input the record to transform

return the transformed writable

1

public Object mapSequence(Object sequence)

Copied!

Transform a sequence

param sequence

1

public String outputColumnName()

Copied!

The output column name after the operation has been applied

return the output column name

1

public String columnName()

Copied!

The output column names This will often be the same as the input

return the output column names

CategoricalToOneHotTransform

Created by Alex on 4/03/2016.

1

public Object map(Object input)

Copied!

Transform an object in to another object

param input the record to transform

return the transformed writable

1

public Object mapSequence(Object sequence)

Copied!

Transform a sequence

param sequence

1

public String outputColumnName()

Copied!

The output column name after the operation has been applied

return the output column name

1

public String columnName()

Copied!

The output column names This will often be the same as the input

return the output column names

IntegerToCategoricalTransform

1

public Object map(Object input)

Copied!

Transform an object in to another object

param input the record to transform

return the transformed writable

1

public Object mapSequence(Object sequence)

Copied!

Transform a sequence

param sequence

PivotTransform

Pivot transform operates on two columns:

a categorical column that operates as a key, and

Another column that contains a value Essentially, Pivot transform takes keyvalue pairs and breaks them out into separate columns.

For example, with schema [col0, key, value, col3] and values with key in {a,b,c} Output schema is [col0, key[a], key[b], key[c], col3] and input (col0Val, b, x, col3Val) gets mapped to (col0Val, 0, x, 0, col3Val).

When expanding columns, a default value is used - for example 0 for numerical columns.

1

public Schema transform(Schema inputSchema)

Copied!

param keyColumnName Key column to expand

param valueColumnName Name of the column that contains the value

StringToCategoricalTransform

Convert a String column to a categorical column

1

public Object map(Object input)

Copied!

Transform an object in to another object

param input the record to transform

return the transformed writable

1

public Object mapSequence(Object sequence)

Copied!

Transform a sequence

param sequence

AddConstantColumnTransform

Add a new column, where the values in that column for all records are identical (according to the specified value)

DuplicateColumnsTransform

Duplicate one or more columns. The duplicated columns are placed immediately after the original columns

1

public Schema transform(Schema inputSchema)

Copied!

param columnsToDuplicate List of columns to duplicate

param newColumnNames List of names for the new (duplicate) columns

1

public Object map(Object input)

Copied!

Transform an object in to another object

param input the record to transform

return the transformed writable

1

public Object mapSequence(Object sequence)

Copied!

Transform a sequence

param sequence

1

public String outputColumnName()

Copied!

The output column name after the operation has been applied

return the output column name

1

public String columnName()

Copied!

The output column names This will often be the same as the input

return the output column names

RemoveAllColumnsExceptForTransform

Transform that removes all columns except for those that are explicitly specified as ones to keep To specify only the columns

1

public Object map(Object input)

Copied!