Filters
Selection of data using conditions.

Using filters

Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.
1
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
2
.filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
3
.build();
Copied!
You can also write your own filters by implementing the Filter interface, though it is much more often that you may want to create a custom condition instead.

Available filters

ConditionFilter

[source]
If condition is satisfied (returns true): remove the example or sequence If condition is not satisfied (returns false): keep the example or sequence
removeExample
1
public boolean removeExample(Object writables)
Copied!
    param writables Example
    return true if example should be removed, false to keep
removeSequence
1
public boolean removeSequence(Object sequence)
Copied!
    param sequence sequence example
    return true if example should be removed, false to keep
transform
1
public Schema transform(Schema inputSchema)
Copied!
Get the output schema for this transformation, given an input schema
    param inputSchema
outputColumnName
1
public String outputColumnName()
Copied!
The output column name after the operation has been applied
    return the output column name
columnName
1
public String columnName()
Copied!
The output column names This will often be the same as the input
    return the output column names

Filter

[source]
Filter: a method of removing examples (or sequences) according to some condition

FilterInvalidValues

[source]
FilterInvalidValues: a filter operation that removes any examples (or sequences) if the examples/sequences contains invalid values in any of a specified set of columns. Invalid values are determined with respect to the schema
transform
1
public Schema transform(Schema inputSchema)
Copied!
    param columnsToFilterIfInvalid Columns to check for invalid values
removeExample
1
public boolean removeExample(Object writables)
Copied!
    param writables Example
    return true if example should be removed, false to keep
removeSequence
1
public boolean removeSequence(Object sequence)
Copied!
    param sequence sequence example
    return true if example should be removed, false to keep
outputColumnName
1
public String outputColumnName()
Copied!
The output column name after the operation has been applied
    return the output column name
columnName
1
public String columnName()
Copied!
The output column names This will often be the same as the input
    return the output column names

InvalidNumColumns

[source]
Remove invalid records of a certain size.
removeExample
1
public boolean removeExample(Object writables)
Copied!
    param writables Example
    return true if example should be removed, false to keep
removeSequence
1
public boolean removeSequence(Object sequence)
Copied!
    param sequence sequence example
    return true if example should be removed, false to keep
removeExample
1
public boolean removeExample(List<Writable> writables)
Copied!
    param writables Example
    return true if example should be removed, false to keep
removeSequence
1
public boolean removeSequence(List<List<Writable>> sequence)
Copied!
    param sequence sequence example
    return true if example should be removed, false to keep
transform
1
public Schema transform(Schema inputSchema)
Copied!
Get the output schema for this transformation, given an input schema
    param inputSchema
outputColumnName
1
public String outputColumnName()
Copied!
The output column name after the operation has been applied
    return the output column name
columnName
1
public String columnName()
Copied!
The output column names This will often be the same as the input
    return the output column names