All pages
Powered by GitBook
1 of 1

Loading...

Filters

Selection of data using conditions.

Using filters

Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.

You can also write your own filters by implementing the Filter interface, though it is much more often that you may want to create a custom condition instead.

Available filters

ConditionFilter

If condition is satisfied (returns true): remove the example or sequence If condition is not satisfied (returns false): keep the example or sequence

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

transform

Get the output schema for this transformation, given an input schema

  • param inputSchema

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

Filter

Filter: a method of removing examples (or sequences) according to some condition

FilterInvalidValues

FilterInvalidValues: a filter operation that removes any examples (or sequences) if the examples/sequences contains invalid values in any of a specified set of columns. Invalid values are determined with respect to the schema

transform

  • param columnsToFilterIfInvalid Columns to check for invalid values

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

InvalidNumColumns

Remove invalid records of a certain size.

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

transform

Get the output schema for this transformation, given an input schema

  • param inputSchema

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
    .build();
[source]
[source]
[source]
[source]
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public Schema transform(Schema inputSchema)
public String outputColumnName()
public String columnName()
public Schema transform(Schema inputSchema)
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public String outputColumnName()
public String columnName()
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public boolean removeExample(List<Writable> writables)
public boolean removeSequence(List<List<Writable>> sequence)
public Schema transform(Schema inputSchema)
public String outputColumnName()
public String columnName()