arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Filters

Selection of data using conditions.

hashtag
Using filters

Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.

You can also write your own filters by implementing the Filter interface, though it is much more often that you may want to create a custom condition instead.

hashtag
Available filters

hashtag
ConditionFilter

If condition is satisfied (returns true): remove the example or sequence If condition is not satisfied (returns false): keep the example or sequence

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

transform

Get the output schema for this transformation, given an input schema

  • param inputSchema

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

hashtag
Filter

Filter: a method of removing examples (or sequences) according to some condition

hashtag
FilterInvalidValues

FilterInvalidValues: a filter operation that removes any examples (or sequences) if the examples/sequences contains invalid values in any of a specified set of columns. Invalid values are determined with respect to the schema

transform

  • param columnsToFilterIfInvalid Columns to check for invalid values

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

hashtag
InvalidNumColumns

Remove invalid records of a certain size.

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

removeExample

  • param writables Example

  • return true if example should be removed, false to keep

removeSequence

  • param sequence sequence example

  • return true if example should be removed, false to keep

transform

Get the output schema for this transformation, given an input schema

  • param inputSchema

outputColumnName

The output column name after the operation has been applied

  • return the output column name

columnName

The output column names This will often be the same as the input

  • return the output column names

TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
    .build();
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public Schema transform(Schema inputSchema)
public String outputColumnName()
public String columnName()
public Schema transform(Schema inputSchema)
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public String outputColumnName()
public String columnName()
public boolean removeExample(Object writables)
public boolean removeSequence(Object sequence)
public boolean removeExample(List<Writable> writables)
public boolean removeSequence(List<List<Writable>> sequence)
public Schema transform(Schema inputSchema)
public String outputColumnName()
public String columnName()