一个转换过程需要一个概要来成功地转换数据。概要和转换过程类都附带一个帮助构建器类,对于组织代码和避免复杂的构建器来说是很有用的。当两者结合起来它们看起来像如下的样例代码。请注意inputDataSchema是如何传到Builder构造器的。没有它,你的转换过程将会编译失败。
Copy import org . datavec . api . transform . TransformProcess ;
TransformProcess tp = new TransformProcess . Builder (inputDataSchema)
. removeColumns ( "CustomerID" , "MerchantID" )
. filter ( new ConditionFilter( new CategoricalColumnCondition( "MerchantCountryCode" , ConditionOp . NotInSet , new HashSet <>( Arrays . asList( "USA" , "CAN" )))) )
. conditionalReplaceValueTransform (
"TransactionAmountUSD" , //被操作的列
new DoubleWritable( 0.0 ) , //如果条件满足,用新的值
new DoubleColumnCondition( "TransactionAmountUSD" , ConditionOp . LessThan , 0.0 ) ) //条件: amount < 0.0
. stringToTimeTransform ( "DateTimeString" , "YYYY-MM-DD HH:mm:ss.SSS" , DateTimeZone . UTC )
. renameColumn ( "DateTimeString" , "DateTime" )
. transform ( new DeriveColumnsFromTimeTransform . Builder ( "DateTime" ) . addIntegerDerivedColumn ( "HourOfDay" , DateTimeFieldType . hourOfDay ()) . build ())
. removeColumns ( "DateTime" )
. build ();
Copy import org . datavec . local . transforms . LocalTransformExecutor ;
List < List < Writable >> processedData = LocalTransformExecutor . execute (originalData , tp);
Copy //Now, print the schema after each time step:
int numActions = tp . getActionList () . size ();
for ( int i = 0 ; i < numActions; i ++ ){
System . out . println ( "\n\n==================================================" );
System . out . println ( "-- Schema after step " + i + " (" + tp . getActionList () . get (i) + ") --" );
System . out . println ( tp . getSchemaAfterStep (i));
}