> For the complete documentation index, see [llms.txt](https://deeplearning4j.konduit.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/datavec/reductions.md).

# Reductions

Reductions aggregate multiple records (or time steps within a sequence) into a single record. They are the primary tool for:

* Collapsing a group of rows that share the same key into one summarized row
* Reducing sequences into a fixed-size feature vector
* Computing geographic midpoints from coordinate strings

DataVec provides two core reducer classes: `Reducer` for numerical and general column data, and `StringReducer` for string-column aggregation.

## Reducer

`Reducer` (in `org.datavec.api.transform.reduce`) collapses groups of records into single records. The columns you specify for reduction get aggregated; the remaining key columns are left as-is.

### Basic usage

```java
import org.datavec.api.transform.reduce.Reducer;
import org.datavec.api.transform.ReduceOp;

Reducer reducer = new Reducer.Builder(ReduceOp.Mean)
    .keyColumns("CustomerID")          // group by this column
    .meanColumns("TransactionAmount")  // compute mean of this column
    .countColumns("TransactionAmount") // add a count column
    .sumColumns("Refunds")
    .build();
```

### Reduction operations

`ReduceOp` specifies the default operation applied to any column not individually configured.

| `ReduceOp`           | Description                       |
| -------------------- | --------------------------------- |
| `Sum`                | Sum of all values                 |
| `Mean`               | Arithmetic mean                   |
| `Count`              | Number of records                 |
| `CountUnique`        | Number of distinct values         |
| `TakeFirst`          | First value (in encounter order)  |
| `TakeLast`           | Last value (in encounter order)   |
| `Min`                | Minimum value                     |
| `Max`                | Maximum value                     |
| `Range`              | max - min                         |
| `Stdev`              | Sample standard deviation         |
| `Variance`           | Sample variance                   |
| `UncorrectedStdDev`  | Population standard deviation     |
| `PopulationVariance` | Population variance               |
| `Prod`               | Product of all values             |
| `Append`             | Concatenate string values         |
| `Prepend`            | Prepend-concatenate string values |

### Column-level configuration

You can override the default operation per column using the builder methods:

```java
Reducer reducer = new Reducer.Builder(ReduceOp.TakeFirst)
    .keyColumns("OrderID")
    .sumColumns("Quantity", "Price")
    .maxColumns("Rating")
    .minColumns("DeliveryDays")
    .countColumns("ItemID")
    .customReduction("Tags", myCustomColumnReduction)
    .setIgnoreInvalid("Price")     // skip rows where Price is invalid
    .build();
```

### Ignoring invalid values

`setIgnoreInvalid(String... columns)` configures the reducer to skip invalid values in the listed columns when computing the aggregate. Invalid is defined relative to the column's `ColumnMetaData`.

### Custom column reductions

Implement `ColumnReduction` to provide a custom aggregation function for a specific column:

```java
public interface ColumnReduction {
    Writable reduceColumn(List<Writable> columnValues);
    String getColumnOutputName(String columnInputName);
    ColumnMetaData getColumnOutputMetaData(String newColumnName, ColumnMetaData columnInputMeta);
}
```

Then register it:

```java
reducer.customReduction("myColumn", new MyColumnReduction());
```

***

## StringReducer

`StringReducer` is a reducer that operates on string columns. It supports the same grouping concept as `Reducer` but provides string-specific operations: append, prepend, merge, and replace.

[source](https://github.com/eclipse/deeplearning4j/tree/master/datavec/datavec-api/src/main/java/org/datavec/api/transform/stringreduce/StringReducer.java)

```java
import org.datavec.api.transform.stringreduce.StringReducer;

StringReducer reducer = new StringReducer.Builder(StringReduceOp.Merge)
    .keyColumns("ProductID")
    .appendColumns("Tags")       // append all tag values
    .prependColumns("Prefix")    // prepend all prefix values
    .replaceColumn("Status")     // use last value (replace)
    .mergeColumns("Description") // merge all values with separator
    .build();
```

### Builder methods

| Method                                           | Effect                                                      |
| ------------------------------------------------ | ----------------------------------------------------------- |
| `appendColumns(String... cols)`                  | Concatenate all values, appending each new value to the end |
| `prependColumns(String... cols)`                 | Concatenate values, prepending each new value to the start  |
| `mergeColumns(String... cols)`                   | Merge all values using a separator                          |
| `replaceColumn(String... cols)`                  | Replace previous value with each new value (keeps last)     |
| `customReduction(String col, ColumnReduction r)` | Use a custom reduction for the named column                 |
| `setIgnoreInvalid(String... cols)`               | Skip invalid values during reduction                        |
| `outputColumnName(String name)`                  | Set the output column name                                  |

***

## GeographicMidpointReduction

[source](https://github.com/eclipse/deeplearning4j/tree/master/datavec/datavec-api/src/main/java/org/datavec/api/transform/reduce/impl/GeographicMidpointReduction.java)

A specialized reduction that computes the geographic midpoint from a column of latitude/longitude coordinate strings. This is useful when you have a set of GPS pings or location records and want to reduce them to a single representative point.

The algorithm follows the method described at [geomidpoint.com](http://www.geomidpoint.com/methods.html), which converts spherical coordinates to Cartesian, computes the average, and converts back.

```java
import org.datavec.api.transform.reduce.impl.GeographicMidpointReduction;

// Column "Coordinates" contains strings like "lat,long"
GeographicMidpointReduction geoReduction = new GeographicMidpointReduction(",");
```

Constructor parameter: `delim` — the delimiter used to separate latitude and longitude within the string, for example `","` for `"40.7128,-74.0060"`.

The output column will also be a string in `"lat,long"` format representing the computed midpoint.

***

## Executing reductions

### Locally

```java
import org.datavec.local.transforms.LocalTransformExecutor;
import org.datavec.api.transform.join.Join;

// Execute a reduction inline in a transform process
TransformProcess tp = new TransformProcess.Builder(schema)
    .reduce(reducer)
    .build();

List<List<Writable>> reduced = LocalTransformExecutor.execute(data, tp);
```

### On Spark

```java
import org.datavec.spark.transform.SparkTransformExecutor;

JavaRDD<List<Writable>> reduced = SparkTransformExecutor.execute(inputRdd, tp);
```

***

## Joins (group and combine)

`Join` combines two datasets on a common key. Combined with `Reducer`, joins can produce rich aggregated views over multiple data sources.

```java
import org.datavec.api.transform.join.Join;

Join join = new Join.Builder(Join.JoinType.Inner)
    .setJoinColumns("CustomerID")
    .setSchemas(leftSchema, rightSchema)
    .build();

List<List<Writable>> joined = LocalTransformExecutor.executeJoin(join, leftData, rightData);
```

Join types:

| Type         | Description                                               |
| ------------ | --------------------------------------------------------- |
| `Inner`      | Only records with matching keys in both datasets          |
| `LeftOuter`  | All left records; matched right records or nulls          |
| `RightOuter` | All right records; matched left records or nulls          |
| `FullOuter`  | All records from both datasets; unmatched sides get nulls |

***

## Sequence reduction

When working with sequence data (time series), reductions can collapse a variable-length sequence into a fixed-size feature vector. The same `Reducer` API applies — each time step is treated as one record, and the resulting summary is one record per sequence.

```java
// Reduce each sequence to one record using per-column operations
TransformProcess tp = new TransformProcess.Builder(sequenceSchema)
    .reduceSequenceByWindow(reducer, windowFunction)
    .build();
```

See `ReduceSequenceByWindowTransform` for windowed reductions, which are useful for sliding-window feature engineering on time series.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/datavec/reductions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
