> For the complete documentation index, see [llms.txt](https://deeplearning4j.konduit.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/nd4j/loss-functions.md).

# Loss Functions

Loss functions measure the discrepancy between a network's predictions and the true labels. During training, the optimizer minimizes the loss to improve predictions. All loss functions in the DL4J ecosystem implement the `ILossFunction` interface at `org.nd4j.linalg.lossfunctions.ILossFunction`.

## Usage

### In Output Layers (Preferred)

Pass an `ILossFunction` instance to the output layer builder:

```java
import org.nd4j.linalg.lossfunctions.impl.LossMCXENT;

new OutputLayer.Builder(new LossMCXENT())
    .nIn(256).nOut(10)
    .activation(Activation.SOFTMAX)
    .build()
```

### Using the LossFunction Enum (Legacy)

The convenience enum still works but instantiating `ILossFunction` directly is preferred in M2.1:

```java
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;

new OutputLayer.Builder(LossFunction.MSE)
    .nIn(256).nOut(1)
    .activation(Activation.IDENTITY)
    .build()
```

### In SameDiff

Use the `sd.loss` namespace:

```java
SDVariable loss = sd.loss.softmaxCrossEntropy("loss", labels, logits, null);
```

## Classification Loss Functions

### LossMCXENT — Multi-Class Cross Entropy

```java
new LossMCXENT()
new LossMCXENT(weights)  // class weights as INDArray
```

The standard loss for **multi-class classification** tasks. Measures the cross entropy between the true distribution (one-hot labels) and the predicted distribution (softmax outputs).

**Formula:** `L = -sum(y_true * log(y_pred))`

**Pair with:** `Activation.SOFTMAX`

**Class weighting:** Pass an `INDArray` of shape `[1, numClasses]` to handle class imbalance:

```java
INDArray classWeights = Nd4j.create(new double[]{1.0, 2.5, 1.0, 3.0});
new OutputLayer.Builder(new LossMCXENT(classWeights))
    .activation(Activation.SOFTMAX)
    .nIn(128).nOut(4)
    .build()
```

### LossSparseMCXENT — Sparse Multi-Class Cross Entropy

```java
new LossSparseMCXENT()
```

Same as `LossMCXENT` but accepts **integer labels** instead of one-hot encoded labels. Labels should be a column vector of class indices (shape `[batchSize, 1]`).

**Pair with:** `Activation.SOFTMAX`

### LossNegativeLogLikelihood — Negative Log Likelihood

```java
new LossNegativeLogLikelihood()
new LossNegativeLogLikelihood(weights)
```

Functionally equivalent to `LossMCXENT` for softmax outputs. The difference is in how the gradient is computed internally.

**Pair with:** `Activation.SOFTMAX`

### LossBinaryXENT — Binary Cross Entropy

```java
new LossBinaryXENT()
new LossBinaryXENT(weights)
```

For **binary classification** or **multi-label classification** where each output is an independent binary decision.

**Formula:** `L = -[y * log(p) + (1-y) * log(1-p)]`

**Pair with:** `Activation.SIGMOID`

```java
// Binary classification (single output)
new OutputLayer.Builder(new LossBinaryXENT())
    .nIn(128).nOut(1)
    .activation(Activation.SIGMOID)
    .build()

// Multi-label classification (multiple independent outputs)
new OutputLayer.Builder(new LossBinaryXENT())
    .nIn(128).nOut(5)    // 5 independent binary labels
    .activation(Activation.SIGMOID)
    .build()
```

### LossHinge — Hinge Loss

```java
new LossHinge()
```

SVM-style loss for classification. Labels should be -1 or +1.

**Formula:** `L = max(0, 1 - y_true * y_pred)`

**Pair with:** `Activation.TANH` (output range -1 to 1)

### LossSquaredHinge — Squared Hinge Loss

```java
new LossSquaredHinge()
```

Smooth variant of hinge loss: `L = max(0, 1 - y_true * y_pred)^2`

### LossFMeasure — F-Measure Loss

```java
new LossFMeasure()
new LossFMeasure(beta)
```

Directly optimizes the F-measure (F1 score by default). For binary classification only.

* `beta = 1.0` (default): F1 score (equal weight to precision and recall)
* `beta < 1.0`: Favors precision
* `beta > 1.0`: Favors recall

### LossMultiLabel — Multi-Label Loss

```java
new LossMultiLabel()
```

Specialized loss for multi-label ranking tasks.

## Regression Loss Functions

### LossMSE — Mean Squared Error

```java
new LossMSE()
```

Standard regression loss. Heavily penalizes large errors.

**Formula:** `L = mean((y_true - y_pred)^2)`

**Pair with:** `Activation.IDENTITY`

```java
new OutputLayer.Builder(new LossMSE())
    .nIn(128).nOut(1)
    .activation(Activation.IDENTITY)
    .build()
```

### LossMAE — Mean Absolute Error

```java
new LossMAE()
```

More robust to outliers than MSE.

**Formula:** `L = mean(|y_true - y_pred|)`

**Pair with:** `Activation.IDENTITY`

### LossL1 — L1 Loss

```java
new LossL1()
```

Sum of absolute differences (not averaged). Encourages sparse predictions.

**Formula:** `L = sum(|y_true - y_pred|)`

### LossL2 — L2 Loss

```java
new LossL2()
```

Sum of squared differences (not averaged).

**Formula:** `L = sum((y_true - y_pred)^2)`

### LossMSLE — Mean Squared Logarithmic Error

```java
new LossMSLE()
```

Useful when target values span several orders of magnitude.

**Formula:** `L = mean((log(y_true + 1) - log(y_pred + 1))^2)`

**Pair with:** `Activation.IDENTITY` (predictions should be non-negative)

### LossMAPE — Mean Absolute Percentage Error

```java
new LossMAPE()
```

Percentage-based regression error.

**Formula:** `L = mean(|y_true - y_pred| / |y_true|) * 100`

### LossPoisson — Poisson Loss

```java
new LossPoisson()
```

For count data regression where the target follows a Poisson distribution.

**Formula:** `L = mean(y_pred - y_true * log(y_pred))`

## Distribution and Similarity Loss Functions

### LossKLD — Kullback-Leibler Divergence

```java
new LossKLD()
```

Measures the divergence between two probability distributions. Used in variational autoencoders and distribution matching.

**Formula:** `L = sum(y_true * log(y_true / y_pred))`

### LossCosineProximity — Cosine Proximity Loss

```java
new LossCosineProximity()
```

Measures the cosine distance between predictions and labels. Useful for similarity learning tasks where direction matters more than magnitude.

**Formula:** `L = -sum(y_true * y_pred) / (||y_true|| * ||y_pred||)`

### LossWasserstein — Wasserstein Loss

```java
new LossWasserstein()
```

Earth Mover's Distance loss. Commonly used in WGAN (Wasserstein GAN) training.

**Formula:** `L = mean(y_true * y_pred)`

## Specialized Loss Functions

### LossMixtureDensity — Mixture Density Network Loss

```java
new LossMixtureDensity(numMixtures)
```

For mixture density networks that output parameters of a Gaussian mixture model (means, variances, mixing coefficients).

## Quick Reference

| Task                         | Loss Function         | Activation | Labels               |
| ---------------------------- | --------------------- | ---------- | -------------------- |
| Multi-class classification   | `LossMCXENT`          | `SOFTMAX`  | One-hot              |
| Multi-class (integer labels) | `LossSparseMCXENT`    | `SOFTMAX`  | Integer indices      |
| Binary classification        | `LossBinaryXENT`      | `SIGMOID`  | 0/1                  |
| Multi-label classification   | `LossBinaryXENT`      | `SIGMOID`  | Binary vector        |
| Regression                   | `LossMSE`             | `IDENTITY` | Continuous           |
| Robust regression            | `LossMAE`             | `IDENTITY` | Continuous           |
| Log-scale regression         | `LossMSLE`            | `IDENTITY` | Positive continuous  |
| Count data                   | `LossPoisson`         | `IDENTITY` | Non-negative integer |
| Distribution matching        | `LossKLD`             | `SOFTMAX`  | Probabilities        |
| Similarity learning          | `LossCosineProximity` | varies     | Normalized vectors   |
| SVM-style                    | `LossHinge`           | `TANH`     | -1/+1                |
| GAN (Wasserstein)            | `LossWasserstein`     | `IDENTITY` | -1/+1                |
| Optimize F1 directly         | `LossFMeasure`        | `SIGMOID`  | 0/1                  |

## Custom Loss Functions

Implement `ILossFunction`:

```java
import org.nd4j.linalg.lossfunctions.ILossFunction;
import org.nd4j.linalg.activations.IActivation;
import org.nd4j.common.primitives.Pair;

public class HuberLoss implements ILossFunction {

    private final double delta;

    public HuberLoss(double delta) {
        this.delta = delta;
    }

    @Override
    public double computeScore(INDArray labels, INDArray preOutput,
                               IActivation activationFn, INDArray mask, boolean average) {
        INDArray output = activationFn.getActivation(preOutput.dup(), true);
        INDArray diff = labels.sub(output);
        INDArray absDiff = Transforms.abs(diff);

        // Huber: quadratic for |diff| <= delta, linear for |diff| > delta
        INDArray quadratic = Transforms.min(absDiff, delta);
        INDArray linear = absDiff.sub(quadratic);
        INDArray loss = quadratic.mul(quadratic).mul(0.5).add(linear.mul(delta));

        double score = loss.sumNumber().doubleValue();
        return average ? score / labels.size(0) : score;
    }

    @Override
    public INDArray computeScoreArray(INDArray labels, INDArray preOutput,
                                      IActivation activationFn, INDArray mask) {
        // Return per-example loss
        INDArray output = activationFn.getActivation(preOutput.dup(), true);
        INDArray diff = labels.sub(output);
        INDArray absDiff = Transforms.abs(diff);
        INDArray quadratic = Transforms.min(absDiff, delta);
        INDArray linear = absDiff.sub(quadratic);
        return quadratic.mul(quadratic).mul(0.5).add(linear.mul(delta)).sum(1);
    }

    @Override
    public INDArray computeGradient(INDArray labels, INDArray preOutput,
                                    IActivation activationFn, INDArray mask) {
        INDArray output = activationFn.getActivation(preOutput.dup(), true);
        INDArray diff = output.sub(labels);
        // Clip gradient to [-delta, delta]
        INDArray grad = Transforms.min(Transforms.max(diff, -delta), delta);

        // Chain rule: multiply by activation gradient
        INDArray dLda = grad.div(labels.size(0));
        Pair<INDArray, INDArray> p = activationFn.backprop(preOutput, dLda);
        return p.getFirst();
    }

    @Override
    public Pair<Double, INDArray> computeGradientAndScore(
            INDArray labels, INDArray preOutput, IActivation activationFn,
            INDArray mask, boolean average) {
        return new Pair<>(
            computeScore(labels, preOutput, activationFn, mask, average),
            computeGradient(labels, preOutput, activationFn, mask)
        );
    }

    @Override
    public String name() { return "HuberLoss(delta=" + delta + ")"; }
}
```

Use it like any built-in loss:

```java
new OutputLayer.Builder(new HuberLoss(1.0))
    .nIn(128).nOut(1)
    .activation(Activation.IDENTITY)
    .build()
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/nd4j/loss-functions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
