> For the complete documentation index, see [llms.txt](https://deeplearning4j.konduit.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://deeplearning4j.konduit.ai/en-1.0.0-rewrite/nd4j/overview-2.md).

# SameDiff

SameDiff is the automatic differentiation (autograd) framework built into ND4J. It lets you define mathematical computation graphs in Java, execute them against real data, and compute gradients automatically — without writing any backpropagation code by hand.

## What SameDiff Is

At its core, SameDiff represents a computation as a directed acyclic graph (DAG) where:

* **Nodes** are variables (`SDVariable` instances) holding arrays of numbers.
* **Edges** are operations that consume one or more input variables and produce an output variable.

When you write code like:

```java
SameDiff sd = SameDiff.create();
SDVariable x = sd.placeHolder("x", DataType.FLOAT, -1, 784);
SDVariable w = sd.var("w", DataType.FLOAT, 784, 10);
SDVariable b = sd.var("b", DataType.FLOAT, 10);
SDVariable logits = x.mmul(w).add(b);
SDVariable output = sd.nn.softmax("output", logits);
```

you are **defining** the graph, not executing it. No numeric computation happens yet. The graph is a blueprint that SameDiff stores internally. Execution happens separately when you call `output()`, `exec()`, or `fit()`.

This approach is called **define-and-run** (as opposed to the eager evaluation model where each line immediately computes a result).

## Automatic Gradient Computation

The major payoff of building a computation graph is that SameDiff can traverse it in reverse to compute gradients with respect to any variable automatically. When training, SameDiff:

1. Runs the forward pass (evaluates all nodes in topological order).
2. Computes the scalar loss value.
3. Runs the backward pass (applies the chain rule through each op in reverse order).
4. Updates trainable `VARIABLE`-type parameters using the configured optimizer.

You never implement `backward()` methods. The gradients for every built-in operation are pre-registered in the framework.

## Key Classes

| Class              | Role                                                                                                            |
| ------------------ | --------------------------------------------------------------------------------------------------------------- |
| `SameDiff`         | The graph container. Holds all variables, ops, and training configuration. Create one with `SameDiff.create()`. |
| `SDVariable`       | A node in the graph. Wraps an `INDArray` (when values are available) and knows its position in the graph.       |
| `TrainingConfig`   | Bundles the optimizer, loss variable name, data-type mappings, and listener list for a training run.            |
| `History`          | Returned by `fit()`; records loss and metric values epoch by epoch.                                             |
| `InferenceSession` | Low-level execution engine; usually used indirectly via `sd.output()`.                                          |

## When to Use SameDiff vs MultiLayerNetwork / ComputationGraph

DL4J provides three ways to build neural networks. Choose based on your needs:

### MultiLayerNetwork

Use when your network is a simple sequential stack of layers. It is the easiest API:

```java
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
    .list()
    .layer(new DenseLayer.Builder().nIn(784).nOut(256).activation(Activation.RELU).build())
    .layer(new OutputLayer.Builder().nIn(256).nOut(10).activation(Activation.SOFTMAX).build())
    .build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
```

**Best for:** standard feedforward networks, CNNs with a single input/output, beginners.

### ComputationGraph

Use when your network has multiple inputs, multiple outputs, skip connections, or branching paths (e.g. encoder-decoder, Siamese networks). Still configuration-driven but more flexible than `MultiLayerNetwork`.

**Best for:** complex topologies that can still be described with DL4J's built-in layer types.

### SameDiff

Use when you need:

* **Custom operations or loss functions** that have no counterpart in the DL4J layer catalogue.
* **Research and experimentation** where you want full symbolic control over every operation.
* **Fine-grained weight sharing** or unusual parameter tying.
* **Importing and fine-tuning TensorFlow/ONNX models** — the model import pipeline internally produces SameDiff graphs.

SameDiff is more verbose than the DL4J layer APIs but gives you complete flexibility over every computation in your model.

## Building a Simple Neural Net in SameDiff

Here is a complete minimal example of a one-hidden-layer network for MNIST classification, from graph definition through to a training loop.

### Step 1: Define the graph

```java
import org.nd4j.autodiff.samediff.SameDiff;
import org.nd4j.autodiff.samediff.SDVariable;
import org.nd4j.autodiff.samediff.TrainingConfig;
import org.nd4j.linalg.api.buffer.DataType;
import org.nd4j.linalg.learning.config.Adam;
import org.nd4j.weightinit.impl.XavierInitScheme;

SameDiff sd = SameDiff.create();

// Placeholders receive data at runtime
SDVariable input  = sd.placeHolder("input",  DataType.FLOAT, -1, 784);
SDVariable labels = sd.placeHolder("labels", DataType.FLOAT, -1, 10);

// Trainable parameters — Xavier-initialised
SDVariable w1 = sd.var("w1", new XavierInitScheme('c', 784, 256), DataType.FLOAT, 784, 256);
SDVariable b1 = sd.var("b1", DataType.FLOAT, 256);

SDVariable w2 = sd.var("w2", new XavierInitScheme('c', 256, 10), DataType.FLOAT, 256, 10);
SDVariable b2 = sd.var("b2", DataType.FLOAT, 10);

// Forward pass
SDVariable hidden  = sd.nn.relu("hidden",  input.mmul(w1).add(b1), 0);
SDVariable logits  = hidden.mmul(w2).add(b2);
SDVariable softmax = sd.nn.softmax("softmax", logits);

// Loss — cross-entropy averaged over the minibatch
SDVariable loss = sd.loss.softmaxCrossEntropy("loss", labels, logits, null);
```

### Step 2: Configure training

```java
TrainingConfig config = TrainingConfig.builder()
    .updater(new Adam(1e-3))
    .dataSetFeatureMapping("input")        // DataSet feature -> placeholder name
    .dataSetLabelMapping("labels")         // DataSet label   -> placeholder name
    .build();

sd.setTrainingConfig(config);
```

### Step 3: Train

```java
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;

DataSetIterator trainIter = /* your iterator */ null;
int numEpochs = 10;

sd.fit(trainIter, numEpochs);
```

### Step 4: Run inference

```java
import org.nd4j.linalg.api.ndarray.INDArray;
import java.util.Map;

INDArray testInput = /* your test batch */ null;
Map<String, INDArray> results = sd.output(
    Map.of("input", testInput),
    "softmax"
);
INDArray predictions = results.get("softmax");
```

## How the Graph Executes

When `sd.output()` is called, SameDiff internally uses an `InferenceSession` that:

1. Resolves which nodes need to be computed in order to produce the requested output variables.
2. Determines a valid topological execution order.
3. Evaluates each op in that order, passing intermediate results through the graph.
4. Returns the values of the requested output nodes.

Only the ops necessary to compute the requested outputs are evaluated — unreachable subgraphs are skipped.

## Graph Inspection

SameDiff provides several utilities for inspecting the graph you have built:

```java
// Print a summary of all variables and their types
sd.summary();

// List all variable names
List<String> varNames = sd.variableNames();

// Get a variable by name
SDVariable v = sd.getVariable("hidden");

// View the output shape of a variable (without executing)
long[] shape = sd.getShapeForVarName("hidden");
```

## Thread Safety and Multiple Graphs

Each `SameDiff` instance is a self-contained graph. You can have multiple `SameDiff` instances in the same JVM, but variables from one instance cannot be mixed with variables from another. All `SDVariable` objects carry a reference back to their owning `SameDiff`.

`SameDiff` instances are not thread-safe for concurrent mutation. For inference in a multi-threaded server environment, either synchronise access or keep a pool of separate `SameDiff` instances loaded from the same saved file.

## User-Defined Functions / Custom Ops (ADR 0023)

SameDiff lets you define your own operations — called User-Defined Functions (UDFs) — that participate in the graph the same way built-in ops do. UDFs are automatically saved and loaded with the graph, support gradient computation for training, and integrate with DSP compilation.

### When to Use UDFs

Use a UDF when you need an operation that has no counterpart in the built-in op catalogue, you want a custom backward pass for training, or you want to wrap a third-party kernel or hardware-specific routine as a first-class graph node.

### Annotating and Registering a UDF

Annotate your class with `@UserDefinedOp`. The annotation scanner discovers all annotated classes on the classpath and registers them with the op registry automatically, so they are available when a saved graph is reloaded.

```java
import org.nd4j.autodiff.samediff.udf.UserDefinedOp;
import org.nd4j.autodiff.samediff.udf.UserDefinedCustomOp;
import org.nd4j.autodiff.samediff.SDVariable;
import org.nd4j.autodiff.samediff.SameDiff;
import org.nd4j.linalg.api.buffer.DataType;
import org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.AddOp;
import org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.bp.AddBpOp;
import org.nd4j.linalg.api.shape.LongShapeDescriptor;
import org.nd4j.linalg.factory.Nd4j;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Map;

@UserDefinedOp                               // registers with the op registry
public class MyAddUdf extends UserDefinedCustomOp {

    // No-arg constructor required for save/load reconstruction
    public MyAddUdf() { super(); }

    public MyAddUdf(SameDiff sameDiff, SDVariable x, SDVariable y) {
        super(sameDiff, new SDVariable[]{x, y});
    }

    @Override
    public String opName() {
        return "my_add_udf";                 // unique name used in serialization
    }

    @Override
    public int getNumOutputs() {
        return 1;
    }

    @Override
    public List<DataType> calculateOutputDataTypes(List<DataType> inputTypes) {
        return Arrays.asList(inputTypes.get(0));
    }

    @Override
    public List<LongShapeDescriptor> calculateOutputShape() {
        return Arrays.asList(inputArguments.get(0).shapeDescriptor());
    }

    @Override
    public List<LongShapeDescriptor> calculateOutputShape(OpContext oc) {
        return Arrays.asList(oc.getInputArrays().get(0).shapeDescriptor());
    }

    @Override
    public void exec() {
        // Implement the forward pass using existing ND4J ops
        AddOp addOp = new AddOp();
        addOp.addInputArgument(inputArguments.get(0), inputArguments.get(1));
        Nd4j.getExecutioner().exec(addOp);
        this.outputArguments.addAll(addOp.outputArguments);
    }

    @Override
    public void exec(OpContext opContext) {
        Nd4j.getExecutioner().exec(new AddOp(), opContext);
    }

    @Override
    public List<SDVariable> doDiff(List<SDVariable> grad) {
        // Return one gradient per input — required for training
        return new AddBpOp(sameDiff, larg(), rarg(), grad.get(0)).outputs();
    }

    @Override
    public boolean isInplaceCall() { return false; }

    @Override
    public Map<String, Object> propertiesForFunction() { return Collections.emptyMap(); }

    @Override
    public void setPropertiesForFunction(Map<String, Object> props) {}

    @Override
    public void configureFromArguments() {}

    @Override
    public void configureWithSameDiff(SameDiff sameDiff) {
        this.sameDiff = sameDiff;
    }
}
```

### Using a UDF in a Graph

Pass an instantiated UDF to `sd.doUdf()`. It returns `SDVariable` outputs wired into the graph exactly like any built-in op.

```java
SameDiff sd = SameDiff.create();
SDVariable x = sd.placeHolder("x", DataType.FLOAT, -1, 128);
SDVariable y = sd.placeHolder("y", DataType.FLOAT, -1, 128);

// Register the UDF and wire it into the graph
SDVariable[] outputs = sd.doUdf(new MyAddUdf(sd, x, y));
SDVariable result = outputs[0];
SDVariable loss = result.mean("loss");

// The UDF is saved and loaded transparently
sd.save(new File("model_with_udf.sdnb"), false);
SameDiff reloaded = SameDiff.load(new File("model_with_udf.sdnb"), false);
```

### Optional Properties

If your UDF has configuration parameters (e.g., a threshold, kernel size), expose them via `propertiesForFunction()` and `setPropertiesForFunction()`. These are serialized alongside the graph structure:

```java
@Override
public Map<String, Object> propertiesForFunction() {
    return Map.of("threshold", threshold, "mode", mode);
}

@Override
public void setPropertiesForFunction(Map<String, Object> props) {
    this.threshold = (double) props.get("threshold");
    this.mode = (int) props.get("mode");
}
```

***

## Graph Tracing with `Nd4j.graphScope()`

`GraphScope` lets you write ordinary eager ND4J code and have it automatically traced into a SameDiff graph, compiled into a DSP plan, and executed in a single optimized pass — with plan caching so the compilation cost is paid only once.

This is useful when you have a performance-critical compute kernel written using standard `Nd4j.*` calls and want it to benefit from DSP's CUDA graph capture and graph optimization without rewriting it in the SameDiff define-and-run style.

### How It Works

1. `Nd4j.graphScope()` returns a `GraphScope` instance.
2. Calling `scope.begin()` starts tracing on the current thread.
3. Every `Nd4j.exec()` call intercepted during tracing records a `LazyINDArray` proxy instead of executing immediately. The proxy carries the inferred output shape and data type so it can be passed to subsequent ops.
4. `scope.end()` compiles the traced ops into a `DynamicShapePlan`, executes it, and materializes all `LazyINDArray` proxies with real values.
5. `scope.close()` (called automatically by `try`-with-resources) cleans up resources.

### Basic Usage

```java
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.factory.GraphScope;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.api.buffer.DataType;

INDArray a = Nd4j.rand(DataType.FLOAT, 128, 256);
INDArray b = Nd4j.rand(DataType.FLOAT, 256, 64);

INDArray mm;
INDArray out;

try (GraphScope scope = Nd4j.graphScope()) {
    scope.begin();

    mm  = Nd4j.matmul(a, b);          // traced — returns LazyINDArray
    out = Nd4j.nn.relu(mm);            // traced — wired from mm's proxy

    scope.end();                       // compiles + executes; mm and out are now real
}

// After scope.end(), out contains real computed values
System.out.println(out.shapeInfoToString());
```

During tracing, `mm` is a `LazyINDArray` — it knows its shape but contains no data. After `scope.end()`, it is a fully materialized `INDArray` backed by real GPU/CPU memory.

### Reusable Compiled Functions with `Nd4j.compile()`

For a function you want to call many times (e.g., a custom layer forward pass), use `Nd4j.compile()` to get a `CompiledGraphFunction`. The first call traces and compiles; subsequent calls with the same op sequence replay the cached DSP plan.

```java
import org.nd4j.linalg.factory.CompiledGraphFunction;

// Define a reusable function: inputs[0] = activations, inputs[1] = weights
CompiledGraphFunction fn = Nd4j.compile(2, inputs -> {
    INDArray mm      = Nd4j.matmul(inputs[0], inputs[1]);
    INDArray normed  = Nd4j.layerNorm(mm, null, null, true, 1);
    INDArray result  = Nd4j.nn.gelu(normed);
    return new INDArray[]{result};
});

// First call: traces and compiles the DSP plan
INDArray[] out1 = fn.execute(activations, weights);

// Subsequent calls: replays the cached plan — no recompilation
INDArray[] out2 = fn.execute(newActivations, newWeights);

fn.close();   // release compiled plan resources
```

### Key Classes

| Class                   | Role                                                                                                                                                     |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GraphScope`            | Trace-compile-replay coordinator. Use via `Nd4j.graphScope()`.                                                                                           |
| `LazyINDArray`          | Proxy returned during tracing. Carries shape/dtype metadata; data access before `scope.end()` throws. Delegates to the real array after materialization. |
| `CompiledGraphFunction` | Wraps a `GraphFunction` lambda with a persistent `GraphScope` and DSP plan cache. Use via `Nd4j.compile()`.                                              |

### Limitations

* Nested `GraphScope` traces on the same thread are not supported.
* Operations involving Java-side conditional logic (e.g., `if (someArray.getDouble(0) > 0.5)`) cannot be captured as graph nodes — only ND4J op calls are intercepted.
* If DSP compilation fails (e.g., due to control flow), `GraphScope` falls back to standard SameDiff interpreted execution automatically.

***

## Next Steps

* [Variables](https://github.com/KonduitAI/deeplearning4j-docs/blob/en-1.0.0-rewrite/docs/m2.1/nd4j/samediff/variables/README.md) — learn about `SDVariable` types (`VARIABLE`, `CONSTANT`, `PLACEHOLDER`, `ARRAY`), data types, and type conversion.
* [Operations](https://github.com/KonduitAI/deeplearning4j-docs/blob/en-1.0.0-rewrite/docs/m2.1/nd4j/samediff/operations/README.md) — explore the op namespaces: `sd.math`, `sd.nn`, `sd.cnn`, `sd.rnn`, `sd.loss`, `sd.random`.
* [Training](https://github.com/KonduitAI/deeplearning4j-docs/blob/en-1.0.0-rewrite/docs/m2.1/nd4j/samediff/training/README.md) — configure `TrainingConfig`, run `fit()`, and track progress with `History`.
* [Execution and Inference](https://github.com/KonduitAI/deeplearning4j-docs/blob/en-1.0.0-rewrite/docs/m2.1/nd4j/samediff/execution/README.md) — understand `sd.output()`, placeholder binding, and batch inference.
* [Serialization](https://github.com/KonduitAI/deeplearning4j-docs/blob/en-1.0.0-rewrite/docs/m2.1/nd4j/samediff/serialization/README.md) — save and load graphs with `sd.save()` and `SameDiff.load()`.