Deeplearning4j
Community ForumND4J JavadocDL4J Javadoc
EN 1.0.0-M2
EN 1.0.0-M2
  • Deeplearning4j Suite Overview
  • Release Notes
    • 1.0.0-M2
    • 1.0.0-M1.1
    • 1.0.0-M1
    • 1.0.0-beta7
    • 1.0.0-beta6
    • 1.0.0-beta5
    • 1.0.0-beta4
    • 1.0.0-beta3
    • 1.0.0-beta2
    • 1.0.0-beta
    • 1.0.0-alpha
    • 0.9.1
    • 0.9.0
    • 0.8.0
    • 0.7.2
    • 0.7.1
    • 0.7.0
    • 0.6.0
    • 0.5.0
    • 0.4.0
    • 1.00-M2.2
  • Multi-Project
    • Tutorials
      • Beginners
      • Quickstart
    • How To Guides
      • Import in to your favorite IDE
      • Contribute
        • Eclipse Contributors
      • Developer Docs
        • Github Actions/Build Infra
        • Javacpp
        • Release
        • Testing
      • Build From Source
      • Benchmark
      • Beginners
    • Reference
      • Examples Tour
    • Explanation
      • The core workflow
      • Configuration
        • Backends
          • Performance Issues
          • CPU
          • Cudnn
        • Memory
          • Workspaces
      • Build Tools
      • Snapshots
      • Maven
  • Deeplearning4j
    • Tutorials
      • Quick Start
      • Language Processing
        • Doc2Vec
        • Sentence Iterator
        • Tokenization
        • Vocabulary Cache
    • How To Guides
      • Custom Layers
      • Keras Import
        • Functional Models
        • Sequential Models
        • Custom Layers
        • Keras Import API Overview
          • Advanced Activations
          • Convolutional Layers
          • Core Layers
          • Embedding Layers
          • Local Layers
          • Noise Layers
          • Normalization Layers
          • Pooling Layers
          • Recurrent Layers
          • Wrapper Layers
        • Supported Features Overview
          • Activations
          • Constraints
          • Initializers
          • Losses
          • Optimizers
          • Regularizers
      • Tuning and Training
        • Visualization
        • Troubleshooting Training
        • Early Stopping
        • Evaluation
        • Transfer Learning
    • Reference
      • Model Zoo
        • Zoo Models
      • Activations
      • Auto Encoders
      • Computation Graph
      • Convolutional Layers
      • DataSet Iterators
      • Layers
      • Model Listeners
      • Saving and Loading Models
      • Multi Layer Network
      • Recurrent Layers
      • Updaters/Optimizers
      • Vertices
      • Word2vec/Glove/Doc2Vec
    • Explanation
  • datavec
    • Tutorials
      • Overview
    • How To Guides
    • Reference
      • Analysis
      • Conditions
      • Executors
      • Filters
      • Normalization
      • Operations
      • Transforms
      • Readers
      • Records
      • Reductions
      • Schemas
      • Serialization
      • Visualization
    • Explanation
  • Nd4j
    • Tutorials
      • Quickstart
    • How To Guides
      • Other Framework Interop
        • Tensorflow
        • TVM
        • Onnx
      • Matrix Manipulation
      • Element wise Operations
      • Basics
    • Reference
      • Op Descriptor Format
      • Tensor
      • Syntax
    • Explanation
  • Samediff
    • Tutorials
      • Quickstart
    • How To Guides
      • Importing Tensorflow
      • Adding Operations
        • codegen
    • Reference
      • Operation Namespaces
        • Base Operations
        • Bitwise
        • CNN
        • Image
        • LinAlg
        • Loss
        • Math
        • NN
        • Random
        • RNN
      • Variables
    • Explanation
      • Model Import Framework
  • Libnd4j
    • How To Guides
      • Building on Windows
      • Building for raspberry pi or Jetson Nano
      • Building on ios
      • How to Add Operations
      • How to Setup CLion
    • Reference
      • Understanding graph execution
      • Overview of working with libnd4j
      • Helpers Overview (CUDNN, OneDNN,Armcompute)
    • Explanation
  • Python4j
    • Tutorials
      • Quickstart
    • How To Guides
      • Write Python Script
    • Reference
      • Python Types
      • Python Path
      • Garbage Collection
      • Python Script Execution
    • Explanation
  • Spark
    • Tutorials
      • DL4J on Spark Quickstart
    • How To Guides
      • How To
      • Data How To
    • Reference
      • Parameter Server
      • Technical Reference
    • Explanation
      • Spark API Reference
  • codegen
Powered by GitBook
On this page
  • What are updaters?
  • Usage
  • Available updaters
  • NadamUpdater
  • NesterovsUpdater
  • RmsPropUpdater
  • AdaGradUpdater
  • AdaMaxUpdater
  • NoOpUpdater
  • AdamUpdater
  • AdaDeltaUpdater
  • SgdUpdater
  • GradientUpdater
  • AMSGradUpdater

Was this helpful?

Export as PDF
  1. Deeplearning4j
  2. Reference

Updaters/Optimizers

Special algorithms for gradient descent.

PreviousRecurrent LayersNextVertices

Last updated 3 years ago

Was this helpful?

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();

Available updaters

NadamUpdater

The Nadam updater.

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NesterovsUpdater

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the nesterov update

  • param gradient the gradient to get the update for

  • param iteration

  • return

RmsPropUpdater

RMS Prop updates:

AdaGradUpdater

Vectorized Learning Rate used per Connection Weight

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

  • param gradient the gradient to get learning rates for

  • param iteration

AdaMaxUpdater

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NoOpUpdater

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

AdaDeltaUpdater

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the updated gradient for the given gradient and also update the state of ada delta.

  • param gradient the gradient to get the updated gradient for

  • param iteration

  • return the update gradient

SgdUpdater

SGD updater applies a learning rate only

GradientUpdater

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

AMSGradUpdater

Adapted from: See also

The AdaMax updater, a variant of Adam.

The Adam updater.

The AMSGrad updater Reference: On the Convergence of Adam and Beyond -

[source]
https://arxiv.org/pdf/1609.04747.pdf
[source]
[source]
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
http://cs231n.github.io/neural-networks-3/#ada
[source]
http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent
http://cs231n.github.io/neural-networks-3/#ada
[source]
http://arxiv.org/abs/1412.6980
[source]
[source]
http://arxiv.org/abs/1412.6980
[source]
http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf
https://arxiv.org/pdf/1212.5701v1.pdf
[source]
[source]
[source]
https://openreview.net/forum?id=ryQu7f-RZ