Deeplearning4j
Community ForumND4J JavadocDL4J Javadoc
EN 1.0.0-beta7
EN 1.0.0-beta7
  • Eclipse DeepLearning4J
  • Getting Started
    • Quickstart
      • Untitled
    • Tutorials
      • Quickstart with MNIST
      • MultiLayerNetwork And ComputationGraph
      • Logistic Regression
      • Built-in Data Iterators
      • Feed Forward Networks
      • Basic Autoencoder
      • Advanced Autoencoder
      • Convolutional Networks
      • Recurrent Networks
      • Early Stopping
      • Layers and Preprocessors
      • Hyperparameter Optimization
      • Using Multiple GPUs
      • Clinical Time Series LSTM
      • Sea Temperature Convolutional LSTM
      • Sea Temperature Convolutional LSTM 2
      • Instacart Multitask Example
      • Instacart Single Task Example
      • Cloud Detection Example
    • Core Concepts
    • Cheat Sheet
    • Examples Tour
    • Deep Learning Beginners
    • Build from Source
    • Contribute
      • Eclipse Contributors
    • Benchmark Guide
    • About
    • Release Notes
  • Configuration
    • Backends
      • CPU and AVX
      • cuDNN
      • Performance Issues
    • Memory Management
      • Memory Workspaces
    • Snapshots
    • Maven
    • SBT, Gradle, & Others
  • Models
    • Autoencoders
    • Multilayer Network
    • Computation Graph
    • Convolutional Neural Network
    • Recurrent Neural Network
    • Layers
    • Vertices
    • Iterators
    • Listeners
    • Custom Layers
    • Model Persistence
    • Activations
    • Updaters
  • Model Zoo
    • Overview
    • Zoo Models
  • ND4J
    • Overview
    • Quickstart
    • Basics
    • Elementwise Operations
    • Matrix Manipulation
    • Syntax
    • Tensors
  • SAMEDIFF
    • Importing TensorFlow models
    • Variables
    • Ops
    • Adding Ops
  • ND4J & SameDiff Ops
    • Overview
    • Bitwise
    • Linalg
    • Math
    • Random
    • BaseOps
    • CNN
    • Image
    • Loss
    • NN
    • RNN
  • Tuning & Training
    • Evaluation
    • Visualization
    • Trouble Shooting
    • Early Stopping
    • t-SNE Visualization
    • Transfer Learning
  • Keras Import
    • Overview
    • Get Started
    • Supported Features
      • Activations
      • Losses
      • Regularizers
      • Initializers
      • Constraints
      • Optimizers
    • Functional Model
    • Sequential Model
    • Custom Layers
    • API Reference
      • Core Layers
      • Convolutional Layers
      • Embedding Layers
      • Local Layers
      • Noise Layers
      • Normalization Layers
      • Pooling Layers
      • Recurrent Layers
      • Wrapper Layers
      • Advanced Activations
  • DISTRIBUTED DEEP LEARNING
    • Introduction/Getting Started
    • Technical Explanation
    • Spark Guide
    • Spark Data Pipelines Guide
    • API Reference
    • Parameter Server
  • Arbiter
    • Overview
    • Layer Spaces
    • Parameter Spaces
  • Datavec
    • Overview
    • Records
    • Reductions
    • Schema
    • Serialization
    • Transforms
    • Analysis
    • Readers
    • Conditions
    • Executors
    • Filters
    • Operations
    • Normalization
    • Visualization
  • Language Processing
    • Overview
    • Word2Vec
    • Doc2Vec
    • Sentence Iteration
    • Tokenization
    • Vocabulary Cache
  • Mobile (Android)
    • Setup
    • Tutorial: First Steps
    • Tutorial: Classifier
    • Tutorial: Image Classifier
    • FAQ
    • Press
    • Support
    • Why Deep Learning?
Powered by GitBook
On this page
  • What are updaters?
  • Usage
  • Available updaters
  • NadamUpdater
  • NesterovsUpdater
  • RmsPropUpdater
  • AdaGradUpdater
  • AdaMaxUpdater
  • NoOpUpdater
  • AdamUpdater
  • AdaDeltaUpdater
  • SgdUpdater
  • GradientUpdater
  • AMSGradUpdater

Was this helpful?

Edit on Git
Export as PDF
  1. Models

Updaters

Special algorithms for gradient descent.

PreviousActivationsNextOverview

Last updated 5 years ago

Was this helpful?

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();

Available updaters

NadamUpdater

The Nadam updater. https://arxiv.org/pdf/1609.04747.pdf

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NesterovsUpdater

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Get the nesterov update

  • param gradient the gradient to get the update for

  • param iteration

  • return

RmsPropUpdater

RMS Prop updates:

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf http://cs231n.github.io/neural-networks-3/#ada

AdaGradUpdater

Vectorized Learning Rate used per Connection Weight

Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent See also http://cs231n.github.io/neural-networks-3/#ada

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

  • param gradient the gradient to get learning rates for

  • param iteration

AdaMaxUpdater

The AdaMax updater, a variant of Adam. http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

NoOpUpdater

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

The Adam updater. http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

AdaDeltaUpdater

http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf https://arxiv.org/pdf/1212.5701v1.pdf

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Get the updated gradient for the given gradient and also update the state of ada delta.

  • param gradient the gradient to get the updated gradient for

  • param iteration

  • return the update gradient

SgdUpdater

SGD updater applies a learning rate only

GradientUpdater

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

AMSGradUpdater

The AMSGrad updater Reference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ

[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]
[source]