arrow-left
All pages
gitbookPowered by GitBook
1 of 1

Loading...

Updaters/Optimizers

Special algorithms for gradient descent.

hashtag
What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

hashtag
Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

hashtag
Available updaters

hashtag
NadamUpdater

The Nadam updater.

applyUpdater

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

hashtag
NesterovsUpdater

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater

Get the nesterov update

  • param gradient the gradient to get the update for

  • param iteration

  • return

hashtag
RmsPropUpdater

RMS Prop updates:

hashtag
AdaGradUpdater

Vectorized Learning Rate used per Connection Weight

Adapted from: See also

applyUpdater

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

  • param gradient the gradient to get learning rates for

  • param iteration

hashtag
AdaMaxUpdater

The AdaMax updater, a variant of Adam.

applyUpdater

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

hashtag
NoOpUpdater

NoOp updater: gradient updater that makes no changes to the gradient

hashtag
AdamUpdater

The Adam updater.

applyUpdater

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for

  • param iteration

  • return the gradient

hashtag
AdaDeltaUpdater

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

Get the updated gradient for the given gradient and also update the state of ada delta.

  • param gradient the gradient to get the updated gradient for

  • param iteration

  • return the update gradient

hashtag
SgdUpdater

SGD updater applies a learning rate only

hashtag
GradientUpdater

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.

hashtag
AMSGradUpdater

The AMSGrad updater Reference: On the Convergence of Adam and Beyond -

[source]arrow-up-right
https://arxiv.org/pdf/1609.04747.pdfarrow-up-right
[source]arrow-up-right
[source]arrow-up-right
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdfarrow-up-right
http://cs231n.github.io/neural-networks-3/#adaarrow-up-right
[source]arrow-up-right
http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descentarrow-up-right
http://cs231n.github.io/neural-networks-3/#adaarrow-up-right
[source]arrow-up-right
http://arxiv.org/abs/1412.6980arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
http://arxiv.org/abs/1412.6980arrow-up-right
[source]arrow-up-right
http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdfarrow-up-right
https://arxiv.org/pdf/1212.5701v1.pdfarrow-up-right
[source]arrow-up-right
[source]arrow-up-right
[source]arrow-up-right
https://openreview.net/forum?id=ryQu7f-RZarrow-up-right
ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();
public void applyUpdater(INDArray gradient, int iteration, int epoch)
public void applyUpdater(INDArray gradient, int iteration, int epoch)
public void applyUpdater(INDArray gradient, int iteration, int epoch)
public void applyUpdater(INDArray gradient, int iteration, int epoch)
public void applyUpdater(INDArray gradient, int iteration, int epoch)
public void applyUpdater(INDArray gradient, int iteration, int epoch)