Package-level declarations
Types
Adam optimizer. Based on research paper: https://arxiv.org/pdf/1412.6980
An optimizer that works by caching calculations during the forward pass and calculating gradients during the backward pass.
Allows an SinglePassOptimizer to delegate the learning rate to increase composability of different optimizers.
An optimizer that performs multiple passes over training data, updating the model parameters multiple times per epoch.
A Trainer will track the activations and derivatives of the model during the forward pass and provide them to the SinglePassOptimizer to update the model parameters.
Implemented per lecture slides: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
An optimizer that only performs a single pass over training data before updating model parameters.
Stochastic Gradient Descent (SGD) optimizer with adjustable learning rate.
This learning rate schedule begins at the initialLearningRate and decays exponentially until it reaches initialLearningRate / decayMax over decayPeriod epochs.