Implemented per lecture slides: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Creates training batches out of the given cases.
Updates the parameters of the model based on the outputs computed during the forward pass.