Mean Squared Error.
Mean Squared Error.
This is similar to the SSE with normalization according to dataset-size
loss: E(w) = 1/2m sum{ (wTx - y)**2 } gradient: dE(w) = 1/m sum{ (wTx - y) *x }
Root-Mean-Square Error (RMSE).
Root-Mean-Square Error (RMSE).
The root ensures that RMSE is measures on the same scale (and in the same units) as the target variable. This error function is usually used for evaluation and therefore we don't provide the gradient. For trainig the regular MSE works just as well.
loss: E(w) = sqrt { MSE } gradient: dE(w) = ???
Sum of Squares Error
Sum of Squares Error
loss: E(w) = 1/2 sum{ (wTx - y)**2 } gradient: dE(w) = sum{ (wTx - y) *x }