All credits go to Kevin P. Murphy (Machine Learning A Probabilistic Perspective)
But Linear Regression is not Robust to outliers.
=> Use a distribution that assigns higher likelihood to outliers, without having to perturb the straight line to "explain" them. (Laplace distribution) The robustness arises from using absolute value for residuals instead of squared residuals.
Problem with ML estimation is that it can overfit, because it is picking the parameter values that are the best for modelling the training data, but if the data is noisy, such parameters often result in complex functions.
Adding the Gaussian prior to the parameters of a model to encourage them to be small is called l2 regularization or weight decay. By penalizing the sum of the magnitudes of the weights, we ensure the function is simple. In the figure below, we see that increasing lambda results in smoother functions and the resulting coefficients also become smaller.
Lambda shows how much the model is constrained.
As we increase lambda:
- the error on the training set increases
- The test set has the characteristic U-shaped curve, where the model overfits and then underfits.
We can use cross validation to pick lambda.
Regularization is the most common way to avoid overfitting. Another way is to use lots of data. The more training data we have, the better we can learn.
No comments:
Post a Comment