# Regularization & Overfitting

Week 3 | Lesson 2.3

### LEARNING OBJECTIVES

*After this lesson, you will be able to:*

- Explain the concepts of overfitting and under-fitting
- Understand Regularization
- Use scikit-learn to apply regularization
- Learn how to use cross-validation to tune the regularization parameters

### STUDENT PRE-WORK

*Before this lesson, you should already be able to:*

- Fit a linear model with scikit-learn
- Understand bias and variance

### INSTRUCTOR PREP

*Before this lesson, instructors will need to:*

- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
- Prepare any specific materials
- Provide students with additional resources

### STARTER CODE

### LESSON GUIDE

TIMING | TYPE | TOPIC |
---|---|---|

5 min | Opening | Opening |

10 min | Introduction | Regularization and Overfitting |

20 min | Demo | Regularization Demo |

20 min | Guided Practice | Ridge Cross-Validation |

25 min | Independent Practice | Boston Housing Data |

5 min | Conclusion | Conclusion |

## Opening (5 mins)

- Review squared error and/or bias-variance tradeoff
- Remind students about overfitting risks
- Discuss real world relevance of these topics -- we often want to fit a model well to many datasets rather than find the best model for a single data set.

**Check:** Ask students to define and explain recall bias and variance.

## Introduction: Regularization and Overfitting (10 mins)

In the lesson on bias and variance we saw examples of both *underfitting*, in
conjunction with bias, and *overfitting* as a companion to variance. Overfitting
is a big issue for anyone modeling data, especially with smaller data sets or
models with large parameters. When the number of parameters is large relative
to the amount of data the parameters can be over-tuned on the training data. When
we attempt to apply our model to new data we find that the fit is not as good.

There are many techniques to avoid overfitting. Obtaining more data is one way,
but sometimes it is very difficult or expensive to obtain more data, or there
is a time lag in the collection of data. *Regularization* involves imposing
a penalty on complex models and is another technique to avoid overfitting.

One way to understand regularization intuitively is in terms of *Occam's razor*,
which is the scientific heuristic that, all things being equal, a more simple
explanation is better. In modeling terms, we want the least complex model that
captures all the information in our data. If we have two models that explain our
data equally well and one of the models has fewer parameters then that model is
better, and less likely to overfit the data.

If you need to review under-fitting and overfitting more there is a good example on the scikit-learn website.

**Check:** Define under- and over-fitting intuitively. We've already discussed
principle to avoid both -- what was it?

Occam's Razor -- find the best fitting model with the least complexity.

## Demo: Regularization (20 mins)

Use the included Jupyter notebook for the demonstration. Walk through a demonstration of the code or some related concept together as a class.

**Check:** What is a ridge regression? How does it implement regularization?

## Guided Practice: Ridge Cross-Validation (20 mins)

Work through the guided practice section in the Jupyter notebook.

Check student work with these solutions

## Independent Practice: Boston Housing Data (25 minutes)

Work through the independent practice section in the Jupyter notebook.

Check student work with these solutions

## Conclusion (5 mins)

Takeaway messages for this lesson

- Regularization helps avoid overfitting by limiting model complexity
- Mathematically this works by penalizing models with greater complexity
- Regularized models will often fit alternate datasets better than a model that's been overfit on the training data

### ADDITIONAL RESOURCES

- Video on Regularization
- Some more examples of regularization with scikit-learn