Regression Metrics & Loss Functions
Week 3 | Lesson 3.1
LEARNING OBJECTIVES
After this lesson, you will be able to:
- Explain and Use RMSA (Root Mean Squared Error)
- Explain and Use MAE (Mean Absolute Error)
- Compute these regression metrics with scikit-learn
- Fit a Least Absolute Deviations line to data with statsmodels
STUDENT PRE-WORK
Before this lesson, you should already be able to:
- Fit a linear regression with scikit-learn
- Compute the sum of squared errors
- Understand outliers
INSTRUCTOR PREP
Before this lesson, instructors will need to:
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
- Prepare any specific materials
- Provide students with additional resources
STARTER CODE
LESSON GUIDE
TIMING | TYPE | TOPIC |
---|---|---|
5 min | Opening | Opening |
10 min | Introduction | Loss functions |
15 min | Demo | RMSE and MAE |
25 min | Guided Practice | Real Data is Noisy |
25 min | Independent Practice | Topic description |
5 min | Conclusion | Conclusion |
Opening (5 mins)
- Review prior labs/homework, upcoming projects, or exit tickets, when applicable
- Review lesson objectives
- Discuss real world relevance of these topics
- Relate topics to the Data Science Workflow - i.e. are these concepts typically used to acquire, parse, clean, mine, refine, model, present, or deploy?
Check: Ask students to define, explain, or recall outliers, sum of squared errors, loss functions.
Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.
Introduction: Loss functions (10 mins)
Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.
Check: What's the difference between MAE and RMSE?
Demo: RMSE and MAE (15 mins)
Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.
You can do this demo or have the students walk through it in groups.
Compute the RMSE and MAE of the sample data set by hand. Compare the size of the terms. Add in the outlier and repeat.
Check: Which regression metric is more affected by the outlier?
Guided Practice: Real Data is Noisy (25 mins)
Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.
In real world data sets there is usually a lot of noise, from many sources. For example, there are errors from lack of precision in measurement instruments, errors due to data entry mistakes, and many others. Can you think of any more?
In the guided and independent practice, we'll see how MAE and RMSE perform on noisy datasets.
Independent Practice: Explore Scenarios (20 minutes)
Run through the starter code here
Grab the solution code here.
Check: Were students able to create the desired deliverable(s)? Did it meet requirements / constraints?
Conclusion (# mins)
- Review any independent practice deliverable(s)
- Recap topic(s) covered