Regression Metrics & Loss Functions

Week 3 | Lesson 3.1

LEARNING OBJECTIVES

After this lesson, you will be able to:

• Explain and Use RMSA (Root Mean Squared Error)
• Explain and Use MAE (Mean Absolute Error)
• Compute these regression metrics with scikit-learn
• Fit a Least Absolute Deviations line to data with statsmodels

STUDENT PRE-WORK

Before this lesson, you should already be able to:

• Fit a linear regression with scikit-learn
• Compute the sum of squared errors
• Understand outliers

INSTRUCTOR PREP

Before this lesson, instructors will need to:

• Read in / Review any dataset(s) & starter/solution code
• Generate a brief slide deck
• Prepare any specific materials
• Provide students with additional resources

Demo

LESSON GUIDE

TIMING TYPE TOPIC
5 min Opening Opening
10 min Introduction Loss functions
15 min Demo RMSE and MAE
25 min Guided Practice Real Data is Noisy
25 min Independent Practice Topic description
5 min Conclusion Conclusion

Opening (5 mins)

• Review prior labs/homework, upcoming projects, or exit tickets, when applicable
• Review lesson objectives
• Discuss real world relevance of these topics
• Relate topics to the Data Science Workflow - i.e. are these concepts typically used to acquire, parse, clean, mine, refine, model, present, or deploy?

Check: Ask students to define, explain, or recall outliers, sum of squared errors, loss functions.

Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.

Introduction: Loss functions (10 mins)

Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.

Check: What's the difference between MAE and RMSE?

Demo: RMSE and MAE (15 mins)

Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.

You can do this demo or have the students walk through it in groups.

Compute the RMSE and MAE of the sample data set by hand. Compare the size of the terms. Add in the outlier and repeat.

Check: Which regression metric is more affected by the outlier?

Guided Practice: Real Data is Noisy (25 mins)

Use the included Jupyter notebook for the entire lesson, including the guided and independent practice.

In real world data sets there is usually a lot of noise, from many sources. For example, there are errors from lack of precision in measurement instruments, errors due to data entry mistakes, and many others. Can you think of any more?

In the guided and independent practice, we'll see how MAE and RMSE perform on noisy datasets.

Independent Practice: Explore Scenarios (20 minutes)

Run through the starter code here

Grab the solution code here.

Check: Were students able to create the desired deliverable(s)? Did it meet requirements / constraints?

Conclusion (# mins)

• Review any independent practice deliverable(s)
• Recap topic(s) covered