Spark Case Studies
Week 10 | Lesson 3.4
LEARNING OBJECTIVES
After this lesson, you will be able to:
- describe a real world application of Spark streaming
- describe a real world application of Spark MLLib
STUDENT PRE-WORK
Before this lesson, you should already be able to:
- Build simple pipelines in Spark
- Build machine learning models in Spark
- Explain what's a transformation in Spark
- Explain the difference between Spark local and Spark cluster modes
INSTRUCTOR PREP
Before this lesson, instructors will need to:
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
- Prepare any specific materials
- Provide students with additional resources
STARTER CODE
LESSON GUIDE
TIMING | TYPE | TOPIC |
---|---|---|
5 min | Opening | Opening |
20 min | Guided-practice | Phase 1: Research |
20 min | Guided-practice | Phase 2: Discussion |
20 min | Guided-practice | Phase 3: Prepare to present |
20 min | Guided-practice | Phase 4: Presentation |
Opening (5 min)
This is going to be a highly interactive class. We'll work in 3 groups and we will do research on case studies.
Phase 1: Research (20 min)
Here are 3 articles detailing case studies for Spark application:
There is one article for each group. In this first phase each of you will work independently.
Use the first 10 minutes to read the article assigned to your group, and the next 10 minutes to compile a list of questions and points you have found interesting.
In particular, try to understand the following:
- what is the problem the business is trying to solve?
- what is the proposed solution?
- what are the advantages obtained by using Spark?
- what may be some of the drawbacks/limitations?
Highlight any things you don't understand.
Instructor note: there are a few key points in this activity.
On the technical side, students should understand that
- Spark is a great replacement for Map Reduce
- Spark allows both ETL and ML under the same umbrella
On the business side, it's important that they start to:
- Critically dissect a data solution to understand how it's built
- Learn to reason about pros/cons of a data solution
Phase 2: Discussion (20 min)
Now that you have formed a personal opinion and highlighted some of the issues, it's time to use the collective wisdom to fill the gaps.
In your group, discuss the points you've highlighted in the first part. Proceed as follows:
- In turns, each person will select the top issue they would like to discuss and propose it to the group.
- Then the group has a maximum of 5 minutes to discuss the issue
- Then the next person will propose the next issue
Phase 3: Prepare to present (20 min)
Now that you have a thorough understanding of the case study, prepare a few slides to explain it to the rest of the class.
Use the same questions of Phase 1 as guide:
- what is the problem the business is trying to solve?
- what is the proposed solution?
- what are the advantages obtained by using Spark?
- what may be some of the drawbacks/limitations?
Aim to present for no more than 5 minutes. Elect a spokesperson for the group.
Instructor note: allocate 20 minutes total for this, in case there are questions.
Phase 4: Presentation (20 min)
It's time to present your findings to the rest of the class. Each groups gets a 5 minutes slot to explain their case to the class.