Spark Case Studies

Week 10 | Lesson 3.4

LEARNING OBJECTIVES

After this lesson, you will be able to:

  • describe a real world application of Spark streaming
  • describe a real world application of Spark MLLib

STUDENT PRE-WORK

Before this lesson, you should already be able to:

  • Build simple pipelines in Spark
  • Build machine learning models in Spark
  • Explain what's a transformation in Spark
  • Explain the difference between Spark local and Spark cluster modes

INSTRUCTOR PREP

Before this lesson, instructors will need to:

  • Read in / Review any dataset(s) & starter/solution code
  • Generate a brief slide deck
  • Prepare any specific materials
  • Provide students with additional resources

STARTER CODE

Code Along

LESSON GUIDE

TIMING TYPE TOPIC
5 min Opening Opening
20 min Guided-practice Phase 1: Research
20 min Guided-practice Phase 2: Discussion
20 min Guided-practice Phase 3: Prepare to present
20 min Guided-practice Phase 4: Presentation

Opening (5 min)

This is going to be a highly interactive class. We'll work in 3 groups and we will do research on case studies.

Phase 1: Research (20 min)

Here are 3 articles detailing case studies for Spark application:

There is one article for each group. In this first phase each of you will work independently.

Use the first 10 minutes to read the article assigned to your group, and the next 10 minutes to compile a list of questions and points you have found interesting.

In particular, try to understand the following:

  1. what is the problem the business is trying to solve?
  2. what is the proposed solution?
  3. what are the advantages obtained by using Spark?
  4. what may be some of the drawbacks/limitations?

Highlight any things you don't understand.

Instructor note: there are a few key points in this activity.

On the technical side, students should understand that

  • Spark is a great replacement for Map Reduce
  • Spark allows both ETL and ML under the same umbrella

On the business side, it's important that they start to:

  • Critically dissect a data solution to understand how it's built
  • Learn to reason about pros/cons of a data solution

Phase 2: Discussion (20 min)

Now that you have formed a personal opinion and highlighted some of the issues, it's time to use the collective wisdom to fill the gaps.

In your group, discuss the points you've highlighted in the first part. Proceed as follows:

  • In turns, each person will select the top issue they would like to discuss and propose it to the group.
  • Then the group has a maximum of 5 minutes to discuss the issue
  • Then the next person will propose the next issue

Phase 3: Prepare to present (20 min)

Now that you have a thorough understanding of the case study, prepare a few slides to explain it to the rest of the class.

Use the same questions of Phase 1 as guide:

  1. what is the problem the business is trying to solve?
  2. what is the proposed solution?
  3. what are the advantages obtained by using Spark?
  4. what may be some of the drawbacks/limitations?

Aim to present for no more than 5 minutes. Elect a spokesperson for the group.

Instructor note: allocate 20 minutes total for this, in case there are questions.

Phase 4: Presentation (20 min)

It's time to present your findings to the rest of the class. Each groups gets a 5 minutes slot to explain their case to the class.

ADDITIONAL RESOURCES

results matching ""

    No results matching ""