Big Data Review: Case Study

Week 10 | Lesson 5.1

LEARNING OBJECTIVES

After this lesson, you will be able to:

explain how different big data tools are used to aggregate large quantities of weather data
explain the advantages of big data tools to a non-technical audience

STUDENT PRE-WORK

Before this lesson, you should already be able to:

perform queries in Hadoop / Hive
perform queries in SQL
build pipelines in Spark

INSTRUCTOR PREP

Before this lesson, instructors will need to:

Read in / Review any dataset(s) & starter/solution code
Generate a brief slide deck
Prepare any specific materials
Provide students with additional resources

STARTER CODE

Code Along

LESSON GUIDE

TIMING	TYPE	TOPIC
5 min	Opening	Opening
20 min	Guided-practice	Phase 1: Research
20 min	Guided	Phase 2: Discussion
20 min	Demo	Phase 3: Mashup
20 min	Demo	Phase 4: Presentation

Opening (5 min)

In this class we will analyze a case study of analysis of Real-Time and Archived NEXRAD Weather Data on AWS.

Our starting point is this article

We will work in 3 groups once more, and each group will focus on a particular technical aspect of the project.

Group 1 will focus on database backend
Group 2 will focus on real-time
Group 3 will focus on visualization

Phase 1: Research (20 min)

In this phase each group should read the article and then look for more information in particular concerning its specific focus.

Group 1 should look into the data storage tools proposed in the article and understand deeply the data format, size and types.
Group 2 should look into the tools for real-time analysis that are mentioned in the article and clearly understand what each one does. Also you should investigate what requirements there are.
Group 3 should look into the requirements for visualization. What is necessary? how big is the data to be visualized? What's the frequency? If we are assuming a nontechnical audience, how should you present the key findings?

Each group can choose how to perform this phase, either each on their own or in smaller subgroups.

Phase 2: Discussion (20 min)

Within your group you should discuss the results of the research phase and draft a document that details:

issues involved
solution proposed
intended audience
risks
limitations
benefits (in relation to audience)

Phase 3: Mashup (20 min)

Now each of the three groups should divide in half, each forming two subgroups: A and B.

All the A subgroups unite to form a new group
All the B subgroups unite to form a new group

The two newly formed mega-groups will contain experts from each of the 3 original groups. In this phase you have to share your findings and discuss how to implement the system.

In particular you should come up with a roadmap to implement the system including:

data you will access
databaset contraints, pros/cons
key visualizations

Is the system proposed in the article the best possible solution? can you suggest improvements?

Note that the system proposed in the article is implemented here using Java). Can you propose an alternative technology to achieve the same result?

Phase 4: Presentation (20 min)

Each of the 2 groups A and B gets 5 minutes to present their solution and their findings to the class.

The group playing the audience role should look for similarities and differences with their own implementation and ask clarifying questions when a difference is found.

5.1 Big Data Review Case Study