Intro to Pandas 1

Week 2 | Lesson 1.1

LEARNING OBJECTIVES

After this lesson, you will be able to:

Read a csv file using pandas
Viewing data: head, columns, values, describe
Selection: a single column, slicing by row, by position

STUDENT PRE-WORK

Before this lesson, you should already be able to:

Since we're using Anaconda, pandas should already be installed. But, make sure you have all the dependencies installed as well:
- setuptools
- NumPy: 1.7.1 or higher
- python-dateutil: 1.5 or higher
- pytz: needed for time zone support

STARTER CODE

Demo

INSTRUCTOR PREP

Before this lesson, instructors will need to:

Read in / Review any dataset(s) & starter/solution code
Generate a brief slide deck

LESSON GUIDE

TIMING	TYPE	TOPIC
5 min	Introduction	Pandas
10 min	Demo / Guided Practice	Read csv
25 min	Demo / Guided Practice	Viewing data: head/tail, describe
25 min	Demo / Guided Practice	Selection: a single column, slicing by row, by position
20 min	Independent Practice
5 min	Conclusion

Introduction: Topic (5 mins)

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas

Demo / Guided Practice: Topic (10 mins)

demo code

in iPython notebook type:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Let's read in a csv file and create a pandas dataframe.

df = pd.read_csv('sales.csv')

Check: This looks familiar...didn't we already learn how to read in csv files? Yes, but that was using Python without any libraries or packages. It took 5 lines of Python W1 L3.2, but using Pandas it only takes one line. Nice!

Demo / Guided Practice: Viewing data: head/tail, describe (25 mins)

Let's take a summary look at our data. First, let's look at the head and tail.

df.head(df)

df.tail(df)

Check: What can looking at the head and tail of a dataset tell us?

Let's take a look at summary statistics.

df.describe

This gives us: count, mean, std, min, 25%, 50%, 75%, and max. Awesome!

Check: What was the cautionary tale about relying too heavily on summary stats again?

Demo / Guided Practice: Selection: a single column, slicing by row, by position (25 mins)

Let's select a single column.

df['Account']

Check: How would you select the 'Quantity' and 'Price' columns separately?

Now, let's slice and select for certain rows.

df[0:3]

Check: How would you slice for rows 9 to 14?

Now, let's try selecting by position. First, let's slice some rows.

df.iloc[1:3, :]

Check: How would you slice for rows 9 to 14?

Now, let's slice some columns.

df.iloc[:,1:3]

Check: How would you slice for the 'Manager' and 'Product' columns?

Now, let's get an explicit value only.

df.iloc[1,1]

Independent Practice: Topic (20 minutes)

Read in this star wars survey csv
Look at its head, tail, and summary stats, what does this tell you about the dataset?
Select a certain column
Slice for a set of rows
Select a data point based on position

Bonus

Convert one data type to another in the star wars survey csv
Create a dummy variable for the yes and no answers

Conclusion (5 mins)

We read a csv file into a pandas dataframe with just one line of code. Compared to last week, when we used just used Python to read in a csv file, it took about 5 lines of code. Pandas is already making our data lives easier. We also took a look at how easy pandas makes it to get some general information about our dataset by looking at the head, tail, and summary stats. Lastly, we started to select and slice our dataset.

1.1 Pandas pt.1