Intro to Pandas 1

Week 2 | Lesson 1.1

LEARNING OBJECTIVES

After this lesson, you will be able to:

  • Read a csv file using pandas
  • Viewing data: head, columns, values, describe
  • Selection: a single column, slicing by row, by position

STUDENT PRE-WORK

Before this lesson, you should already be able to:

  • Since we're using Anaconda, pandas should already be installed. But, make sure you have all the dependencies installed as well:
    • setuptools
    • NumPy: 1.7.1 or higher
    • python-dateutil: 1.5 or higher
    • pytz: needed for time zone support

STARTER CODE

Demo

INSTRUCTOR PREP

Before this lesson, instructors will need to:

  • Read in / Review any dataset(s) & starter/solution code
  • Generate a brief slide deck

LESSON GUIDE

TIMING TYPE TOPIC
5 min Introduction Pandas
10 min Demo / Guided Practice Read csv
25 min Demo / Guided Practice Viewing data: head/tail, describe
25 min Demo / Guided Practice Selection: a single column, slicing by row, by position
20 min Independent Practice
5 min Conclusion

Introduction: Topic (5 mins)

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas

Demo / Guided Practice: Topic (10 mins)

demo code

in iPython notebook type:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Let's read in a csv file and create a pandas dataframe.

df = pd.read_csv('sales.csv')

Check: This looks familiar...didn't we already learn how to read in csv files? Yes, but that was using Python without any libraries or packages. It took 5 lines of Python W1 L3.2, but using Pandas it only takes one line. Nice!

Demo / Guided Practice: Viewing data: head/tail, describe (25 mins)

Let's take a summary look at our data. First, let's look at the head and tail.

df.head(df)
df.tail(df)

Check: What can looking at the head and tail of a dataset tell us?

Let's take a look at summary statistics.

df.describe

This gives us: count, mean, std, min, 25%, 50%, 75%, and max. Awesome!

Check: What was the cautionary tale about relying too heavily on summary stats again?

Demo / Guided Practice: Selection: a single column, slicing by row, by position (25 mins)

Let's select a single column.

df['Account']

Check: How would you select the 'Quantity' and 'Price' columns separately?

Now, let's slice and select for certain rows.

df[0:3]

Check: How would you slice for rows 9 to 14?

Now, let's try selecting by position. First, let's slice some rows.

df.iloc[1:3, :]

Check: How would you slice for rows 9 to 14?

Now, let's slice some columns.

df.iloc[:,1:3]

Check: How would you slice for the 'Manager' and 'Product' columns?

Now, let's get an explicit value only.

df.iloc[1,1]

Independent Practice: Topic (20 minutes)

  • Read in this star wars survey csv
  • Look at its head, tail, and summary stats, what does this tell you about the dataset?
  • Select a certain column
  • Slice for a set of rows
  • Select a data point based on position

Bonus

  • Convert one data type to another in the star wars survey csv
  • Create a dummy variable for the yes and no answers

Conclusion (5 mins)

We read a csv file into a pandas dataframe with just one line of code. Compared to last week, when we used just used Python to read in a csv file, it took about 5 lines of code. Pandas is already making our data lives easier. We also took a look at how easy pandas makes it to get some general information about our dataset by looking at the head, tail, and summary stats. Lastly, we started to select and slice our dataset.

ADDITIONAL RESOURCES

results matching ""

    No results matching ""