Plotting with Pandas
Week 2 | Lesson 5.1
LEARNING OBJECTIVES
After this lesson, you will be able to:
- Generate bar charts
- Generate scatter plots
- Generate time series plots
INSTRUCTOR PREP
Before this lesson, instructors will need to:
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
LESSON GUIDE
TIMING | TYPE | TOPIC |
---|---|---|
5 min | Introduction | Plotting with Pandas |
20 min | Demo / Guided Practice | bar plots |
20 min | Demo / Guided Practice | scatter plots |
20 min | Demo / Guided Practice | time series plots |
20 min | Independent Practice | |
5 min | Conclusion |
Introduction: Plotting with Pandas (5 mins)
As we already learned in Week 1, there are several ways to plot: seaborne, plotly, and matplotlib. Right now we're going to focus on pandas df.plot, which utilizes matplotlib and pylab. It can accept a lot of parameters, which means you'll have a lot of control over what your plot can look like. Here is a look at all the parameters/arguments that df.plot can take.
Don't be overwhelmed, this is just to give you an idea of all the nuances you'll have control over, when you use df.plot.
DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)
Demo/Guided Practice: bar plots (20 mins)
Let's create a small random DataFrame and a bar plot:
dfBar = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
Now, let's plot it using df.plot:
dfBar.plot(kind='bar')
plt.show()
Just set 'stacked' to 'True' and you can make it into a stacked bar plot:
dfBar.plot(kind='bar', stacked=True)
plt.show()
To get horizontal bar plots, pass kind='barh'
:
dfBar.plot(kind='barh', stacked=True)
plt.show()
Demo/Guided Practice: scatter plots (20 mins)
You can create scatter plots with DataFrame.plot by passing kind='scatter'. Scatter plot requires numeric columns for x and y axis. These can be specified by x and y keywords each:
dfScatter = pd.DataFrame(np.random.randn(50, 4), columns=['a', 'b', 'c', 'd'])
df.plot(kind='scatter', x='a', y='b');
plt.show()
To plot multiple column groups in a single axes, repeat plot method specifying target 'ax'. It is recommended to specify color and label keywords to distinguish each groups.
ax = dfScatter.plot(kind='scatter', x='a', y='b',
color='Red', label='Group 1');
dfScatter.plot(kind='scatter', x='c', y='d',
color='Green', label='Group 2', ax=ax);
Demo/Guided Practice: time series plots (20 mins)
from datetime import datetime
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as pyplot
Create a small dataframe:
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994',
'2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071',
'2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592',
'2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109',
'2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'battle_deaths'])
print(df)
Convert df['date'] from string to datetime:
df['date'] = pd.to_datetime(df['date'])
Set df['date'] as the index and delete the column:
df.index = df['date']
del df['date']
df
Find the total value of battle_deaths per day:
df.resample('D', how='sum')
Plot of the total battle deaths per day:
df.resample('D', how='sum').plot()
Independent Practice: Topic (20 minutes)
Using the sales.csv data, do the following
- Create a stacked bar plot of Rep and Price
- Create a stacked bar plot of Rep and Quantity
Conclusion (5 mins)
As we saw in the introduction, df.plot can take a lot of parameters. Try adding some of them to the plots you created during independent practice.