Plotting with Pandas

Week 2 | Lesson 5.1


After this lesson, you will be able to:

  • Generate bar charts
  • Generate scatter plots
  • Generate time series plots


Before this lesson, instructors will need to:

  • Read in / Review any dataset(s) & starter/solution code
  • Generate a brief slide deck


5 min Introduction Plotting with Pandas
20 min Demo / Guided Practice bar plots
20 min Demo / Guided Practice scatter plots
20 min Demo / Guided Practice time series plots
20 min Independent Practice
5 min Conclusion

Introduction: Plotting with Pandas (5 mins)

As we already learned in Week 1, there are several ways to plot: seaborne, plotly, and matplotlib. Right now we're going to focus on pandas df.plot, which utilizes matplotlib and pylab. It can accept a lot of parameters, which means you'll have a lot of control over what your plot can look like. Here is a look at all the parameters/arguments that df.plot can take.

Don't be overwhelmed, this is just to give you an idea of all the nuances you'll have control over, when you use df.plot.

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)

Demo/Guided Practice: bar plots (20 mins)

Let's create a small random DataFrame and a bar plot:

dfBar = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])

Now, let's plot it using df.plot:


Just set 'stacked' to 'True' and you can make it into a stacked bar plot:

dfBar.plot(kind='bar', stacked=True)

To get horizontal bar plots, pass kind='barh':

dfBar.plot(kind='barh', stacked=True)

Demo/Guided Practice: scatter plots (20 mins)

You can create scatter plots with DataFrame.plot by passing kind='scatter'. Scatter plot requires numeric columns for x and y axis. These can be specified by x and y keywords each:

dfScatter = pd.DataFrame(np.random.randn(50, 4), columns=['a', 'b', 'c', 'd'])
df.plot(kind='scatter', x='a', y='b');

To plot multiple column groups in a single axes, repeat plot method specifying target 'ax'. It is recommended to specify color and label keywords to distinguish each groups.

ax = dfScatter.plot(kind='scatter', x='a', y='b',
             color='Red', label='Group 1');
dfScatter.plot(kind='scatter', x='c', y='d',
         color='Green', label='Group 2', ax=ax);

Demo/Guided Practice: time series plots (20 mins)

from datetime import datetime
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as pyplot

Create a small dataframe:

data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994',
'2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071',
'2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592',
'2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109',
'2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'battle_deaths'])

Convert df['date'] from string to datetime:

df['date'] = pd.to_datetime(df['date'])

Set df['date'] as the index and delete the column:

df.index = df['date']
del df['date']

Find the total value of battle_deaths per day:

df.resample('D', how='sum')

Plot of the total battle deaths per day:

df.resample('D', how='sum').plot()

Independent Practice: Topic (20 minutes)

Using the sales.csv data, do the following

  • Create a stacked bar plot of Rep and Price
  • Create a stacked bar plot of Rep and Quantity

Conclusion (5 mins)

As we saw in the introduction, df.plot can take a lot of parameters. Try adding some of them to the plots you created during independent practice.

results matching ""

    No results matching ""