Joins & Pandas
Week 2 | Lesson 4.3
LEARNING OBJECTIVES
After this lesson, you will be able to:
- Join data via concat
- Do left, right, inner, and outer joins
INSTRUCTOR PREP
Before this lesson, instructors will need to:
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
STARTER CODE
LESSON GUIDE
TIMING | TYPE | TOPIC |
---|---|---|
5 min | Introduction | Concatenate & Join |
10 min | Demo / Guided Practice | Concatenate |
25 min | Demo / Guided Practice | Left and right joins |
25 min | Demo / Guided Practice | Outer and inner joins |
20 min | Independent Practice | |
5 min | Conclusion |
Introduction: Concatenate & Joins (5 mins)
Concatenation is taking two or more separately located things and placing them side-by-side next to each other so that they can now be treated as one thing. In computer programming and data processing, two or more character strings are sometimes concatenated for the purpose of saving space or so that they can be addressed as a single item.
Joins using pandas happen when columns of two DataFrames are joined either on index or on a key column. Here is a representation of left, right, inner, and outer joins using Venn diagrams.
Check: In pairs draw a left, right, inner, and outer join. Then, show the other person and explain it to them.
Demo / Guided Practice: Left joins (12 mins)
Let's look at a simple example of concatenation. No need to do this yourself, just follow along.
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
Define the two frames as one variable, frames:
frames = [df1, df2]
Concatenate:
result = pd.concat(frames)
result
Demo / Guided Practice: Left joins (25 mins)
Let's create a small DataFrame to and try a left, right, inner, and outer join:
import pandas as pd
from IPython.display import display
from IPython.display import Image
raw_data = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a
Create a 2nd DataFrame:
raw_data = {
'subject_id': ['4', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b
Create a 3rd DataFrame:
raw_data = {
'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n
Merge with a left join produces a complete set of records from df_a, with the matching records (where available) in df_b. If there is no match, the right side will contain null:
pd.merge(df_a, df_b, on='subject_id', how='left')
Check What do you think a right join will result in?
Merge with a right join produces a complete set of records from df_b, with the matching records (where available) in df_a. If there is no match, the left side will contain null:
pd.merge(df_a, df_b, on='subject_id', how='right')
Demo / Guided Practice: Outer and inner joins (25 mins)
An outer join produces the set of all records in df_a and df_b, with matching records from both sides where available. If there is no match, the missing side will contain null:
pd.merge(df_a, df_b, on='subject_id', how='outer')
Check What do you think an inner join will produce?
An inner join produces only the set of records that match in both df_a and df_b:
pd.merge(df_a, df_b, on='subject_id', how='inner')
Independent Practice: Topic (20 minutes)
Here are two simple DataFrames:
df1 = pd.DataFrame([[1, 2, 3]])
df1
df2 = pd.DataFrame([[1, 7, 8],[4, 9, 9]], columns=[0, 3, 4])
df2
Do a left, right, inner, and outer join.
Bonus If you've used SQL before, joins are probably old hat to you. If so and you finish early, how is your neighbor doing? Remember, you might be quiz at joins, but they might be a whiz at linear regression. Sharing knowledge will make you both smarter.
Conclusion (5 mins)
Pair up and explain a left, a right, an inner, and an outer join to a partner.