More SQL!!

Week 5 | Lesson 3.1

LEARNING OBJECTIVES

After this lesson, you will be able to:

  • Sort results by a column using ORDER BY
  • Simplify our syntax using aliases (AS)
  • Match patterns using LIKE
  • Select distinct items using DISTINCT
  • Aggregate values using GROUP BY
  • Filter on aggregations using HAVING

STUDENT PRE-WORK

Before this lesson, you should already be able to:

  • Connect to a local or remote relational database
  • Add and remove data, create new tables, alter table schemas
  • Perform simple queries using SQL language

INSTRUCTOR PREP

Before this lesson, instructors will need to:

  • Read in / Review any dataset(s) & starter/solution code
  • Generate a brief slide deck
  • Prepare any specific materials
  • Provide students with additional resources
  • Make sure the GA copy of the northwind database is available at:
      psql -h dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com -p 5432 -U dsi_student northwind
      password: gastudents
    

LESSON GUIDE

TIMING TYPE TOPIC
5 mins Opening Opening
10 mins Introduction More SQL
10 mins Guided-practice Guided Practice: ORDER BY
10 mins Guided-practice Guided Practice: Aliases
10 mins Guided-practice Guided Practice: SQL LIKE Operator
10 mins Guided-practice Guided Practice: DISTINCT and LIMIT Operators
10 mins Guided-practice Guided Practice: GROUP BY Operator
10 mins Guided-practice Guided Practice: HAVING Operator
10 mins Ind-practice Independent Practice
5 mins Conclusion Conclusion

Opening (5 mins)

We have seen how to connect to a local sqlite database and to a remote postgresql database.

Check: What SQL commands have we learned so far?

Answer:

  • CREATE
  • INSERT
  • DELETE
  • UPDATE
  • SELECT

Check: What different commands have we learned for SQLite and PostgreSQL?

Answer:

  • How to list schema and tables

More SQL (10 mins)

In this lecture we'll learn a few more useful SQL commands that will give us the ability to perform more complex queries.

In particular we will learn to:

  • Sort results by a column using ORDER BY
  • Simplify our syntax using aliases
  • Match patterns using LIKE
  • Select distinct items using DISTINCT
  • Aggregate values using GROUP BY
  • Filter on aggregations using HAVING

For all the following example we will use the well-known Northwind sample database.

We have imported this database into our PostgreSQL instance that you can connect to as follows:

psql -h dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com -p 5432 -U dsi_student northwind
password: gastudents

We will use a few tables from this database:

customers:

CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax
ALFKI Alfreds Futterkiste Maria Anders Sales Representative Obere Str. 57 Berlin 12209 Germany 030-0074321 030-0076545
ANATR Ana Trujillo Emparedados y helados Ana Trujillo Owner Avda. de la Constitución 2222 México D.F. 05021 Mexico (5) 555-4729 (5) 555-3745
ANTON Antonio Moreno Taquería Antonio Moreno Owner Mataderos2312 México D.F. 05023 Mexico (5) 555-3932
... ... ... ... ... ... ... ... ... ... ...

orders:

OrderID CustomerID EmployeeID OrderDate RequiredDate ShippedDate ShipVia Freight ShipName ShipAddress ShipCity ShipRegion ShipPostalCode ShipCountry
10248 VINET 5 1996-07-04 1996-08-01 1996-07-16 3 32.38 Vins et alcools Chevalier 59 rue de l'Abbaye Reims 51100 France
10249 TOMSP 6 1996-07-05 1996-08-16 1996-07-10 1 11.61 Toms Spezialitäten Luisenstr. 48 Münster 44087 Germany
10250 HANAR 4 1996-07-08 1996-08-05 1996-07-12 2 65.83 Hanari Carnes Rua do Paço, 67 Rio de Janeiro RJ 05454-876 Brazil
... ... ... ... ... ... ... ... ... ... ...

order_details:

OrderID ProductID UnitPrice Quantity Discount
10248 11 14 12 0
10248 42 9.8 10 0
10248 72 34.8 5 0
10249 14 18.6 9 0
10249 51 42.4 40 0
10250 41 7.7 10 0
... ... ... ... ...

Check: Let's come up with a few queries you'd be curious to run on these tables.

Guided Practice: ORDER BY (10 mins)

The ORDER BY keyword is used to sort the result-set by one or more columns. It sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword.

SQL ORDER BY Syntax

SELECT _column_name_,_ column_name_  
FROM _table_name_  
ORDER BY _column_name _ASC|DESC,_ column_name_ ASC|DESC;

Example

The following SQL statement selects all customers from the "Customers" table, sorted by the "Country" column:

SELECT "CustomerID", "CompanyName", "Country" FROM customers
ORDER BY "Country";
CustomerID CompanyName Country
OCEAN Océano Atlántico Ltda. Argentina
CACTU Cactus Comidas para llevar Argentina
RANCH Rancho grande Argentina
... ... ...

ORDER BY DESC Example

The following SQL statement selects all customers from the "Customers" table, sorted DESCENDING by the "Country" column:

SELECT "CustomerID", "CompanyName", "Country" FROM customers
ORDER BY "Country" DESC;
CustomerID CompanyName Country
GROSR GROSELLA-Restaurante Venezuela
LILAS LILA-Supermercado Venezuela
HILAA HILARION-Abastos Venezuela
... ... ...

ORDER BY Several Columns Example

The following SQL statement selects all customers from the customers table, sorted by the "Country" and the "CompanyName" columns:

SELECT "CustomerID", "CompanyName", "Country" FROM customers
ORDER BY "Country", "CompanyName";
CustomerID CompanyName Country
CACTU Cactus Comidas para llevar Argentina
OCEAN Océano Atlántico Ltda. Argentina
RANCH Rancho grande Argentina
... ... ...

Check: run a few queries on the provided database and try using the ORDER BY command.

Guided Practice: Aliases (10 mins)

SQL aliases are used to give a database table, or a column in a table, a temporary name. Aliases are often created to make column names more readable.

SQL Alias Syntax for Columns

SELECT _column_name_ AS _alias_name_  
FROM _table_name;_

SQL Alias Syntax for Tables

SELECT _column_name(s)_  
FROM _table_name _AS _alias_name;_

Alias Examples for Table Columns

The following SQL statement specifies two aliases, one for the CompanyName column and one for the ContactName column:

SELECT "CompanyName" AS "Customer", "ContactName" AS "[Contact Person]"
FROM customers;
Customer [Contact Person]
Alfreds Futterkiste Maria Anders
Ana Trujillo Emparedados y helados Ana Trujillo
Antonio Moreno Taquería Antonio Moreno
... ...

In the following SQL statement we combine four columns (Address, City, PostalCode and Country) and create an alias named Address:

SELECT "CompanyName" AS "Customer",
    CONCAT("Address",', ',"City",', ', "PostalCode",', ',"Country") AS "Address"
FROM customers;
Customer Address
Alfreds Futterkiste Obere Str. 57, Berlin, 12209, Germany
Ana Trujillo Emparedados y helados Avda. de la Constitución 2222, México D.F., 05021, Mexico
Antonio Moreno Taquería Mataderos 2312, México D.F., 05023, Mexico
... ...

Alias Example for Tables

The following SQL statement selects all the orders from the customer with CustomerID=4 (Around the Horn). We use the customers and orders tables, and give them the table aliases of c and o respectively. (This time we used aliases to make the SQL shorter.)

SELECT o."OrderID", o."OrderDate", c."CompanyName"
FROM customers AS c, orders  AS o  
WHERE c."CompanyName" = 'Around the Horn' AND c."CustomerID"=o."CustomerID";
CustomerID OrderDate CompanyName
10355 1996-11-15 Around the Horn
10383 1996-12-16 Around the Horn
10453 1997-02-21 Around the Horn
10558 1997-06-04 Around the Horn
... ... ...

Check what would be the same SQL statement without aliases?

Aliases can be useful when:

  • More than one table is involved in a query
  • Functions are used in the query
  • Column names are long or not very readable
  • Two or more columns are combined together

Check: Try running the same queries as above with aliases. Are they more readable?

Guided Practice: SQL LIKE Operator (10 mins)

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

SQL LIKE Syntax

SELECT _column_name(s)_  
FROM _table_name_  
WHERE _column_name_ LIKE _pattern_;

SQL LIKE Operator Examples

The following SQL statement selects all customers with a City starting with the letter "S":

SELECT * FROM customers
WHERE "City" LIKE 'S%';
CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax
BLONP Blondesddsl père et fils Frédérique Citeaux Marketing Manager 24, place Kléber Strasbourg 67000 France 88.60.15.31 88.60.15.32
COMMI Comércio Mineiro Pedro Afonso Sales Associate Av. dos Lusíadas, 23 Sao Paulo SP 05432-043 Brazil (11) 555-7647
FAMIA Familia Arquibaldo Aria Cruz Marketing Assistant Rua Orós, 92 Sao Paulo SP 05442-030 Brazil (11) 555-9857
... ... ... ... ... ... ... ... ... ... ...

Tip: The "%" sign is used to define wildcards (missing letters) both before and after the pattern. Also notice that PostgreSQL is case sensitive.

The following SQL statement selects all customers with a City ending with the letter "s":

SELECT * FROM customers
WHERE "City" LIKE '%s';
CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax
CACTU Cactus Comidas para llevar Patricio Simpson Sales Agent Cerrito 333 Buenos Aires 1010 Argentina (1) 135-5555 (1) 135-4892
DUMON Du monde entier Janine Labrune Owner 67, rue des Cinquante Otages Nantes 44000 France 40.67.88.88 40.67.89.89
FRANR France restauration Carine Schmitt Marketing Manager 54, rue Royale Nantes 44000 France 40.32.21.21 40.32.21.20
... ... ... ... ... ... ... ... ... ... ...

Using the NOT keyword allows you to select records that do NOT match the pattern. The following SQL statement selects all customers with Country NOT containing the pattern "land":

SELECT * FROM customers
WHERE "Country" NOT LIKE '%land%';
CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax
ALFKI Alfreds Futterkiste Maria Anders Sales Representative Obere Str. 57 Berlin 12209 Germany 030-0074321 030-0076545
ANATR Ana Trujillo Emparedados y helados Ana Trujillo Owner Avda. de la Constitución 2222 México D.F. 05021 Mexico (5) 555-4729 (5) 555-3745
ANTON Antonio Moreno Taquería Antonio Moreno Owner Mataderos 2312 México D.F. 05023 Mexico (5) 555-3932
... ... ... ... ... ... ... ... ... ... ...

Check: Run a few queries on the provided database and try using the LIKE command.

Guided Practice: DISTINCT and LIMIT Operators (10 mins)

The SELECT DISTINCT statement is used to return only distinct (different) values. In a table, a column may contain many duplicate values; and sometimes you only want to list the different (distinct) values.

SQL SELECT DISTINCT Syntax

SELECT DISTINCT _column_name_,_column_name_  
FROM _table_name_;

SELECT DISTINCT Example

The following SQL statement selects only the distinct values from the "City" columns from the "Customers" table:

SELECT DISTINCT "City" FROM customers;
City
Leipzig
London
Nantes
...

The DISTINCT operator is equivalent to the numpy.unique command.

SQL SELECT LIMIT Syntax

Sometimes we may want to only retrieve a fixed number of records from the database. This is where the LIMIT operator comes in.

SELECT _column_name_,_column_name_  
FROM _table_name_
LIMIT _number_of_records;

SELECT LIMIT Example

The following SQL statement selects only the first 3 values from the "City" columns from the "Customers" table:

SELECT DISTINCT "City"
FROM customers
LIMIT 3;
City
Leipzig
London
Nantes

The LIMIT operator is equivalent to the pandas.Dataframe.head command.

Check: Try running the same queries using DISTINCT and LIMIT

Guided Practice: GROUP BY Operator (10 mins)

A table may contain several records that have a common key. Consider for example the following order_details table:

OrderID ProductID UnitPrice Quantity Discount
10248 11 14 12 0
10248 42 9.8 10 0
10248 72 34.8 5 0
10249 14 18.6 9 0
10249 51 42.4 40 0
10250 41 7.7 10 0
... ... ... ... ...

The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. For example we may want to know the total number of items purchased in each order.

SQL GROUP BY Syntax

The syntax to perform a GROUP BY operation is the following:

SELECT column_name, aggregate_function(column_name)  
FROM table_name  
WHERE column_name operator value  
GROUP BY column_name;

We select the column we want to aggregate on and the function.

Example

In order to calculate the total amount of items in a specific order we can use the following query:

SELECT "OrderID", SUM("Quantity")
FROM order_details
GROUP BY "OrderID"

Note that in PostgreSQL unquoted names are case-insensitive. Thus SELECT * FROM hello and SELECT * FROM HELLO are equivalent.

However, quoted names are case-sensivite. SELECT * FROM "hello" is not equivalent to SELECT * FROM "HELLO".

To make a "bridge" between quoted names and unquoted names, unquoted names are implicitly lowercased, thus hello, HELLO and HeLLo are equivalent to "hello", but not to "HELLO" or "HeLLo".

Thus, when creating entities (tables, views, procedures, etc) in PostgreSQL, you should specify them either unquoted, or quoted-but-lowercased.

Guided Practice: HAVING Operator (10 mins)

The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions. Let's consider the previous example where we aggregated the Quantity of an order. What if we would like to filter the results for orders where the Maximum discount applied on any item is lower than 10%?

SQL HAVING Syntax

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Example
SELECT "OrderID", SUM("Quantity"), MAX("Discount")
FROM order_details
GROUP BY "OrderID"
HAVING MAX("Discount") <= 0.1;
OrderID sum max
10501 20 0
10295 4 0
10827 36 0

Independent Practice (10 minutes)

Now that you've learned a lot more commands, try combining them:

  • Retrieve the top 3 orders by number of items ordered
  • Retrieve the oldest 5 orders
  • Retrieve the 5 most recent orders. Nest that query into another one that ranks these 5 orders by shipping country (alphabetical order)
  • Try some queries on your own

Answers:

SELECT "OrderID", SUM("Quantity") AS "Sum"
FROM order_details
GROUP BY "OrderID"
ORDER BY "Sum" DESC
LIMIT 3;
SELECT "OrderID", "OrderDate"
FROM orders
ORDER BY "OrderDate"
LIMIT 5;
SELECT * FROM (
    SELECT "OrderID", "OrderDate", "ShipCountry"
    FROM orders
    ORDER BY "OrderDate" DESC
    LIMIT 5
    ) AS fivemostrecent
ORDER BY "ShipCountry";

Conclusion (5 mins)

In this lesson we have learned many more new commands to make our SQL queries more powerful.

In particular we learned how to:

  • Sort results by a column using ORDER BY
  • Simplify our syntax using aliases
  • Match patterns using LIKE
  • Select distinct items using DISTINCT
  • Aggregate values using GROUP BY
  • Filter on aggregations using HAVING

Check: can you think of a few more business cases where these are useful?


ADDITIONAL RESOURCES

results matching ""

    No results matching ""