- Describe the uses of advanced queries like subqueries and unions
- Demonstrate ability to order data
- Demonstrate ability to aggregate and combine data
Let's create some data tables that we can run some queries on. Go to a terminal and run psql
. Create a new database named 'advanced':
CREATE DATABASE advanced;
Now connect to it using \c advanced
and create a new 'customers' table inside:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT,
age INTEGER,
country TEXT,
salary INTEGER
);
-- give it some data:
INSERT INTO customers (name, age, country, salary) VALUES ('Bira', 32, 'Brazil', 2000);
INSERT INTO customers (name, age, country, salary) VALUES ('Kaushik', 23, 'Kota', 2000);
INSERT INTO customers (name, age, country, salary) VALUES ('Ramesh', 25, null, 1500);
INSERT INTO customers (name, age, country, salary) VALUES ('Kaushik', 25, 'Mumbai', null);
INSERT INTO customers (name, age, country, salary) VALUES ('Amelia', 27, 'England', 8500);
INSERT INTO customers (name, age, country, salary) VALUES ('Silvana', null, null , 4500);
You should be able to SELECT all the data and see this output:
id | name | age | country | salary
----+---------+-----+---------+--------
1 | Bira | 32 | Brazil | 2000
2 | Kaushik | 23 | Kota | 2000
3 | Ramesh | 25 | | 1500
4 | Kaushik | 25 | Mumbai |
5 | Amelia | 27 | England | 8500
6 | Silvana | | | 4500
(6 rows)
Now let's make a friend for it. Create a new 'orders' table:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
order_num TEXT,
amount DECIMAL,
customer_id INTEGER REFERENCES customers(id)
);
-- Give it some data
INSERT INTO orders (order_num, amount, customer_id) VALUES ('A2067O', 104.09 , 1);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('J9899P', 50.54 , 1);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('N2337B', 954.66 , 1);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('A7786C', 66.33 , 2);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('F5400B', 403.54 , 3);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('F5298H', 669.84 , 3);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('L7800M', 200.03 , 3);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('J5454G', 44.30 , 4);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('F9802B', 43.54 , 6);
INSERT INTO orders (order_num, amount, customer_id) VALUES ('B7780B', 182.72 , 6);
Now SELECT * FROM orders;
and you should see this table:
id | order_num | amount | customer_id
----+-----------+--------+-------------
1 | A2067O | 104.09 | 1
2 | J9899P | 50.54 | 1
3 | N2337B | 954.66 | 1
4 | A7786C | 66.33 | 2
5 | F5400B | 403.54 | 3
6 | F5298H | 669.84 | 3
7 | L7800M | 200.03 | 3
8 | J5454G | 44.30 | 4
9 | F9802B | 43.54 | 6
10 | B7780B | 182.72 | 6
(10 rows)
Remember our 'orders' table:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
order_num TEXT,
amount DECIMAL,
customer_id INTEGER REFERENCES customers(id)
);
That last column we defined is called a FOREIGN KEY. Foreign keys and primary keys are related in that a foreign key is basically a reference to a primary key in another table. In this case, we have a column in our 'orders' table called customer_id
that references the primary key in the 'customers' table. This is the basis for making data relations with JOIN statements as we will see below. To summarize, the foreign key provides a sort of ownership link between the customer who has the primary key and all of that customer's orders in the related table where the customer_id
matches the id from the 'customers' table.
There are four types of JOINs in SQL:
LEFT JOIN
RIGHT JOIN
INNER JOIN
FULL [OUTER] JOIN
Let's look at our table for customers and our table for orders. The customers table looks like this:
id | name | age | country | salary
----+---------+-----+---------+--------
1 | Bira | 32 | Brazil | 2000
2 | Kaushik | 23 | Kota | 2000
3 | Ramesh | 25 | | 1500
4 | Kaushik | 25 | Mumbai |
5 | Amelia | 27 | England | 8500
6 | Silvana | | | 4500
(6 rows)
And the orders table looks like this:
id | order_num | amount | customer_id
----+-----------+--------+-------------
1 | A2067O | 104.09 | 1
2 | J9899P | 50.54 | 1
3 | N2337B | 954.66 | 1
4 | A7786C | 66.33 | 2
5 | F5400B | 403.54 | 3
6 | F5298H | 669.84 | 3
7 | L7800M | 200.03 | 3
8 | J5454G | 44.30 | 4
9 | F9802B | 43.54 | 6
10 | B7780B | 182.72 | 6
(10 rows)
As you can see, there are some customers who haven't placed orders. If we ask for the orders that correspond to customer_id 5, we will receive a value of NULL because they haven't ordered anything.
INNER JOIN
The defualt join type is is inner:
-- the simplist join possible
SELECT *
FROM customers
JOIN orders
ON customers.id=orders.customer_id;
-- selection of only certain cols
SELECT customers.name, orders.order_num
FROM customers
INNER JOIN orders
ON customers.id = orders.customer_id;
-- selection using 'aliasing' so you don't have to type out column names
-- aliases are defined in the FROM and JOIN clauses
-- aliases are refered to in the SELECT clause
SELECT c.name, o.order_num
-- here customers is alises as c and orders is alaised as o
FROM customers c
INNER JOIN orders o
-- aliases are refered to in the ON clause
ON c.id = o.customer_id;
An INNER JOIN
will return a dataset with all the matches from our customer and order tables where there is no NULL value on either side.
name | order_num
---------+-----------
Bira | A2067O
Bira | J9899P
Bira | N2337B
Kaushik | A7786C
Ramesh | F5400B
Ramesh | F5298H
Ramesh | L7800M
Kaushik | J5454G
Silvana | F9802B
Silvana | B7780B
(10 rows)
NOTE: This is the default type of JOIN so if you don't specify the type, SQL will perform an INNER JOIN
.
FULL [OUTER] JOIN
SELECT customers.name, orders.order_num
FROM customers
FULL OUTER JOIN orders
ON customers.id = orders.customer_id;
-- OR
-- select cols outer join with aliases
SELECT c.name, o.order_num
FROM customers c
FULL OUTER JOIN orders o
ON c.id=o.customer_id;
NOTE: The OUTER
is optional
A FULL OUTER JOIN
will do the opposite of an INNER JOIN
, returning you a table with all possible combinations, even if NULL has to be placed in.
name | order_num
---------+-----------
Bira | A2067O
Bira | J9899P
Bira | N2337B
Kaushik | A7786C
Ramesh | F5400B
Ramesh | F5298H
Ramesh | L7800M
Kaushik | J5454G
Silvana | F9802B
Silvana | B7780B
Amelia |
(11 rows)
TIP: The LEFT JOIN
and RIGHT JOIN
below can both be considered types of outer joins
LEFT JOIN
SELECT customers.name, orders.order_num
FROM customers LEFT JOIN orders
ON customers.id = orders.customer_id;
-- OR with aliases
SELECT c.name, o.order_num
FROM customers c LEFT JOIN orders o
ON c.id=o.customer_id;
With a LEFT JOIN
the table returned will have all values in the left table, even if there is no corresponding value on the right side.
name | order_num
---------+-----------
Bira | A2067O
Bira | J9899P
Bira | N2337B
Kaushik | A7786C
Ramesh | F5400B
Ramesh | F5298H
Ramesh | L7800M
Kaushik | J5454G
Silvana | F9802B
Silvana | B7780B
Amelia |
(11 rows)
RIGHT JOIN
With a RIGHT JOIN
the table returned will have all values in the right table, even if there is no corresponding value on the left side. This is a very rare join as it would require us to have orphaned records in the orders table. That is, orders that have no related customer. This is actually impossible with the way we have the tables set up. The foreign key constraint in the orders table basically says that you can't have a value in the customer_id
column in the orders table if that id
doesn't exist in the customers table. So when we run this, it looks exactly like our INNER JOIN above.
name | order_num
---------+-----------
Bira | A2067O
Bira | J9899P
Bira | N2337B
Kaushik | A7786C
Ramesh | F5400B
Ramesh | F5298H
Ramesh | L7800M
Kaushik | J5454G
Silvana | F9802B
Silvana | B7780B
(10 rows)
SELECT customers.name, orders.order_num
FROM customers
RIGHT JOIN orders
ON customers.id = orders.customer_id;
-- OR with aliasing
SELECT c.name, o.order_num
FROM customers c
RIGHT JOIN orders o
ON c.id=o.customer_id;
It's great that we can select all records from a table but we frequently want to limit the results to a smaller set that meets some set of criteria. We saw the WHERE clause in the introduction to SQL lesson and saw how it can help us retrieve specific data. Here are a few more ways we can get more exclusive with our queries.
Remember that in SQL, our comparison operators are a little different. Equality is a single equals =
and inequality is represented by a "greater-than-or-less-than" symbol <>
.
- LIKE - SELECT * FROM customers WHERE name LIKE '%';
- DISTINCT - SELECT DISTINCT name FROM customers;
- ORDER BY - SELECT * FROM customers ORDER BY name DESC;
- COUNT - SELECT count(*) FROM customers;
- MAX - SELECT max(age) FROM customers;
- MIN - SELECT min(age) FROM customers;
- AND - SELECT * from customers WHERE name = 'Kaushik' AND age = 25;
- OR - SELECT * from customers WHERE name = 'Silvana' OR name = 'Bira';
- IN - SELECT * FROM customers WHERE name IN ('Amelia', 'Ramesh');
- NOT IN - SELECT * FROM customers WHERE name NOT IN ('Amelia', 'Ramesh');
- LIMIT - SELECT * FROM customers LIMIT 2;
- OFFSET - SELECT * FROM customers OFFSET 1;
- LIMIT + OFFSET - SELECT * FROM customers LIMIT 2 OFFSET 1;
- % - SELECT * FROM customers WHERE name LIKE '%a';
COUNT() is an aggregate function.
"In database management an aggregate function is a function where the values of multiple rows are grouped together to form a single value of more significant meaning or measurement such as a set, a bag or a list." Read more on wikipedia.
We use an aggregate function to get the total count of customers in a table.
SELECT COUNT(*) FROM customers;
What about getting the count of something more specific in customer, such as the number of rows that have the age datapoint?
SELECT COUNT(age) FROM customers;
GROUP BY is used to pull together identical data points. For example, say we just want to see the different ages we have in our customer table, without having to look through the duplicates too.
SELECT age FROM customers GROUP BY age;
What if we just want to know how many different ages we have? We can combine GROUP BY and COUNT():
SELECT age, COUNT(age) FROM customers GROUP BY age;
Or maybe we want the average salaries of the customers from each country:
SELECT country, AVG(salary) FROM customers GROUP BY country;
Aliases are a piece of a SQL query that allows you to temporarily rename a table or column for the current query.
SELECT country, AVG(salary) AS avgSal FROM customers GROUP BY country;
-- add a data column
ALTER TABLE customers ADD COLUMN date DATE;
-- set the name column to be required when a new entry is made
ALTER TABLE customers ALTER COLUMN name SET NOT NULL;
-- remove the data column
ALTER TABLE customers DROP date;
What if I want to get names of customers with the highest salary.
Let's try it using WHERE
SELECT name, salary FROM customers
WHERE salary = MAX(salary);
That will give us an error, because MAX is an aggregate function and can't be used in WHERE.
This will return the maximum salary, which we need to feed into WHERE.
SELECT name, salary FROM customers
WHERE salary = (
SELECT MAX(salary) FROM customers
);
The CASE statement is used when you want to display different things depending on the data that you've queried from the database. There's two different ways to structure a CASE statement shown below. Note that in the first example you can only compare against single values while in the second example you can use actual expressions for evaluation. Also note that CASE statements require an ELSE statement.
SELECT name, age,
CASE
WHEN age > 30 THEN 'boomer'
WHEN age > 23 THEN 'adult'
WHEN age > 18 THEN 'young adult'
ELSE 'baby'
END AS age_group
FROM customers;
Unions display the results of two or more SELECT statements into one table, so the SELECT statements must have the same number of columns with the same names/data types, in the same order.
Here's a customers table:
id | name
---+---------
1 | Romesh
2 | Sally
3 | Vlad
4 | Poppy
CREATE TABLE subscribers (
id SERIAL PRIMARY KEY,
name TEXT
);
-- Give it some data
INSERT INTO subscribers (name) VALUES ('Romish');
INSERT INTO subscribers (name) VALUES ('Sally');
INSERT INTO subscribers (name) VALUES ('Janice');
INSERT INTO subscribers (name) VALUES ('Poopy');
INSERT INTO subscribers (name) VALUES ('Katie');
Now SELECT * FROM subscribers;
and you should see this table:
id | name
---+---------
1 | Romesh
2 | Sally
3 | Poppy
4 | Janice
5 | Kady
We could use this query to view the ids and names from both the customers and the subscribers tables.
SELECT id, name FROM customers UNION SELECT id, name FROM subscribers ORDER BY id;
id | name
---+---------
1 | Romesh
2 | Sally
3 | Vlad
3 | Poppy
4 | Poppy
4 | Janice
5 | Kady
Notice that the resulting table has fewer rows that the sum of the rows from each table. This is because UNION statements also eliminate any duplicate rows from the result. To include the duplicate rows, use UNION ALL.
SELECT id, name FROM customers UNION ALL SELECT id, name FROM subscribers ORDER BY id;
id | name
---+---------
1 | Romesh
1 | Romesh
2 | Sally
2 | Sally
3 | Vlad
3 | Poppy
4 | Poppy
4 | Janice
5 | Kady
.sql
files can be written and ran like any other langauge. From within the psql
the command \i <relative path to file>.sql
will import and run a .sql
file.
mkdir advanced-sql
to create a folder for the sql filestouch create-example-db.sql
to make your first.sql
file to run- add the following
SQL
example:
-- comments in SQL start with tow dashes btw
/*
multiline line comments work like this
this file will create a db called example_books and connect to it, CREATE a book table, add CREATE information and then READ all
*/
-- create the db
CREATE DATABASE "example_books";
-- connect to it (psql commands are valid)
\connect example_books
-- create the tables
CREATE TABLE books (
id SERIAL PRIMARY KEY,
title TEXT,
author TEXT
);
-- CREATE some data
INSERT INTO books (title, author) VALUES ('Do Androids Dream of Electric Sheep?', 'Phillip K. Dick');
INSERT INTO books (title, author) VALUES ('Ubik', 'Phillip K. Dick');
-- single qoutes are escaped by doubling them up ''
INSERT INTO books (title, author) VALUES ('Cat''s Cradle', 'kurt Vonnegut');
INSERT INTO books (title, author) VALUES ('Breakfast of Champions', 'kurt Vonnegut');
-- READ some data
SELECT * FROM books;
Creating an ER diagram can be useful if you are designing a DB with lots of tables and relationships to one another. It may be useful to revist ER Diagrams after you have a firm understanding of databases. Here are some useful resources:
- gitbook notes on ERDs
- Wikipedia - ER Diagram
- Ultimate Guide To ER Diagrams - Not so ultimate, but a good intro.