Skip to content

Latest commit

 

History

History
1089 lines (756 loc) · 31.7 KB

postgres.md

File metadata and controls

1089 lines (756 loc) · 31.7 KB

SQL

In the next step we will use the Prisma ORM to simplify working with relational databases. Documentation on Prisma can be found here.


What is a RDBMS

RDBMS stands for Relational Database Management System. In this, data can be stored in a tabular structure. The best-known RDBMS are, for example, MySQL or PostgreSQL.

An RDBMS can contain not only one, but several databases. These can be created, deleted or edited via the data management system.

A single database can contain many different tables. For example, one table for unicorns and another for locations.

Unicorns:

id name age location_id
1 Friendcorn 175 1
2 Lazycorn 957 2
3 Strangecorn 5 1

Locations:

id location lat lng
1 Strange Town 175 33
2 Sleepy Hollow 55 199

It is possible to link the different tables together. Like Unicorns and Locations in the example above.

A table is divided into fields (columns) and records (rows). It is important that a record must be assigned a unique id, to be able to identify it later.

Server


What is SQL

SQL is the language that is required to communicate with a relational database. In the core syntax, a distinction is made between Key Words and Identifiers.
The Key Words are specified by the language itself.
The Identifiers result from the various fields in a table.

An acurate list of all Keywords can be found here.

Syntax

Syntax rules:

  1. SQL statements must end with a semicolon (if multiple statements are included in the same command).
  2. SQL statements are case-insensitive. This means that FROM is the same as from (for better identification of keywords, they should still be capitalized).
  3. identifiers can be surrounded with double quotes to avoid conflicts with built-in keywords.

Working with Databases and Tables

When working with databases and tables, the following keywords are important.

  • CREATE is needed to create a database or table.
  • ALTER is needed to update an existing database or table.
  • DROP is needed to delete an existing database or table
CREATE DATABASE cornify;
DROP DATABASE cornify;

Create

The keywords CREATE TABLE followed by a random table name indicate that a table is being created. The following round brackets specify which fields the table contains and which data types are allowed in the respective fields.

A list of the most common data types can be found in the chapter Data Types.

CREATE TABLE unicorns(
    id SERIAL PRIMARY KEY,
    unicorn_name VARCHAR(200),
    salary INT,
);

DROP

Using the keywords DROP TABLE followed by the respective table name. A table can be deleted

DROP TABLE unicorns;

Alter

The keywords ALTER TABLE followed by the table name indicate which table is to be changed.

In the example below, a description field of data type TEXT is added to the unicorns table. More information on this topic will be given later in the Modifying Tables chapter.

ALTER TABLE unicorns ADD COLUMN description TEXT;

Data Types

Data types are needed to specify which value may be entered into the field of a table.

Data types can differ depending on the RDBMS. In this chapter, the most common data types are discussed. Special data types for PostgreSQL can be looked up here, for example.

The most common data types are:

  • Character
  • Numeric
  • Date
  • Boolean

Character Types

The character types can be chosen between CHAR(X), VARCHAR(X) and TEXT (The X indicates how many characters can be stored).

CHAR(X) and VARCHAR(X) are almost identical. The big difference is that the missing characters in CHAR(X) are filled with spaces.

TEXT can be used when a larger amount of text is to be stored. Like a blog post for example. The character type text stores a maximum size of 1GB (PostgreSQL).

Another special character type is the ENUM. In an ENUM you can specify which concrete text values are allowed. For example: Mr | Mrs | Mx

CREATE TYPE unicorn_status AS ENUM ('warrior', 'worker', 'magical_one');

CREATE TABLE unicorns(
    id SERIAL PRIMARY KEY,
    unicorn_name VARCHAR(200),
    unicorn_description TEXT,
    unicorn_type unicorn_status
);

Numeric Types

All Numeric Types for PostgreSQL can be read here. However, the most important ones are SMALLINT and INTEGER. As well as DECIMAL and NUMERIC.

SMALLINT and INTEGER are numbers without decimal places.

DECIMAL and NUMERIC are used to store numbers with decimal places.

REAL and DOUBLE are the same then DECIMAL or NUMERIC. The difference is, that REAL and DOUBLE have variable-precision that means they are inexact. But therefore much faster performance.

CREATE TABLE locations (
    id SERIAL PRIMARY KEY,
    area_name VARCHAR(300) NOT NULL,
    lat NUMERIC(5,2) NOT NULL,
    lng NUMERIC(5,2) NOT NULL -- Maximum five characters and two of them in the comma area
);

Date Types

All Date Types for PostgreSQL can be read here. The most important date types are DATE and TIMESTAMP.

DATE contains the current date without taking hours or minutes into account.

TIMESTAMP is needed to store hours and minutes with consideration of the time zone.

With the help of the DEFAULT keyword, default values can be stored for the respective field. For a date type this can be helpful in combination with the CURRENT_TIMESTAMP or CURRENT_DATE keyword.

CREATE TABLE conversations (
    id serial PRIMARY KEY,
    unicorn_name VARCHAR(200),
    clan_name VARCHAR(200),
    message TEXT,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)

Boolean Type

The Boolean type accepts truthy or falsy values.

truthy:

  1. true
  2. 'yes'
  3. 'on'
  4. 1

Falsy:

  1. false
  2. 'no'
  3. 'off'
  4. 0

What about files

Databases are used to store primitive values. That means the path to a certain file can be stored in a database, not the file itself.


Default Values

A column can be assigned a default value. When a new row is created and no values are specified for some of the columns, those columns will be filled with their respective default values.

CREATE TABLE unicorns(
    id serial PRIMARY KEY,
    unicorn_name VARCHAR(200),
    salary INT DEFAULT NULL
);

Null:

The NULL value is needed if you want to indicate that there is no data for this field.

NULL VS 0: In the table below, Strangecorn has a value of NULL in the salary field. Let's assume that later on we want to calculate the average of all salaries. In this case, all records with the salary value of NULL would be excluded from the calculation.

However, if a 0 was entered as the value, the record would be included in the calculation and thus falsify the result.

Unicorns:

id unicorn_name salary unicorn_type
1 Friendcorn 5000 1
2 Lazycorn 7000 2
3 Strangecorn NULL 1

With a value of NULL the average can now be calculated correctly and requested via the AVG function.

SELECT AVG(salary) FROM unicorns;

Constraints

A Constraint restrict which values can actually be inserted into a column.

A distinction is made between different constraints, these would be:

  1. Check Constraints
  2. Not-Null Constraints
  3. Unique Constraints
  4. Primary Keys
  5. Foreign Keys
  6. Exclusion Constraints

Check Constraints

The CHECK constraint can be used to specify that the value in a particular column must satisfy a Boolean expression (truth value).

In the example below, it is specified that the column salary may only contain values that are greater than 1000.

CREATE TABLE unicorns(
    id serial PRIMARY KEY,
    unicorn_name VARCHAR(200),
    salary INT CHECK (salary > 1000)
);

With the CONSTRAINT keyword you can also give the constraint a separate name. This clarifies error messages and allows you to refer to the constraint when you need to change it.

CREATE TABLE unicorns(
    id serial PRIMARY KEY,
    unicorn_name VARCHAR(200),
    salary INT CONSTRAINT salary_price CHECK (salary > 1000)
);

Not Null Constraints

A not-null constraint simply specifies that a column must not assume the null value.

The example below specifies that the value for unicorn_name must not be NULL.

CREATE TABLE unicorns(
    id serial PRIMARY KEY,
    unicorn_name VARCHAR(200) NOT NULL,
    salary INT DEFAULT NULL CONSTRAINT salary_price CHECK (salary > 1000)
);

Unique Constraints

The UNIQUE constraint ensure that the value in a column is unique. This can be useful when assigning id's or email addresses.

CREATE TABLE unicorns(
    id SERIAL PRIMARY KEY,
    unicorn_name VARCHAR(200) NOT NULL,
    email: VARCHAR(200) NOT NULL UNIQUE,
    salary INT DEFAULT NULL CONSTRAINT salary_price CHECK (salary > 1000)
);

Primary Keys

The PRIMARY KEY indicates, that a value is unique. A PRIMARY KEY may only be assigned once per table.

CREATE TABLE unicorns(
    id SERIAL PRIMARY KEY,
    unicorn_name VARCHAR(200) NOT NULL,
    email: VARCHAR(200) NOT NULL UNIQUE,
    salary INT DEFAULT NULL CONSTRAINT salary_price CHECK (salary > 1000)
);

The PRIMARY KEY signals to the RDBMS that this field is the primary identifier of a record.


Foreign Keys

Foreign keys are used to ensure relationships between one or more tables.
This can be achieved using the REFERENCES keyword followed by the desired table name.

Unicorns:

id name age location_id
1 Friendcorn 175 1
2 Lazycorn 957 2
3 Strangecorn 5 1

Locations:

id location lat lng
1 Strange Town 175 33
2 Sleepy Hollow 55 199
CREATE TABLE unicorns (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    age INTEGER NOT NULL
    location_id  INTEGER NOT NULL REFERENCES locations (id)
);

CREATE TABLE locations (
    id SERIAL PRIMARY KEY,
    location VARCHAR(200) NOT NULL,
    lat INTEGER NOT NULL
    lng INTEGER NOT NULL
);

Please note: More about foreign and primary keys can be looked up in the Related Data chapter.


Modifying Tables

PostgreSQL provides a set of commands to make changes to an existing table. The records in the table will not be deleted.

Postgres provides solutions for the following scenarios.

  1. Add columns
  2. Remove columns
  3. Add constraints
  4. Remove constraints
  5. Change default values
  6. Change column data types
  7. Rename columns
  8. Rename tables

Add columns

As described in the section Working with Databases and Tables, ALTER TABLE is needed to update an existing table.

Using the keywords ADD COLUMN a new column can be created. Also the desired data type for the field must be specified.

ALTER TABLE unicorns ADD COLUMN description text;

Remove columns

The keyword DROP followed by the column name can be used to drop a column.

ALTER TABLE unicorns DROP COLUMN description;

Add constraints

To add a Constaraint to an existing field, the procedure is a little different. With the keywords ADD CONSTRAINT a new constraint can be created. This will be assigned a name of your choice. The constraint is not directly assigned to a field but results from the filter.

ALTER TABLE unicorns
ADD CONSTRAINT salary_range CHECK (salary > 1000 AND salary < 100000);

Remove constraints

To remove a constraint you need to know its name. If you gave it a name then that's easy.

ALTER TABLE unicorns
DROP CONSTRAINT salary_range;

Change default values

1. Changing a Column's Default Value:

After the table has been referenced, the desired column can be changed via ALTER COLUMN. The keyword SET allows to set a default value.

ALTER TABLE unicorns ALTER COLUMN is_happy SET DEFAULT TRUE;

2. Changing a Column's Data Type:

The keywords SET DATA TYPE can be used to change the data type of a column.

ALTER TABLE unicorns ALTER COLUMN salary SET DATA TYPE REAL;

This will succeed only if each existing entry in the column can be converted to the new type by an implicit cast. If a more complex conversion is needed, you can add a USING clause that specifies how to compute the new values from the old.
PostgreSQL will attempt to convert the column's default value (if any) to the new type, as well as any constraints that involve the column. But these conversions might fail, or might produce surprising results. It's often best to drop any constraints on the column before altering its type, and then add back suitably modified constraints afterwards.

3. Renaming a Column:

The keywords RENAME COLUMN can be used to rename a column. This is done by referencing the old column name followed by the keyword TO and the name of the new column.

ALTER TABLE unicorns RENAME COLUMN salary TO yearly_salary;

4. Renaming a Table:

A table can be renamed similarly to a column. An example is included below.

ALTER TABLE unicorns RENAME TO beautiful_unicorns;

CRUD

What are CRUD operations? CRUD stands for create, read, update, delete. So these are the basic data manipulation operations.

The following keywords are required for CRUD operations:

  • Create Data: INSERT INTO
  • Read Data: SELECT
  • Update Data: UPDATE
  • Delete Data: DELETE

Inserting Data

The column names in the round brackets can be omitted. Then, the column values must be entered exactly according to the order of the table.
If column names are used, the values are oriented according to these.

If you don't have values for all the columns, you can omit some of them. In that case, the columns will be filled with their default values.

-- insert single row
INSERT INTO <table name> (<column names>)
VALUES (<column values>);

--insert multiple rows
INSERT INTO <table name> (<column names>)
VALUES
    (<column values>), --row 1
    (<column values>), --row 2
    (<column values>); --row 3

Updating Data

The keyword UPDATE followed by the table name can be used to change column values. The SET keyword is used for this purpose. In one command either one column or several columns can be updated at the same time.

-- update single row
UPDATE <table name>
SET <column name> = <new value>
WHERE <condition>;

--update multiple rows
UPDATE <table name>
SET <column name> = <new value>, --row 1
    <column name> = <new value>, --row 2
    <column name> = <new value> --row 3
WHERE <condition>;

UPDATE also accepts a where clause. This will be discussed in more detail later in the Filtering Data chapter. A simple example of a where clause can be taken from below

UPDATE unicorns
SET
    food_amount = 1000,
    rainbow_hugs = 500
WHERE  is_happy IS NOT TRUE;

Deleting Data

The keywords DELETE FROM can be used to delete one or more columns from a table. Here a WHERE clause also can be used as a filter. If this is not specified, all columns in the table will be deleted.

-- deletes specific columns
DELETE FROM <table name>
WHERE <condition>;

-- deletes all columns in the table
DELETE FROM <table name>;

Query Data

The keyword SELECT can be used to query data from the database. After the SELECT keyword, a comma-separated list can be used to specify which columns from the database are needed. The keyword FROM indicates in which table the data records are located.
Also the SELECT statement accepts a WHERE clause, this can be used as a filter.

A wildcard ( * ) indicates that all columns in a table are needed.

-- get back customer_name and product_name columns
SELECT customer_name,
    product_name,
FROM sales;


-- get back all columns
SELECT *
FROM sales;

Addition: In the comma-separated list, the column names can be changed via the 'AS' keyword. Likewise arithmetic operations are possible.

SELECT customer_name,
    product_name,
    price - (price * 0.25) AS discounted_price
FROM sales
WHERE (price > 100) AND (is_canceled IS TRUE);

Filtering Data

For Update, DELETE and SELECT statements a where-clause can be used as a filter. This section explains in more detail how to write appropriate conditions for the WHERE clause.

The condition must evaluate to true, false, or unknown. It can be a boolean expression or a combination of boolean expressions using the AND and OR operators. The query returns only rows that satisfy the condition in the WHERE clause. In other words, only rows that cause the condition evaluates to true will be included in the result set.

SELECT *
FROM sales
WHERE (price > 100) AND (price < 1000);

SELECT *
FROM sales
WHERE (price > 100) OR (is_disputed IS FALSE);

Operators

To write conditions, operators are needed. A list of the most common operators can be seen below. A list of all operators can be found here.

Operator Description
= Equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
<> or != Not equal
AND Logical operator AND
OR Logical operator OR
IN Return true if a value matches any value in a list
BETWEEN Return true if a value is between a range of values
LIKE Return true if a value matches a pattern
IS NULL Return true if a value is NULL
IS FALSE Return true if a value is FALSE
NOT Negate the result of other operators

Equal Operator

SELECT id, unicorn_type
FROM unicorns
WHERE unicorn_name = 'MathHorn';

AND Operator

SELECT id, unicorn_type
FROM unicorns
WHERE unicorn_name = 'MathHorn' AND unicorn_type = 'magical_one';

OR Operator

SELECT id, unicorn_type
FROM unicorns
WHERE unicorn_name = 'MathHorn' OR unicorn_type = 'magical_one';

IN Operator

If you want to match a string with any string in a list, you can use the IN operator.

For example, the following statement returns an unicorn whose unicorn name is Lazycorn, or Strangecorn, or Pummelcorn.

SELECT id, unicorn_type
FROM unicorns
WHERE unicorn_name IN ('Lazycorn', 'Strangecorn', 'Pummelcorn')

LIKE Operator

To find a string that matches a specified pattern, you use the LIKE operator.

The following example returns all unicorns whose unicorn_name start with the string Laz.

SELECT id, unicorn_type
FROM unicorns
WHERE unicorn_name LIKE 'Laz%'

The Underscore ( _ ): An underscore (_) in pattern stands for (matches) any single character

The Percent Sign ( % ) : A percent sign (%) matches any sequence of zero or more characters.

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself

Examples:

SELECT *
FROM sales
WHERE product_name LIKE 'T%'; -- Finds any values that start with "T"
SELECT *
FROM sales
WHERE product_name LIKE '%JS%'; -- Finds any values that have "JS" in any position
SELECT *
FROM sales
WHERE product_name LIKE '%Book'; -- Finds any values that ends with "Book"

BETWEEN Operator

The BETWEEN operator returns true if a value is in a range of values.

SELECT *
FROM sales
WHERE price BETWEEN 20 AND 50;

Not Equal Operator

In this example, the customer_name must start with "Bernd". But, the customer_email must not be the same as "[email protected]".

SELECT *
FROM sales
WHERE
    customer_name LIKE 'Bernd%' AND
    customer_email <>  '[email protected]'

Working with Dates

The BETWEEN operator can also be used to find a period of time between two dates.

SELECT customer_name,
    product_name,
    price - (price * 0.25) AS discounted_price,
    date_created,
    date_fulfilled
FROM sales
WHERE date_fulfilled BETWEEN date_created AND '2023-05-01';

It is also possible to calculate with the date type. In the following example all sales are to be found where date_fullfilled is less than or equal to five days compared to date_created.

SELECT customer_name,
    product_name,
    price - (price * 0.25) AS discounted_price,
    date_created,
    date_fulfilled
FROM sales
WHERE (date_fulfilled - date_created )<= 5

ORDER BY

The keywords ORDER BY can be used to sort records by a specific column order. The default is by ascending order, with the keyword DESC the order can be changed to descending.

SELECT *
FROM sales
WHERE date_fulfilled IS NOT NULL
ORDER BY date_fulfilled DESC;

Of course, ORDER BY can also be used to order by a certain string. In the example below alphabetically by the column customer_name.

SELECT *
FROM sales
WHERE date_fulfilled IS NOT NULL
ORDER BY customer_name;

Limit

ORDER BY is also often used in connection with the LIMIT keyword. In the example below, only the three most recently completed sales are output.

SELECT *
FROM sales
WHERE date_fulfilled IS NOT NULL
ORDER BY date_fulfilled DESC
LIMIT 3;

OFFSET

The keyword OFFSET is needed to skip a certain number of rows. This can be very useful for a pagination.

SELECT *
FROM sales
WHERE date_fulfilled IS NOT NULL
ORDER BY id
LIMIT 5 OFFSET 3

DISTINCT

The keyword DISTINCT is used for dropping all dublicates in the result set.

SELECT DISTINCT customer_name
FROM sales

Related Data

As described in the previous chapters, it is possible to link several tables together. Via the so-called JOIN keyword the columns of the different tables can then be requested.

Unicorns:

id name age location_id
1 Friendcorn 175 1
2 Lazycorn 957 2
3 Strangecorn 5 1

Locations:

id location lat lng
1 Strange Town 175 33
2 Sleepy Hollow 55 199
SELECT u.name, u.age, l.location
FROM unicorns AS u
INNER JOIN locations AS l ON u.location_id = l.id;

Merged Data:

u.name u.age l.location
Friendcorn 175 Strange Town
Lazycorn 957 Sleepy Hollow
Strangecorn 5 Strange Town

In the example above the JOIN keyword was used to query different tables. This is possible because location_id in the unicorns table refers to the primary key of the location table.

The AS keyword was used to avoid a possible name-clash in the respective columns.


Data Normalization

Data Normalization is a concept that reduces data redundancy and increases data minimality. The goal is to split composite and grouped data into multiple, independent values.

Example: Let's assume a user table, in which the column full_name exists. In this column we cannot assume that data values are always stored in the same order. Is the first name in the first or second place? Are first and last name separated by a comma? A better approach in terms of data normalization would be to create two columns, one for the first name and one for the last name.

Equally important for Data Normalization is to split data items across multiple tables.

Example: Let's assume that the table user also has an address column. According to the principle in the example above, the address column would have to be split into postal code, city, street, house number. Now we can assume that address is a separate table. This can be linked to the column in the user table at any time using a foreign key.

Note when to work on Data Normalization:

  1. if columns in a table use the same prefix (address_number, address_street...).
  2. when several fields contain the same records.

Forms of Data Normalization

There are six forms of normalization, but these are very theoretical. Therefore, in this chapter they are converted into simple and practical rules.

The six normalization forms can be viewed here if you are interested.

Simple Rules:

  1. Avoid mixing data entities in the same table.
  2. Avoid multiple values in a single table cell.
  3. Try to avoid splitting basic data across dozens of tables. (Dont't overengineer)

Joins

Joins are needed to work with related data. A distinction is made here between different joins.

  1. inner join
  2. left join
  3. right join
  4. full outer join

Syntax


Inner Join

An INNER JOIN is needed to get the intersection or data from two tables. These must be connected to each other, for example, via a foreign key.

SELECT * FROM unicorns
INNER JOIN locations ON unicorns.location_id = locations.id;

In the example above the columns of locations and unicorns are requested via an INNER JOIN. To indicate the relationship between the two tables, the ON keyword is required. Behind this keyword it is indicated that the location_id from the unicorns table is linked with the id from the locations table via foreign key. The dot notation is needed to indicate which table it is. Without this it would not be clear whether the id from unicorns or locations is meant.

SELECT  u.unicorn_name, u.unicorn_status, l.area_name
FROM unicorns AS U
INNER JOIN locations AS l ON u.location_id = l.id;

To save characters, aliases can be used. This works via the AS keyword followed by the desired alias name.


Combining Multiple Joins

An SQL statement may contain not only one but several joins. This is needed if, for example, data from several tables are required. An example can be found below.

SELECT  u.unicorn_name, u.unicorn_status, l.area_name, c.city_name, c.population
FROM unicorns AS U
INNER JOIN locations AS l ON u.location_id = l.id
INNER JOIN cities AS c ON l.city_id = c.id;

Filtering

Joins can also work with a where clause as a filter. In the example below, only records are output where the population is below 9000 and the latitude starts with a 5.

::text can be used to cast a number into a string.

SELECT  u.unicorn_name, u.unicorn_status, l.area_name, c.city_name, c.population
FROM unicorns AS U
INNER JOIN locations AS l ON u.location_id = l.id
INNER JOIN cities AS c ON l.city_id = c.id
WHERE c.population >= 9000 AND l.lat::text LIKE '5%'
ORDER BY u.unicorn_name;

Left Join

To understand left joins, two new records are created in the locations table with a null value for city_id. An example can be found here
As in the previous chapter, an INNER JOIN only returns rows that have a location_id. So it takes the intersection.

SELECT u.unicorn_name, u.unicorn_status, l.area_name
FROM unicorns AS U
INNER JOIN locations AS l ON u.location_id = l.id;

A LEFT JOIN on the other hand refers to the left table as a whole. In the example below, this is locations. The column values like c.city_name, and c.population are then filled. Column values that cannot be filled because they do not reference another table are set to NULL.

SELECT c.city_name, c.population, l.area_name, l.lat
FROM locations AS l
LEFT JOIN cities AS c ON l.city_id = c.id

Right Join

In reality, right joins are used rather rarely, this is because all right joins can be replaced by a left join. A right join is simply the opposite of a left join. An example can be found below.

SELECT c.city_name, c.population, l.area_name, l.lat
FROM cities AS c
RIGHT JOIN locations AS l ON l.city_id = c.id

Data Relationships

Relationships between data can be classified into three categories (One-to-Many, Many-to-Many and One-to-One). These categories describe the relationship between the records in different tables.
A great example can be taken from here

One to Many

One-to-Many means, that one record in a table (A) has one or many related records in table (B).

Example: A unicorn belongs to one unicorn herd
but a unicorn herd has many unicorns.

Many to Many

Many-to-Many means, that one record in table (A) has one or many related tables in table (B) - and vice versa.

Example: A unicorn has many unicorn_friend
and every unicorn_friend has multiple unicorns.

One to One

One-to-One means, that one record in table (A) belongs to exactly one record in table (B) - and vice versa.

Example: A unicorn has one unique super_power
and one super_power belongs to exact one unicorn