Skip to content

Commit

Permalink
Update our EDD process documentation (#166)
Browse files Browse the repository at this point in the history
* Initial pass at updating our EDD database change processes

* Fix file name of new image

* Switch from 'rerunnable' to 'repeatable'

* Push the quick fixes from feedback

* Removed repeated use of Fowler's name as well as the repeated use of EDD

* Fix the image caption

* Removing all added personal pronouns

* Use markdown text styling

* Rename the application code version in the Phase definitions to be more clear

* Update terminology definitions

* Update language to be more focused

* Accepted introduction summary improvements

* Accepted suggested changes.

* Accept revised defenition and examples of non-destructive changes

* Update docs/contributing/database-migrations/edd.mdx

Co-authored-by: Thomas Avery <[email protected]>

* addtional updates

* revert to JSX to fix the broken image link

* Tweak the EDD doc (#184)

---------

Co-authored-by: Thomas Avery <[email protected]>
Co-authored-by: Oscar Hinton <[email protected]>
  • Loading branch information
3 people authored Sep 26, 2023
1 parent f979c32 commit 9bd0a72
Show file tree
Hide file tree
Showing 7 changed files with 224 additions and 69 deletions.
204 changes: 145 additions & 59 deletions docs/contributing/database-migrations/edd.mdx
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

# Evolutionary Database Design
# Evolutionary database design

At Bitwarden we follow
[Evolutionary Database Design (EDD)](https://en.wikipedia.org/wiki/Evolutionary_database_design).
EDD describes a process where the database schema is continuously updated while still ensuring
compatibility with older releases by using database transition phases.
At Bitwarden we follow [Evolutionary Database Design (EDD)][edd-wiki]. EDD describes a process where
the database schema is continuously updated while still ensuring compatibility with older releases
by defining a database transition phases.

In short the Database Schema for the Bitwarden Server **must** support the previous release of the
server. The database migrations will be performed before the code deployment, and in the event of a
release rollback the database schema will **not** be updated.
Bitwarden also needs to support:

- **Zero-downtime deployments**: Which means that multiple versions of the application will be
running concurrently during the deployment window.
- **Code rollback**: Critical defects in code should be able to be rolled back to the previous
version.

To fulfill these additional requirements the database schema **must** support the previous release
of the server.

<bitwarden>

Expand All @@ -24,26 +29,76 @@ For background on this decision please see the [Evolutionary Database Design RFD

## Design

### Nullable
Database changes can be categorized into two categories: destructive and non-destructive changes
\[[1](./edd#further-reading)\]. A destructive change prevents existing functionality from working as
expected without an accompanying code change. A non-destructive change is the opposite: a database
change that does not require a code change to allow the non-application to continue working as
expected.

### Non-destructive changes

Many database changes can be designed in a backwards compatible manner by using a mix of nullable
fields and default values in the database tables, views, and stored procedures. This ensures that
the stored procedures can be called without the new columns and allow them to run with both the old
and new code.

### Destructive changes

Any change that cannot be done in a non-destructive manner is a destructive change. This can be as
simple as adding a non nullable column where the value needs to be computed from existing fields, or
renaming an existing column. To handle destructive changes it's necessary to break them up into
three phases: _Start_, _Transition_, and _End_ as shown in the diagram below.

<figure>

![Refactoring Stages](./transitions.png)

<figcaption>Refactoring Phases</figcaption>

</figure>

It's worth noting that the _Refactoring Phases_ are usually rolling, and the _End phase_ of one
refactor is the _Transition phase_ of another. The table below details which application releases
needs to be supported during which database phase.

Database tables, views and stored procedures should almost always use either nullable fields or have
a default value. Since this will allow stored procedures to omit columns, which is a requirement
when running both old and new code.
| Database Phase | Release X | Release X+1 | Release X+2 |
| -------------- | --------- | ----------- | ----------- |
| Start ||||
| Transition ||||
| End ||||

### EDD Process
### Migrations

The EDD breaks up each database migration into three phases. _Start_, _Transition_ and _End_.
The three different migrations described in the diagram above are, _Initial migration_, _Transition
migration_ and _ Finalization migration_.

![Refactoring Stages](./stages_refactoring.jpg)
[https://www.martinfowler.com/articles/evodb.html#TransitionPhase](https://www.martinfowler.com/articles/evodb.html#TransitionPhase)
#### Initial migration

This necessitates two different database migrations. The first migration adds new content and is
backwards compatible with the existing code. The second migration removes content and is not
backwards compatible with that same code prior to the first migration.
The initial migration runs before the code deployment, and its purpose is to add support for
_Release X+1_ without breaking support of _Release X_. The migration should execute quickly and not
contain any costly operations to ensure zero downtime.

#### Transition migration

The transition migration are run sometime during the transition phase, and provides an optional data
migration should it be too slow or put too much load on the database, or otherwise make it
unsuitable for the _Initial migration_.

- Compatible with _Release X_ **and** _Release X+1_ application.
- Only data population migrations may be run at this time, if they are needed
- Must be run as a background task during the Transition phase.
- Operation is batched or otherwise optimized to ensure the database stays responsive.
- Schema changes are NOT to be run during this phase.

#### Finalization migration

The finalization migration removes the temporary measurements that were needed to retain backwards
compatibility with _Release X_, and the database schema henceforth only supports _Release X+1_.
These migrations are run as part of the deployment of _Release X+2_.

### Example

Lets look at an example, the rename column refactor is shown in the image below.
Let's look at an example, the rename column refactor is shown in the image below.

![Rename Column Refactor](./rename-column.gif)

Expand Down Expand Up @@ -73,7 +128,7 @@ actions.
:::

<Tabs>
<TabItem value="first" label="First Migration" default>
<TabItem value="first" label="Initial Migration" default>

```sql
-- Add Column
Expand Down Expand Up @@ -120,7 +175,7 @@ END
```

</TabItem>
<TabItem value="data" label="Data Migration">
<TabItem value="data" label="Transition Migration">

```sql
UPDATE [dbo].Customer SET
Expand All @@ -129,7 +184,7 @@ WHERE FirstName IS NULL
```

</TabItem>
<TabItem value="second" label="Second Migration">
<TabItem value="second" label="Finalization Migration">

```sql
-- Remove Column
Expand Down Expand Up @@ -173,65 +228,96 @@ END
</TabItem>
</Tabs>

## Workflow
## Deployment orchestration

There are some important constraints to the implementation of the process:

- Bitwarden Production environments are required to be on at all times
- Self-host instances must support the same database change process; however, they do not have the
same always-on application constraint
- Minimization of manual steps in the process

The process to support all of these constraints is a complex one. Below is an image of a state
machine that will hopefully help visualize the process and what it supports. It assumes that all
database changes follow the standards that are laid out in [Migrations](./).

---

![Bitwarden EDD State Machine](./edd_state_machine.jpg) \[Open Image in a new tab for better
viewing\]

---

The Bitwarden specific workflow for writing migrations are described below.
### Online environments

### Developer
Schema migrations and data migrations as just migrations. The underlying implementation issue is
orchestrating the runtime constraints on the migration. Eventually, all migrations will end up in
`DbScripts`. However, to orchestrate the running of _Transition_ and associated _Finalization_
migrations, they are kept outside of `DbScripts` until the correct timing.

The development flow is described in [Migrations](./).
In environments with always-on applications, _Transition_ scripts must be run after the new code has
been rolled out. To execute a full deploy, all new migrations in `DbScripts` are run, the new code
is rolled out, and then all _Transition_ migrations in the `DbScripts_transition` directory are run
as soon as all of the new code services are online. In the case of a critical failure after the new
code is rolled out, a Rollback would be conducted (see Rollbacks below). _Finalization_ migrations
will not be run until the start of the next deploy when they are moved into `DbScripts`.

### Devops
After this deploy, to prep for the next release, all migrations in `DbScripts_transition` are moved
to `DbScripts` and then all migrations in `DbScripts_finalization` are moved to `DbScripts`,
conserving their execution order for a clean install. For the current branching strategy, PRs will
be open against `master` when `rc` is cut to prep for this release. This PR automation will also
handle renaming the migration file and updating any reference of `[dbo_finalization]` to `[dbo]`.

#### On `rc` cut
The next deploy will pick up the newly added migrations in `DbScripts` and set the previously
repeatable _Transition_ migrations to no longer be repeatable, execute the _Finalization_
migrations, and then execute any new migrations associated with the code changes that are about to
go out.

Create a PR moving the future scripts.
The state of migrations in the different directories at any one time is is saved and versioned in
the Migrator Utility which supports the phased migration process in both types of environments.

- `DbScripts_future` to `DbScripts`, prefix the script with the current date, but retain the
existing date.
- `dbo_future` to `dbo`.
<bitwarden>
<li>
Create a ticket in Jira with a `Due Date` of the release date to ensure future migrations are
merged in and ready to be executed. Set the ticket that created the future migration as a
blocker.
</li>
</bitwarden>
### Offline environments

#### After server release
The process for offline environments is similar to the always-on ones. However, since they do not
have the constraint of always being on, the _Initial_ and _Transition_ migrations will be run one
after the other:

1. Run whatever data migration scripts might be needed. (This might need to be batched and executed
until all the data has been migrated)
2. After having the server run for a while execute the future migration script to clean up the
database.
- Stop the Bitwarden stack as done today
- Start the database
- Run all new migrations in `DbScripts` (both _Finalization_ migrations from the last deploy and any
_Initial_ migrations from the deploy currently going out)
- Run all _Transition_ migrations
- Restart the Bitwarden stack.

## Rollbacks

In the event the server release failed and needs to be rolled back, it should be as simple as just
re-deploying the previous version again. The database will **stay** in the transition phase until a
hotfix can be released, and the server can be updated.
patch can be released, and the server can be updated. Once a patch is ready to go out, it is
deployed the _Transition_ migrations are rerun to verify that the DB is in the state that it is
required to be in.

The goal is to resolve the issue quickly and re-deploy the fixed code to minimize the time the
database stays in the transition phase. Should a feature need to be completely pulled, a new
migration needs to be written to undo the database changes and the future migration will also need
to be updated to work with the database changes. This is generally not recommended since pending
migrations (for other releases) will need to be revisited.
Should a feature need to be completely pulled, a new migration needs to be written to undo the
database changes and the future migration will also need to be updated to work with the database
changes. This is generally not recommended since pending migrations (for other releases) will need
to be revisited.

## Testing

Prior to merging a PR please ensure that the database changes run well on the currently released
version. We currently do not have an automated test suite for this and it’s up to the developers to
ensure their database changes run correctly against the currently released version.

## Further Reading
## Further reading

- [Evolutionary Database Design](https://martinfowler.com/articles/evodb.html) (Particularly
[All database changes are database refactorings](https://martinfowler.com/articles/evodb.html#AllDatabaseChangesAreMigrations))
- [The Agile Data (AD) Method](http://agiledata.org/) (Particularly
[Catalog of Database Refactorings](http://agiledata.org/essays/databaseRefactoringCatalog.html))
- [Refactoring Databases: Evolutionary Database](https://databaserefactoring.com/)
- Refactoring Databases: Evolutionary Database Design (Addison-Wesley Signature Series (Fowler))
ISBN-10: 0321774515
1. [Evolutionary Database Design](https://martinfowler.com/articles/evodb.html) (Particularly
[All database changes are database refactorings](https://martinfowler.com/articles/evodb.html#AllDatabaseChangesAreMigrations))
2. [The Agile Data (AD) Method](http://agiledata.org/) (Particularly
[Catalog of Database Refactorings](http://agiledata.org/essays/databaseRefactoringCatalog.html))
3. [Refactoring Databases: Evolutionary Database](https://databaserefactoring.com/)
4. Refactoring Databases: Evolutionary Database Design (Addison-Wesley Signature Series (Fowler))
ISBN-10: 0321774515

[edd-wiki]: https://en.wikipedia.org/wiki/Evolutionary_database_design
[edd-rfd]:
https://bitwarden.atlassian.net/wiki/spaces/PIQ/pages/177701412/Adopt+Evolutionary+database+design
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 21 additions & 10 deletions docs/contributing/database-migrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
sidebar_position: 2
---

# Database Migrations
# Database migrations

## Applying Migrations
## Applying migrations

We use a `migrate.ps1` PowerShell script to apply migrations to the local development database. This
script handles the different database providers that we support.
Expand All @@ -13,12 +13,12 @@ For instructions on how to use `migrate.ps1`, see the Getting Started section fo
[MSSQL](../../getting-started/server/database/mssql/index.md#updating-the-database) and
[Entity Framework](../../getting-started/server/database/ef/index.mdx#migrations)

## Creating Migrations for New Changes
## Creating migrations for new changes

Any database change must be scripted as a migration for both our primary DBMS - MSSQL - as well as
for Entity Framework. Follow the instructions below for each provider.

### MSSQL Migrations
### MSSQL migrations

:::tip

Expand All @@ -37,24 +37,24 @@ It is possible that a change may not require a non-backwards-compatible end phas
may be backwards-compatible in their final form). In that case, only one phase of changes is
required.

#### Backwards Compatible Migration
#### Backwards compatible migration

1. Modify the source `.sql` files in `src/Sql/dbo`.
2. Write a migration script, and place it in `util/Migrator/DbScripts`. Each script must be prefixed
with the current date.

#### Non-Backwards Compatible Migration
#### Non-backwards compatible migration

1. Copy the relevant `.sql` files from `src/Sql/dbo` to `src/Sql/dbo_future`.
1. Copy the relevant `.sql` files from `src/Sql/dbo` to `src/Sql/dbo_finalization`.
2. Remove the backwards compatibility that is no longer needed.
3. Write a new Migration and place it in `src/Migrator/DbScripts_future`. Name it
`YYYY-0M-FutureMigration.sql`.
3. Write a new Migration and place it in `src/Migrator/DbScripts_finalization`. Name it
`YYYY-0M-FinalizationMigration.sql`.
- Typically migrations are designed to be run in sequence. However since the migrations in
DbScripts_future can be run out of order, care must be taken to ensure they remain compatible
with the changes to DbScripts. In order to achieve this we only keep a single migration, which
executes all backwards incompatible schema changes.

### EF Migrations
### EF migrations

If you alter the database schema, you must create an EF migration script to ensure that EF databases
keep pace with these changes. Developers must do this and include the migrations with their PR.
Expand All @@ -72,4 +72,15 @@ pwsh ef_migrate.ps1 [NAME_OF_MIGRATION]

This will generate the migrations, which should then be included in your PR.

### [Not Yet Implemented] Manual MSSQL migrations

There may be a need for a migration to be run outside of our normal update process. These types of
migrations should be saved for very exceptional purposes. One such reason could be an Index rebuild.

1. Write a new Migration with a prefixed current date and place it in
`src/Migrator/DbScripts_manual`
2. After it has been run against our Cloud environments and we are satisfied with the outcome,
create a PR to move it to `DbScripts`. This will enable it to be run by our Migrator processes in
self-host and clean installs of both cloud and self-host environments

[code-style-sql]: ../code-style/sql.md
Binary file not shown.
Loading

0 comments on commit 9bd0a72

Please sign in to comment.