Add draft ADR for bigint id column migration #4191

johha · 2025-01-28T11:47:10Z

To avoid overflows of the id primary key column we should migrate it to bigint.
The purpose of this ADR is to find a good approach and to document all relevant decisions.

Feedback is highly appreciated 🙂

jochenehret

Are there any considerations for the Ruby sequel library necessary?

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

Add draft ADR for bigint id column migration Co-authored-by: Johannes Haass <[email protected]>

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

Samze · 2025-02-12T17:48:32Z

Just as a heads up, even though MySQL 5.7 is EOL. Since its still available from certain cloud providers, we technically still support it until 2025-07-31.

So if we're thinking of doing this before that date, we'd need to test against that MySQL version.

johha · 2025-02-14T07:49:33Z

Just as a heads up, even though MySQL 5.7 is EOL. Since its still available from certain cloud providers, we technically still support it until 2025-07-31.

So if we're thinking of doing this before that date, we'd need to test against that MySQL version.

👍 According to the mysql docs there should no downtime when renaming columns (source). Still someone who has access to a copy of a productive ccdb should try out the migration procedure.

Samze · 2025-02-24T16:12:27Z

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

+- Backfill will be executed outside the migration due to its potentially long runtime.
+- If necessary the backfill will run for multiple weeks to ensure all records are processed.
+
+#### Step 3a - Migration Pre Check


question Are 3a/3b steps in different cf-deployment releases? And does this mean that people have to upgrade in sequence for this migration to be successful?

Step 3a is just a check to ensure we can start the migration.
The plan is to ship 3a & 3b with the same CAPI/CF-D release. We will have one major release for steps 1 & 2 which prepares the migration and another major for steps 3a & 3b for the actual migration.
Between those two releases there should be reasonable time - maybe something like 60 days.

Whats the behaviour if someone goes from a cfd version before this to a cfd version that contains all the changes?

Will the migration still work just with a long deploy time?

One issue for us is that we do not enforce sequential updates of patch releases for customers. So it'll be difficult for us to do this phased approach.

CF-D has a rule that major versions cannot be skipped (Docs):

No major version must be skipped when updating cf-deployment. If an operator skips a major version, a seamless update is no longer guaranteed. This additional constraint is necessary for features which must be rolled out step-by-step, e.g. complex database migrations.

Samze · 2025-02-24T16:20:36Z

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

+- Add a `CHECK` constraint to verify that id_bigint is fully populated (`id_bigint == id & id_bigint != NULL`).
+- In case the backfill is not yet complete or the `id_bigint` column is not fully populated the migration exits gracefully and is retried in the next deploy.
+#### Step 3b - Actual Migration
+- Remove the `CHECK` constraint once verified.


question Do we need to know that the check constraint was successfully added in 3a before we can start the migration?

Adding the check constraint in 3a would fail if there are still missing entries.

Samze · 2025-02-24T16:21:04Z

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

+
+#### Step 3a - Migration Pre Check
+- Add a `CHECK` constraint to verify that id_bigint is fully populated (`id_bigint == id & id_bigint != NULL`).
+- In case the backfill is not yet complete or the `id_bigint` column is not fully populated the migration exits gracefully and is retried in the next deploy.


question Could you clarify migration exits gracefully? Does the deploy still succeed?

Maybe it would be actually better if the migration fails and then the next deploy would retry it.
We'll check this again.

+1 for letting the migration fail if backfill is not yet complete.

This will be the case when an update over multiple major cf-d versions gets applied so that step 1 and step 3 run directly one after each other (i.e. step 2 = backfill doesn't have a chance to run).
We need a procedure how to resolve such situations. E.g. running the backfill manually as a rake task.

nookala · 2025-02-24T20:05:10Z

decisions/0013-migrating-int-to-bigint-for-primary-keys.md

+Unknown at this point. Further investigation is needed.
+
+### Rollback Mechanism
+The old `id` column is no longer retained, as the `CHECK` constraint ensures correctness during migration.


Why not rename the old id column and drop later if required?

We thought about that but if we keep the column by renaming it to e.g. id_old it will be nil for newly inserted rows as it is no longer the primary key.
By adding some function and trigger similar to step 1 id_old could be still filled but this adds additional complexity and we would need another migration step to clean up id_old.
As the migration step 3b is executed in a single transaction it will be rolled backed automatically if something goes wrong. Thus we don't need the id_old column

abg · 2025-02-25T16:30:40Z

In MySQL these sort PK constraint changes (dropping, adding or dropping+adding concurrently) suggested in this proposal will result in table rebuild(s):

https://dev.mysql.com/doc/refman/8.4/en/innodb-online-ddl-operations.html#online-ddl-primary-key-operations

A MySQL primary key is "clustered" and referenced in every secondary index as a sort of "row pointer" so fiddling with primary keys like this is a very expensive operation in MySQL.

In some MySQL environments, this may not block writes if applied appropriately with the right database configuration. Environments using clustering solutions like Galera (e.g. pxc-release in cf-deployment), DDL is a synchronization point for replication and is going to stall writes to the database until the operation is complete.

Typically the MySQL community's approach to this class of problem is creating a new empty "ghost" / "shadow" table with the desired structure and doing a copy / "backfill" operation (typically similar to the process proposed here), and swapping tables at the end.

Some examples of "state-of-the-art" from the MySQL community:

Foreign keys add a lot of difficulty to this process - although that's not necessarily specific to MySQL.

sethboyles · 2025-02-27T22:30:25Z

@johha what do you think about making this an opt-in process? Either the whole process, or at least making Phase 3 able to be deferred at Operator discretion.

We're still in the process of validating this, but we don't think this is a concern for us in the short to medium term. The fact that this is a multi phase process will make it hard to coordinate (as we don't use CFD). We'd prefer to choose when we have our foundations run these migrations.

I'm not sure how easy it would be to set up a second migrations directory or schema_migrations table, and what issues might arise from having it uncertain if id columns are int or bigint (I think that the ORM will take care of that for us, though).

The tables mentioned (events, delayed_jobs, jobs, and app_usage_events) are probably ones that could easily have their increment markers reset, so if it became an issue for us I think that's the first thing solution we would look toward.

johha · 2025-03-03T15:37:31Z

@sethboyles I understand that smaller foundations will probably never run into this problem and thus would like have to option to skip the bigint migration.
The easiest way to implement this would be to add a config flag and if this flag is set all related migrations will result in a no-op (they would still appear in the schema_versions table). Operators choosing this option would be required to apply those migrations manually when the id gets close to a overflow.
Therefore I'd prefer a opt-out process so that operators need to take a conscious decision.
Would this be a valid option for you?

We're also discussing currently if we should change the default type for id columns to bigint when creating new schemas. With this at least newly created foundations would never run into a overflow situation. Any thoughts on that?

sethboyles · 2025-03-05T22:54:13Z

I think that's a potential solution. There's still a chance that instead of never running into this problem it won't be a problem for 5-10 years, and then we will need to think if we want some different solution. We're still collecting numbers to validate this.

For new foundations to have bigint as the default instead of int, that seems reasonable to me.

johha · 2025-03-06T08:51:35Z

I added a description of the opt out procedure and some mysql specifics to the ADR.

johha force-pushed the bigint-migration-adr branch 3 times, most recently from 8790c52 to 41c4af3 Compare February 4, 2025 13:37

jochenehret reviewed Feb 4, 2025

View reviewed changes

decisions/0013-migrating-int-to-bigint-for-primary-keys.md Outdated Show resolved Hide resolved

decisions/0013-migrating-int-to-bigint-for-primary-keys.md Outdated Show resolved Hide resolved

stephanme reviewed Feb 5, 2025

View reviewed changes

c874d65

Add draft ADR for bigint id column migration Co-authored-by: Johannes Haass <[email protected]>

johha force-pushed the bigint-migration-adr branch from 41c4af3 to c874d65 Compare February 6, 2025 07:40

Remove id_old column related steps

6908e4a

johha requested review from stephanme and jochenehret February 7, 2025 08:23

stephanme requested changes Feb 7, 2025

View reviewed changes

Remove general rollback section

e7ca2eb

stephanme previously approved these changes Feb 24, 2025

View reviewed changes

johha marked this pull request as ready for review February 24, 2025 15:57

Samze reviewed Feb 24, 2025

View reviewed changes

nookala reviewed Feb 24, 2025

View reviewed changes

Add opt-out flag and mysql specifics

ba7d81c

johha dismissed stephanme’s stale review via ba7d81c March 6, 2025 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add draft ADR for bigint id column migration #4191

Add draft ADR for bigint id column migration #4191

johha commented Jan 28, 2025

jochenehret left a comment

Samze commented Feb 12, 2025

johha commented Feb 14, 2025

Samze Feb 24, 2025 •

edited

Loading

johha Feb 25, 2025

Samze Feb 25, 2025 •

edited

Loading

johha Feb 25, 2025

Samze Feb 24, 2025

johha Feb 25, 2025

Samze Feb 24, 2025

johha Feb 25, 2025

stephanme Mar 7, 2025

nookala Feb 24, 2025

johha Feb 25, 2025

abg commented Feb 25, 2025

sethboyles commented Feb 27, 2025 •

edited

Loading

johha commented Mar 3, 2025

sethboyles commented Mar 5, 2025

johha commented Mar 6, 2025

Add draft ADR for bigint id column migration #4191

Are you sure you want to change the base?

Add draft ADR for bigint id column migration #4191

Conversation

johha commented Jan 28, 2025

jochenehret left a comment

Choose a reason for hiding this comment

Samze commented Feb 12, 2025

johha commented Feb 14, 2025

Samze Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Samze Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abg commented Feb 25, 2025

sethboyles commented Feb 27, 2025 • edited Loading

johha commented Mar 3, 2025

sethboyles commented Mar 5, 2025

johha commented Mar 6, 2025

Samze Feb 24, 2025 •

edited

Loading

Samze Feb 25, 2025 •

edited

Loading

sethboyles commented Feb 27, 2025 •

edited

Loading