Back-fill `token.user_id` #8516

seanh · 2024-02-13T15:42:41Z

No description provided.

seanh · 2024-02-13T15:46:24Z

h/migrations/versions/8e3417e3713b_back_fill_the_token_user_id_column.py

+    tokens_query = select(Token).where(Token.user_id.is_(None)).limit(1000)
+    count = 0
+
+    while tokens := session.scalars(tokens_query).all():
+        for token in tokens:
+            username, authority = split_userid(token.userid)
+            token.user_id = session.scalars(
+                select(User.id).where(
+                    User.username == username, User.authority == authority
+                )
+            ).one()
+            count += 1
+        session.commit()
+        log.info("Back-filled %d token.user_id's", count)


There are 312,655 tokens in production right now so this will do 313 separate DB transactions. It'll do 313 SELECT TOKEN ... queries, 312,655 SELECT "user".id ... queries and 312,655 UPDATE token ... queries.

I could add a time.sleep in the loop to slow it down, to make sure it doesn't put too much pressure on the DB?

For big updates like this without a DB with the same data is difficult to make good guesses.

My intuition it that this will be much faster with less queries and doing more work on SQL instead of python.

But this seems perfectly doable in the time scale of a migration.

It's not speed I'm going for, it's trying to make sure that running the migration doesn't disrupt other DB queries, e.g. by using up too many of the DB's resources at once (e.g. DB CPU usage), or by holding a lock on a table for too long (in this case the token table when running the UPDATE query). So it's actually the opposite: I'm actually trying to make it go slower, and in separate transactions, so that other requests have a chance to jump in and get their work done in-between these transactions. Lots of small queries, rather than one big one.

seanh requested a review from marcospri February 13, 2024 15:42

seanh force-pushed the back-fill-token.user_id branch from 98f2250 to a9884fd Compare February 13, 2024 15:43

seanh commented Feb 13, 2024

View reviewed changes

seanh changed the title ~~back fill token.user id~~ Back-fill token.user_id Feb 13, 2024

seanh mentioned this pull request Feb 13, 2024

Add a DB migration to make Token.user_id not nullable #8517

Merged

marcospri approved these changes Feb 14, 2024

View reviewed changes

Add a DB migration to back-fill token.user_id

0549e8a

seanh force-pushed the back-fill-token.user_id branch from a9884fd to 0549e8a Compare February 14, 2024 15:52

seanh merged commit 14fbf34 into main Feb 15, 2024
9 checks passed

seanh deleted the back-fill-token.user_id branch February 15, 2024 08:54

seanh mentioned this pull request Feb 15, 2024

Delete API tokens when deleting users #8531

Closed

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back-fill `token.user_id` #8516

Back-fill `token.user_id` #8516

seanh commented Feb 13, 2024 •

edited

Loading

seanh Feb 13, 2024

marcospri Feb 14, 2024

seanh Feb 14, 2024 •

edited

Loading

Back-fill token.user_id #8516

Back-fill token.user_id #8516

Conversation

seanh commented Feb 13, 2024 • edited Loading

seanh Feb 13, 2024

Choose a reason for hiding this comment

marcospri Feb 14, 2024

Choose a reason for hiding this comment

seanh Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Back-fill `token.user_id` #8516

Back-fill `token.user_id` #8516

seanh commented Feb 13, 2024 •

edited

Loading

seanh Feb 14, 2024 •

edited

Loading