Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back-fill token.user_id #8516

Merged
merged 1 commit into from
Feb 15, 2024
Merged

Back-fill token.user_id #8516

merged 1 commit into from
Feb 15, 2024

Conversation

seanh
Copy link
Contributor

@seanh seanh commented Feb 13, 2024

No description provided.

Comment on lines +26 to +39
tokens_query = select(Token).where(Token.user_id.is_(None)).limit(1000)
count = 0

while tokens := session.scalars(tokens_query).all():
for token in tokens:
username, authority = split_userid(token.userid)
token.user_id = session.scalars(
select(User.id).where(
User.username == username, User.authority == authority
)
).one()
count += 1
session.commit()
log.info("Back-filled %d token.user_id's", count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 312,655 tokens in production right now so this will do 313 separate DB transactions. It'll do 313 SELECT TOKEN ... queries, 312,655 SELECT "user".id ... queries and 312,655 UPDATE token ... queries.

I could add a time.sleep in the loop to slow it down, to make sure it doesn't put too much pressure on the DB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For big updates like this without a DB with the same data is difficult to make good guesses.

My intuition it that this will be much faster with less queries and doing more work on SQL instead of python.

But this seems perfectly doable in the time scale of a migration.

Copy link
Contributor Author

@seanh seanh Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not speed I'm going for, it's trying to make sure that running the migration doesn't disrupt other DB queries, e.g. by using up too many of the DB's resources at once (e.g. DB CPU usage), or by holding a lock on a table for too long (in this case the token table when running the UPDATE query). So it's actually the opposite: I'm actually trying to make it go slower, and in separate transactions, so that other requests have a chance to jump in and get their work done in-between these transactions. Lots of small queries, rather than one big one.

@seanh seanh changed the title back fill token.user id Back-fill token.user_id Feb 13, 2024
@seanh seanh merged commit 14fbf34 into main Feb 15, 2024
9 checks passed
@seanh seanh deleted the back-fill-token.user_id branch February 15, 2024 08:54
@seanh seanh mentioned this pull request Feb 15, 2024
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants