-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Back-fill token.user_id
#8516
Back-fill token.user_id
#8516
Conversation
98f2250
to
a9884fd
Compare
tokens_query = select(Token).where(Token.user_id.is_(None)).limit(1000) | ||
count = 0 | ||
|
||
while tokens := session.scalars(tokens_query).all(): | ||
for token in tokens: | ||
username, authority = split_userid(token.userid) | ||
token.user_id = session.scalars( | ||
select(User.id).where( | ||
User.username == username, User.authority == authority | ||
) | ||
).one() | ||
count += 1 | ||
session.commit() | ||
log.info("Back-filled %d token.user_id's", count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 312,655
tokens in production right now so this will do 313
separate DB transactions. It'll do 313
SELECT TOKEN ...
queries, 312,655
SELECT "user".id ...
queries and 312,655
UPDATE token ...
queries.
I could add a time.sleep
in the loop to slow it down, to make sure it doesn't put too much pressure on the DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For big updates like this without a DB with the same data is difficult to make good guesses.
My intuition it that this will be much faster with less queries and doing more work on SQL instead of python.
But this seems perfectly doable in the time scale of a migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not speed I'm going for, it's trying to make sure that running the migration doesn't disrupt other DB queries, e.g. by using up too many of the DB's resources at once (e.g. DB CPU usage), or by holding a lock on a table for too long (in this case the token
table when running the UPDATE
query). So it's actually the opposite: I'm actually trying to make it go slower, and in separate transactions, so that other requests have a chance to jump in and get their work done in-between these transactions. Lots of small queries, rather than one big one.
a9884fd
to
0549e8a
Compare
No description provided.