Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental Stats deployment fixes #3156

Merged
merged 3 commits into from
Jan 29, 2025
Merged

Incremental Stats deployment fixes #3156

merged 3 commits into from
Jan 29, 2025

Conversation

amCap1712
Copy link
Member

Currently, the database for each stat is generated and the previous database for the same stat is deleted daily. However, incremental stats generates messages only for users with listens in incremental dumps, assuming stats for users without new listens are already available on the LB server. To maintain this assumption, each stats database must have a fixed name. Therefore, avoid sending the database name from the LB server to Spark in stat generation requests. During stats generation, Spark determines whether it's a full or incremental generation. For full stats, a new database name is created and sent; for incremental stats, a database prefix is sent for the LB server to look up the existing database and insert the stats accordingly.

@amCap1712 amCap1712 requested a review from mayhem January 28, 2025 19:26
Currently, the database for each stat is generated and the previous database for the
same stat is deleted daily. However, incremental stats generates messages only for
users with listens in incremental dumps, assuming stats for users without new listens
are already available on the LB server. To maintain this assumption, each stats database
must have a fixed name. Therefore, avoid sending the database name from the LB server to
Spark in stat generation requests. During stats generation, Spark determines whether it's
a full or incremental generation. For full stats, a new database name is created and sent;
for incremental stats, a database prefix is sent for the LB server to look up the existing
database and insert the stats accordingly.
@amCap1712 amCap1712 force-pushed the incremental-cleanup branch from 40f80ec to 18ef743 Compare January 29, 2025 13:40
@amCap1712
Copy link
Member Author

For listening activity, a CROSS JOIN is needed to add rows for days on which the user didn't listen to any listens. I had removed it in a previous PR due a misunderstanding. Its now added back. For the LEFT JOIN, need to pay attention on which table attributes being join-ed on are taken from (user_id and time_range) otherwise they will be NULL for the extra added rows.

@amCap1712 amCap1712 merged commit 8d0a9a5 into master Jan 29, 2025
2 of 3 checks passed
@amCap1712 amCap1712 deleted the incremental-cleanup branch January 29, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants