Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging table should be periodically truncated #778

Closed
audiodude opened this issue Dec 27, 2024 · 3 comments · Fixed by #823
Closed

Logging table should be periodically truncated #778

audiodude opened this issue Dec 27, 2024 · 3 comments · Fixed by #823

Comments

@audiodude
Copy link
Member

The logging table is set up to grow indefinitely. Recently we had an outage because it got too full:

+--------------------+---------------------------------------+------------+
| Database           | Table                                 | Size in MB |
+--------------------+---------------------------------------+------------+
| enwp10_prod        | logging                               |   42164.95 |
| enwp10_prod        | ratings                               |    8176.63 |
| enwp10_prod        | moves                                 |     750.84 |
| enwp10_prod        | global_articles                       |     686.00 |
| enwp10_prod        | selection_data                        |     181.89 |
| enwp10_prod        | cache                                 |      91.86 |
| enwp10_prod        | namespacename                         |      10.55 |
| enwp10_prod        | categories                            |       8.53 |
| enwp10_prod        | reviews                               |       5.03 |
| enwp10_prod        | workingselection                      |       2.52 |
| enwp10_prod        | automatedselection                    |       2.52 |
| enwp10_prod        | builders                              |       2.52 |
| enwp10_prod        | manualselectionlog                    |       0.77 |
| enwp10_prod        | projects                              |       0.34 |
| enwp10_prod        | manualselection                       |       0.30 |
| enwp10_prod        | selections                            |       0.13 |
| enwp10_prod        | zim_files                             |       0.08 |
| enwp10_prod        | users                                 |       0.06 |
| enwp10_prod        | global_rankings                       |       0.03 |
| enwp10_prod        | releases                              |       0.03 |
+--------------------+---------------------------------------+------------+

This was accelerated by the fact that "Articles by Quality" and "Category class" templates were merged, which basically generated a superfluous log entry for every article. However, even if that hadn't happened, the logs table would have eventually run out of space, since it only grows over time.

Since we only ever show logs for the past 7 days, we don't need to hold data older than 7 days. Although #49 would make this easier and bring its own benefits, at the very least we could have a cron job that deletes data older than 7 days, or do it as part of generating the logs.

@Shiv-aurora
Copy link

Hi! I'm Shivam, interested in contributing to Kiwix for GSoC 2025, and this issue seems like a great fit for my skills.

Quick question before I start: do you prefer having a separate cron job for truncating logs, or would you prefer integrating this cleanup directly within the log generation process? Happy to work on either approach.
Thanks!

@audiodude
Copy link
Member Author

do you prefer having a separate cron job for truncating logs, or would you prefer integrating this cleanup directly within the log generation process? Happy to work on either approach.

I would prefer doing it during generation, because the log generation is relative (last 7 days) and at that point, we know that we are posting a log for those 7 days and can delete anything older.

@elfkuzco
Copy link
Contributor

@Shiv-aurora , how far have you gone with this? I am currently implementing #49 which could render this obsolete and would like to know if you have made significant progress with this or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants