Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize sitemap #12064

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

RealOrangeOne
Copy link
Member

Fixes #1698

This PR encompasses a few small optimisations:

  • Use .specific(defer=True) when getting pages. This means there's no need to fetch the specific page models at all, removing the extra queries which comes with it.
  • Use the DB to calculate the latest lastmod, rather than looping through all pages.
  • Use a slightly simpler / faster way of determining the last modified date when looping through pages (copied from Django)
  • Cache the Wagtail site on the view

The query drops in the tests is deceptive. These changes dropped them by 2 in the tests - the rest are from using a non-database cache.

With ~10000 pages (averaged over 10 requests using gunicorn and oha):

Before:

Slowest:      1.4849 secs
Fastest:      1.0741 secs
Average:      1.1292 secs
Requests/sec: 0.8856

35 DB queries

After:

Slowest:      1.8591 secs
Fastest:      1.4128 secs
Average:      1.4964 secs
Requests/sec: 0.6683

18 DB queries


I'm not sure why the updated version appears slower - although I think it's within the margin of error. It's running fewer queries, and defer=True should make the codepaths much simpler, too. The tests were run on bakerydemo and I suspect with more than just a single model the improvements may be greater still.

This removes the need to do subsequent custom queries for a page
`Site.get_for_request` is already cached, but this ensures even no site
is cached, and keeps the cache closer to the class.
This removes the need to load all pages as part of the sitemap index
views.
This code is copied from how Django does it
Copy link

squash-labs bot commented Jun 19, 2024

Manage this branch in Squash

Test this branch here: https://realorangeoneoptimize-sitemap-nule6.squash.io

@RealOrangeOne
Copy link
Member Author

The cause of the slowdown seems to be related to the use of .specific(defer=True). As this is the main optimisation, I'll do some digging as to why it's so much slower.

@RealOrangeOne RealOrangeOne marked this pull request as draft June 25, 2024 14:40
@RealOrangeOne
Copy link
Member Author

The issue is indeed from .specific(defer=True), particularly when used as a queryset (as opposed to a singular use of .specific_deferred). I have a rough fix, but it'll pollute this PR too much, so I'll open it as a separate one. This shouldn't be merged until that's in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sitemap module times out with large site
1 participant