Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up indexing with cursor_tuple_fraction pg option #11382 #11439

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Sep 6, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

See #11382

Issues Solved

Closes #11382

Checklist

  • I targeted one of these branches:
    • dev/8.0.x (under development): features, bugfixes not covered below
  • I added a changelog in arches/releases
  • I submitted a PR to arches-docs (if appropriate)
  • Unit tests pass locally with my changes
  • I added tests that prove my fix is effective or that my feature works
  • My test fails on the target branch

Ticket Background

  • Sponsored by: Farallon

@aarongundel
Copy link
Contributor

I don't understand what this is doing - I don't see this option listed https://docs.djangoproject.com/en/5.1/ref/databases/#postgresql-connection-settings in here. Can you expand on this or point me to some documentation?

@jacobtylerwalls
Copy link
Member Author

Sorry, the link was buried in the issue:

We use QuerySet.iterator() in a couple places for indexing and deletion, and we tend to fully exhaust those iterators, so we would probably benefit from setting the cursor_tuple_fraction option:

By default, PostgreSQL assumes that only the first 10% of the results of cursor queries will be fetched. The query planner spends less time planning the query and starts returning results faster, but this could diminish performance if more than 10% of the results are retrieved.

Django link
PG link

This a little bit getting out in front of the issue since I don't have an example in hand where we're being bitten by this (e.g. where we're suffering with a slower query plan), but it seems good to go ahead and bias this toward fetching the whole result, as every iterator() we've used is always fully exhausted.

PS -- I'm already using this in the lingo settings, where the concept tree is built up using an iterator().

Copy link
Contributor

@aarongundel aarongundel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aarongundel aarongundel merged commit 6989e7f into dev/8.0.x Oct 2, 2024
6 checks passed
@aarongundel aarongundel deleted the jtw/cursor-tuple-fraction branch October 2, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tune performance of iterator() calls with cursor_tuple_fraction
2 participants