Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance for queries that return many items #3544

Closed
dancoates opened this issue Jun 21, 2024 · 7 comments · Fixed by #3549
Closed

Slow performance for queries that return many items #3544

dancoates opened this issue Jun 21, 2024 · 7 comments · Fixed by #3549

Comments

@dancoates
Copy link

dancoates commented Jun 21, 2024

Hello, I'm trying to improve the performance of a graphql query that looks like:

query GetSampleEidMapQuery($project: String!) {
  project(name: $project) {
    samples {
      assays {
        id
        externalIds
        meta
      }
    }
  }
}

This returns a result that has around 3000 samples, and each sample has between 1 and 4 assays. So less than 10,000 objects in total. The query takes between 3 and 5 seconds and returns around 600kB of json. So it's not a small amount of data but also not exactly huge. I initially thought this might be slow SQL queries but it turns out around 85% of the query time is in strawberry processing the results. Here's the pyinstrument profiling that shows this pyinstrument.html.zip

Is there anything that can be done to reduce the time that it takes for strawberry to handle results? I've tried both the ParserCache and ValidationCache as well as disabling validation entirely but unfortunately that made very little difference.

Not sure if it helps but this is our graphql schema: https://github.com/populationgenomics/metamist/blob/dev/api/graphql/schema.py

Thank you!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@erikwrede
Copy link
Member

Hey dan, thanks for the report. This is a known problem in the GraphQL reference implementation and we're actively investigating ways to fix this. If you have any specific ideas, we'd highly appreciate any input.
Here are some related issues:

@dancoates
Copy link
Author

Hi Erik, thanks for the quick response. It does sound like a tricky one to solve! Feel free to close this if you'd like as it sounds like the issue isn't within strawberry and it is a well known issue in graphql-core.

@dancoates
Copy link
Author

dancoates commented Jun 23, 2024

Actually, sorry, I've just had another look at the profiling I included in the issue, and it does seem like the majority of the time is spent within strawberry, particularly it seems like lots of time is spent in inferring the schema from the types? But I could well be reading the profiling wrong, I'm pretty new to python profiling.

image

@dancoates
Copy link
Author

I've narrowed this down further to our usage of a input that has a generic type on the assays field on a sample. Changing this line https://github.com/populationgenomics/metamist/blob/dev/api/graphql/schema.py#L754 to use a non generic type cuts the execution time from 5 seconds down to 1

@dancoates
Copy link
Author

Sorry to spam messages, I've made a minimal repro repo to help show the issue:
https://github.com/dancoates/strawberry-generic-input-repro

@patrick91
Copy link
Member

@dancoates could you test this pre-release if you have time?

poetry add strawberry-graphql==0.235.1.dev.1719337273

@dancoates
Copy link
Author

Hi @patrick91 I can confirm that the pre-release is a big performance improvement. Thank you very much for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants