Slow performance for queries that return many items #3544

dancoates · 2024-06-21T07:02:26Z

Hello, I'm trying to improve the performance of a graphql query that looks like:

query GetSampleEidMapQuery($project: String!) {
  project(name: $project) {
    samples {
      assays {
        id
        externalIds
        meta
      }
    }
  }
}

This returns a result that has around 3000 samples, and each sample has between 1 and 4 assays. So less than 10,000 objects in total. The query takes between 3 and 5 seconds and returns around 600kB of json. So it's not a small amount of data but also not exactly huge. I initially thought this might be slow SQL queries but it turns out around 85% of the query time is in strawberry processing the results. Here's the pyinstrument profiling that shows this pyinstrument.html.zip

Is there anything that can be done to reduce the time that it takes for strawberry to handle results? I've tried both the ParserCache and ValidationCache as well as disabling validation entirely but unfortunately that made very little difference.

Not sure if it helps but this is our graphql schema: https://github.com/populationgenomics/metamist/blob/dev/api/graphql/schema.py

Thank you!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

erikwrede · 2024-06-21T08:31:16Z

Hey dan, thanks for the report. This is a known problem in the GraphQL reference implementation and we're actively investigating ways to fix this. If you have any specific ideas, we'd highly appreciate any input.
Here are some related issues:

Significant performance hit when using async resolvers graphql-python/graphql-core#190

dancoates · 2024-06-23T23:32:10Z

Hi Erik, thanks for the quick response. It does sound like a tricky one to solve! Feel free to close this if you'd like as it sounds like the issue isn't within strawberry and it is a well known issue in graphql-core.

dancoates · 2024-06-23T23:46:41Z

Actually, sorry, I've just had another look at the profiling I included in the issue, and it does seem like the majority of the time is spent within strawberry, particularly it seems like lots of time is spent in inferring the schema from the types? But I could well be reading the profiling wrong, I'm pretty new to python profiling.

dancoates · 2024-06-24T01:52:12Z

I've narrowed this down further to our usage of a input that has a generic type on the assays field on a sample. Changing this line https://github.com/populationgenomics/metamist/blob/dev/api/graphql/schema.py#L754 to use a non generic type cuts the execution time from 5 seconds down to 1

dancoates · 2024-06-24T02:33:51Z

Sorry to spam messages, I've made a minimal repro repo to help show the issue:
https://github.com/dancoates/strawberry-generic-input-repro

patrick91 · 2024-06-25T17:42:48Z

@dancoates could you test this pre-release if you have time?

poetry add strawberry-graphql==0.235.1.dev.1719337273

dancoates · 2024-06-26T01:32:05Z

Hi @patrick91 I can confirm that the pre-release is a big performance improvement. Thank you very much for the quick fix!

patrick91 mentioned this issue Jun 24, 2024

Add benchmark for generic inputs #3547

Merged

dancoates mentioned this issue Jun 24, 2024

Workaround for graphql performance issue when using generic input types populationgenomics/metamist#836

Closed

patrick91 mentioned this issue Jun 24, 2024

Add caching for resolve #3549

Merged

dancoates mentioned this issue Jun 26, 2024

Upgrade to dev version of strawberry graphql to improve performance of resolvers with generic inputs populationgenomics/metamist#841

Merged

patrick91 closed this as completed in #3549 Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance for queries that return many items #3544

Slow performance for queries that return many items #3544

dancoates commented Jun 21, 2024 •

edited by polar-sh bot

Loading

erikwrede commented Jun 21, 2024

dancoates commented Jun 23, 2024

dancoates commented Jun 23, 2024 •

edited

Loading

dancoates commented Jun 24, 2024

dancoates commented Jun 24, 2024

patrick91 commented Jun 25, 2024

dancoates commented Jun 26, 2024

Slow performance for queries that return many items #3544

Slow performance for queries that return many items #3544

Comments

dancoates commented Jun 21, 2024 • edited by polar-sh bot Loading

Upvote & Fund

erikwrede commented Jun 21, 2024

dancoates commented Jun 23, 2024

dancoates commented Jun 23, 2024 • edited Loading

dancoates commented Jun 24, 2024

dancoates commented Jun 24, 2024

patrick91 commented Jun 25, 2024

dancoates commented Jun 26, 2024

dancoates commented Jun 21, 2024 •

edited by polar-sh bot

Loading

dancoates commented Jun 23, 2024 •

edited

Loading