Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repartitioning browser gene model Hail Table? #1643

Open
gtiao opened this issue Oct 16, 2024 · 2 comments
Open

Repartitioning browser gene model Hail Table? #1643

gtiao opened this issue Oct 16, 2024 · 2 comments
Assignees

Comments

@gtiao
Copy link
Contributor

gtiao commented Oct 16, 2024

Hi team,

Thank you so much for releasing the browser gene model Hail Table! It's so helpful to have all the relevant transcript and coding sequence annotations pulled together and harmonized in one place.

I wanted to note that for a relatively small dataset (~60k rows), there are a lot of partitions (~2k), and that's causing even simple, small joins on the table to run relatively slowly. It's not prohibitively slow, but it is a noticeable inefficiency, so I was wondering if there was a deliberate design decision around the number of partition in the release, or if this is something you could adjust.

Thank you!

Grace

@rileyhgrant
Copy link
Contributor

Hey @gtiao! Hope all is well. Thanks for the suggestion, and for filing this issue.

This is certainly something we can adjust, we'll look into a rule of thumb for a better amount of partitions for this table with respect to its number of rows/columns.

Since it sounds like this isn't blocking at the moment, we'll likely add this repartition step to the end of our pipeline, and not re-run the pipeline immediately. As far as I know, the next time we anticipate updating the gene model tables would be to include GTEx v10 when it releases, so at that time the released tables will have less partitions. Does that seem reasonable?

@rileyhgrant rileyhgrant self-assigned this Oct 18, 2024
@gtiao
Copy link
Contributor Author

gtiao commented Oct 19, 2024

Hi Riley! Nope, not a blocking issue, just a nice-to-have suggestion -- and no particular rush. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants