You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you so much for releasing the browser gene model Hail Table! It's so helpful to have all the relevant transcript and coding sequence annotations pulled together and harmonized in one place.
I wanted to note that for a relatively small dataset (~60k rows), there are a lot of partitions (~2k), and that's causing even simple, small joins on the table to run relatively slowly. It's not prohibitively slow, but it is a noticeable inefficiency, so I was wondering if there was a deliberate design decision around the number of partition in the release, or if this is something you could adjust.
Thank you!
Grace
The text was updated successfully, but these errors were encountered:
Hey @gtiao! Hope all is well. Thanks for the suggestion, and for filing this issue.
This is certainly something we can adjust, we'll look into a rule of thumb for a better amount of partitions for this table with respect to its number of rows/columns.
Since it sounds like this isn't blocking at the moment, we'll likely add this repartition step to the end of our pipeline, and not re-run the pipeline immediately. As far as I know, the next time we anticipate updating the gene model tables would be to include GTEx v10 when it releases, so at that time the released tables will have less partitions. Does that seem reasonable?
Hi team,
Thank you so much for releasing the browser gene model Hail Table! It's so helpful to have all the relevant transcript and coding sequence annotations pulled together and harmonized in one place.
I wanted to note that for a relatively small dataset (~60k rows), there are a lot of partitions (~2k), and that's causing even simple, small joins on the table to run relatively slowly. It's not prohibitively slow, but it is a noticeable inefficiency, so I was wondering if there was a deliberate design decision around the number of partition in the release, or if this is something you could adjust.
Thank you!
Grace
The text was updated successfully, but these errors were encountered: