Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Pipeline Genes Error #1099

Closed
theBobiKing opened this issue Mar 28, 2023 · 2 comments
Closed

Data Pipeline Genes Error #1099

theBobiKing opened this issue Mar 28, 2023 · 2 comments
Assignees

Comments

@theBobiKing
Copy link

Hey everybody,

In the last few days, I've been trying to run a data pipeline with a genes argument following this documentation: https://github.com/broadinstitute/gnomad-browser/blob/main/data-pipeline/README.md

Note: Before running data pipeline, I have run ./deploy setup from here https://github.com/broadinstitute/gnomad-browser/tree/main/deploy

However, after ~14/17 minutes of running genes pipeline it fails with the following error message:

Hail version: 0.2.109-b71b065e4bb6
Error summary: MethodTooLargeException: Method too large: __C1355collect_distributed_array_table_text_writer.__m1423split_InsertFields ()V

It fails on GTEX genes (GTEx_Analysis_2016-01-15_v7_RSEMv1.2.22_transcript_tpm.txt.gz - 1.8G large).

Have any of you had a similar problem, and how did you manage to solve it?

Link to stack trace:
https://pastebin.com/at9aGQ9z

Kind regards

@rileyhgrant
Copy link
Contributor

Hello! Thanks for reaching out.

Please see issue #914 for some additional context and discussion.

Essentially, an update to Hail that also fixed some security vulnerabilities starting causing this step to fail.

Since this step of the pipeline is rarely run by our team, this has not been fixed yet even though it has been a known bug. The gnomAD Browser team is most likely intending to fix this ahead of an upcoming Dataset release, so a fix for this is queued up for the relative future (within a few months, most likely).

If this is currently hard blocking you, a fix for this step could take the form of importing the data to a Matrix Table directly, instead of a Hail Table or in the form of transforming the data into a Long, rather than a Wide, data format. Either of these can help get around the central problem of Hail raising this error when there are too many columns in the Hail Table for Hail to handle.

@rileyhgrant
Copy link
Contributor

Closing this issue as it's a functional duplicate of #914

@rileyhgrant rileyhgrant self-assigned this Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants