Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update generate caids pipeline: add gnomAD v4 and update Hail usage #1419

Merged
merged 3 commits into from
Feb 16, 2024

Conversation

nadeaujoshua
Copy link
Contributor

This PR includes 2 updates to the data pipeline that generates a Hail table for caids:

  1. Add gnomAD v4 to the pipeline
  2. Update Hail imports and their usage to be compatible with newer versions of Hail
    • The get_caids module uses a number of Hail utils that have either been changed, removed or replaced since its last update

@nadeaujoshua nadeaujoshua self-assigned this Feb 14, 2024
@nadeaujoshua nadeaujoshua linked an issue Feb 14, 2024 that may be closed by this pull request
@nadeaujoshua nadeaujoshua changed the title Update pipeline to generate caids: add gnomAD v4 and update Hail usage Update generate caids pipeline: add gnomAD v4 and update Hail usage Feb 14, 2024
@nadeaujoshua nadeaujoshua marked this pull request as ready for review February 14, 2024 17:39
Copy link
Contributor

@mattsolo1 mattsolo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! (without actually running it)

Next step is to add this to the output to the gnomAD v4 variant pipeline.

# "caids_path": "gs://gnomad-browser-data-pipeline/caids/gnomad_v4_caids.ht",

This also means reloading all variant data in Elasticsearch which is a whole thing unto itself...

with tqdm(total=len(remaining_part_urls)) as progress:
logger.warning(f'\n\nParts Counts\nTotal: {len(all_part_urls)}\nCompleted: {len(completed_parts)}\nRemaining: {len(remaining_part_urls)}\n')

with SimpleCopyToolProgressBar(total=len(remaining_part_urls)) as progress:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat I didn't know this progress bar was a thing.

@nadeaujoshua nadeaujoshua merged commit 85950f5 into main Feb 16, 2024
1 check passed
@nadeaujoshua nadeaujoshua deleted the jn/update-caids-pipeline-v4 branch February 16, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate CAIDs for v4 variants
2 participants