Skip to content

Reproduce results in the paper Chen et al (2023). "A genomic mutational constraint map using variation in 76,156 human genomes." Nature (2023): 1-11.

Notifications You must be signed in to change notification settings

haedong31/gnomad-nc-constraint

Repository files navigation

Code for gnomAD v3 WGS flagship manuscript

Manuscript at: Chen et al., 2023.

Note: Pre-computed genomic constraint scores (called Gnocchi: Genomic Non-Coding Constraint of HaploInsufficient variation) are available in the Supplementary Dataset 2 & 3 of the paper and can also be downloaded from the gnomAD official website.

This repository provides

  • The computation pipeline for computing the constraint scores from gnomAD v3.1.2 whole-genome sequencing data (N=76,156).
  • The code for generating figures present in the manuscript.

Constraint metric

The main components of the computation pipeline are written in Hail 0.2, which enables scalability to large datasets like gnomAD. The script run_nc_constraint_gnomad_v31_main.py implements QC and the mutational model as described in Chen et al., 2023, which uses the gnomAD genomes and a dataset of every possible variant (~6 billion well-covered variants) to detect depletions of variation (constraint), and generates constraint scores at a 1kb scale across the human genome.

The constraint scores and all intermediate files can be generated by a single run of

python run_nc_constraint_gnomad_v31_main.py 
  -output_bucket "gs://bucket_name/" 
  -output_dir "/path/to/local/dir_name"

which will access the public gnomAD data living on Google Cloud and save all output files (including plain text files and Hail tables) to the designated -output_bucket. A copy of key outputs (tables of estimated mutation rates, possible/observed/expected variant counts, and constraint Z scores) will be saved locally to -output_dir.

Manuscript figures

All figures presented in the manuscript can be reproduced using generate_manuscript_figures.py, which takes a figure number (a single number from 1-5) or “all” (to generate all figures at once). The script will automatically download necessary input files from the public Google Cloud bucket gs://gnomad-nc-constraint-v31-paper/fig_tables/ to a local directory fig_tables/ if the files are not already existing. Please save generate_manuscript_figures.py and fig_utils .py to the same directory. For example

python generate_manuscript_figures.py -fig 1

will generate Figure_1a.pdf and Figure_1b.pdf in the current directory. Similarly, all extended figures in the manuscript can be reproduced by running

python generate_manuscript_efigures.py -efig x

where x can be a single number from 1-8 or “all”.

About

Reproduce results in the paper Chen et al (2023). "A genomic mutational constraint map using variation in 76,156 human genomes." Nature (2023): 1-11.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published