This repository contains all scripts for the manuscript "Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart" by Marderstein, Kundu et al.
For any questions, please contact:
Andrew Marderstein & Soumya Kundu 📧 [email protected], [email protected]
These sets of scripts in the preprocess/
directory generate all of the outputs used for the downstream analyses.
0_process_data/
– Scripts for processing the scATAC-seq data.1_train_chrombpnet/
– Scripts for training the ChromBPNet models.2_score_variants/
– Scripts for scoring the rare, common, and ASD variants.3_shap_variants/
– Scripts for generating DeepLIFT / DeepSHAP contribution scores for both alleles of each variant.4_shap_peaks/
- Scripts for generating DeepLIFT / DeepSHAP contribution scores for scATAC-seq peaks.5_run_modisco/
- Scripts for running TF-MoDISco to identify the motif patterns learned by each model.6_cluster_motifs/
- Scripts for running MotifCompendium to cluster the motif patterns from TF-MoDISco.7_run_finemo/
- Scripts for running Fi-NeMo for identifying motif instances in the genome.
These sets of scripts in the analysis/
directory generate all of the results presented in the manuscript.
These scripts process chromBPNet variant scoring outputs and compile annotation tables for downstream analyses.
- Extract outputs: Run
1_pull_scores.sh
to extract relevant ChromBPNet outputs. - Analyze model performance:
2a_model_performance.R
evaluates performance metrics.2b_model_performance_plot.R
identifies model outliers.
- Annotate variants:
- Use
3a_bed2vcf.Rare.CADD_VEP.R
or3b_bed2vcf.ASD.CADD_VEP.R
to run CADD and VEP. - Process outputs with
3c_Process_CADD_VEP.R
.
- Use
- Merge results: Run
4_mergeData.R
to integrate annotations, merge scores, and remove outliers.
These scripts correspond to the manuscript section "Variants effects are shaped by genomic context and TF binding".
They analyze:
- Genomic context – A variant’s proximity to transcribed regions.
- Cell-type specificity – How constrained or widespread variant effects are.
- Regulatory magnitude – The extent of chromatin accessibility and TF binding changes.
These scripts support the manuscript sections:
- "Context-specific models reveal regulatory effects of fine-mapped eQTLs"
- "Pinpointing disease-relevant variants using cell-type-specific chromatin models"
- "Microglia-driven mechanisms of Alzheimer’s disease risk"
We use chromBPnet-predicted regulatory effects to identify candidate causal variants affecting gene regulation and disease risk.
These scripts correspond to:
- "Ultra-rare variants show larger and more shared regulatory effects than common variants"
- "Fetal neurons shape selective constraint in non-coding regions"
They compare rare and common variant effects to understand the selective pressures that influence allele frequency distributions across human populations.
These scripts correspond to:
- "FLARE: a functional genomic model of constraint"
- "FLARE prioritizes de novo non-coding mutations in autism"
Since PhyloP scores are not context-specific, FLARE models the relationship between genomic context, regulatory effects, and evolutionary conservation within cell-type-specific contexts. FLARE:
- Disentangles accessibility and regulatory effects from conservation.
- Integrates multiple functional genomic features into a unified model.
- Captures regulatory potential across multiple cell types.
We set up a FLARE repository for the FLARE method, which can be found by clicking here.
Marderstein^, Kundu^, et al. Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart.