Various code and files for "Regulatory Transposable Elements in the Encyclopedia of DNA Elements"
Files containing links to publicly available datasets/files used for analysis.
Unix commands, python scripts, and R scripts to perform analyses and create visualizations/plots in "Regulatory Transposable Elements in the Encyclopedia of DNA Elements" are provided here.
Code is grouped into different directories based on common themes of analysis or by individual writing the code. The primary purpose/aim of each directory is listed below.
Commands are generally listed in a commands.sh file. The exception is for the "combined_MPRA_feature_comparison_and_TF_distance_turnover" directory, where commands are listed in the jc_code.txt file.
For some commands, input files are obtained through commands from other directories. As such, commands in each directory should be run according to the following order.
Create reference files from public files to be used to subsequent analyses.
Quantify TE-derived cCREs in humans.
Compare human and mouse cCREs to identify shared and lineage-specific cCREs, using the information to quantify TE contributions to cCREs after human-mouse divergence.
Quantify cCRE associated transcription factor (TF) origins in TEs at TE subfamily level.
Quantify genomic distance of TEs (cCRE-associated and non-associated) to non-TE cCREs.
Several analyses combined together:
- Quantify genomic distance of TEs (TF-bound and non-bound) to non-TE TF binding sites.
- Compare MPRA activity of (primarily) TE-derived sequences and non-TE sequences using ENCODE K562 lentiMPRA data.
- Compare feature overlap of TE-derived cCREs and non-TE cCREs. Features are K562 lentiMPRA activity, K562 TF ChIP-seq peak, ATAC-seq peak, and phastCons score.
- Quantify TF binding site turnover between analogous human (K562) and mouse (MEL) cell lines.
Quantify common human population variants (>1% allele frequency) in TE-derived cCREs compared to flanking sequence and non-TE cCREs.
Quantify GWAS SNPs in TE-derived cCREs and compare to all cCREs and non-TE cCREs.