This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules.
This is in service of documenting interdependent analyses.
Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv
) even when it is not explicitly included in the table below.
Module | Input Files | Brief Description | Output Files Consumed by Other Analyses |
---|---|---|---|
chromosomal-instability |
pbta-histologies.tsv pbta-sv-manta.tsv.gz pbta-cnv-cnvkit.seg.gz |
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | N/A |
cnv-chrom-plot |
pbta-cnv-consensus-gistic.zip analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg |
Makes plots from GISTIC output as well as seg.mean plots by histology group |
N/A |
cnv-comparison |
Earlier version of SEG files | Deprecated; compared earlier version of the CNV methods. | N/A |
collapse-rnaseq |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds gencode.v27.primary_assembly.annotation.gtf.gz |
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub) |
comparative-RNASeq-analysis |
pbta-gene-expression-rsem-tpm.polya.rds pbta-gene-expression-rsem-tpm.stranded.rds |
In progress; will produce expression outlier profiles per #229 | N/A |
copy_number_consensus_call |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-sv-manta.tsv.gz |
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | results/cnv_consensus.tsv results/pbta-cnv-consensus.seg.gz ref/cnv_excluded_regions.bed ref/cnv_callable.bed |
create-subset-files |
All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration |
focal-cn-file-preparation |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz |
Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms (#186) | results/cnvkit_annotated_cn_autosomes.tsv.gz results/cnvkit_annotated_cn_x_and_y.tsv.gz results/controlfreec_annotated_cn_autosomes.tsv.gz results/controlfreec_annotated_cn_x_and_y.tsv.gz results/consensus_seg_annotated_cn_autosomes.tsv.gz results/consensus_seg_annotated_cn_x_and_y.tsv.gz |
fusion_filtering |
pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Standardizes, filters, and prioritizes fusion calls | results/pbta-fusion-putative-oncogenic.tsv results/pbta-fusion-recurrent-fusion-byhistology.tsv results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download) |
fusion-summary |
pbta-histologies.tsv pbta-fusion-putative-oncogenic.tsv pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Generate summary tables from fusion files (#398) | results/fusion_summary_embryonal_foi.tsv results/fusion_summary_ependymoma_foi.tsv |
gene-set-enrichment-analysis |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
In progress. Updated gene set enrichment analysis with appropriate RNA-seq expression data | results/gsva_scores_stranded.tsv results/gsva_scores_polya.tsv for stranded, polya expression data respectively |
immune-deconv |
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Immune/Stroma characterization across PBTA (part of #15) | results/deconv-output.RData |
independent-samples |
pbta-histologies.tsv |
Generates independent specimen lists for WGS/WXS samples | results/independent-specimens.wgs.primary.tsv results/independent-specimens.wgs.primary-plus.tsv results/independent-specimens.wgswxs.primary.tsv results/independent-specimens.wgswxs.primary-plus.tsv (included in data download) |
interaction-plots |
independent-specimens.wgs.primary-plus.tsv pbta-snv-consensus-mutation.maf.tsv.gz |
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) | N/A |
molecular-subtyping-ATRT |
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-snv-consensus-mutation-tmb-all.tsv pbta-cnv-consensus-gistic.zip |
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work | N/A |
molecular-subtyping-chordoma |
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
In progress; identifying poorly-differentiated chordoma samples per #250 | N/A |
molecular-subtyping-embryonal |
analyses/fusion-summary/fusion_summary_embryonal_foi.tsv pbta-histologies.tsv pbta-sv-manta.tsv.gz analyses/focal-cn-file-preparation/consensus_seg_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x_and_y.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
In progress; molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | N/A |
molecular-subtyping-ependymoma |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-cnv-consensus-gistic.zip analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv |
In progress; | N/A |
molecular-subtyping-HGG |
pbta-snv-consensus-mutation.maf.tsv.gz analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz pbta-fusion-putative-oncogenic.tsv pbta-cnv-consensus-gistic.zip pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
In progress; molecular subtyping of high-grade glioma samples #249 | N/A |
molecular-subtyping-SHH-tp53 |
pbta-histologies pbta-snv-consensus-mutation.maf.tsv.gz |
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | N/A |
mutational-signatures |
pbta-snv-consensus-mutation.maf.tsv.gz |
Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A |
mutect2-vs-strelka2 |
pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz |
Deprecated; comparison of only two SNV callers, subsumed by snv-callers |
N/A |
oncoprint-landscape |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-fusion-putative-oncogenic.tsv analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz independent-specimens.* |
Combines mutation, copy number, and fusion data into an OncoPrint plot (#6); will need to be updated as all data types are refined | N/A |
run-gistic |
pbta-histologies.tsv pbta-cnv-consensus.seg.gz |
In progress Runs GISTIC 2.0 on SEG files | Currently N/A |
sample-distribution-analysis |
pbta-histologies.tsv |
Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | N/A |
selection-strategy-comparison |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds |
Comparison of RNA-seq data from different selection strategies | N/A |
sex-prediction-from-RNASeq |
pbta-gene-expression-kallisto.stranded.rds pbta-histologies.tsv |
In progress; predicts genetic sex using RNA-seq data (#84) | N/A |
snv-callers |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz pbta-snv-vardict.vep.maf.gz |
Generates consensus SNV and indel calls; calculates tumor mutation burden using the consensus calls | results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz results/consensus/pbta-snv-consensus-mutation-tmb.tsv (included in data download; too large for tracking via GitHub) |
ssgsea-hallmark |
pbta-gene-counts-rsem-expected_count.stranded.rds |
Deprecated; performs GSVA using Hallmark gene sets | N/A |
survival-analysis |
TBD | In progress; will eventually contain functions for various types of survival analysis (#18) | N/A |
sv-analysis |
pbta-sv-manta.tsv.gz independent-specimens.wgs.primary-plus.tsv |
In progress; chromothripsis analysis per #27 | N/A |
telomerase-activity-prediction |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-counts-rsem-expected_count.stranded.rds pbta-gene-counts-rsem-expected_count.polya.rds |
Quantify telomerase activity across pediatric brain tumors (part of #148) | results/TelomeraseScores_PTBAPolya_counts results/TelomeraseScores_PTBAPolya_FPKM.txt results/TelomeraseScores_PTBAStranded_counts.txt results/TelomeraseScores_PTBAStranded_FPKM.txt |
tmb-compare-tcga |
pbta-snv-consensus-mutation-tmb-coding.tsv |
Compares PBTA tumor mutation burden to adult TCGA data; may need to be updated per #257 | N/A |
tp53_nf1_score |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data | N/A |
transcriptomic-dimension-reduction |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds pbta-gene-expression-kallisto.polya.rds pbta-gene-expression-kallisto.stranded.rds |
Dimension reduction and visualization of RNA-seq data (part of #9) | N/A |