Remove R script/notebook package installs (Part 1 of N) (#698)

* update the for clinical data of version 6 * Get rid of cnv-chrom-plot installs * Refresh focal-cn and push resulting results * Rerun script and add files again * Add note about how much RAM you need in README
AlexsLemonade · May 21, 2020 · 0985505 · 0985505
1 parent d5b13f2
commit 0985505
Show file tree

Hide file tree

Showing 8 changed files with 192 additions and 295 deletions.
diff --git a/analyses/cnv-chrom-plot/gistic_plot.Rmd b/analyses/cnv-chrom-plot/gistic_plot.Rmd
@@ -21,11 +21,6 @@ Rscript -e "rmarkdown::render('analyses/cnv-chrom-plot/gistic_plot.Rmd',
 ### Set Up
 
 ```{r}
-if (!("ggbio" %in% installed.packages())) {
-  install.packages("BiocManager")
-  BiocManager::install("ggbio")
-}
-
 # Magrittr pipe
 `%>%` <- dplyr::`%>%`
 ```

diff --git a/analyses/cnv-chrom-plot/gistic_plot.nb.html b/analyses/cnv-chrom-plot/gistic_plot.nb.html
diff --git a/analyses/focal-cn-file-preparation/02-add-ploidy-consensus.nb.html b/analyses/focal-cn-file-preparation/02-add-ploidy-consensus.nb.html
diff --git a/analyses/focal-cn-file-preparation/03-add-cytoband-status-consensus.nb.html b/analyses/focal-cn-file-preparation/03-add-cytoband-status-consensus.nb.html
diff --git a/analyses/focal-cn-file-preparation/04-prepare-cn-file.R b/analyses/focal-cn-file-preparation/04-prepare-cn-file.R
@@ -20,39 +20,6 @@
 
 #### Set Up --------------------------------------------------------------------
 
-# We require bioconductor
-if (!("BiocManager" %in% installed.packages())) {
-  install.packages("BiocManager")
-}
-
-# Install GenomicRanges
-if (!("GenomicRanges" %in% installed.packages())) {
-  BiocManager::install("GenomicRanges", update = FALSE)
-}
-
-# Install IRanges
-if (!("IRanges" %in% installed.packages())) {
-  BiocManager::install("IRanges", update = FALSE)
-}
-
-# Install annotatr
-if (!("annotatr" %in% installed.packages())) {
-  BiocManager::install("annotatr", update = FALSE)
-}
-
-# hg38 genome annotations
-if (!("TxDb.Hsapiens.UCSC.hg38.knownGene" %in% installed.packages())) {
-  BiocManager::install("TxDb.Hsapiens.UCSC.hg38.knownGene", update = FALSE)
-}
-
-if (!("org.Hs.eg.db" %in% installed.packages())) {
-  BiocManager::install("org.Hs.eg.db", update = FALSE)
-}
-
-if (!("AnnotationDbi" %in% installed.packages())) {
-  BiocManager::install("AnnotationDbi", update = FALSE)
-}
-
 # Get `magrittr` pipe
 `%>%` <- dplyr::`%>%`
 

diff --git a/analyses/focal-cn-file-preparation/05-define-most-focal-cn-units.nb.html b/analyses/focal-cn-file-preparation/05-define-most-focal-cn-units.nb.html
diff --git a/analyses/focal-cn-file-preparation/README.md b/analyses/focal-cn-file-preparation/README.md
@@ -6,6 +6,7 @@ The copy number data from OpenPBTA are provided as ranges or segments.
 The purpose of this module is to map from those ranges to gene identifiers for consumption by downstream analyses (e.g., OncoPrint plotting).
 
 ### Running this analysis
+*This analysis requires at least ~24 GB of RAM to run to completion*
 
 To run this analysis _only on consensus SEG file_, use the following (from the root directory of the repository):
 
@@ -23,7 +24,7 @@ RUN_ORIGINAL=1 bash analyses/focal-cn-file-preparation/run-prepare-cn.sh
 
 ### Scripts and notebooks
 
-* `01-add-ploidy-cnvkit.Rmd` - The two CNV callers, CNVkit and ControlFreeC, do not handle ploidy in the same way ([A Note on Ploidy](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/de661fbe740717472fcf01c7d9b74fe1b946aece/doc/data-formats.md#a-note-on-ploidy) in the Data Formats documentation). 
+* `01-add-ploidy-cnvkit.Rmd` - The two CNV callers, CNVkit and ControlFreeC, do not handle ploidy in the same way ([A Note on Ploidy](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/de661fbe740717472fcf01c7d9b74fe1b946aece/doc/data-formats.md#a-note-on-ploidy) in the Data Formats documentation).
   This notebook adds the ploidy inferred via ControlFreeC to the CNVkit data and adds a status column that defines gain and loss broadly.
   Specifically, segments with copy number fewer than ploidy are losses, segments with copy number greater than ploidy are marked as a gain, and segments where copy number is equal to ploidy are marked as neutral.
   (Note that [the logic around sex chromosomes in males when ploidy = 3 leaves something to be desired](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/259#discussion_r345354403)).
@@ -36,16 +37,16 @@ See the notebook for more information. This notebook also prepares lists of copy
 
   | `Kids_First_Biospecimen_ID` | chr | cytoband | dominant_status | band_length | callable_fraction | gain_fraction | loss_fraction | chromosome_arm |
   |----------------|--------|-------------|--------|---------|----------|-------------|---------|---------------|
-  
+
 * `04-prepare-cn-file.R` - This script performs the ranges to annotation mapping using the GENCODE v27 GTF included via the data download step; it takes the ControlFreeC file or a SEG (e.g., CNVkit, consensus SEG) file prepared with `01-add-ploidy-cnvkit.Rmd` and  `02-add-ploidy-cnvkit.Rmd` as input.
   **The mapping is limited to _exons_.**
   Mapping to cytobands is performed with the [`org.Hs.eg.db`](https://doi.org/doi:10.18129/B9.bioc.org.Hs.eg.db) package.
   A table with the following columns is returned:
-  
+
   | biospecimen_id | status | copy_number | ploidy | ensembl | gene_symbol | cytoband |
   |----------------|--------|-------------|--------|---------|-------------|---------|
   Any segment that is copy neutral is filtered out of this table. In addition, [any segments with copy number > (2 * ploidy) are marked as amplifications](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e2058dd43d9b1dd41b609e0c3429c72f79ff3be6/analyses/focal-cn-file-preparation/03-prepare-cn-file.R#L275) in the `status` column.
-  
+
 * `05-define-most-focal-cn-units.Rmd` - This notebook defines the _most focal_ recurrent copy number units by removing focal changes that are within entire chromosome arm losses and gains.
 _Most focal_ here meaning if a chromosome arm is not clearly defined as a gain or loss (and is callable) we look to define the cytoband level status.
 Similarly, if a cytoband is not clearly defined as a gain or loss (and is callable) we then look to define the gene level status.
@@ -60,7 +61,7 @@ It produces loss/neutral and zero/neutral correlation plots, as well as stacked
 _Note: The shell script's default behavior is to produce these plots using the annotated consensus SEG autosome and sex chromsome files found in this module's `results` directory and listed below._
 
 
-### Output files for downstream consumption 
+### Output files for downstream consumption
 
 **Note:** The output files from `03-prepare-cn-file.R` have neutral calls filtered out to reduce file size.
 

diff --git a/analyses/focal-cn-file-preparation/results/consensus_seg_with_ucsc_cytoband_status.tsv.gz b/analyses/focal-cn-file-preparation/results/consensus_seg_with_ucsc_cytoband_status.tsv.gz