Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
Remove R script/notebook package installs (Part 1 of N) (#698)
Browse files Browse the repository at this point in the history
* update the for clinical data of version 6

* Get rid of cnv-chrom-plot installs

* Refresh focal-cn and push resulting results

* Rerun script and add files again

* Add note about how much RAM you need in README
  • Loading branch information
cansavvy authored May 21, 2020
1 parent d5b13f2 commit 0985505
Show file tree
Hide file tree
Showing 8 changed files with 192 additions and 295 deletions.
5 changes: 0 additions & 5 deletions analyses/cnv-chrom-plot/gistic_plot.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,6 @@ Rscript -e "rmarkdown::render('analyses/cnv-chrom-plot/gistic_plot.Rmd',
### Set Up

```{r}
if (!("ggbio" %in% installed.packages())) {
install.packages("BiocManager")
BiocManager::install("ggbio")
}
# Magrittr pipe
`%>%` <- dplyr::`%>%`
```
Expand Down
200 changes: 58 additions & 142 deletions analyses/cnv-chrom-plot/gistic_plot.nb.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

33 changes: 0 additions & 33 deletions analyses/focal-cn-file-preparation/04-prepare-cn-file.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,39 +20,6 @@

#### Set Up --------------------------------------------------------------------

# We require bioconductor
if (!("BiocManager" %in% installed.packages())) {
install.packages("BiocManager")
}

# Install GenomicRanges
if (!("GenomicRanges" %in% installed.packages())) {
BiocManager::install("GenomicRanges", update = FALSE)
}

# Install IRanges
if (!("IRanges" %in% installed.packages())) {
BiocManager::install("IRanges", update = FALSE)
}

# Install annotatr
if (!("annotatr" %in% installed.packages())) {
BiocManager::install("annotatr", update = FALSE)
}

# hg38 genome annotations
if (!("TxDb.Hsapiens.UCSC.hg38.knownGene" %in% installed.packages())) {
BiocManager::install("TxDb.Hsapiens.UCSC.hg38.knownGene", update = FALSE)
}

if (!("org.Hs.eg.db" %in% installed.packages())) {
BiocManager::install("org.Hs.eg.db", update = FALSE)
}

if (!("AnnotationDbi" %in% installed.packages())) {
BiocManager::install("AnnotationDbi", update = FALSE)
}

# Get `magrittr` pipe
`%>%` <- dplyr::`%>%`

Expand Down

Large diffs are not rendered by default.

11 changes: 6 additions & 5 deletions analyses/focal-cn-file-preparation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ The copy number data from OpenPBTA are provided as ranges or segments.
The purpose of this module is to map from those ranges to gene identifiers for consumption by downstream analyses (e.g., OncoPrint plotting).

### Running this analysis
*This analysis requires at least ~24 GB of RAM to run to completion*

To run this analysis _only on consensus SEG file_, use the following (from the root directory of the repository):

Expand All @@ -23,7 +24,7 @@ RUN_ORIGINAL=1 bash analyses/focal-cn-file-preparation/run-prepare-cn.sh

### Scripts and notebooks

* `01-add-ploidy-cnvkit.Rmd` - The two CNV callers, CNVkit and ControlFreeC, do not handle ploidy in the same way ([A Note on Ploidy](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/de661fbe740717472fcf01c7d9b74fe1b946aece/doc/data-formats.md#a-note-on-ploidy) in the Data Formats documentation).
* `01-add-ploidy-cnvkit.Rmd` - The two CNV callers, CNVkit and ControlFreeC, do not handle ploidy in the same way ([A Note on Ploidy](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/de661fbe740717472fcf01c7d9b74fe1b946aece/doc/data-formats.md#a-note-on-ploidy) in the Data Formats documentation).
This notebook adds the ploidy inferred via ControlFreeC to the CNVkit data and adds a status column that defines gain and loss broadly.
Specifically, segments with copy number fewer than ploidy are losses, segments with copy number greater than ploidy are marked as a gain, and segments where copy number is equal to ploidy are marked as neutral.
(Note that [the logic around sex chromosomes in males when ploidy = 3 leaves something to be desired](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/259#discussion_r345354403)).
Expand All @@ -36,16 +37,16 @@ See the notebook for more information. This notebook also prepares lists of copy

| `Kids_First_Biospecimen_ID` | chr | cytoband | dominant_status | band_length | callable_fraction | gain_fraction | loss_fraction | chromosome_arm |
|----------------|--------|-------------|--------|---------|----------|-------------|---------|---------------|

* `04-prepare-cn-file.R` - This script performs the ranges to annotation mapping using the GENCODE v27 GTF included via the data download step; it takes the ControlFreeC file or a SEG (e.g., CNVkit, consensus SEG) file prepared with `01-add-ploidy-cnvkit.Rmd` and `02-add-ploidy-cnvkit.Rmd` as input.
**The mapping is limited to _exons_.**
Mapping to cytobands is performed with the [`org.Hs.eg.db`](https://doi.org/doi:10.18129/B9.bioc.org.Hs.eg.db) package.
A table with the following columns is returned:

| biospecimen_id | status | copy_number | ploidy | ensembl | gene_symbol | cytoband |
|----------------|--------|-------------|--------|---------|-------------|---------|
Any segment that is copy neutral is filtered out of this table. In addition, [any segments with copy number > (2 * ploidy) are marked as amplifications](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e2058dd43d9b1dd41b609e0c3429c72f79ff3be6/analyses/focal-cn-file-preparation/03-prepare-cn-file.R#L275) in the `status` column.

* `05-define-most-focal-cn-units.Rmd` - This notebook defines the _most focal_ recurrent copy number units by removing focal changes that are within entire chromosome arm losses and gains.
_Most focal_ here meaning if a chromosome arm is not clearly defined as a gain or loss (and is callable) we look to define the cytoband level status.
Similarly, if a cytoband is not clearly defined as a gain or loss (and is callable) we then look to define the gene level status.
Expand All @@ -60,7 +61,7 @@ It produces loss/neutral and zero/neutral correlation plots, as well as stacked
_Note: The shell script's default behavior is to produce these plots using the annotated consensus SEG autosome and sex chromsome files found in this module's `results` directory and listed below._


### Output files for downstream consumption
### Output files for downstream consumption

**Note:** The output files from `03-prepare-cn-file.R` have neutral calls filtered out to reduce file size.

Expand Down
Binary file not shown.

0 comments on commit 0985505

Please sign in to comment.