Table of Contents generated with DocToc
- Scripts required to prepare for a data release
- Generating analysis files for the manuscript
- Other scripts
The overall steps for preparing a data release are as follows:
- Start a release (termed
release-vX-YYYYMMDD
below) that contains all of the PBTA data files (i.e., upstream files) included. - Run
scripts/generate-analysis-files-for-release.sh
using the PBTA data files inrelease-vX-YYYYMMDD
and commit any changes to files tracked in the repository. - Add the analysis files in
scratch/analysis_files_for_release
torelease-vX-YYYYMMDD
. - Run
scripts/run-for-subtyping.sh
using the PBTA data files and analysis files inrelease-vX-YYYYMMDD
and commit any changes to files tracked in the repository. - Add
pbta-histologies.tsv
torelease-vX-YYYYMMDD
.
For definitions of the kinds of files in data releases, please see this documentation.
Running the following from this directory will generate all analysis files that are included in data releases and compile them in scratch/analysis_files_for_release
for convenience:
bash generate-analysis-files-for-release.sh
This script also generates a file that contains the MD5 checksums for the analysis files (scratch/analysis_files_for_release/analysis_files_md5sum.txt
).
Notes
- Modules run via this script must have options to use the base (pre-subtyping) histologies file
pbta-histologies-base.tsv
; these options are used ingenerate-analysis-files-for-release.sh
. ⚠️ This requires 100GB of disk space to run and it may require more than 32 GB of ram. To test locally, you can use the following:
RUN_LOCAL=1 bash generate-analysis-files-for-release.sh
Molecular subtyping as part of data release can be run with the following from this directory:
bash run-for-subtyping.sh
This will re-run subtyping for the following broad histologies:
molecular-subtyping-EWS
molecular-subtyping-HGG
molecular-subtyping-LGAT
molecular-subtyping-embryonal
molecular-subtyping-CRANIO
molecular-subtyping-EPN
molecular-subtyping-MB
molecular-subtyping-neurocytoma
It will also run any analysis steps used for subtyping that do not generate files included in a release and molecular-subtyping-pathology
& molecular-subtyping-integrate
modules to generate the compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
file containing the molecular_subtype
column.
For an analysis to be run for subyping, it must use pbta-histologies-base.tsv
as input, and it should not depend on molecular_subtype
or integrated_diagnosis
columns for molecular-subtyping-*
modules.
Please set OPENPBTA_BASE_SUBTYPING=1
as a condition to run code with pbta-histologies-base.tsv
.
Here is an example from the TP53 classifier module (assumes root of repo):
OPENPBTA_BASE_SUBTYPING=1 bash analyses/tp53_nf1_score/run_classifier.sh
Once a new data release has been cut, analysis modules should be run with the new data release.
Specifically, non-deprecated analyses which appear in manuscript should be run, and as well as certain analyses that were run in generate-analysis-files-for-release.sh
which export output files in scratch/
that are needed for figure generation or require disease label information in the released histologies file.
Note that subtyping modules do not need to be re-run, since subtyping was performed to create the data release itself.
The script run-manuscript-analyses.sh
can be used for this purpose as:
bash run-manuscript-analyses.sh
By default, this script will run all relevant analyses as described.
However, some of those analyses have significant memory requirements which are generally not available on local machines.
Therefore, to run only analyses that can be run locally, set RUN_LOCAL=1
:
RUN_LOCAL=1 bash run-manuscript-analyses.sh
download-ci-files.sh
allows you to download the CI files locally, e.g., for debugging. See these docs.install_bioc.R
is used to install R packages on the project Docker image. See these docs.check-python.sh
is used in CI to ensure all Python packages on the project Docker image match what is in therequirements.txt
file in the root of the repository. See these docs.