![]() |
UCSC Treehouse Childhood Cancer Initiative |
Documentation is in progress; the interface of this program is in active development.
To compare your own RNA-seq sample to the Treehouse Compendium, you will need an rsem_genes.results file that has been generated by a pipeline equivalent to that used by the Toil RNAseq Recompute. Treehouse provides such a pipeline in our pipelines respository. While files generated by other RSEM pipelines may be accepted as input by CARE, because they were not created in a comparable manner, any outliers found may not be truly supported by the underlying data.
Follow the instructions on the Trehouse Pipelines repository with your own FASTQ files to generate an outputs/
directory tree.
(Replace SAMPLE_ID with your sample's identifier in all that follows)
- Extract the files under
outputs/expression/SAMPLE_ID.tar.gz
and place directly in theexpression
parent dir. (expression
will then contain dirsRSEM
,Kallisto
,QC
). - Rename parent dirs as follows:
outputs
->secondary
expression
->ucsc_cgl-rnaseq-cgl-pipeline-0.0.0-0000000
fusion
->ucsctreehouse-fusion-0.0.0-0000000
qc
->ucsctreehouse-bam-umend-qc-0.0.0-0000000
variants
->ucsctreehouse-mini-var-call-0.0.0-0000000
Create a SAMPLE_ID
dir inside the inputs
dir of CARE. Then, move your secondary
dir containing
the input data into that SAMPLE_ID
dir.
Edit the manifest.tsv
file in the CARE base directory to replace the test sample with your own samples. The first column contains your SAMPLE_ID;
then, separated by a tab character, the second column contains the "harmonized diagnosis" of your focus sample. This diagnosis
should match one of those found in the "disease" column of the compendium clinical data. For example, the clinical data for
the Tumor Compendium V10 Public PolyA can be found here on UCSC Xena
.
If no diagnoses match that of your sample, you may leave it blank and the "pan-disease" outlier analysis will be skipped.
make run
will now run your desired sample. When complete, results will be available in outputs/SAMPLE_ID
.
Each step of the analysis produces a Jupyter Notebook containing the code that has been executed. You can use the Jupyter software or nbviewer to inspect them.
The programmatic output of each Jupyter Notebook is stored in the correspondingly-numbered JSON file.
The JSON files are summarized into a human-readable format, Summary.html and Slides.html. These documents are automatically populated with the high-level results of the analysis, but need human intervention to display the clinical data.
Output for this code includes Summary.html and Slides.html documents. When you first open them, they will have some values populated but in general they will be full of placeholder text. The key to populating them is the annotations.json file. When this file is in the same directory as the HTML file, its contents will be inserted into the HTML file via JavaScript. Update the values in annotations.json, save it, and refresh the HTML file to see your changes appear.
In addition, creating images with certain names will cause them to appear in predetermined locations in the Summary and Slides:
- pathway.png : pathway diagram
- tumormap.png : tumormap image