Skip to content

Latest commit

 

History

History

docs

Treehouse Comparative Analysis of RNA Expression (CARE)


UCSC Treehouse Childhood Cancer Initiative

Documentation is in progress; the interface of this program is in active development.

Running CARE on my own focus sample

To compare your own RNA-seq sample to the Treehouse Compendium, you will need an rsem_genes.results file that has been generated by a pipeline equivalent to that used by the Toil RNAseq Recompute. Treehouse provides such a pipeline in our pipelines respository. While files generated by other RSEM pipelines may be accepted as input by CARE, because they were not created in a comparable manner, any outliers found may not be truly supported by the underlying data.

Run Treehouse Pipelines

Follow the instructions on the Trehouse Pipelines repository with your own FASTQ files to generate an outputs/ directory tree.

Rename output directories

(Replace SAMPLE_ID with your sample's identifier in all that follows)

  • Extract the files under outputs/expression/SAMPLE_ID.tar.gz and place directly in the expression parent dir. (expression will then contain dirs RSEM, Kallisto, QC).
  • Rename parent dirs as follows:
    • outputs -> secondary
    • expression -> ucsc_cgl-rnaseq-cgl-pipeline-0.0.0-0000000
    • fusion -> ucsctreehouse-fusion-0.0.0-0000000
    • qc -> ucsctreehouse-bam-umend-qc-0.0.0-0000000
    • variants -> ucsctreehouse-mini-var-call-0.0.0-0000000

Place in CARE input dir

Create a SAMPLE_ID dir inside the inputs dir of CARE. Then, move your secondary dir containing the input data into that SAMPLE_ID dir.

Create manifest

Edit the manifest.tsv file in the CARE base directory to replace the test sample with your own samples. The first column contains your SAMPLE_ID; then, separated by a tab character, the second column contains the "harmonized diagnosis" of your focus sample. This diagnosis should match one of those found in the "disease" column of the compendium clinical data. For example, the clinical data for the Tumor Compendium V10 Public PolyA can be found here on UCSC Xena . If no diagnoses match that of your sample, you may leave it blank and the "pan-disease" outlier analysis will be skipped.

Run CARE

make run will now run your desired sample. When complete, results will be available in outputs/SAMPLE_ID.

Interpreting the output

Jupyter Notebooks

Each step of the analysis produces a Jupyter Notebook containing the code that has been executed. You can use the Jupyter software or nbviewer to inspect them.

JSON files

The programmatic output of each Jupyter Notebook is stored in the correspondingly-numbered JSON file.

HTML files

The JSON files are summarized into a human-readable format, Summary.html and Slides.html. These documents are automatically populated with the high-level results of the analysis, but need human intervention to display the clinical data.

Customizing the HTML summary and slides

Output for this code includes Summary.html and Slides.html documents. When you first open them, they will have some values populated but in general they will be full of placeholder text. The key to populating them is the annotations.json file. When this file is in the same directory as the HTML file, its contents will be inserted into the HTML file via JavaScript. Update the values in annotations.json, save it, and refresh the HTML file to see your changes appear.

In addition, creating images with certain names will cause them to appear in predetermined locations in the Summary and Slides:

  • pathway.png : pathway diagram
  • tumormap.png : tumormap image