Skip to content

Commit

Permalink
cg 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
tobiasrausch committed Apr 10, 2024
1 parent a5db92f commit b2e7bbb
Showing 1 changed file with 105 additions and 1 deletion.
106 changes: 105 additions & 1 deletion courses/cg/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,110 @@
<BASE HREF="https://tobiasrausch.com/courses/cg/">
<title>Analytical Methods in Cancer Genomics</title>
</head>
<body>
<body>

<h2>Analytical Methods in Cancer Genomics</h2>

<h3>Course Content</h3>

This course will focus on the analysis of short-read and long-read sequencing data from cancer genomics studies. Bioinformatic concepts, tools and methods required to analyse tumor sequencing data will be introduced. Learning outcomes include an overview of the challenges in the study of cancer genomics, discovery and visualisation of copy-number and structural variants, understanding the principles of tumor purity, heterogeneity and ploidy and an overview of cancer epigenetics. The course covers different sequencing data modalities (short-reads vs. long-reads) and data types (bulk vs. single-cell). Practical data analysis sessions will complement the course.

<h3>Schedule</h3>

<ul>
<li>Thursday 11th April, 12pm-2pm: Course Overview (Zoom), <a href="https://gear.embl.de/data/.slides/CourseOverview.pdf">Slides</a></li>
<li>Thursday 11th April - Wednesday 24th April: Watch pre-recorded lectures. See email for videos, slides are below.</li>
<li><a href="https://gear.embl.de/data/.slides/Lecture1_CancerGenomics.pdf">Lecture1 - Introduction to Cancer Genomics</a></li>
<li><a href="https://gear.embl.de/data/.slides/Lecture2_GenomeVariation.pdf">Lecture2 - Genome Variation</a></li>
<li><a href="https://gear.embl.de/data/.slides/Lecture3_StructuralVariants.pdf">Lecture3 - Structural Variants</a></li>
<li><a href="https://gear.embl.de/data/.slides/Lecture4_Epigenetics.pdf">Lecture4 - Cancer Epigenetics</a></li>
<li>Wednesday 24th April, 9am-4pm: Biocev day (Lectures and Practicals)</li>
<li>Thursday 2th May, 12pm-2pm: Single-cell lecture (Zoom)</li>
<li>Thursday 9th May: Exercises and Questionnaires are due</li>
</ul>

<h3>Exercise 1: Variant Calling (due date 24th April 2024)</h3>

Please create a GitHub account or login to your existing account and create a new repository to analyse sequencing data. The goal of this exercise is to create a simple variant calling workflow for human sequencing data. Please describe the steps of your workflow using markdown (<a href="https://docs.github.com/en/get-started/writing-on-github">GitHub Markdown</a>). The workflow should contain steps to align the FASTQ files to the human reference genome (<a href="https://github.com/lh3/bwa">bwa</a>), call variants (<a href="https://samtools.github.io/bcftools/howtos/variant-calling.html">bcftools</a>) and annotate variants (<a href="https://www.ensembl.org/Tools/VEP">VEP</a>). Once you have finished the exercise, send me the repository URL of your GitHub repository and the likely causative variant via email.

<ul>
<li>Chromosome 7 human reference, <a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr7.fa.gz">chr7</a></li>
<li>FASTQ of read1, <a href="https://gear.embl.de/data/.slides/R1.fastq.gz">Read1</a></li>
<li>FASTQ of read2, <a href="https://gear.embl.de/data/.slides/R2.fastq.gz">Read2</a></li>
</ul>

<h3>Exercise 2: Cancer Genomics Data Analysis (due date 24th April 2024)</h3>
In this exercise we want to analyze a cancer genomics sample, namely a paired tumor-normal sample pair.
You can download the data set <a href="https://gear.embl.de/data/.exercise/">here</a>.
The main objective of this exercise is to align the data to the human reference genome (<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a>), to sort and index the alignments and to generate a read-depth plot, as discussed in the lectures. Please note that I downsampled the dataset and I also just kept the data for chromosome X from 20Mbp to 40Mbp (GRCh37/hg19 coordinates) because otherwise all analysis take a lot of time for a human genome. Once you have generated the alignment in BAM format you can subset the BAM to the region of interest using `samtools view -b input.bam chrX:20000000-40000000 > output.bam`.
<br>
Please write-up your analysis pipeline using <a href="https://guides.github.com/features/mastering-markdown/">GitHub markdown</a> and use your Github repository to store your analysis scripts in your favorite language, i.e., this could be bash scripts, <a href="https://snakemake.readthedocs.io/en/stable/">Snakemake</a> or <a href="https://www.nextflow.io/">Nextflow</a> pipelines, <a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">python</a> scripts.
Likewise feel free to check-in a Makefile or a requirements file for <a href="https://conda.io/projects/conda/en/latest/user-guide/getting-started.html">Bioconda</a> if you use these to install tools.
At the very minimum the repository should contain the produced read-depth plot and a README.md file that explains the steps you have executed to generate the read-depth plot.
Once you are done please email me again the repository link, thanks!
<br>
**Optional**: Once you have successfully computed a read-depth plot you may also want to call structural variants and overlay these with the read-depth plot as arcs or points that indicate SV breakpoints.


<h3>Exercise 3: Working with count matrices (due date 9th May 2024)</h3>
In this exercise we want to run a differential gene expression analysis using an RNA-Seq count matrix (<a href="https://gear.embl.de/data/.slides/sample.counts">sample.counts</a>).
The sample metadata is available here: <a href="https://gear.embl.de/data/.slides/sample.info">sample.info</a>.
Starting from an <a href="https://gear.embl.de/data/.slides/template.R">Rscript template</a> please run a differential expression analysis, generate PCA, Heatmap and MA-plots and export the results into a CSV file.
Once you are done please upload your Rscript to your GitHub repository and email me again the repository link, thanks!
<br>
**Optional**: You may also want to run a gene set enrichment analysis on the differentially expressed genes.

<h3>Exercise 4: Fill out the questionnaires (Google Forms, due date 9th May 2024)</h3>

To be sent via email on the 25th or 26th April.

<h3>Useful links</h3>

Below are a couple of links to commonly used Bioinformatics tools in Cancer Genomics (certainly not comprehensive).
<br>
Next-generation sequencing analysis tutorials
<ul>
<li><a href="https://github.com/ekg/alignment-and-variant-calling-tutorial">Alignment and variant calling</a></li>
</ul>
Commonly used alignment tools
<ul>
<li><a href="https://github.com/lh3/bwa">BWA</a></li>
<li><a href="http://bowtie-bio.sourceforge.net/bowtie2/index.shtml">Bowtie2</a></li>
</ul>
Tools for working with alignment files (BAM files)
<ul>
<li><a href="https://github.com/samtools/htslib">HTSlib</a></li>
<li><a href="https://github.com/samtools/samtools">SAMtools</a></li>
<li><a href="https://github.com/arq5x/bedtools2">bedtools</a></li>
</ul>
Tools to compute read counts in windows
<ul>
<li><a href="https://github.com/brentp/mosdepth">mosdepth</a></li>
<li><a href="https://github.com/tobiasrausch/alfred">alfred</a></li>
<li><a href="https://github.com/samtools/samtools">SAMtools</a></li>
<li><a href="https://github.com/dellytools/delly">delly</a></li>
</ul>
Tools for short variant calling, i.e., point mutations (SNVs) and short insertions and deletions (InDels)
<ul>
<li><a href="https://github.com/Illumina/strelka">Strelka</a></li>
<li><a href="https://github.com/freebayes/freebayes">FreeBayes</a></li>
</ul>
Tools for structural variant (SV) calling
<ul>
<li><a href="https://github.com/dellytools/delly">delly</a></li>
<li><a href="https://github.com/arq5x/lumpy-sv">lumpy</a></li>
</ul>
Tools for working with variant call files (VCF/BCF)
<ul>
<li><a href="https://github.com/samtools/htslib">HTSlib</a></li>
<li><a href="https://github.com/samtools/bcftools">BCFtools</a></li>
</ul>
Working with count matrices
<ul>
<li><a href="http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html">DESeq2 Tutorial</a></li>
<li><a href="https://www.gsea-msigdb.org/gsea/">Gene set enrichment analysis (GSEA)</a></li>
<li><a href="https://maayanlab.cloud/Enrichr/">Enrichr</a></li>
</ul>
<ul>
</body>
</html>

0 comments on commit b2e7bbb

Please sign in to comment.