Skip to content

Commit

Permalink
close course
Browse files Browse the repository at this point in the history
  • Loading branch information
tobiasrausch committed May 23, 2023
1 parent e536bdf commit 3879a86
Showing 1 changed file with 1 addition and 107 deletions.
108 changes: 1 addition & 107 deletions courses/cg/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,112 +4,6 @@
<BASE HREF="https://tobiasrausch.com/courses/cg/">
<title>Analytical Methods in Cancer Genomics</title>
</head>
<body>

<h2>Analytical Methods in Cancer Genomics</h2>

<h3>Course Content</h3>

This course will focus on the analysis of short-read and long-read sequencing data from cancer genomics studies. Bioinformatic concepts, tools and methods required to analyse tumor sequencing data will be introduced. Learning outcomes include an overview of the challenges in the study of cancer genomics, discovery and visualisation of copy-number and structural variants, understanding the principles of tumor purity, heterogeneity and ploidy and an overview of cancer epigenetics. The course covers different sequencing data modalities (short-reads vs. long-reads) and data types (bulk vs. single-cell). Practical data analysis sessions will complement the course.

<h3>Schedule</h3>

<ul>
<li>Thursday 16th March, 2pm-3.30pm: Genome Variation, <a href="https://gear.embl.de/data/.slides/Lecture1_GenomeVariation.pdf">Slides</a></li>
<li>Thursday 23th March, 2pm-3.30pm: Cancer Genomics, <a href="https://gear.embl.de/data/.slides/Exercise1_MultiOmics.pdf">Exercise</a>, <a href="https://gear.embl.de/data/.slides/Lecture2_CancerGenomics.pdf">Slides</a></li>
<li>Thursday 30th March, 2pm-3.30pm: Mutation Calling, <a href="https://gear.embl.de/data/.slides/Lecture2_CancerGenomics.pdf">Slides</a></li>
<li>Thursday 6th April, 2pm-3.30pm: Structural Variants and copy-number variants, <a href="https://gear.embl.de/data/.slides/Lecture3_StructuralVariants.pdf">Slides</a></li>
<li>Thursday 13th April, 2pm-3.30pm: Long reads in cancer genomics, <a href="https://gear.embl.de/data/.slides/Lecture4_LongReads.pdf">Slides</a></li>
<li>Thursday 20th April, 2pm-3.30pm: Cancer transcriptomics using bulk RNA-Seq, <a href="https://gear.embl.de/data/.slides/Lecture5_RNASeq.pdf">Slides</a></li>
<li>Thursday 27th April, 2pm-3.30pm: Single-cell applications in cancer genomics, <a href="https://gear.embl.de/data/.slides/Lecture6_scRNA.pdf">Slides</a></li>
<li>Thursday 4th May, 2pm-3.30pm: Cancer epigenetics, <a href="https://gear.embl.de/data/.slides/Lecture7_Epigenetics.pdf">Slides</a></li>
<li>Wednesday 10th May, 2pm-4pm: Journal Club, Cancer Genomics Papers (Prague, Vinicna 7, room P311), <a href="https://gear.embl.de/data/.slides/JournalClub.pdf">Papers</a></li>
</ul>

<h3>Exercise 1: Sars-CoV-2 Lineage Determination (due date 30th March 2023)</h3>

Please create a GitHub account or login to your existing account and create a new repository to analyse SARS-CoV-2 amplicon sequencing data. The goal of this exercise is to create a simple variant calling workflow for SARS-CoV-2 and please describe the steps of your workflow using markdown (<a href="https://docs.github.com/en/get-started/writing-on-github">GitHub Markdown</a>). The workflow should contain steps to align the FASTQ files to the SARS-CoV-2 reference genome (<a href="https://github.com/lh3/bwa">bwa</a>), call variants (<a href="https://samtools.github.io/bcftools/howtos/variant-calling.html">bcftools</a>) and annotate variants (<a href="https://covid-19.ensembl.org/Tools/VEP">VEP</a>). Alternatively, you can also create a consensus sequence (<a href="https://samtools.github.io/bcftools/howtos/consensus-sequence.html">bcftools</a>) and classify the lineage (<a href="https://clades.nextstrain.org/">nextclade</a>). Determine the most likely SARS-CoV-2 lineage (Alpha, Beta, Gamma, Delta or Omicron) for the below 2 data sets using the variant calls and/or the consensus sequence and send me the repository URL of your GitHub repository via email.

<ul>
<li>SARS-CoV-2 reference, <a href="https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta">Reference</a></li>
<li>Data set 1, <a href="https://gear.embl.de/data/.slides/Plate42B2.R1.fastq.gz">Read1</a></li>
<li>Data set 1, <a href="https://gear.embl.de/data/.slides/Plate42B2.R2.fastq.gz">Read2</a></li>
<li>Data set 2, <a href="https://gear.embl.de/data/.slides/Plate135H10.R1.fastq.gz">Read1</a></li>
<li>Data set 2, <a href="https://gear.embl.de/data/.slides/Plate135H10.R2.fastq.gz">Read2</a></li>
</ul>

<h3>Exercise 2: Cancer Genomics Data Analysis (due date 20th April 2023)</h3>
In this exercise we want to analyze a cancer genomics sample, namely a paired tumor-normal sample pair.
You can download the data set <a href="https://gear.embl.de/data/.exercise/">here</a>.
The main objective of this exercise is to align the data to the human reference genome (<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a>), to sort and index the alignments and to generate a read-depth plot, as discussed in the lectures. Please note that I downsampled the dataset and I also just kept the data for chromosome X from 20Mbp to 40Mbp (GRCh37/hg19 coordinates) because otherwise all analysis take a lot of time for a human genome. Once you have generated the alignment in BAM format you can subset the BAM to the region of interest using `samtools view -b input.bam chrX:20000000-40000000 > output.bam`.
<br>
Please write-up your analysis pipeline using <a href="https://guides.github.com/features/mastering-markdown/">GitHub markdown</a> and use your Github repository to store your analysis scripts in your favorite language, i.e., this could be bash scripts, <a href="https://snakemake.readthedocs.io/en/stable/">Snakemake</a> or <a href="https://www.nextflow.io/">Nextflow</a> pipelines, <a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">python</a> scripts.
Likewise feel free to check-in a Makefile or a requirements file for <a href="https://conda.io/projects/conda/en/latest/user-guide/getting-started.html">Bioconda</a> if you use these to install tools.
At the very minimum the repository should contain the produced read-depth plot and a README.md file that explains the steps you have executed to generate the read-depth plot.
Once you are done please email me again the repository link, thanks!
<br>
**Optional**: Once you have successfully computed a read-depth plot you may also want to call structural variants and overlay these with the read-depth plot as arcs or points that indicate SV breakpoints.


<h3>Exercise 3: Working with count matrices (due date 11th May 2023)</h3>
In this exercise we want to run a differential gene expression analysis using an RNA-Seq count matrix (<a href="https://gear.embl.de/data/.slides/sample.counts">sample.counts</a>).
The sample metadata is available here: <a href="https://gear.embl.de/data/.slides/sample.info">sample.info</a>.
Starting from an <a href="https://gear.embl.de/data/.slides/template.R">Rscript template</a> please run a differential expression analysis, generate PCA, Heatmap and MA-plots and export the results into a CSV file.
Once you are done please upload your Rscript to your GitHub repository and email me again the repository link, thanks!
<br>
**Optional**: You may also want to run a gene set enrichment analysis on the differentially expressed genes.

<h3>Exercise 4: Please prepare a short presentation for the journal club on 10th May 2023</h3>
Please summarize the key points of your <a href="https://gear.embl.de/data/.slides/JournalClub.pdf">assigned paper</a></li> in a journal club format (~15min presentation + ~5min discussion).


<h3>Useful links</h3>

Below are a couple of links to commonly used Bioinformatics tools in Cancer Genomics (certainly not comprehensive).
<br>
Next-generation sequencing analysis tutorials
<ul>
<li><a href="https://tobiasrausch.com/courses/vc/">NGS tutorial</a></li>
<li><a href="https://github.com/ekg/alignment-and-variant-calling-tutorial">Alignment and variant calling</a></li>
</ul>
Commonly used alignment tools
<ul>
<li><a href="https://github.com/lh3/bwa">BWA</a></li>
<li><a href="http://bowtie-bio.sourceforge.net/bowtie2/index.shtml">Bowtie2</a></li>
</ul>
Tools for working with alignment files (BAM files)
<ul>
<li><a href="https://github.com/samtools/htslib">HTSlib</a></li>
<li><a href="https://github.com/samtools/samtools">SAMtools</a></li>
<li><a href="https://github.com/arq5x/bedtools2">bedtools</a></li>
</ul>
Tools to compute read counts in windows
<ul>
<li><a href="https://github.com/brentp/mosdepth">mosdepth</a></li>
<li><a href="https://github.com/tobiasrausch/alfred">alfred</a></li>
<li><a href="https://github.com/samtools/samtools">SAMtools</a></li>
</ul>
Tools for short variant calling, i.e., point mutations (SNVs) and short insertions and deletions (InDels)
<ul>
<li><a href="https://github.com/Illumina/strelka">Strelka</a></li>
<li><a href="https://github.com/freebayes/freebayes">FreeBayes</a></li>
</ul>
Tools for structural variant (SV) calling
<ul>
<li><a href="https://github.com/dellytools/delly">delly</a></li>
<li><a href="https://github.com/arq5x/lumpy-sv">lumpy</a></li>
</ul>
Tools for working with variant call files (VCF/BCF)
<ul>
<li><a href="https://github.com/samtools/htslib">HTSlib</a></li>
<li><a href="https://github.com/samtools/bcftools">BCFtools</a></li>
</ul>
Working with count matrices
<ul>
<li><a href="http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html">DESeq2 Tutorial</a></li>
<li><a href="https://www.gsea-msigdb.org/gsea/">Gene set enrichment analysis (GSEA)</a></li>
<li><a href="https://maayanlab.cloud/Enrichr/">Enrichr</a></li>
</ul>
<ul>
<body>
</body>
</html>

0 comments on commit 3879a86

Please sign in to comment.