Skip to content

Commit

Permalink
improve cross-referencing in case studies
Browse files Browse the repository at this point in the history
  • Loading branch information
tavareshugo committed Jan 29, 2024
1 parent 536e75d commit ae8761b
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 11 deletions.
2 changes: 1 addition & 1 deletion materials/02-isolates/01-consensus.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This section has an accompanying <a href="https://docs.google.com/presentation/d
::: -->


## SARS-CoV-2 Consensus Assembly
## SARS-CoV-2 Consensus Assembly {#sec-consensus}

As we discussed [earlier in the course](../01-intro/01-surveillance.md), the starting material for sequencing SARS-CoV-2 samples from infected patients is PCR-amplified DNA generated with a panel of primers that covers the whole SARS-CoV-2 genome (for example the primers developed by the ARTIC network).
This material can then be sequenced using either _Illumina_ or _Nanopore_ platforms.
Expand Down
2 changes: 1 addition & 1 deletion materials/02-isolates/02-qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ We highlight some of the main files of interest below.
:::


## Quality Control
## Quality Control {#sec-consensus-qc}

The `viralrecon` pipeline produces many quality control metrics, which are conveniently compiled in an interactive report with _MultiQC_, as mentioned above.
We will not detail here every section of the report (check the [pipeline documentation](https://nf-co.re/viralrecon/2.6.0/docs/output) for a full description), but only highlight some of the sections that can be used for a first assessment of the quality of our samples.
Expand Down
2 changes: 1 addition & 1 deletion materials/02-isolates/03-lineages.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This section has an accompanying <a href="https://docs.google.com/presentation/d
::: -->


## SARS-CoV-2 Variants
## SARS-CoV-2 Variants {#sec-lineages}

As viruses (or any other organism) evolve, random DNA changes occur in the population, for example due to replication errors.
Many of these changes are likely to be _neutral_, meaning that they do not change the characteristics of the virus in any significant way.
Expand Down
2 changes: 1 addition & 1 deletion materials/02-isolates/04-phylogeny.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ This tool was developed to be very efficient at working with millions of samples
-
-->

## SARS-CoV-2 Phylogeny
## SARS-CoV-2 Phylogeny {#sec-phylogeny}

<img src="https://raw.githubusercontent.com/roblanf/sarscov2phylo/master/tree_image.jpg" alt="Global Phylogeny" style="float:right;width:20%">

Expand Down
19 changes: 18 additions & 1 deletion materials/03-case_studies/01-switzerland.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,16 @@ Here is some of the information we have available for these samples:

## Consensus Assembly

:::{.callout-note}
See @sec-consensus, if you need to revise how the `nf-core/viralrecon` pipeline works.
:::

The first step in the bioinformatic analysis is to run the `nf-core/viralrecon` pipeline.
But first we need to prepare our input files.


### Samplesheet

But first we need to prepare our input files.
For _Nanopore_ data, we need a **samplesheet CSV file** with two columns, indicating sample name (first column) and the respective barcode number (second column).

We produced this table in _Excel_ and saved it as a CSV file.
Expand Down Expand Up @@ -206,6 +211,10 @@ This is very useful when commands are very long, because it makes the code more

## Consensus Quality

:::{.callout-note}
See @sec-consensus-qc, if you need to revise how to assess the quality of consensus sequences.
:::

### General Metrics

We used the _MultiqQC_ report to assess the initial quality of our samples.
Expand Down Expand Up @@ -315,6 +324,10 @@ Based on the clean consensus sequences, we then perform several downstream analy

### Lineage Assignment

:::{.callout-note}
See @sec-lineages, if you need to revise how lineage assignment works.
:::

Although the _Viralrecon_ pipeline runs _Pangolin_ and _Nextclade_, it does not use the latest version of these programs (because lineages evolve so fast, the nomenclature constantly changes).
An up-to-date run of both of these tools can be done using each of their web applications:

Expand Down Expand Up @@ -379,6 +392,10 @@ Like before, we will do further analysis (and visualisation) of these data using

### Phylogeny

:::{.callout-note}
See @sec-phylogeny, if you need to revise how to build phylogenetic trees.
:::

Although a tool such as _Nextclade_ can place our samples in a global phylogeny context, sometimes it may be convient to build our own phylogenies.
This requires three steps:

Expand Down
16 changes: 16 additions & 0 deletions materials/03-case_studies/02-southafrica.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,10 @@ Here is some of the information we have available for these samples:

## Consensus Assembly

:::{.callout-note}
See @sec-consensus, if you need to revise how the `nf-core/viralrecon` pipeline works.
:::

The first step in the bioinformatic analysis is to run the `nf-core/viralrecon` pipeline.
But first we need to prepare our input files.

Expand Down Expand Up @@ -263,6 +267,10 @@ We will investigate this further in the next section.

## Consensus Quality

:::{.callout-note}
See @sec-consensus-qc, if you need to revise how to assess the quality of consensus sequences.
:::

### General Metrics

We used the _MultiqQC_ report to assess the initial quality of our samples.
Expand Down Expand Up @@ -403,6 +411,10 @@ Based on the clean consensus sequences, we then perform several downstream analy

### Lineage Assignment

:::{.callout-note}
See @sec-lineages, if you need to revise how lineage assignment works.
:::

Although the _Viralrecon_ pipeline runs _Pangolin_ and _Nextclade_, it does not use the latest version of these programs (because lineages evolve so fast, the nomenclature constantly changes).
An up-to-date run of both of these tools can be done using each of their web applications:

Expand Down Expand Up @@ -471,6 +483,10 @@ Like before, we will do further analysis (and visualisation) of these data using

### Phylogeny

:::{.callout-note}
See @sec-phylogeny, if you need to revise how to build phylogenetic trees.
:::

Although a tool such as _Nextclade_ can place our samples in a global phylogeny context, sometimes it may be convient to build our own phylogenies.
This requires three steps:

Expand Down
17 changes: 11 additions & 6 deletions materials/03-case_studies/03-eqa.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ At this point we are ready to start our analysis with the first step: generating
We will use a standardised pipeline called _viralrecon_, which automates most of this process for us, helping us be more efficient and reproducible in our analysis.

:::{.callout-note}
If you need to revise how the `nf-core/viralrecon` pipeline works, please consult the [Consensus Assembly](../02-isolates/01-consensus.md) section of the materials.
See @sec-consensus, if you need to revise how the `nf-core/viralrecon` pipeline works.
:::

### Samplesheet
Expand Down Expand Up @@ -348,7 +348,7 @@ However, make sure to set these options to the maximum resources available on th
Once your workflow is complete, it's time to assess the quality of the assembly.

:::{.callout-note}
If you need to revise how to interpret the quality control report, please consult the [Quality Control](../02-isolates/02-qc.md) section of the materials.
See @sec-consensus-qc, if you need to revise how to assess the quality of consensus sequences.
:::

### Coverage
Expand Down Expand Up @@ -502,12 +502,13 @@ We will focus on these:
- **Clustering:** assess how many clusters of sequences we have, based on a phylogenetic analysis.
- **Integration & Visualisation:** cross-reference different results tables and produce visualisations of how variants changed over time.

:::{.callout-note}
If you need to revise these topics, please consult the [Lineage Assignment](../02-isolates/03-lineages.md) and [Phylogenetics](../02-isolates/04-phylogeny.md) sections of the materials.
:::

### Lineage Assignment

:::{.callout-note}
See @sec-lineages, if you need to revise how lineage assignment works.
:::

Although the _Viralrecon_ pipeline can run _Pangolin_ and _Nextclade_, it does not use the latest version of these programs (because lineages evolve so fast, the nomenclature constantly changes).
Although it is possible to [configure _viralrecon_](https://nf-co.re/viralrecon/2.6.0/docs/usage#updating-containers-advanced-users) to use more recent versions of these tools, it requires more advanced use of configuration files with the pipeline.

Expand Down Expand Up @@ -614,6 +615,10 @@ Then:

### Phylogeny

:::{.callout-note}
See @sec-phylogeny, if you need to revise how to build phylogenetic trees.
:::

Although tools such as _Nextclade_ and _civet_ can place our samples in a phylogeny, sometimes it may be convenient to build our own phylogenies.
This requires three steps:

Expand Down Expand Up @@ -777,7 +782,7 @@ Conversely, if you have a low-coverage genome (say <50%) but very high-quality s
![](images/precision_sensitivity.svg)

If you are submiting your samples to _GenQA_'s platform, they will also provide with a general accuracy score called _F-score_.
This is calculated as the harmonic mean of precision and sensitivity:
This is calculated as the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean#Two_numbers) of precision and sensitivity:
$F_{score} = \frac{2 \times Precision \times Sensitivity}{Precision + Sensitivity}$
Expand Down

0 comments on commit ae8761b

Please sign in to comment.