Skip to content

Commit

Permalink
minor fixes; fix #37; fix #31
Browse files Browse the repository at this point in the history
  • Loading branch information
tavareshugo committed Jan 30, 2024
1 parent ae8761b commit 4989622
Show file tree
Hide file tree
Showing 7 changed files with 58 additions and 27 deletions.
7 changes: 6 additions & 1 deletion materials/02-isolates/04-phylogeny.md
Original file line number Diff line number Diff line change
Expand Up @@ -504,9 +504,11 @@ To highlight these:
- Select the branch corresponding to the base of the group of samples classified as _Alpha_. This should highlight all those branches.
- Click the "Highlight" button at the top and choose a colour.

<!--
The final result should look similar to what is shown here.
![TODO]()
![TODO]()
-->

**Question 4**

Expand All @@ -518,9 +520,12 @@ treetime --tree results/iqtree/india.treefile --dates sample_annotation.tsv --al

Once complete, we can open the `india.nexus` tree with _FigTree_.
We can annotate the internal nodes of the tree with the dates inferred by `treetime` by clicking on the <kbd>Node Labels</kbd> menu on the left and selecting "Display" to be "date".

<!--
This should result in a tree similar to the one shown here.
![TODO]()
-->

**Question 5**

Expand Down
13 changes: 11 additions & 2 deletions materials/03-case_studies/01-switzerland.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,10 +304,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
```

This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`.
The table looks like this:
The table looks like this (only the top few rows are shown):

```
TODO
seqID patternName pattern strand start end
CH01 N+ N+ + 1 54
CH01 N+ N+ + 1193 1264
CH01 N+ N+ + 4143 4322
CH01 N+ N+ + 6248 6294
CH01 N+ N+ + 7561 7561
CH01 N+ N+ + 9243 9311
CH01 N+ N+ + 10367 10367
CH01 N+ N+ + 11361 11370
CH01 N+ N+ + 13599 13613
```

We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval.
Expand Down
28 changes: 13 additions & 15 deletions materials/03-case_studies/02-southafrica.md
Original file line number Diff line number Diff line change
Expand Up @@ -382,21 +382,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
```

This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`.
The table looks like this:

```
seqID patternName pattern strand start end
ZA01 N+ N+ + 1 54
ZA01 N+ N+ + 22771 22926
ZA01 N+ N+ + 23603 23835
ZA01 N+ N+ + 26948 26948
ZA01 N+ N+ + 26968 27137
ZA01 N+ N+ + 29801 29867
ZA02 N+ N+ + 1 54
ZA02 N+ N+ + 22771 22921
ZA02 N+ N+ + 23603 23835
... MORE LINES OMITTED ...
The table looks like this (only the top few rows are shown):

```
seqID patternName pattern strand start end
ZA01 N+ N+ + 1 54
ZA01 N+ N+ + 22771 22926
ZA01 N+ N+ + 23603 23835
ZA01 N+ N+ + 26948 26948
ZA01 N+ N+ + 26968 27137
ZA01 N+ N+ + 29801 29867
ZA02 N+ N+ + 1 54
ZA02 N+ N+ + 22771 22921
ZA02 N+ N+ + 23603 23835
```

We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval.
Expand Down
17 changes: 10 additions & 7 deletions materials/03-case_studies/03-eqa.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ In our example, if our version of _Guppy_ was 6.1.5 we would use the same model
#### Illumina

```bash
nextflow run nf-core/viralrecon
nextflow run nf-core/viralrecon \
-r 2.6.0 -profile singularity \
--max_memory '15.GB' --max_cpus 4 \
--platform illumina \
Expand Down Expand Up @@ -483,7 +483,7 @@ The meaning of the options is detailed in [`seqkit`'s documentation](https://bio

:::{.callout-exercise}

Open the file you created in the previous step (`results/consensus_miss_intervals.tsv`) in a spreadsheet program.
Open the file you created in the previous step (`results/missing_intervals.tsv`) in a spreadsheet program.
Create a new column with the length of each interval (`end - start + 1`).

Note if any missing intervals are larger than 1Kb, and whether they overlap with the _Spike_ gene.
Expand Down Expand Up @@ -558,7 +558,7 @@ For the **Pangolin** analysis:
<DATA_UPDATE_COMMAND>
# run pangolin
pangolin --outdir results/pangolin/ --outfile report.csv <INPUT>
pangolin --outdir results/pangolin/ --outfile pango_report.csv <INPUT>
```

- Fix the code:
Expand All @@ -569,7 +569,7 @@ For the **Pangolin** analysis:
- Save the file.
- Activate the software environment: `mamba activate pangolin`.
- Run your script using `bash`.
- Once the analysis completes, open the file `results/pangolin/report.csv` in _Excel_ and see if there were any samples for which the analysis failed.
- Once the analysis completes, open the file `results/pangolin/pango_report.csv` in _Excel_ and see if there were any samples for which the analysis failed.
If there were any failed samples, check if they match the report from _Nextclade_.

:::
Expand Down Expand Up @@ -722,7 +722,7 @@ At this point in our analysis, we have several tables with different pieces of i
- `sample_info.csv` → the original table with metadata for our samples.
- `results/viralrecon/multiqc/medaka/summary_variants_metrics_mqc.csv` → quality metrics from the _MultiQC_ report generated by the _viralrecon_ pipeline.
- `results/nextclade/nextclade.tsv` → the results from _Nextclade_.
- `results/pangolin/report.csv` → the results from _Pangolin_.
- `results/pangolin/pango_report.csv` → the results from _Pangolin_.
- (optional) `results/civet/master_metadata.csv` → the results from the _civet_ analysis, namely the catchment (or cluster) that each of our samples was grouped into.

Each of these tables stores different pieces of information, and it would be great if we could _integrate_ them together, to facilitate their interpration and generate some visualisations.
Expand Down Expand Up @@ -756,10 +756,13 @@ You can export these plots from within RStudio using the "Export" button on the
:::{.callout-exercise}
**Annotating Phylogenetic Tree**
Use the file `report/consensus_metrics.tsv` (created in the Data Integration exercise) to annotate your phylogenetic tree in FigTree and display the lineages assigned to each sample as the tip labels.
Using FigTree, import two annotation files (**File** > **Import Annotations...**):
If you need a reminder of how to load annotations in FigTree, check the "[Building phylogenetic trees](../02-isolates/04-phylogeny.md#visualising-trees)" section of the materials.
- `report/consensus_metrics.tsv`, which was created in the Data Integration exercise.
- `resources/eqa_collaborators/metadata.tsv`, which has lineage assignment for EQA samples sequenced by other labs.
After importing both files, annotate your phylogenetic tree to display the lineages assigned to each sample as the tip labels.
See @sec-figtree, if you need a reminder of how to annotate trees using FigTree.
:::
Expand Down
16 changes: 16 additions & 0 deletions materials/05-software/03-software_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,22 @@ Another way to run Linux within Windows (or macOS) is to install a Virtual Machi
However, this is mostly suitable for practicing and **not suitable for real data analysis**.

Details for installing Ubuntu on VirtualBox is given on [this page](https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview).
Make sure to do these things, while you are setting it up:

- In Step 2 "Create a user profile": make sure to tick the Guest Additions option.
- In Step 2 "Define the Virtual Machine’s resources":
- Assign at least 4 CPUs and 16000MB of RAM. At the very minimum you need 2 CPUs to run an Ubuntu VM.
- Set at least 100GB as disk size, more if you have it available (note, this will not take 100GB of space on your computer, but it will allow using up to a maximum of that value, which is useful as we are working with sequencing data).

Once the installation completes, login to the Ubuntu Virtual machine, open a terminal and run the following commands:

```bash
sudo -
usermod -a -G sudo YOUR-USERNAME-HERE
```

Then close the terminal and restart the virtual machine.
These commands will add your newly created user to the "sudo" (admin) group.
:::


Expand Down
2 changes: 1 addition & 1 deletion utils/eqa/scripts_illumina/03-missing_intervals.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash

seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv
2 changes: 1 addition & 1 deletion utils/eqa/scripts_nanopore/03-missing_intervals.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash

seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv

0 comments on commit 4989622

Please sign in to comment.