diff --git a/materials/02-isolates/04-phylogeny.md b/materials/02-isolates/04-phylogeny.md
index 70f7159..d90082c 100644
--- a/materials/02-isolates/04-phylogeny.md
+++ b/materials/02-isolates/04-phylogeny.md
@@ -504,9 +504,11 @@ To highlight these:
- Select the branch corresponding to the base of the group of samples classified as _Alpha_. This should highlight all those branches.
- Click the "Highlight" button at the top and choose a colour.
+
**Question 4**
@@ -518,9 +520,12 @@ treetime --tree results/iqtree/india.treefile --dates sample_annotation.tsv --al
Once complete, we can open the `india.nexus` tree with _FigTree_.
We can annotate the internal nodes of the tree with the dates inferred by `treetime` by clicking on the Node Labels menu on the left and selecting "Display" to be "date".
+
+
**Question 5**
diff --git a/materials/03-case_studies/01-switzerland.md b/materials/03-case_studies/01-switzerland.md
index 9b3f7b8..c62fff3 100644
--- a/materials/03-case_studies/01-switzerland.md
+++ b/materials/03-case_studies/01-switzerland.md
@@ -304,10 +304,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
```
This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`.
-The table looks like this:
+The table looks like this (only the top few rows are shown):
```
-TODO
+seqID patternName pattern strand start end
+CH01 N+ N+ + 1 54
+CH01 N+ N+ + 1193 1264
+CH01 N+ N+ + 4143 4322
+CH01 N+ N+ + 6248 6294
+CH01 N+ N+ + 7561 7561
+CH01 N+ N+ + 9243 9311
+CH01 N+ N+ + 10367 10367
+CH01 N+ N+ + 11361 11370
+CH01 N+ N+ + 13599 13613
```
We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval.
diff --git a/materials/03-case_studies/02-southafrica.md b/materials/03-case_studies/02-southafrica.md
index 1a7c897..fc2b3e3 100644
--- a/materials/03-case_studies/02-southafrica.md
+++ b/materials/03-case_studies/02-southafrica.md
@@ -382,21 +382,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
```
This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`.
-The table looks like this:
-
-```
-seqID patternName pattern strand start end
-ZA01 N+ N+ + 1 54
-ZA01 N+ N+ + 22771 22926
-ZA01 N+ N+ + 23603 23835
-ZA01 N+ N+ + 26948 26948
-ZA01 N+ N+ + 26968 27137
-ZA01 N+ N+ + 29801 29867
-ZA02 N+ N+ + 1 54
-ZA02 N+ N+ + 22771 22921
-ZA02 N+ N+ + 23603 23835
-
-... MORE LINES OMITTED ...
+The table looks like this (only the top few rows are shown):
+
+```
+seqID patternName pattern strand start end
+ZA01 N+ N+ + 1 54
+ZA01 N+ N+ + 22771 22926
+ZA01 N+ N+ + 23603 23835
+ZA01 N+ N+ + 26948 26948
+ZA01 N+ N+ + 26968 27137
+ZA01 N+ N+ + 29801 29867
+ZA02 N+ N+ + 1 54
+ZA02 N+ N+ + 22771 22921
+ZA02 N+ N+ + 23603 23835
```
We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval.
diff --git a/materials/03-case_studies/03-eqa.md b/materials/03-case_studies/03-eqa.md
index 6f287ef..913cb48 100644
--- a/materials/03-case_studies/03-eqa.md
+++ b/materials/03-case_studies/03-eqa.md
@@ -301,7 +301,7 @@ In our example, if our version of _Guppy_ was 6.1.5 we would use the same model
#### Illumina
```bash
-nextflow run nf-core/viralrecon
+nextflow run nf-core/viralrecon \
-r 2.6.0 -profile singularity \
--max_memory '15.GB' --max_cpus 4 \
--platform illumina \
@@ -483,7 +483,7 @@ The meaning of the options is detailed in [`seqkit`'s documentation](https://bio
:::{.callout-exercise}
-Open the file you created in the previous step (`results/consensus_miss_intervals.tsv`) in a spreadsheet program.
+Open the file you created in the previous step (`results/missing_intervals.tsv`) in a spreadsheet program.
Create a new column with the length of each interval (`end - start + 1`).
Note if any missing intervals are larger than 1Kb, and whether they overlap with the _Spike_ gene.
@@ -558,7 +558,7 @@ For the **Pangolin** analysis:
# run pangolin
- pangolin --outdir results/pangolin/ --outfile report.csv
+ pangolin --outdir results/pangolin/ --outfile pango_report.csv
```
- Fix the code:
@@ -569,7 +569,7 @@ For the **Pangolin** analysis:
- Save the file.
- Activate the software environment: `mamba activate pangolin`.
- Run your script using `bash`.
-- Once the analysis completes, open the file `results/pangolin/report.csv` in _Excel_ and see if there were any samples for which the analysis failed.
+- Once the analysis completes, open the file `results/pangolin/pango_report.csv` in _Excel_ and see if there were any samples for which the analysis failed.
If there were any failed samples, check if they match the report from _Nextclade_.
:::
@@ -722,7 +722,7 @@ At this point in our analysis, we have several tables with different pieces of i
- `sample_info.csv` → the original table with metadata for our samples.
- `results/viralrecon/multiqc/medaka/summary_variants_metrics_mqc.csv` → quality metrics from the _MultiQC_ report generated by the _viralrecon_ pipeline.
- `results/nextclade/nextclade.tsv` → the results from _Nextclade_.
-- `results/pangolin/report.csv` → the results from _Pangolin_.
+- `results/pangolin/pango_report.csv` → the results from _Pangolin_.
- (optional) `results/civet/master_metadata.csv` → the results from the _civet_ analysis, namely the catchment (or cluster) that each of our samples was grouped into.
Each of these tables stores different pieces of information, and it would be great if we could _integrate_ them together, to facilitate their interpration and generate some visualisations.
@@ -756,10 +756,13 @@ You can export these plots from within RStudio using the "Export" button on the
:::{.callout-exercise}
**Annotating Phylogenetic Tree**
-Use the file `report/consensus_metrics.tsv` (created in the Data Integration exercise) to annotate your phylogenetic tree in FigTree and display the lineages assigned to each sample as the tip labels.
+Using FigTree, import two annotation files (**File** > **Import Annotations...**):
-If you need a reminder of how to load annotations in FigTree, check the "[Building phylogenetic trees](../02-isolates/04-phylogeny.md#visualising-trees)" section of the materials.
+- `report/consensus_metrics.tsv`, which was created in the Data Integration exercise.
+- `resources/eqa_collaborators/metadata.tsv`, which has lineage assignment for EQA samples sequenced by other labs.
+After importing both files, annotate your phylogenetic tree to display the lineages assigned to each sample as the tip labels.
+See @sec-figtree, if you need a reminder of how to annotate trees using FigTree.
:::
diff --git a/materials/05-software/03-software_setup.md b/materials/05-software/03-software_setup.md
index 19a428a..5fe3b50 100644
--- a/materials/05-software/03-software_setup.md
+++ b/materials/05-software/03-software_setup.md
@@ -79,6 +79,22 @@ Another way to run Linux within Windows (or macOS) is to install a Virtual Machi
However, this is mostly suitable for practicing and **not suitable for real data analysis**.
Details for installing Ubuntu on VirtualBox is given on [this page](https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview).
+Make sure to do these things, while you are setting it up:
+
+- In Step 2 "Create a user profile": make sure to tick the Guest Additions option.
+- In Step 2 "Define the Virtual Machine’s resources":
+ - Assign at least 4 CPUs and 16000MB of RAM. At the very minimum you need 2 CPUs to run an Ubuntu VM.
+ - Set at least 100GB as disk size, more if you have it available (note, this will not take 100GB of space on your computer, but it will allow using up to a maximum of that value, which is useful as we are working with sequencing data).
+
+Once the installation completes, login to the Ubuntu Virtual machine, open a terminal and run the following commands:
+
+```bash
+sudo -
+usermod -a -G sudo YOUR-USERNAME-HERE
+```
+
+Then close the terminal and restart the virtual machine.
+These commands will add your newly created user to the "sudo" (admin) group.
:::
diff --git a/utils/eqa/scripts_illumina/03-missing_intervals.sh b/utils/eqa/scripts_illumina/03-missing_intervals.sh
index d09e10e..3434232 100644
--- a/utils/eqa/scripts_illumina/03-missing_intervals.sh
+++ b/utils/eqa/scripts_illumina/03-missing_intervals.sh
@@ -1,3 +1,3 @@
#!/bin/bash
-seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
\ No newline at end of file
+seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv
\ No newline at end of file
diff --git a/utils/eqa/scripts_nanopore/03-missing_intervals.sh b/utils/eqa/scripts_nanopore/03-missing_intervals.sh
index d09e10e..3434232 100644
--- a/utils/eqa/scripts_nanopore/03-missing_intervals.sh
+++ b/utils/eqa/scripts_nanopore/03-missing_intervals.sh
@@ -1,3 +1,3 @@
#!/bin/bash
-seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
\ No newline at end of file
+seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv
\ No newline at end of file