minor fixes; fix #37; fix #31

cambiotraining · Jan 30, 2024 · 4989622 · 4989622
1 parent ae8761b
commit 4989622
Show file tree

Hide file tree

Showing 7 changed files with 58 additions and 27 deletions.
diff --git a/materials/02-isolates/04-phylogeny.md b/materials/02-isolates/04-phylogeny.md
@@ -504,9 +504,11 @@ To highlight these:
 - Select the branch corresponding to the base of the group of samples classified as _Alpha_. This should highlight all those branches. 
 - Click the "Highlight" button at the top and choose a colour. 
 
+<!-- 
 The final result should look similar to what is shown here.
 
-![TODO]()
+![TODO]() 
+-->
 
 **Question 4**
 
@@ -518,9 +520,12 @@ treetime --tree results/iqtree/india.treefile --dates sample_annotation.tsv --al
 
 Once complete, we can open the `india.nexus` tree with _FigTree_.
 We can annotate the internal nodes of the tree with the dates inferred by `treetime` by clicking on the <kbd>Node Labels</kbd> menu on the left and selecting "Display" to be "date". 
+
+<!-- 
 This should result in a tree similar to the one shown here.
 
 ![TODO]()
+-->
 
 **Question 5**
 

diff --git a/materials/03-case_studies/01-switzerland.md b/materials/03-case_studies/01-switzerland.md
@@ -304,10 +304,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
 ```
 
 This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`. 
-The table looks like this: 
+The table looks like this (only the top few rows are shown): 
 
 ```
-TODO
+seqID  patternName  pattern  strand  start  end
+CH01   N+           N+       +       1      54
+CH01   N+           N+       +       1193   1264
+CH01   N+           N+       +       4143   4322
+CH01   N+           N+       +       6248   6294
+CH01   N+           N+       +       7561   7561
+CH01   N+           N+       +       9243   9311
+CH01   N+           N+       +       10367  10367
+CH01   N+           N+       +       11361  11370
+CH01   N+           N+       +       13599  13613
 ```
 
 We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval. 

diff --git a/materials/03-case_studies/02-southafrica.md b/materials/03-case_studies/02-southafrica.md
@@ -382,21 +382,19 @@ seqkit locate -i -P -G -M -r -p "N+" report/consensus.fa > results/missing_inter
 ```
 
 This software outputs a tab-delimited table, which we saved as `results/missing_intervals.tsv`. 
-The table looks like this: 
-
-```
-seqID	patternName	pattern	strand	start	end
-ZA01	N+	N+	+	1	54
-ZA01	N+	N+	+	22771	22926
-ZA01	N+	N+	+	23603	23835
-ZA01	N+	N+	+	26948	26948
-ZA01	N+	N+	+	26968	27137
-ZA01	N+	N+	+	29801	29867
-ZA02	N+	N+	+	1	54
-ZA02	N+	N+	+	22771	22921
-ZA02	N+	N+	+	23603	23835
-
-... MORE LINES OMITTED ...
+The table looks like this (only the top few rows are shown): 
+
+```
+seqID  patternName  pattern  strand  start  end
+ZA01   N+           N+       +       1      54
+ZA01   N+           N+       +       22771  22926
+ZA01   N+           N+       +       23603  23835
+ZA01   N+           N+       +       26948  26948
+ZA01   N+           N+       +       26968  27137
+ZA01   N+           N+       +       29801  29867
+ZA02   N+           N+       +       1      54
+ZA02   N+           N+       +       22771  22921
+ZA02   N+           N+       +       23603  23835
 ```
 
 We opened this file `missing_intervals.tsv` in _Excel_ and quickly calculated the length of each interval. 

diff --git a/materials/03-case_studies/03-eqa.md b/materials/03-case_studies/03-eqa.md
@@ -301,7 +301,7 @@ In our example, if our version of _Guppy_ was 6.1.5 we would use the same model
 #### Illumina
 
 ```bash
-nextflow run nf-core/viralrecon 
+nextflow run nf-core/viralrecon \
   -r 2.6.0 -profile singularity \
   --max_memory '15.GB' --max_cpus 4 \
   --platform illumina \
@@ -483,7 +483,7 @@ The meaning of the options is detailed in [`seqkit`'s documentation](https://bio
 
 :::{.callout-exercise}
 
-Open the file you created in the previous step (`results/consensus_miss_intervals.tsv`) in a spreadsheet program. 
+Open the file you created in the previous step (`results/missing_intervals.tsv`) in a spreadsheet program. 
 Create a new column with the length of each interval (`end - start + 1`). 
 
 Note if any missing intervals are larger than 1Kb, and whether they overlap with the _Spike_ gene. 
@@ -558,7 +558,7 @@ For the **Pangolin** analysis:
     <DATA_UPDATE_COMMAND>
 
     # run pangolin
-    pangolin --outdir results/pangolin/ --outfile report.csv <INPUT>
+    pangolin --outdir results/pangolin/ --outfile pango_report.csv <INPUT>
     ```
 
 - Fix the code:
@@ -569,7 +569,7 @@ For the **Pangolin** analysis:
 - Save the file.
 - Activate the software environment: `mamba activate pangolin`.
 - Run your script using `bash`.
-- Once the analysis completes, open the file `results/pangolin/report.csv` in _Excel_ and see if there were any samples for which the analysis failed. 
+- Once the analysis completes, open the file `results/pangolin/pango_report.csv` in _Excel_ and see if there were any samples for which the analysis failed. 
   If there were any failed samples, check if they match the report from _Nextclade_.
 
 :::
@@ -722,7 +722,7 @@ At this point in our analysis, we have several tables with different pieces of i
 - `sample_info.csv` → the original table with metadata for our samples. 
 - `results/viralrecon/multiqc/medaka/summary_variants_metrics_mqc.csv` → quality metrics from the _MultiQC_ report generated by the _viralrecon_ pipeline.
 - `results/nextclade/nextclade.tsv` → the results from _Nextclade_. 
-- `results/pangolin/report.csv` → the results from _Pangolin_.
+- `results/pangolin/pango_report.csv` → the results from _Pangolin_.
 - (optional) `results/civet/master_metadata.csv` → the results from the _civet_ analysis, namely the catchment (or cluster) that each of our samples was grouped into.
 
 Each of these tables stores different pieces of information, and it would be great if we could _integrate_ them together, to facilitate their interpration and generate some visualisations. 
@@ -756,10 +756,13 @@ You can export these plots from within RStudio using the "Export" button on the
 :::{.callout-exercise}
 **Annotating Phylogenetic Tree**
 
-Use the file `report/consensus_metrics.tsv` (created in the Data Integration exercise) to annotate your phylogenetic tree in FigTree and display the lineages assigned to each sample as the tip labels. 
+Using FigTree, import two annotation files (**File** > **Import Annotations...**):
 
-If you need a reminder of how to load annotations in FigTree, check the "[Building phylogenetic trees](../02-isolates/04-phylogeny.md#visualising-trees)" section of the materials. 
+- `report/consensus_metrics.tsv`, which was created in the Data Integration exercise.
+- `resources/eqa_collaborators/metadata.tsv`, which has lineage assignment for EQA samples sequenced by other labs. 
 
+After importing both files, annotate your phylogenetic tree to display the lineages assigned to each sample as the tip labels. 
+See @sec-figtree, if you need a reminder of how to annotate trees using FigTree.
 :::
 
 

diff --git a/materials/05-software/03-software_setup.md b/materials/05-software/03-software_setup.md
@@ -79,6 +79,22 @@ Another way to run Linux within Windows (or macOS) is to install a Virtual Machi
 However, this is mostly suitable for practicing and **not suitable for real data analysis**.
 
 Details for installing Ubuntu on VirtualBox is given on [this page](https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview).
+Make sure to do these things, while you are setting it up:
+
+- In Step 2 "Create a user profile": make sure to tick the Guest Additions option.
+- In Step 2 "Define the Virtual Machine’s resources": 
+  - Assign at least 4 CPUs and 16000MB of RAM. At the very minimum you need 2 CPUs to run an Ubuntu VM.
+  - Set at least 100GB as disk size, more if you have it available (note, this will not take 100GB of space on your computer, but it will allow using up to a maximum of that value, which is useful as we are working with sequencing data).
+
+Once the installation completes, login to the Ubuntu Virtual machine, open a terminal and run the following commands: 
+
+```bash
+sudo -
+usermod -a -G sudo YOUR-USERNAME-HERE
+```
+
+Then close the terminal and restart the virtual machine. 
+These commands will add your newly created user to the "sudo" (admin) group. 
 :::
 
 

diff --git a/utils/eqa/scripts_illumina/03-missing_intervals.sh b/utils/eqa/scripts_illumina/03-missing_intervals.sh
@@ -1,3 +1,3 @@
 #!/bin/bash
 
-seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
+seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv
diff --git a/utils/eqa/scripts_nanopore/03-missing_intervals.sh b/utils/eqa/scripts_nanopore/03-missing_intervals.sh
@@ -1,3 +1,3 @@
 #!/bin/bash
 
-seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > report/missing_intervals.tsv
+seqkit locate --ignore-case --only-positive-strand --hide-matched -r -p "N+" report/consensus.fa > results/missing_intervals.tsv