From afd370908112f0cb414559e616de6aa42578e783 Mon Sep 17 00:00:00 2001 From: Heli Juottonen Date: Wed, 16 Aug 2023 10:17:20 +0300 Subject: [PATCH] Minor changes to Ion Torrent exercises --- docs/IonTorrent/Exercises_day1.html | 2 +- docs/IonTorrent/Exercises_day2.html | 12 +++++++++--- eLena_md/IonTorrent/Exercises_IonTorrent_day1.Rmd | 2 +- eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd | 4 ++-- 4 files changed, 13 insertions(+), 7 deletions(-) diff --git a/docs/IonTorrent/Exercises_day1.html b/docs/IonTorrent/Exercises_day1.html index 3505e1f..6023971 100644 --- a/docs/IonTorrent/Exercises_day1.html +++ b/docs/IonTorrent/Exercises_day1.html @@ -1006,7 +1006,7 @@

Day 1: Data pre-processing

1) they aligned outside the common alignment range, or 2) they contained too long homopolymers?

Step 11. Remove gaps and overhangs from the alignment. If -this creates new identical sequences, remove them

+this creates new identical sequences, they will be removed

Choose screened.fasta.gz and screened.count_table and run the tool Filter sequence alignment.

diff --git a/docs/IonTorrent/Exercises_day2.html b/docs/IonTorrent/Exercises_day2.html index 5468a4b..d251de5 100644 --- a/docs/IonTorrent/Exercises_day2.html +++ b/docs/IonTorrent/Exercises_day2.html @@ -849,7 +849,10 @@

Getting the data into phyloseq

files

Choose chimeras.removed.fasta.gz, chimeras.removed.count_table and -sequences-taxonomy-assignment.txt. Next, run the tool +sequences-taxonomy-assignment.txt. Check in +Parameters that these files are in the correct locations under +Input files and correct if needed.
+Next, run the tool Microbial amplicon dta preprocessing for OTU / Generate input files for phyloseq so that you select the correct data type (16S or 18S) and set a cut-off of 0.03 (i.e. 3%, corresponding to 97% sequence @@ -940,8 +943,11 @@

Tidying and inspecting the data

  • Proportional prevalence filtering (for removing OTUs that occur in less than specific % of samples)
  • -

    Selecting ps_ind.Rda, run the former tool, making sure -that both singletons and doubletons are removed.

    +

    Selecting ps_ind.Rda, run the tool +Remove OTUs with 0-2 occurrences, making sure that both +singletons and doubletons are removed. Feel free to test the prevalence +filtering tool too if you have time, but it is not necessary for the +following exercises.

    Why would we want to remove singletons and doubletons from the data?
     Can you think of situations where these should be kept as part of the dataset?

    Step 20. Sequence numbers, rarefaction curve and alpha diff --git a/eLena_md/IonTorrent/Exercises_IonTorrent_day1.Rmd b/eLena_md/IonTorrent/Exercises_IonTorrent_day1.Rmd index 22ad398..84a9c8c 100644 --- a/eLena_md/IonTorrent/Exercises_IonTorrent_day1.Rmd +++ b/eLena_md/IonTorrent/Exercises_IonTorrent_day1.Rmd @@ -168,7 +168,7 @@ Were these sequences removed because: 2) they contained too long homopolymers? ``` -**Step 11. Remove gaps and overhangs from the alignment. If this creates new identical sequences, remove them** +**Step 11. Remove gaps and overhangs from the alignment. If this creates new identical sequences, they will be removed** Choose `screened.fasta.gz` and `screened.count_table` and run the tool `Filter sequence alignment`. diff --git a/eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd b/eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd index 04ded48..df17579 100644 --- a/eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd +++ b/eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd @@ -26,7 +26,7 @@ opts_knit$set(width=75) **Step 16. Creating `phyloseq` input files** -Choose `chimeras.removed.fasta.gz`, `chimeras.removed.count_table` and `sequences-taxonomy-assignment.txt`. +Choose `chimeras.removed.fasta.gz`, `chimeras.removed.count_table` and `sequences-taxonomy-assignment.txt`. Check in *Parameters* that these files are in the correct locations under *Input files* and correct if needed. Next, run the tool `Microbial amplicon dta preprocessing for OTU / Generate input files for phyloseq` so that you select the correct data type (`16S or 18S`) and set a cut-off of 0.03 (i.e. 3%, corresponding to 97% sequence similarity) for OTU clustering. ``` @@ -93,7 +93,7 @@ iv) There are two further tools for data tidying: - Remove OTUs with 0-2 occurrences - Proportional prevalence filtering (for removing OTUs that occur in less than specific % of samples) -Selecting `ps_ind.Rda`, run the former tool, making sure that both singletons and doubletons are removed. +Selecting `ps_ind.Rda`, run the tool `Remove OTUs with 0-2 occurrences`, making sure that both singletons and doubletons are removed. Feel free to test the prevalence filtering tool too if you have time, but it is not necessary for the following exercises. ``` Why would we want to remove singletons and doubletons from the data?