Skip to content

Commit

Permalink
Merge pull request #7 from csc-training/IonTorrentJYU23
Browse files Browse the repository at this point in the history
Update tool names
  • Loading branch information
helijuottonen committed Sep 12, 2023
2 parents b3f9973 + b5d08b8 commit 086f7cc
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 12 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**/.DS_Store
19 changes: 10 additions & 9 deletions docs/IonTorrent/Exercises_day2.html
Original file line number Diff line number Diff line change
Expand Up @@ -853,7 +853,7 @@ <h3><strong>Getting the data into <code>phyloseq</code></strong></h3>
<em>Parameters</em> that these files are in the correct locations under
<em>Input files</em> and correct if needed.<br />
Next, run the tool
<code>Microbial amplicon dta preprocessing for OTU / Generate input files for phyloseq</code>
<code>Microbial amplicon dta preprocessing for OTU / Cluster sequences to OTUs and classify them</code>
so that you select the correct data type (<code>16S or 18S</code>) and
set a cut-off of 0.03 (i.e. 3%, corresponding to 97% sequence
similarity) for OTU clustering.</p>
Expand All @@ -869,7 +869,7 @@ <h3><strong>Getting the data into <code>phyloseq</code></strong></h3>
<em>Selected files</em> choose the tab <em>Phenodata</em>.</p></li>
<li><p>In the column <code>description</code>, write the sample names as
you want them to appear in result plots. For example, <em>HPc1</em>,
<em>HPc2</em> etc.</p></li>
<em>HPc2</em> etc. The names must be unique for each sample.</p></li>
<li><p>Create new columns called <code>site</code> and
<code>bagging</code> by clicking <code>+ Add column</code>. You delete
the column <code>chiptype</code> or adjust the width of the columns to
Expand All @@ -895,8 +895,8 @@ <h3><strong>Getting the data into <code>phyloseq</code></strong></h3>
constaxonomy file <code>file.opti_mcc.0.03.cons.taxonomy</code>. Select
the tool <code>Convert Mothur files into phyloseq object</code>. In
<em>Parameters</em>, specify the phenodata column including unique IDs
for each community profile (the column <code>sample</code>). Run the
tool.</p>
for each community profile (the column <code>description</code>). Run
the tool.</p>
<p>This tool produces two files:</p>
<ul>
<li>a <code>phyloseq</code> object (stored as <code>ps.Rda</code>)</li>
Expand Down Expand Up @@ -932,10 +932,10 @@ <h3><strong>Tidying and inspecting the data</strong></h3>
get an overview of the distribution of OTUs in our data.</p>
<ol start="3" style="list-style-type: lower-roman">
<li>Selecting <code>ps_ind.Rda</code>, run the
<code>Additional prevalence summaries</code> tool. This will produce
both a prevalence plot (<code>ps_prevalence.pdf</code>) and a text
summary (<code>ps_low.txt</code>). The plot has a prevalence threshold
of 5% drawn as a default guess for prevalence filtering.</li>
<code>Prevalence summaries</code> tool. This will produce both a
prevalence plot (<code>ps_prevalence.pdf</code>) and a text summary
(<code>ps_low.txt</code>). The plot has a prevalence threshold of 5%
drawn as a default guess for prevalence filtering.</li>
</ol>
<pre><code>How many doubletons are there in the data set?
Do you have an advance idea about what the term &quot;prevalence&quot; refers to?
Expand Down Expand Up @@ -996,7 +996,8 @@ <h3><strong>Taking a closer look at patterns</strong></h3>
<li>1 in Relative abundance cut-off threshold (%) for excluding
OTUs</li>
<li>Class as the level of biological organisation</li>
<li>site as the phenodata variable 1 for plot faceting</li>
<li>site as the phenodata variable 1 for dividing the plot into
subplots</li>
</ul>
<p>The result should look close to this (click on the thumbnail to
expand the image):</p>
Expand Down
6 changes: 3 additions & 3 deletions eLena_md/IonTorrent/Exercises_IonTorrent_day2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ opts_knit$set(width=75)
**Step 16. Creating `phyloseq` input files**

Choose `chimeras.removed.fasta.gz`, `chimeras.removed.count_table` and `sequences-taxonomy-assignment.txt`. Check in *Parameters* that these files are in the correct locations under *Input files* and correct if needed.
Next, run the tool `Microbial amplicon dta preprocessing for OTU / Generate input files for phyloseq` so that you select the correct data type (`16S or 18S`) and set a cut-off of 0.03 (i.e. 3%, corresponding to 97% sequence similarity) for OTU clustering.
Next, run the tool `Microbial amplicon dta preprocessing for OTU / Cluster sequences to OTUs and classify them` so that you select the correct data type (`16S or 18S`) and set a cut-off of 0.03 (i.e. 3%, corresponding to 97% sequence similarity) for OTU clustering.

```
Why are we using a dissimilarity threshold of 3%?
Expand Down Expand Up @@ -82,7 +82,7 @@ sequences in a bacterial dataset - isn't that a little strange?

There are a few more additional tools for data tidying. Let's first get an overview of the distribution of OTUs in our data.

iii) Selecting `ps_ind.Rda`, run the `Additional prevalence summaries` tool. This will produce both a prevalence plot (`ps_prevalence.pdf`) and a text summary (`ps_low.txt`). The plot has a prevalence threshold of 5% drawn as a default guess for prevalence filtering.
iii) Selecting `ps_ind.Rda`, run the `Prevalence summaries` tool. This will produce both a prevalence plot (`ps_prevalence.pdf`) and a text summary (`ps_low.txt`). The plot has a prevalence threshold of 5% drawn as a default guess for prevalence filtering.

```
How many doubletons are there in the data set?
Expand Down Expand Up @@ -135,7 +135,7 @@ This will produce a file called `ps_relabund.Rda`. Select it and run the `OTU re

- 1 in Relative abundance cut-off threshold (%) for excluding OTUs
- Class as the level of biological organisation
- site as the phenodata variable 1 for plot faceting
- site as the phenodata variable 1 for dividing the plot into subplots

The result should look close to this (click on the thumbnail to expand the image):

Expand Down
18 changes: 18 additions & 0 deletions eLena_md/updating_instructions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Updating the chipster-microbial repo (CSC internal instructions):

1. Access:

Ask an owner for access to the csc-training repo if you are not already a member.

2. Structure of the repo:

eLena_md: R markdown files for the exercises etc.
exercises-d1, exercises-d2: MiSeq 16S materials
IonTorrent: Ion Torrent materials
docs: html versions of the same markdown files (can be linked to course participants)
README.md: determined what is shown on the front page

3. Editing:

Because the markdown files need to be converted to html, the easiest way is to edit the markdown files on your own computer, convert into html and then upload both files to Github (markdown in the folder eLena_md, html in the folder docs). So start with git clone on your own terminal. Create a new branch, make the changes in the markdown file (for example in RStudio), use 'knit' to convert into html, move the html file to the docs folder. Add, commit, push, create a pull request, merge.

0 comments on commit 086f7cc

Please sign in to comment.