Skip to content

Commit

Permalink
add mamba activate commands
Browse files Browse the repository at this point in the history
  • Loading branch information
tavareshugo committed Oct 24, 2024
1 parent 5d67b55 commit 2a3312d
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 13 deletions.
14 changes: 11 additions & 3 deletions materials/22-pract.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@ The following practical simulates the situation when we know what we are looking
- If you don't have your own reference genome, try to find one in public databases that is potentially the closest to your geographical location but also a recent isolate.
:::

:::{.callout-important}
#### Activate your software environment

For this practical we need to activate the software environment called `alignment`:

```bash
mamba activate alignment
```
:::

### Standard quality control and pre-processing of shotgun metagenomics raw data

Before we perform any analysis on the raw data it is important to perform the basic quality control checks and if needed certain pre-processing and filtering steps to ensure that we are working with high quality data. When you open a new terminal in the training environment, your working directory should be the `~/Course_Materials` folder. You can always check where you are in the filesystem by using the `pwd` command or just by checking your `bash` prompt.
Expand All @@ -33,11 +43,9 @@ FastQC generatesgraphical output report in `.html` format. This is often placed
Open the Html files and go through the graphs, discuss what you see. For a future reference, and to see more examples (good and bad data), please visit the [FastQC website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
:::

The next standard pre-processing step is to remove adapter, primer and other unwanted sequences from your reads. These sequence contents are side products of the next-generation sequencing technique, and as they are not coming from the template DNA, they can interfere from many downstream pipeline steps. One of the most commonly used tool for this purpose is [cutadapt](https://cutadapt.readthedocs.io/en/stable/). Check the command line help for the application and run the filtering step on the raw sequencing data (please note, that you have to activate the `metagenomics` conda environment if it is not yet active).
The next standard pre-processing step is to remove adapter, primer and other unwanted sequences from your reads. These sequence contents are side products of the next-generation sequencing technique, and as they are not coming from the template DNA, they can interfere from many downstream pipeline steps. One of the most commonly used tool for this purpose is [cutadapt](https://cutadapt.readthedocs.io/en/stable/). Check the command line help for the application and run the filtering step on the raw sequencing data.

```bash
conda activate metagenomics

cutadapt -h

cutadapt -a CTGTCTCTTATACACATCT -A ATGTGTATAAGAGACA \
Expand Down
21 changes: 11 additions & 10 deletions materials/32-pract.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,25 @@ title: Practical

In this practical we are simulating a virus infection that is caused by a completely unknown virus. In this simulated dataset a novel genome is hidden in the raw data with a huge background of human genome sequence. This simulates a real scenario when a pathogen is in the blood or other (otherwise sterile) body fluid. To make the bioinformatics step faster we generated the human "background" from chromosome 22, so the database will be relatively small.

:::{.callout-important}
#### Activate your software environment

For this practical we need to activate the software environment called `assembly`:

```bash
mamba activate assembly
```
:::


### QC and Pre-processing

The raw date quality control and pre-processing is going the same way as we did with the mixed community data, for the details on these steps, please refer to [Day 2 practical material](22-pract.html#standard-quality-control-and-pre-processing-of-shotgun-metagenomics-raw-data).

```bash
# Deactivate the metagenomics environment if you are in that
conda deactivate

cd sg_raw_data/
fastqc unknown_pathogen_R1.fastq unknown_pathogen_R2.fastq

conda activate metagenomics

cutadapt -a CTGTCTCTTATACACATCT -A ATGTGTATAAGAGACA \
-o unknown_pathogen_noadapt_R1.fastq -p unknown_pathogen_noadapt_R2.fastq \
unknown_pathogen_R1.fastq unknown_pathogen_R2.fastq
Expand Down Expand Up @@ -116,11 +122,6 @@ The de novo assembly QC and pre-processing has the same first steps as any other
As the assembly step time and resource need is correlating significantly with the amount of input data, we can use methods to reduce the amount of raw reads without loosing important data. We will use the `clumpify.sh` script (from the `bbmap` package) to remove duplicates (PCR or optical). This algorithm removes completely matching reads or read-pairs.

```bash

# Be sure you have the metagenomics environment activated
# if not...
conda activate metagenomics

clumpify.sh

clumpify.sh in=mixedcomm_forward_paired.fq.gz in2=mixedcomm_reverse_paired.fq.gz \
Expand Down
11 changes: 11 additions & 0 deletions materials/42-pract.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ title: Practical

During this practical session we will bin the contigs to multiple clusters forming metagenome assembled genomes (MAGs). These MAGs ideally represent individual genomes, we will assign quality measures for these and do de novo gene annotation and specific gene discovery.

:::{.callout-important}
#### Activate your software environment

For this practical we need to activate the software environment called `mags`:

```bash
mamba activate mags
```
:::


### Reference independent binning

We will use the `maxbin2` algorithm to reconstruct the individual genomes of those bacteria that were sequenced in the artificial mixed community. The method uses genomic properties and contig coverage values to find clusters of contigs that are potentially coming from the same source (same genome).
Expand Down

0 comments on commit 2a3312d

Please sign in to comment.