Traditionally you would need to run at least three separate steps to map reads to the reference genome, identify variants, and filter out low quality variants. The snippy
software package wraps these three steps together and simplifies the process. Note that snippy
works well for bacterial genomes like we are analyzing, but is not well-suited for identifying variants in more complex eurkaryotic genomes.
At this point you should be learning many of the commands in BASH. In this tutorial, we will not be providing as many explicit commands and will expect you to lean back on the skills you built in the earlier exercises to come up with the commands yourself. We have tried to bold many of the actions you need to take to make sure you do not miss anything. If you get stuck ask for help!
-
You will perform this analysis on the Biomix cluster, so use the
ssh
command to login to Biomix as you learned previously. -
The analysis will be run in a new sub-directory of the same OUTDIR location we setup in Tutorial 1. Change directory to that location.
-
You will need to collect a few file paths to start the analysis.
- Find the trimmed read 1 and read 2 fastq files (fq.gz) that were produced by TrimGalore! in the previous analysis.
- Use the
realpath
command to get the complete file path to each (you will use this in step 4.)
-
There is a script provided in the
scripts
folder calledB-variant.slurm
. Open this file for editing. Examine the script and try to understand what it is doing. You will notice it has a very similar strucutre to the trmming script, but with some differences near the end. We will go over the script together in the workshop. Make edits to theSAMPLE_NAME
,READ1
, andREAD2
variables in the##SAMPLE INFORMATION
section.SAMPLE_NAME
should match what was in your A-trim.slurm script (i.e. the first part of the read filenames)READ1
andREAD2
should be the paths to the trimmed fastq files (fq.gz) which you recorded the paths of in #3 above.
-
You are now ready to submit your script to slurm.
-
Monitor the queue and make sure your job is running.
-
When your job completes. Check the slurm output file and make sure there were not any errors during the run.
-
Examine the new files that are produced. They should be in a subdirectory (named after the sample) of a new directory called
4-snippy
. Change to that directory and list the contents. Snippy does several steps, so it produces a lot of output files... Some are useful to you and others are just intermediate steps that you might not need. We will explore these together in the workshop, but in the meantime try to explore the files and see which ones may be useful... NOTE: all files except for the.bam
file can be opened with theless
command.