Skip to content

ISAAC Intro

Meg Staton edited this page Oct 26, 2022 · 4 revisions

Get set up on ISAAC

  • Navigate to https://portal.acf.utk.edu/accounts/request
  • Click on “I have a UT NetID”
  • Authenticate with NetID, password and Duo two factor
  • A form will then be presented with information collected from the University. There are at least two fields that the user needs to update: Salutation (Mr., Dr. etc.) Citizenship, and maybe another. Look for the required fields marked with an *
  • Once the form is filled out click through to the next item
  • Type that project name ISAAC-UTK0208 (with the alphabetic characters in uppercase) to request to be added. That should be it.

Notes

  • You should never run jobs on the login node! It is only to set up your scripts to launch your jobs through the scheduler.
  • Keep the documentation handy!
  • The user portal will enable you to see what projects you are a part of and where you can store data and how much
  • You should be a part of ISAAC-UTK0208 and see storage at /lustre/isaac/proj/UTK0138
  • By default, SLURM scheduler assumes the working directory to be the directory from which the jobs are being submitted

Log into Isaac Next Gen. After your password, it will send you a Duo push.

ssh <yourusername>@login.isaac.utk.edu

The software system is just like spack, only the command is "module". (Its actually spack but the command module is what is used in the documentation). Lets see what is available.

module avail

BWA is something we have used before, let's see if its installed

module avail bwa
module load bwa
bwa

Go to the project directory

/lustre/isaac/proj/UTK0208/

You will see a directory set up for our practice. cd into it and create a directory for your lab

cd isaac_practice
mkdir <yourusername>
cd <yourusername>

I've already downloaded our old solenopsis data, the genome, and indexed the genome with bwa.

Simple example - single sbatch command on the command line

You can load the software into your environment then run the job and it will inherit your environment (i.e. the software will still be loaded)

Lets just get a quick test command going

echo Worked! > output.txt

Delete output.txt and lets try to run that through the scheduler.

sbatch -n 1 -N 1 -A ISAAC-UTK0208 -p condo-epp622 -q condo -t 00:01:00 --wrap="echo Worked! > output.txt"

The flags tell the job scheduler all about your job, including what kind of resources it needs. This can be tricky - you need to know how many threads and how much RAM your job will take, so that you request a sufficient amount.

To see where our job is in the queue

squeue -u mstaton1

Did it work? Do you see output.txt? What is the slurm file that got created?

Now let's try a real command. Increase time to 10min and lets try to run bwa mem through the scheduler.

sbatch -n 1 -N 1 -A ISAAC-UTK0208 -p condo-epp622 -q condo -t 00:10:00 --wrap="bwa mem -o SRR6922311.sam /lustre/isaac/proj/UTK0032/data/solenopsis/UNIL_Sinv_3.0.Chr.fasta /lustre/isaac/proj/UTK0032/data/solenopsis/SRR6922311_1.fastq"

Simple example - single command in an sbatch script

Its more typical and more readable to create a submission script

In simple-bwa.qsh, put

#!/bin/bash
#SBATCH -J bwa
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:10:00

module load bwa

bwa mem \
-o SRR6922311.sam \
/lustre/isaac/proj/UTK0032/data/solenopsis/UNIL_Sinv_3.0.Chr.fasta \
/lustre/isaac/proj/UTK0032/data/solenopsis/SRR6922311_1.fastq

The directives at the top tell the job scheduler all about your job, just like the flags did. bwa mem by default only needs one thread, so we'll keep that.

Submit the script

sbatch simple-bwa.qsh

You can again track progress with squeue and look at the output files.

Multiple jobs - single sbatch command inside a for loop

We learned about for loops in class, and we can use them here too.

First link to the files

ln -s /lustre/isaac/proj/UTK0032/data/solenopsis/*fastq .

Now we can build a for loop

for FILE in *.fastq
do
echo ${FILE}
sbatch -n 1 -N 1 -A ISAAC-UTK0208 -p condo-epp622 -q condo -t 00:03:00 --wrap="bwa mem -o ${FILE}.sam /lustre/isaac/proj/UTK0032/data/solenopsis/UNIL_Sinv_3.0.Chr.fasta ${FILE}"
sleep 1 # pause to be kind to the scheduler
done

This gets super messy fast, better to move to a script again. But in this case, a for loop won't work if you want the jobs to run in parallel. If you put all the jobs in the background, the main script will complete, the scheduler will think you are done, and then your jobs will be killed before finishing. Instead, we are going to use an array.

Multiple jobs - array script

Remove the slurm and sam files.

Task arrays are great but the scheduler assumes all data can be referred to by a convenient range of numbers, in this case 1 to 4. Let see how this works. In array-example.qsh, put

#!/bin/bash
#SBATCH -J bwa
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:10:00
#SBATCH --array=1-4

echo "${SLURM_ARRAY_TASK_ID}" > ${SLURM_ARRAY_TASK_ID}.txt

Note we need ntasks=4 because we have 4 tasks.

Submit the script

sbatch array-example.qsh

New files are created, showing the array of 4 jobs. And new slurm output file names.

But our files aren't named 1-4. We can make aliases OR we can use some more clever sed.

Start by putting those names in a file:

ls *fastq > filenames.txt

You can check it worked with nano or cat.

Now tweaking the submission script. First, we'll use sed to take the number provided from ${SLURM_ARRAY_TASK_ID} and pull out the corresponding line number in filenames.txt. Next, we'll use sed again to create the output filename (using a regex and the input filename) In array-bwa.qsh, put

#!/bin/bash
#SBATCH -J bwa
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:10:00
#SBATCH --array=1-4

infile=$(sed -n -e "${SLURM_ARRAY_TASK_ID} p" filenames.txt)

outfile=$(echo $infile | sed 's/_1.fastq/.sam/')

bwa mem \
-o ${outfile} \
/lustre/isaac/proj/UTK0032/data/solenopsis/UNIL_Sinv_3.0.Chr.fasta \
${infile}

Submit and see if it worked.