Skip to content

proc_bash

rcruces edited this page Aug 31, 2023 · 1 revision

Bash Basics

An all encompassing tutorial of Bash is far beyond the scope of this wiki. For Bash/Shell basics I refer you to this excellent tutorial. I highly recommend you have at least a cursory look at that page before you continue, as I'll assume you're familliar with all concepts presented there.

MICA Bash Profile

First things first, you'll need to set up your ~/.bashrc. A minimalist example can be found in micasoft/bash/example_bashrc. You can replace your bashrc with the micasoft version. Make sure to replace every occurence of "reinder" with the name you use on our drives. This bashrc relies on noelsoft, micasoft, and micaopen. The latter two can be cloned from Github, the former you'll have to copy from a labmate.

Son of Grid Engine (SGE)

The basics

Sometimes, you'll have to process a lot of data. Far more than even one of the MICA monster computers can handle. SGE is a tool for parallel processing on the MICA or BIC network so you can make full use of all our computational resources. Lets start with an overview of the commands. For a complete overview please see the SGE man pages (e.g. qsub man page).

qsub : Submits jobs to SGE.

qstat : Show the current status of SGE.

qdel : Delete submitted/running jobs.

These three commands are the basics of controlling SGE. Before we start with a tutorial on submitting jobs, lets first check whether there's space in the mica queue. Run qstat -f to see all nodes on the network. This returns a rather large list which includes other labs' nodes. We can filter just the MICA nodes with qstat -f | grep mica.q. You'll now see a list that looks something like this:

[Wed Nov 27 12:23] reinder@login2: /data/mica1/03_projects/reinder
$ qstat -f | grep mica.q
[email protected] BIP   0/0/36         10.80    lx-amd64     
[email protected] BIP   0/0/24         11.07    lx-amd64     
[email protected] BIP   0/0/30         16.59    lx-amd64     
[email protected]. BIP   0/0/24         19.91    lx-amd64     
[email protected]. BIP   0/0/30         53.54    lx-amd64      aA
[email protected] BIP   0/0/30         8.49     lx-amd64     
[email protected] BIP   0/3/36         3.43     lx-amd64     
[email protected]. BIP   0/0/30         11.68    lx-amd64     

The first column contains the name of the computer. The third column tells us how many threads are reserved by SGE, how many threads are used by SGE, and the maximum number of threads available to SGE, in that order. In this example, there are three threads being used on varro, and none on any other machine. The fourth column denotes the number the number of threads requested for usage on that computer. We can see that oncilla (at 53.54) must have a lot of processing going on outside of SGE. In any case, there is space on mica.q for our tutorial so we can continue.

Lets make a simple script, and submit it to the MICA queue. Usage of qsub is as follows: qsub [options] script [arguments to script].

echo "echo 'Hello World'; echo 'Goodbye Cruel World' 1>&2; sleep 999999" > ~/test_sge_script.sh
qsub -q mica.q ~/test_sge_script.sh

This script will echo "Hello World" to STDOUT, "Goodbye Cruel World" to STDERR, and then go to sleep. You can observe the progress of your script with qstat. You'll see this job's status as either "qw" (waiting), "r" (running), or "E" (error; if you see this ask a lab member for help). Wait for the job to be running and then delete it by copying its job ID (the first column in qstat) and running qdel JOB_ID.

Customizing qsub

Have a look at your home directory, you should see two new files that start with the name of the job. These are the log files. One will end in "o[JOB_ID]" and the other ends in "e[JOB_ID]". The former should contain the STDOUT, and the latter the STDERR. You can choose to set these log files to another path with the -o and -e arguments e.g.

qsub -q mica.q -o ~/my_new_stdout_path.log \
     -e ~/my_new_stderr_path.log \
     ~/test_sge_script.sh

Note that -e and -o may also refer to the same file. Next, we can also give the job a custom name with the -N parameter e.g.

qsub -q mica.q -N my_custom_name ~/test_sge_script.sh

Sometimes there are dependencies between jobs. We can tell a job told wait until another one finishes using the -hold_jid argument e.g.

qsub -q mica.q -N my_custom_name ~/test_sge_script.sh
qsub -q mica.q -hold_jid my_custom_name \
     -N my_custom_name_2 ~/test_sge_script.sh

The second job will remain in "hqw" (hold, waiting) until the first job finishes or is deleted. Try it!

I REQUIRE MORE POWER!

Lets take a step back from the commands and consider how SGE allocates computational resources. By default, each job is allocated one thread and total_memory/number_of_threads_on_SGE memory. If you use more than one thread, your job will continue but you risk overloading the computer which may slow it down or in extreme cases crash it. If you use more memory than requested, the job is automatically killed. Therefore you always want to come up with a good estimate of your required resources before you submit jobs. Imagine our job needs 20 threads and 4 gigabytes of memory per thread. We could submit this as follows:

qsub -q mica.q -l h_vmem=4G -pe smp 20 ~/test_sge_script.sh

l h_vmem=4G tells SGE to use 4 gigabyte per thread and -pe smp 20 tells SGE to use twenty threads. We can also allocate between five and ten threads with -pe smp 5-10. Inside your job script the environemnt variable NSLOTS will contain the number of threads available.

MOAR POWAH!

So the MICA cluster isn't good enough for you, huh? Well, we also have access to the BIC cluster (all.q). You can submit jobs to this cluster with -q all.q or to both clusters with -q mica.q,all.q. However, the nodes on all.q can only access our network drives (i.e. /data/) but not storage on our computers. So if you run on all.q make sure all your input and output data are on the MICA drives.

Clone this wiki locally