diff --git a/doc/web/images/pycharm.png b/doc/web/images/pycharm.png new file mode 100644 index 0000000..d595105 Binary files /dev/null and b/doc/web/images/pycharm.png differ diff --git a/doc/web/images/rosetta-mapping.png b/doc/web/images/rosetta-mapping.png new file mode 100644 index 0000000..a1150b9 Binary files /dev/null and b/doc/web/images/rosetta-mapping.png differ diff --git a/doc/web/images/slurm-arch.png b/doc/web/images/slurm-arch.png new file mode 100644 index 0000000..0a1f3fc Binary files /dev/null and b/doc/web/images/slurm-arch.png differ diff --git a/doc/web/images/speed-pics.png b/doc/web/images/speed-pics.png new file mode 100644 index 0000000..080ead2 Binary files /dev/null and b/doc/web/images/speed-pics.png differ diff --git a/doc/web/index.html b/doc/web/index.html index b05bcba..ea57c35 100644 --- a/doc/web/index.html +++ b/doc/web/index.html @@ -19,98 +19,119 @@

Speed: The GCS ENCS Cluster

-
Serguei A. Mokhov
Gillian A. Roper
Network, Security and HPC Group -
Gina Cody School of Engineering and Computer Science -
Concordia University -
Montreal, Quebec, Canada -
rt-ex-hpc~AT~encs.concordia.ca

-
Version 6.6 (final GE version)
+
Serguei A. Mokhov
Gillian A. Roper
Carlos Alarcón Meza
Network, Security and HPC Group +
Gina Cody School of Engineering and Computer Science +
Concordia University +
Montreal, Quebec, Canada +
rt-ex-hpc~AT~encs.concordia.ca

+
Version 7.0

The group acknowledges the initial manual version VI produced by Dr. Scott Bunnell while with -us.
+us as well as Dr. Tariq Daradkeh for his instructional support of the users and contribution of +examples.

Abstract

-

This document primarily presents a quick start guide to the usage of the Gina Cody - School of Engineering and Computer Science compute server farm called “Speed” – the - GCS ENCS Speed cluster, managed by HPC/NAG of GCS ENCS, Concordia University, - Montreal, Canada. +

This document presents a quick start guide to the usage of the Gina Cody School of + Engineering and Computer Science compute server farm called “Speed” – the GCS Speed + cluster, managed by the HPC/NAG group of the Academic Information Technology + Services (AITS) at GCS, Concordia University, Montreal, Canada.

Contents

-  1 Introduction -
  1.1 Resources -
  1.2 Team -
  1.3 What Speed Comprises -
  1.4 What Speed Is Ideal For -
  1.5 What Speed Is Not -
  1.6 Available Software -
  1.7 Requesting Access -
 2 Job Management -
  2.1 Getting Started -
   2.1.1 SSH Connections -
   2.1.2 Environment Set Up -
  2.2 Job Submission Basics -
   2.2.1 Directives -
   2.2.2 Module Loads -
   2.2.3 User Scripting -
  2.3 Sample Job Script -
  2.4 Common Job Management Commands Summary -
  2.5 Advanced qsub Options -
  2.6 Array Jobs - - - -
  2.7 Requesting Multiple Cores (i.e., Multithreading Jobs) -
  2.8 Interactive Jobs -
  2.9 Scheduler Environment Variables -
  2.10 SSH Keys For MPI -
  2.11 Creating Virtual Environments -
   2.11.1 Anaconda -
  2.12 Example Job Script: Fluent -
  2.13 Example Job: efficientdet -
  2.14 Java Jobs -
  2.15 Scheduling On The GPU Nodes -
   2.15.1 CUDA -
   2.15.2 Special Notes for sending CUDA jobs to the GPU Queue -
   2.15.3 OpenISS Examples -
  2.16 Singularity Containers -
 3 Conclusion -
  3.1 Important Limitations -
  3.2 Tips/Tricks -
  3.3 Use Cases -
 A History -
  A.1 Acknowledgments -
  A.2 Phase 3 -
  A.3 Phase 2 -
  A.4 Phase 1 -
 B Frequently Asked Questions -
  B.1 Where do I learn about Linux? -
  B.2 How to use the “bash shell” on Speed? -
   B.2.1 How do I set bash as my login shell? -
   B.2.2 How do I move into a bash shell on Speed? -
   B.2.3 How do I run scripts written in bash on Speed? -
  B.3 How to resolve “Disk quota exceeded” errors? -
   B.3.1 Probable Cause -
   B.3.2 Possible Solutions -
   B.3.3 Example of setting working directories for COMSOL -
   B.3.4 Example of setting working directories for Python Modules -
  B.4 How do I check my job’s status? -
  B.5 Why is my job pending when nodes are empty? -
   B.5.1 Disabled nodes -
   B.5.2 Error in job submit request. -
 C Sister Facilities -
 Annotated Bibliography + 1 Introduction +
 1.1 Resources +
 1.2 Team +
 1.3 What Speed Consists of +
 1.4 What Speed Is Ideal For +
 1.5 What Speed Is Not +
 1.6 Available Software +
 1.7 Requesting Access +
2 Job Management +
 2.1 Getting Started +
  2.1.1 SSH Connections +
  2.1.2 Environment Set Up +
 2.2 Job Submission Basics +
  2.2.1 Directives +
  2.2.2 Module Loads +
  2.2.3 User Scripting +
 2.3 Sample Job Script +
 2.4 Common Job Management Commands Summary +
 2.5 Advanced sbatch Options + + + +
 2.6 Array Jobs +
 2.7 Requesting Multiple Cores (i.e., Multithreading Jobs) +
 2.8 Interactive Jobs +
  2.8.1 Command Line +
  2.8.2 Graphical Applications +
 2.9 Scheduler Environment Variables +
 2.10 SSH Keys For MPI +
 2.11 Creating Virtual Environments +
  2.11.1 Anaconda +
  2.11.2 Python +
 2.12 Example Job Script: Fluent +
 2.13 Example Job: efficientdet +
 2.14 Java Jobs +
 2.15 Scheduling On The GPU Nodes +
  2.15.1 CUDA +
  2.15.2 Special Notes for sending CUDA jobs to the GPU Queue +
  2.15.3 OpenISS Examples +
 2.16 Singularity Containers +
3 Conclusion +
 3.1 Important Limitations +
 3.2 Tips/Tricks +
 3.3 Use Cases +
A History +
 A.1 Acknowledgments +
 A.2 Migration from UGE to SLURM +
 A.3 Phases +
  A.3.1 Phase 4 +
  A.3.2 Phase 3 +
  A.3.3 Phase 2 +
  A.3.4 Phase 1 +
B Frequently Asked Questions +
 B.1 Where do I learn about Linux? +
 B.2 How to use the “bash shell” on Speed? +
  B.2.1 How do I set bash as my login shell? +
  B.2.2 How do I move into a bash shell on Speed? +
  B.2.3 How do I use the bash shell in an interactive session on Speed? +
  B.2.4 How do I run scripts written in bash on Speed? +
 B.3 How to resolve “Disk quota exceeded” errors? +
  B.3.1 Probable Cause +
  B.3.2 Possible Solutions +
  B.3.3 Example of setting working directories for COMSOL +
  B.3.4 Example of setting working directories for Python Modules +
 B.4 How do I check my job’s status? +
 B.5 Why is my job pending when nodes are empty? +
  B.5.1 Disabled nodes +
  B.5.2 Error in job submit request. +
C Sister Facilities +
Annotated Bibliography + + +

1 Introduction

-

This document contains basic information required to use “Speed” as well as tips and tricks, -examples, and references to projects and papers that have used Speed. User contributions of sample -jobs and/or references are welcome. Details are sent to the hpc-ml mailing list. -

+

This document contains basic information required to use “Speed” as well as tips and tricks, +examples, and references to projects and papers that have used Speed. User contributions +of sample jobs and/ or references are welcome. Details are sent to the hpc-ml mailing +list. +

Note: On October 20, 2023 with workshops prior, we have completed migration to SLURM (see +Figure 2) from Grid Engine (UGE/AGE) as our job scheduler, so this manual has been ported to use +SLURM’s syntax and commands. If you are a long-time GE user, see Appendix A.2 key highlights of +the move needed to translate your GE jobs to SLURM as well as environment changes. These are also +elaborated throughout this document and our examples as well in case you desire to re-read +it. +

If you wish to cite this work in your acknowledgements, you can use our general DOI found on our +GitHub page https://dx.doi.org/10.5281/zenodo.5683642 or a specific version of the manual and +scripts from that link individually. +

1.1 Resources

-

-

-

1.2 Team

+

+

+

1.2 Team

+

Speed is supported by: +

-

We receive support from the rest of AITS teams, such as NAG, SAG, FIS, and DOG. -

+

  • Carlos Alarcón Meza, Systems Administrator, HPC and Networking, AITS
  • +

    We receive support from the rest of AITS teams, such as NAG, SAG, FIS, and DOG.
    https://www.concordia.ca/ginacody/aits.html +

    -

    1.3 What Speed Comprises

    +

    1.3 What Speed Consists of

    +
  • 4 VIDPRO nodes, with 6 P6 cards, and 6 V100 cards (32GB), and 256GB of RAM. +
  • +
  • 7 new SPEED2 servers with 64 CPU cores each 4x A100 80 GB GPUs, partitioned into + 4x 20GB each; larger local storage for TMPDIR. + + + +
  • +
  • One AMD FirePro S7150 GPU, with 8 GB of memory (compatible with the Direct X, + OpenGL, OpenCL, and Vulkan APIs).
  • +
    + + + + + + + + +

    PIC +

    +
    Figure 1: Speed
    + + + +
    +
    + -

    + + + + +

    PIC

    +
    Figure 2: Speed SLURM Architecture
    + + + +

    1.4 What Speed Is Ideal For

    +
  • CUDA GPU jobs (speed-01|-03|-05, speed-17, speed-37speed-43). -

    +

  • +
  • Non-CUDA GPU jobs using OpenCL (speed-19 and -01|03|05|17|25|27|37-43).
  • +

    1.5 What Speed Is Not

    -

    +

    1.6 Available Software

    -

    We have a great number of open-source software available and installed on Speed – various Python, -CUDA versions, C++/Java compilers, OpenGL, OpenFOAM, OpenCV, TensorFlow, OpenMPI, -OpenISS, MARF [21], etc. There are also a number of commercial packages, subject to -licensing contributions, available, such as MATLAB [1020], Abaqus [1], Ansys, Fluent [2], +

    We have a great number of open-source software available and installed on “Speed” – various +Python, CUDA versions, C++/Java compilers, OpenGL, OpenFOAM, OpenCV, TensorFlow, +OpenMPI, OpenISS, MARF [24], etc. There are also a number of commercial packages, subject to +licensing contributions, available, such as MATLAB [1323], Abaqus [1], Ansys, Fluent [2], etc. -

    To see the packages available, run ls -al /encs/pkg/ on speed.encs. -

    In particular, there are over 2200 programs available in /encs/bin and /encs/pkg under Scientific -Linux 7 (EL7). +

    To see the packages available, run ls -al /encs/pkg/ on speed.encs. In particular, there are +over 2200 programs available in /encs/bin and /encs/pkg under Scientific Linux 7 (EL7). We are +building an equivalent array of programs for the EL9 SPEED2 nodes.

    - +

    +

    +

    2.5 Advanced sbatch Options

    +

    In addition to the basic sbatch options presented earlier, there are a few additional options that are generally useful:

    - - - -

    +

    The many sbatch options available are read with, man sbatch. Also note that sbatch options can +be specified during the job-submission command, and these override existing script options (if +present). The syntax is, sbatch [options] PATHTOSCRIPT, but unlike in the script, the options are +specified without the leading #SBATCH (e.g., sbatch -J sub-test --chdir=./ --mem=1G +./tcsh.sh). +

    2.6 Array Jobs

    -

    Array jobs are those that start a batch job or a parallel job multiple times. Each iteration of the job -array is called a task and receives a unique job ID. -

    To submit an array job, use the t option of the qsub command as follows: +

    Array jobs are those that start a batch job or a parallel job multiple times. Each iteration of the job +array is called a task and receives a unique job ID. Only supported for batch jobs; submit time \(< 1\) +second, compared to repeatedly submitting the same regular job over and over even from a +script. +

    To submit an array job, use the --array option of the sbatch command as follows:

    -
    -qsub -t n[-m[:s]] <batch_script>
    +   
    +sbatch --array n-m[:s]] <batch_script>
     
    -

    -

    -t Option Syntax:

    +

    +

    -t Option Syntax:

    -

    Examples:

    +

    Examples:

    -

    Output files for Array Jobs: -

    The default and output and error-files are job_name.[o|e]job_id and
    job_name.[o|e]job_id.task_id. This means that Speed creates an output and an error-file for each -task generated by the array-job as well as one for the super-ordinate array-job. To alter this behavior -use the -o and -e option of qsub. -

    For more details about Array Job options, please review the manual pages for qsub by executing -the following at the command line on speed-submit man qsub. -

    +

  • sbatch --array=3-15:3 array.sh: submits a jobs with 5 tasks numbered consecutively + with step size 3 (task-ids 3,6,9,12,15).
  • +

    Output files for Array Jobs: +

    The default and output and error-files are slurm-job_id_task_id.out. This means that Speed +creates an output and an error-file for each task generated by the array-job as well as +one for the super-ordinate array-job. To alter this behavior use the -o and -e option of +sbatch. +

    For more details about Array Job options, please review the manual pages for sbatch by executing +the following at the command line on speed-submit man sbatch. + + + +

    2.7 Requesting Multiple Cores (i.e., Multithreading Jobs)

    -

    For jobs that can take advantage of multiple machine cores, up to 32 cores (per job) can be requested +

    For jobs that can take advantage of multiple machine cores, up to 32 cores (per job) can be requested in your script with:

    -
    -#$ -pe smp [#cores]
    +   
    +#SBATCH -n [#cores]
     
    -

    -

    Do not request more cores than you think will be useful, as larger-core jobs -are more difficult to schedule. On the flip side, though, if you are going to be running -a program that scales out to the maximum single-machine core count available, please -(please) request 32 cores, to avoid node oversubscription (i.e., to avoid overloading the +

    +

    Both sbatch and salloc support -n on the command line, and it should always be used either in +the script or on the command line as the default \(n=1\). Do not request more cores than you think +will be useful, as larger-core jobs are more difficult to schedule. On the flip side, though, if you are +going to be running a program that scales out to the maximum single-machine core count available, +please (please) request 32 cores, to avoid node oversubscription (i.e., to avoid overloading the CPUs). -

    Core count associated with a job appears under, “states”, in the, qstat -f -u "*", -output. -

    +

    Important note about --ntasks or --ntasks-per-node (-n) talks about processes (usually the +ones ran with srun). --cpus-per-task (-c) corresponds to threads per process. Some programs +consider them equivalent, some don’t. Fluent for example uses --ntasks-per-node=8 and +--cpus-per-task=1, some just set --cpus-per-task=8 and --ntasks-per-node=1. If one of them is +not \(1\) then some applications need to be told to use \(n*c\) total cores. +

    Core count associated with a job appears under, “AllocCPUS”, in the, qacct -j, output. + + + +

    +
    +[serguei@speed-submit src] % squeue -l
    +Thu Oct 19 20:32:32 2023
    +JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
    + 2652        ps interact   a_user  RUNNING   9:35:18 1-00:00:00      1 speed-07
    +[serguei@speed-submit src] % sacct -j 2652
    +JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
    +------------ ---------- ---------- ---------- ---------- ---------- --------
    +2652         interacti+         ps     speed1         20    RUNNING      0:0
    +2652.intera+ interacti+                speed1         20    RUNNING      0:0
    +2652.extern      extern                speed1         20    RUNNING      0:0
    +2652.0       gydra_pmi+                speed1         20  COMPLETED      0:0
    +2652.1       gydra_pmi+                speed1         20  COMPLETED      0:0
    +2652.2       gydra_pmi+                speed1         20     FAILED      7:0
    +2652.3       gydra_pmi+                speed1         20     FAILED      7:0
    +2652.4       gydra_pmi+                speed1         20  COMPLETED      0:0
    +2652.5       gydra_pmi+                speed1         20  COMPLETED      0:0
    +2652.6       gydra_pmi+                speed1         20  COMPLETED      0:0
    +2652.7       gydra_pmi+                speed1         20  COMPLETED      0:0
    +
    +

    +

    2.8 Interactive Jobs

    -

    Job sessions can be interactive, instead of batch (script) based. Such sessions can be useful for testing -and optimising code and resource requirements prior to batch submission. To request an interactive -job session, use, qlogin [options], similarly to a qsub command-line job (e.g., qlogin -N -qlogin-test -l h_vmem=1G). Note that the options that are available for qsub are not necessarily -available for qlogin, notably, -cwd, and, -v. -

    +

    Job sessions can be interactive, instead of batch (script) based. Such sessions can be useful for testing, +debugging, and optimising code and resource requirements, conda or python virtual environments +setup, or any likewise preparatory work prior to batch submission. +

    +

    +
    2.8.1 Command Line
    +

    To request an interactive job session, use, salloc [options], similarly to a sbatch command-line +job, e.g., + + + +

    +
    +salloc -J interactive-test --mem=1G -p ps -n 8
    +
    +

    Inside the allocated salloc session you can run shell commands as usual; it is recommended to use +srun for the heavy compute steps inside salloc. If it is a quick a short job just to compile something, +e.g., on a GPU node you can use an interactive srun directly (note no srun can run within srun), +e.g., a 1 hour allocation: +

    For tcsh: + + + +

    +
    +srun --pty -n 8 -p pg --gpus=1 -t 60 /encs/bin/tcsh
    +
    +

    +

    For bash: + + + +

    +
    +srun --pty -n 8 -p pg --gpus=1 -t 60 /encs/bin/bash
    +
    +

    +

    +

    +
    2.8.2 Graphical Applications
    +

    If you need to run an on-Speed graphical-based UI application (e.g., MALTLAB, Abaqus CME, etc.), +or an IDE (PyCharm, VSCode, Eclipse) to develop and test your job’s code interactively you need to +enable X11-forwarding from your client machine to speed then to the compute node. To do +so: +

    +

      +
    1. +

      you need to run an X server on your client machine, such as,

      +
        +
      • on Windows: MobaXterm with X turned on, or Xming + PuTTY with X11 + forwarding, or XOrg under Cygwin +
      • +
      • on macOS: XQuarz – use its xterm and ssh -X +
      • +
      • on Linux just use ssh -X speed.encs.concordia.ca
      +

      See https://www.concordia.ca/ginacody/aits/support/faq/xserver.html for + details. +

    2. +
    3. +

      verify your X connection was properly forwarded by printing the DISPLAY variable: +

      echo $DISPLAY If it has no output, then your X forwarding is not on and you may need to + re-login to Speed. +

    4. +
    5. +

      Use the --x11 with salloc or srun: +

      salloc ... --x11=first ... + + + +

    6. +
    7. Once landed on a compute node, verify DISPLAY again. +
    8. +
    9. While running under scheduler, unset XDG_RUNTIME_DIR. +
    10. +
    11. +

      Launch your graphical application: +

      module load the required version, then matlab, or abaqus cme, etc.

    +

    Here’s an example of starting PyCharm, of which we made a sample local installation. You can +make a similar install under your own directory. If using VSCode, it’s currently only supported with +the --no-sandbox option. + + +

    -

    2.9 Scheduler Environment Variables

    -

    The scheduler presents a number of environment variables that can be used in your jobs. Three of the -more useful are TMPDIR, SGE_O_WORKDIR, and NSLOTS: +

    +bash-3.2$ ssh -X speed (XQuartz xterm, PuTTY or MobaXterm have X11 forwarding too)
    +serguei@speed’s password:
    +[serguei@speed-submit ~] % echo $DISPLAY
    +localhost:14.0
    +[serguei@speed-submit ~] % srun -p ps --pty --x11=first --mem 4000 -t 0-06:00 /encs/bin/bash
    +bash-4.4$ echo $DISPLAY
    +localhost:77.0
    +bash-4.4$ hostname
    +speed-01.encs.concordia.ca
    +bash-4.4$ unset XDG_RUNTIME_DIR
    +bash-4.4$ /speed-scratch/nag-public/bin/pycharm.sh
    +
    +

    +

    +
    + + + + + + + + +

    PIC +

    +
    Figure 4: PyCharm Starting up on a Speed Node
    + + + +
    +

    2.9 Scheduler Environment Variables

    +

    The scheduler presents a number of environment variables that can be used in your jobs. You can +invoke env or printenv in your job to know what hose are (most begin with the prefix SLURM). Some +of the more useful ones are:

    -

    In Figure 2 is a sample script, using all three. +

  • $SLURM_JOBID – your current jobs ID, useful for some manipulation and reporting. +
  • +
  • $SLURM_JOB_NODELIST=nodes participating in your job. +
  • +
  • $SLURM_ARRAY_TASK_ID=for array jobs (see Section 2.6). +
  • +
  • +

    See a more complete list here: +

    + +
  • +

    In Figure 5 is a sample script, using some of these.

    - + -
    #!/encs/bin/tcsh 
    - 
    -#$ -N envs 
    -#$ -cwd 
    -#$ -pe smp 8 
    -#$ -l h_vmem=32G 
    - 
    -cd $TMPDIR 
    -mkdir input 
    -rsync -av $SGE_O_WORKDIR/references/ input/ 
    -mkdir results 
    -STAR --inFiles $TMPDIR/input --parallel $NSLOTS --outFiles $TMPDIR/results 
    -rsync -av $TMPDIR/results/ $SGE_O_WORKDIR/processed/
    +
    #!/encs/bin/tcsh 
    + 
    +#SBATCH --job-name=tmpdir      ## Give the job a name 
    +#SBATCH --mail-type=ALL        ## Receive all email type notifications 
    +#SBATCH --mail-user=$USER 
    +#SBATCH --chdir=./             ## Use currect directory as working directory 
    +#SBATCH --nodes=1 
    +#SBATCH --ntasks=1 
    +#SBATCH --cpus-per-task=8      ## Request 8 cores 
    +#SBATCH --mem=32G              ## Assign 32G memory per node 
    + 
    +cd $TMPDIR 
    +mkdir input 
    +rsync -av $SLURM_SUBMIT_DIR/references/ input/ 
    +mkdir results 
    +srun STAR --inFiles $TMPDIR/input --parallel $SRUN_CPUS_PER_TASK --outFiles $TMPDIR/results 
    +rsync -av $TMPDIR/results/ $SLURM_SUBMIT_DIR/processed/
     
    -
    Figure 2: Source code for tmpdir.sh
    +
    Figure 5: Source code for tmpdir.sh
    -

    2.10 SSH Keys For MPI

    -

    Some programs effect their parallel processing via MPI (which is a communication protocol). An +

    2.10 SSH Keys For MPI

    +

    Some programs effect their parallel processing via MPI (which is a communication protocol). An example of such software is Fluent. MPI needs to have ‘passwordless login’ set up, which means SSH keys. In your NFS-mounted home directory:

    @@ -860,118 +1087,154 @@

    2.10
  • Set file permissions of authorized_keys to 600; of your NFS-mounted home to 700 (note that you likely will not have to do anything here, as most people will have those permissions by default).
  • -

    +

    -

    2.11 Creating Virtual Environments

    -

    The following documentation is specific to the Speed HPC Facility at the Gina Cody School of -Engineering and Computer Science. -

    +

    2.11 Creating Virtual Environments

    +

    The following documentation is specific to the Speed HPC Facility at the Gina Cody School of +Engineering and Computer Science. Virtual environments typically instantiated via Conda or Python. +Another option is Singularity detailed in Section 2.16. +

    -
    2.11.1 Anaconda
    -

    To create an anaconda environment in your speed-scratch directory, use the prefix option when -executing conda create. For example, to create an anaconda environment for ai_user, execute the +

    2.11.1 Anaconda
    +

    To create an anaconda environment in your speed-scratch directory, use the prefix option when +executing conda create. For example, to create an anaconda environment for a_user, execute the following at the command line:

    -
    +   
     conda create --prefix /speed-scratch/a_user/myconda
     
    -

    -

    Note: Without the prefix option, the conda create command creates the environment in -texttta_user’s home directory by default. +

    +

    Note: Without the prefix option, the conda create command creates the environment in a_user’s +home directory by default.

    -

    List Environments. +

    List Environments. To view your conda environments, type: conda info --envs

    -
    +   
     # conda environments:
     #
     base                  *  /encs/pkg/anaconda3-2019.07/root
                              /speed-scratch/a_user/myconda
     
    -

    +

    -

    Activate an Environment. +

    Activate an Environment. Activate the environment speedscratcha_usermyconda as follows

    -
    +   
     conda activate /speed-scratch/a_user/myconda
     
    -

    After activating your environment, add pip to your environment by using +

    After activating your environment, add pip to your environment by using

    -
    +   
     conda install pip
     
    -

    This will install pip and pip’s dependencies, including python, into the environment. -

    Important Note: pip (and pip3) are used to install modules from the python distribution while +

    This will install pip and pip’s dependencies, including python, into the environment. +

    Important Note: pip (and pip3) are used to install modules from the python distribution while conda install installs modules from anaconda’s repository. -

    +

    +

    +
    2.11.2 Python
    +

    Setting up a Python virtual environment is fairly straightforward. We have a simple example that use +a Python virtual environment: +

    + +

    -

    2.12 Example Job Script: Fluent

    +

    2.12 Example Job Script: Fluent

    - - - - -
    #!/encs/bin/tcsh 
    - 
    -#$ -N flu10000 
    -#$ -cwd 
    -#$ -m bea 
    -#$ -pe smp 8 
    -#$ -l h_vmem=160G 
    - 
    -module load ansys/19.0/default 
    -cd $TMPDIR 
    - 
    -fluent 3ddp -g -i $SGE_O_WORKDIR/fluentdata/info.jou -sgepe smp > call.txt 
    - 
    -rsync -av $TMPDIR/ $SGE_O_WORKDIR/fluentparallel/
    +
    +                                                                               
    +
    +                                                                               
    +
    #!/encs/bin/tcsh 
    + 
    +#SBATCH --job-name=flu10000    ## Give the job a name 
    +#SBATCH --mail-type=ALL        ## Receive all email type notifications 
    +#SBATCH --mail-user=$USER@encs.concordia.ca 
    +#SBATCH --chdir=./             ## Use currect directory as working directory 
    +#SBATCH --nodes=1              ## Number of nodes to run on 
    +#SBATCH --ntasks-per-node=32   ## Number of cores 
    +#SBATCH --cpus-per-task=1      ## Number of MPI threads 
    +#SBATCH --mem=160G             ## Assign 160G memory per node 
    + 
    +date 
    + 
    +module avail ansys 
    + 
    +module load ansys/19.2/default 
    +cd $TMPDIR 
    + 
    +set FLUENTNODES = "‘scontrol␣show␣hostnames‘" 
    +set FLUENTNODES = ‘echo $FLUENTNODES | tr ’ ’ ’,’‘ 
    + 
    +date 
    + 
    +srun fluent 3ddp \ 
    +        -g -t$SLURM_NTASKS \ 
    +        -g-cnf=$FLUENTNODES \ 
    +        -i $SLURM_SUBMIT_DIR/fluentdata/info.jou > call.txt 
    + 
    +date 
    + 
    +srun rsync -av $TMPDIR/ $SLURM_SUBMIT_DIR/fluentparallel/ 
    + 
    +date
     
    -
    Figure 3: Source code for fluent.sh
    +
    Figure 6: Source code for fluent.sh
    -

    The job script in Figure 3 runs Fluent in parallel over 32 cores. Of note, we have requested e-mail -notifications (-m), are defining the parallel environment for, fluent, with, -sgepe smp (very -important), and are setting $TMPDIR as the in-job location for the “moment” rfile.out file (in-job, -because the last line of the script copies everything from $TMPDIR to a directory in the user’s -NFS-mounted home). Job progress can be monitored by examining the standard-out file (e.g., -flu10000.o249), and/or by examining the “moment” file in /disk/nobackup/<yourjob> (hint: it -starts with your job-ID) on the node running the job. Caveat: take care with journal-file file +

    The job script in Figure 6 runs Fluent in parallel over 32 cores. Of note, we have requested +e-mail notifications (--mail-type), are defining the parallel environment for, fluent, with, +-t$SLURM_NTASKS and -g-cnf=$FLUENTNODES (very important), and are setting $TMPDIR as +the in-job location for the “moment” rfile.out file (in-job, because the last line of the +script copies everything from $TMPDIR to a directory in the user’s NFS-mounted home). +Job progress can be monitored by examining the standard-out file (e.g., slurm-249.out), +and/or by examining the “moment” file in /disk/nobackup/<yourjob> (hint: it starts +with your job-ID) on the node running the job. Caveat: take care with journal-file file paths.

    -

    2.13 Example Job: efficientdet

    -

    The following steps describing how to create an efficientdet environment on Speed, were submitted by +

    2.13 Example Job: efficientdet

    +

    The following steps describing how to create an efficientdet environment on Speed, were submitted by a member of Dr. Amer’s research group.

      -
    • Enter your ENCS user account’s speed-scratch directory - cd /speed-scratch/<encs_username> +
    • Enter your ENCS user account’s speed-scratch directory
      cd /speed-scratch/<encs_username>
    • -
    • load python module load python/3.8.3 - create virtual environment python3 -m venv <env_name> activate virtual environment - source <env_name>/bin/activate.csh install DL packages for Efficientdet
    +
  • +

    Next

    +
      +
    • load python module load python/3.8.3 +
    • +
    • create virtual environment python3 -m venv <env_name> +
    • +
    • activate virtual environment source <env_name>/bin/activate.csh +
    • +
    • install DL packages for Efficientdet
    +
  • -
    +
     pip install tensorflow==2.7.0
     pip install lxml>=4.6.1
     pip install absl-py>=0.10.0
    @@ -987,197 +1250,178 @@ 

    -

    -

    +

    +

    -

    2.14 Java Jobs

    -

    Jobs that call java have a memory overhead, which needs to be taken into account when assigning a -value to h_vmem. Even the most basic java call, java -Xmx1G -version, will need to have, -l -h_vmem=5G, with the 4-GB difference representing the memory overhead. Note that this memory +

    2.14 Java Jobs

    +

    Jobs that call java have a memory overhead, which needs to be taken into account when assigning a +value to --mem. Even the most basic java call, java -Xmx1G -version, will need to have, +--mem=5G, with the 4-GB difference representing the memory overhead. Note that this memory overhead grows proportionally with the value of -Xmx. To give you an idea, when -Xmx has a -value of 100G, h_vmem has to be at least 106G; for 200G, at least 211G; for 300G, at least +value of 100G, --mem has to be at least 106G; for 200G, at least 211G; for 300G, at least 314G. -

    +

    -

    2.15 Scheduling On The GPU Nodes

    -

    The primary cluster has two GPU nodes, each with six Tesla (CUDA-compatible) P6 cards: each card +

    2.15 Scheduling On The GPU Nodes

    +

    The primary cluster has two GPU nodes, each with six Tesla (CUDA-compatible) P6 cards: each card has 2048 cores and 16GB of RAM. Though note that the P6 is mainly a single-precision card, so unless you need the GPU double precision, double-precision calculations will be faster on a CPU node. -

    Job scripts for the GPU queue differ in that they do not need these statements: - - - -

    -
    -#$ -pe smp <threadcount>
    -#$ -l h_vmem=<memory>G
    -
    -

    -

    But do need this statement, which attaches either a single GPU, or, two GPUs, to the -job: +

    Job scripts for the GPU queue differ in that they need this statement, which attaches either a +single GPU, or, two GPUs, to the job:

    -
    -#$ -l gpu=[1|2]
    +   
    +#SBATCH --gpus=[1|2]
     
    -

    -

    Single-GPU jobs are granted 5 CPU cores and 80GB of system memory, and dual-GPU jobs are -granted 10 CPU cores and 160GB of system memory. A total of four GPUs can be actively attached -to any one user at any given time. -

    Once that your job script is ready, you can submit it to the GPU queue with: +

    +

    Once that your job script is ready, you can submit it to the GPU partition (queue) +with:

    -
    -qsub -q g.q ./<myscript>.sh
    +   
    +sbatch -p pg ./<myscript>.sh
     
    -

    -

    And you can query nvidia-smi on the node that is running your job with: +

    +

    And you can query nvidia-smi on the node that is running your job with:

    -
    -ssh <username>@speed[-05|-17] nvidia-smi
    +   
    +ssh <username>@speed[-05|-17|37-43] nvidia-smi
     
    -

    -

    Status of the GPU queue can be queried with: +

    +

    Status of the GPU queue can be queried with:

    -
    -qstat -f -u "*" -q g.q
    +   
    +sinfo -p pg --long --Node
     
    -

    -

    Very important note regarding TensorFlow and PyTorch: if you are planning to run TensorFlow -and/or PyTorch multi-GPU jobs, do not use the tf.distribute and/or
    torch.nn.DataParallel functions, as they will crash the compute node (100% certainty). This -appears to be the current hardware’s architecture’s defect. The workaround is to either manually -effect GPU parallelisation (TensorFlow has an example on how to do this), or to run on a single -GPU. -

    Important -

    Users without permission to use the GPU nodes can submit jobs to the g.q queue but those jobs -will hang and never run. -

    There are two GPUs in both speed-05 and speed-17, and one in speed-19. Their availability is -seen with, qstat -F g (note the capital): +

    +

    Very important note regarding TensorFlow and PyTorch: if you are planning to run TensorFlow +and/or PyTorch multi-GPU jobs, do not use the tf.distribute and/or
    torch.nn.DataParallel functions on speed-01,05,17, as they will crash the compute node (100% +certainty). This appears to be the current hardware’s architecture’s defect. The workaround is to +either manually effect GPU parallelisation (TensorFlow has an example on how to do this), or to run +on a single GPU. +

    Important +

    Users without permission to use the GPU nodes can submit jobs to the pg partition, but those +jobs will hang and never run. Their availability is seen with:

    -
    -queuename                      qtype resv/used/tot. load_avg arch          states
    ----------------------------------------------------------------------------------
    -...
    ----------------------------------------------------------------------------------
    -g.q@speed-05.encs.concordia.ca BIP   0/0/32         0.04     lx-amd64
    -        hc:gpu=6
    ----------------------------------------------------------------------------------
    -g.q@speed-17.encs.concordia.ca BIP   0/0/32         0.01     lx-amd64
    -        hc:gpu=6
    ----------------------------------------------------------------------------------
    -...
    ----------------------------------------------------------------------------------
    -s.q@speed-19.encs.concordia.ca BIP   0/32/32        32.37    lx-amd64
    -        hc:gpu=1
    ----------------------------------------------------------------------------------
    -etc.
    +   
    +[serguei@speed-submit src] % sinfo -p pg --long --Node
    +Thu Oct 19 22:31:04 2023
    +NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
    +speed-05       1        pg        idle 32     2:16:1 515490        0      1    gpu16 none
    +speed-17       1        pg     drained 32     2:16:1 515490        0      1    gpu16 UGE
    +speed-25       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-27       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
    +[serguei@speed-submit src] % sinfo -p pt --long --Node
    +Thu Oct 19 22:32:39 2023
    +NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
    +speed-37       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-38       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-39       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-40       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-41       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-42       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-43       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
     
    -

    -

    This status demonstrates that all five are available (i.e., have not been requested as resources). To -specifically request a GPU node, add, -l g=[#GPUs], to your qsub (statement/script) or qlogin -(statement) request. For example, qsub -l h_vmem=1G -l g=1 ./count.sh. You will see that this -job has been assigned to one of the GPU nodes: +

    +

    This status demonstrates that most are available (i.e., have not been requested as resources). To specifically request a +GPU node, add, --gpus=[#GPUs], to your sbatch (statement/script) or salloc (statement) request. For example, +sbatch -t 10 --mem=1G --gpus=1 -p pg ./tcsh.sh. You will see that this job has been assigned to one of the GPU +nodes.

    -
    -queuename                      qtype resv/used/tot. load_avg arch          states
    ----------------------------------------------------------------------------------
    -g.q@speed-05.encs.concordia.ca BIP 0/0/32 0.01 lx-amd64  hc:gpu=6
    ----------------------------------------------------------------------------------
    -g.q@speed-17.encs.concordia.ca BIP 0/0/32 0.01 lx-amd64  hc:gpu=6
    ----------------------------------------------------------------------------------
    -s.q@speed-19.encs.concordia.ca BIP 0/1/32 0.04 lx-amd64  hc:gpu=0 (haff=1.000000)
    -       538 100.00000 count.sh   sbunnell     r     03/07/2019 02:39:39     1
    ----------------------------------------------------------------------------------
    -etc.
    +   
    +[serguei@speed-submit src] % squeue -p pg -o "%15N %.6D %7P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20G %20E"
    +NODELIST         NODES PARTITI       STATE MIN_    S:C:T MIN_ME MIN_TMP_  WCKEY FEATURES GROUP DEPENDENCY
    +speed-05             1 pg          RUNNING    1    *:*:*     1G        0 (null)   (null) 11929     (null)
    +[serguei@speed-submit src] % sinfo -p pg -o "%15N %.6D %7P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20G %20E"
    +NODELIST         NODES PARTITI       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE GRES      REASON
    +speed-17             1 pg          drained   32   2:16:1 515490        0      1    gpu16 gpu:6        UGE
    +speed-05             1 pg            mixed   32   2:16:1 515490        0      1    gpu16 gpu:6       none
    +speed-[25,27]        2 pg             idle   32   2:16:1 257458        0      1    gpu32 gpu:2       none
     
    -

    -

    And that there are no more GPUs available on that node (hc:gpu=0). Note that no more than two -GPUs can be requested for any one job. -

    +

    +

    -
    2.15.1 CUDA
    -

    When calling CUDA within job scripts, it is important to create a link to the desired CUDA libraries and +

    2.15.1 CUDA
    +

    When calling CUDA within job scripts, it is important to create a link to the desired CUDA libraries and set the runtime link path to the same libraries. For example, to use the cuda-11.5 libraries, specify -the following in your Makefile. +the following in your Makefile.

    -
    +   
     -L/encs/pkg/cuda-11.5/root/lib64 -Wl,-rpath,/encs/pkg/cuda-11.5/root/lib64
     
    -

    -

    In your job script, specify the version of gcc to use prior to calling cuda. For example: module +

    +

    In your job script, specify the version of gcc to use prior to calling cuda. For example: module load gcc/8.4 or module load gcc/9.3 -

    +

    -
    2.15.2 Special Notes for sending CUDA jobs to the GPU Queue
    -

    It is not possible to create a qlogin session on to a node in the GPU Queue (g.q). As direct logins -to these nodes is not available, jobs must be submitted to the GPU Queue in order to compile and -link. -

    We have several versions of CUDA installed in: +

    2.15.2 Special Notes for sending CUDA jobs to the GPU Queue
    +

    Interactive jobs (Section 2.8) must be submitted to the GPU partition in order to compile and link. +We have several versions of CUDA installed in:

    -
    +   
     /encs/pkg/cuda-11.5/root/
     /encs/pkg/cuda-10.2/root/
     /encs/pkg/cuda-9.2/root
     
    -

    -

    For CUDA to compile properly for the GPU queue, edit your Makefile replacing usrlocalcuda -with one of the above. -

    +

    +

    For CUDA to compile properly for the GPU partition, edit your Makefile replacing +usrlocalcuda with one of the above. +

    -
    2.15.3 OpenISS Examples
    -

    These represent more comprehensive research-like examples of jobs for computer vision and other +

    2.15.3 OpenISS Examples
    +

    These represent more comprehensive research-like examples of jobs for computer vision and other tasks with a lot longer runtime (a subject to the number of epochs and other parameters) derive from the actual research works of students and their theses. These jobs require the use of CUDA and GPUs. These examples are available as “native” jobs on Speed and as Singularity containers.

    -

    OpenISS and REID - +

    OpenISS and REID + The example openiss-reid-speed.sh illustrates a job for a computer-vision based person re-identification (e.g., motion capture-based tracking for stage performance) part of the OpenISS -project by Haotao Lai [7] using TensorFlow and Keras. The fork of the original repo [9] adjusted to +project by Haotao Lai [10] using TensorFlow and Keras. The fork of the original repo [12] adjusted to to run on Speed is here:

    -

    and its detailed description on how to run it on Speed is in the README: +

    and its detailed description on how to run it on Speed is in the README:

    -

    OpenISS and YOLOv3 - - The related code using YOLOv3 framework is in the the fork of the original repo [8] adjusted to +

    OpenISS and YOLOv3 + + The related code using YOLOv3 framework is in the the fork of the original repo [11] adjusted to to run on Speed is here:

    -

    Its example job scripts can run on both CPUs and GPUs, as well as interactively using +

    Its example job scripts can run on both CPUs and GPUs, as well as interactively using TensorFlow:

    -

    The detailed description on how to run these on Speed is in the README at: +

    The detailed description on how to run these on Speed is in the README at:

    -

    +

    -

    2.16 Singularity Containers

    -

    If the /encs software tree does not have a required software instantaneously available, another option +

    2.16 Singularity Containers

    +

    If the /encs software tree does not have a required software instantaneously available, another option is to run Singularity containers. We run EL7 flavor of Linux, and if some projects require Ubuntu or other distributions, there is a possibility to run that software as a container, including the ones translated from Docker. -

    The example lambdal-singularity.sh showcases an immediate use of a container built for the +

    The example lambdal-singularity.sh showcases an immediate use of a container built for the Ubuntu-based LambdaLabs software stack, originally built as a Docker image then pulled in as a Singularity container that is immediately available for use as that job example illustrates. The source material used for the docker image was our fork of their official repo: @@ -1205,26 +1449,26 @@

    2 -

    NOTE: It is important if you make your own containers or pull from DockerHub, use your +

    NOTE: It is important if you make your own containers or pull from DockerHub, use your /speed-scratch/$USER directory as these images may easily consume gigs of space in your home directory and you’d run out of quota there very fast. -

    TIP: To check for your quota, and the corresponding commands to find big files, see: +

    TIP: To check for your quota, and the corresponding commands to find big files, see: https://www.concordia.ca/ginacody/aits/encs-data-storage.html -

    We likewise built equivalent OpenISS (Section 2.15.3) containers from their Docker -counter parts as they were used for teaching and research [11]. The images from +

    We likewise built equivalent OpenISS (Section 2.15.3) containers from their Docker +counter parts as they were used for teaching and research [14]. The images from https://github.com/NAG-DevOps/openiss-dockerfiles and their DockerHub equivalents https://hub.docker.com/u/openiss are found in the same public directory on /speed-scratch/nag-public as the LambdaLabs Singularity image. They all have .sif extension. Some of them can be ran in both batch or interactive mode, some make more sense to run interactively. They cover some basics with CUDA, OpenGL rendering, and computer vision tasks as examples from the OpenISS library and other libraries, including the base -images that use diffrent distros. We also include Jupyter notebook example with Conda +images that use different distros. We also include Jupyter notebook example with Conda support.

    -
    +   
     /speed-scratch/nag-public:
     
     openiss-cuda-conda-jupyter.sif
    @@ -1235,14 +1479,14 @@ 

    2 openiss-reid.sif openiss-xeyes.sif

    -

    -

    The currently recommended version of Singularity is singularity/3.10.4/default. -

    This section comprises an introduction to working with Singularity, its containers, and what can +

    +

    The currently recommended version of Singularity is singularity/3.10.4/default. +

    This section comprises an introduction to working with Singularity, its containers, and what can and cannot be done with Singularity on the ENCS infrastructure. It is not intended to be an exhaustive presentation of Singularity: the program’s authors do a good job of that here: https://www.sylabs.io/docs/. It also assumes that you have successfully installed Singularity on a user-managed/personal system (see next paragraph as to why). -

    Singularity containers are essentially either built from an existing container, or are built from +

    Singularity containers are essentially either built from an existing container, or are built from scratch. Building from scratch requires a recipe file (think of like a Dockerfile), and the operation must be effected as root. You will not have root on the ENCS infrastructure, so any built-from-scratch containers must be created on a user-managed/personal system. Root-level permissions are also @@ -1253,127 +1497,122 @@

    2 containers are essentially a directory in an existing read-write space, and squashfs containers are a read-only compressed “file”. Note that file-system containers cannot be resized once built. -

    Note that the default build is a squashfs one. Also note what Singularity’s authors have to say +

    Note that the default build is a squashfs one. Also note what Singularity’s authors have to say about the builds, “A common workflow is to use the “sandbox” mode for development of the container, and then build it as a default (squashfs) Singularity image when done.” File-system containers are considered to be, “legacy”, at this point in time. When built, a very small overhead is allotted to a file-system container (think, MB), and that cannot be changed. -

    Probably for the most of your workflows you might find there is a Docker container exists for your +

    Probably for the most of your workflows you might find there is a Docker container exists for your tasks, in this case you can use the docker pull function of Singularity as a part of you virtual environment setup as an interactive job allocation:

    -
    -qlogin
    +   
    +salloc --gpus=1 -n8 -t60
     cd /speed-scratch/$USER/
     singularity pull openiss-cuda-devicequery.sif docker://openiss/openiss-cuda-devicequery
     INFO:    Converting OCI blobs to SIF format
     INFO:    Starting build...
     
    -

    -

    This method can be used for converting Docker containers directly on Speed. On GPU nodes make +

    +

    This method can be used for converting Docker containers directly on Speed. On GPU nodes make sure to pass on the --nv flag to Singularity, so its containers could access the GPUs. See the linked example. -

    +

    -

    3 Conclusion

    -

    The cluster is, “first come, first served”, until it fills, and then job position in the queue is +

    3 Conclusion

    +

    The cluster is, “first come, first served”, until it fills, and then job position in the queue is based upon past usage. The scheduler does attempt to fill gaps, though, so sometimes a single-core job of lower priority will schedule before a multi-core job of higher priority, for example. -

    +

    -

    3.1 Important Limitations

    +

    3.1 Important Limitations

    • New users are restricted to a total of 32 cores: write to rt-ex-hpc@encs.concordia.ca - if you need more temporarily (256 is the maximum possible, or, 8 jobs of 32 cores each). + if you need more temporarily (192 is the maximum, or, 6 jobs of 32 cores each).
    • -
    • Job sessions are a maximum of one week in length (only 24 hours, though, for interactive - jobs). +
    • Batch job sessions are a maximum of one week in length (only 24 hours, though, for + interactive jobs, see Section 2.8).
    • -

      Scripts can live in your NFS-provided home, but any substantial data need to be in your +

      Scripts can live in your NFS-provided home, but any substantial data need to be in your cluster-specific directory (located at /speed-scratch/<ENCSusername>/). -

      NFS is great for acute activity, but is not ideal for chronic activity. Any data that a - job will read more than once should be copied at the start to the scratch disk of a - compute node using $TMPDIR (and, perhaps, $SGE_O_WORKDIR), any intermediary job data - should be produced in $TMPDIR, and once a job is near to finishing, those data should +

      NFS is great for acute activity, but is not ideal for chronic activity. Any data that a job will + read more than once should be copied at the start to the scratch disk of a compute node + using $TMPDIR (and, perhaps, $SLURM_SUBMIT_DIR), any intermediary job data should be + produced in $TMPDIR, and once a job is near to finishing, those data should be copied - be copied to your NFS-mounted home (or other NFS-mounted space) from $TMPDIR (to, - perhaps, $SGE_O_WORKDIR). In other words, IO-intensive operations should be effected - locally whenever possible, saving network activity for the start and end of jobs. + to your NFS-mounted home (or other NFS-mounted space) from $TMPDIR (to, perhaps, + $SLURM_SUBMIT_DIR). In other words, IO-intensive operations should be effected locally + whenever possible, saving network activity for the start and end of jobs.

    • Your current resource allocation is based upon past usage, which is an amalgamation of approximately one week’s worth of past wallclock (i.e., time spent on the node(s)) and - CPU activity (on the node(s)). + compute activity (on the node(s)).
    • Jobs should NEVER be run outside of the province of the scheduler. Repeat offenders risk loss of cluster access.

    -

    3.2 Tips/Tricks

    +

    3.2 Tips/Tricks

      -
    • Files/scripts must have Linux line breaks in them (not Windows ones). +
    • Files/scripts must have Linux line breaks in them (not Windows ones). Use file command + to verify; and dos2unix command to convert.
    • -
    • Use rsync, not scp, when moving data around. +
    • Use rsync, not scp, when moving a lot of data around.
    • If you are going to move many many files between NFS-mounted storage and the cluster, tar everything up first.
    • -
    • If you intend to use a different shell (e.g., bash [19]), you will need to source a different - scheduler file, and will need to change the shell declaration in your script(s). +
    • If you intend to use a different shell (e.g., bash [22]), you will need to change the shell + declaration in your script(s).
    • -
    • The load displayed in qstat by default is np_load, which is load/#cores. That means - that a load of, “1”, which represents a fully active core, is displayed as \(0.03\) on the node in - question, as there are 32 cores on a node. To display load “as is” (such that a node with - a fully active core displays a load of approximately \(1.00\)), add the following to your .tcshrc - file: setenv SGE_LOAD_AVG load_avg +
    • Try to request resources that closely match what your job will use: requesting + many more cores or much more memory than will be needed makes a job + more difficult to schedule when resources are scarce. -
    • -
    • Try to request resources that closely match what your job will use: requesting many more - cores or much more memory than will be needed makes a job more difficult to schedule - when resources are scarce.
    • E-mail, rt-ex-hpc AT encs.concordia.ca, with any concerns/questions.
    -

    +

    -

    3.3 Use Cases

    +

    3.3 Use Cases

    • -

      HPC Committee’s initial batch about 6 students (end of 2019):

      +

      HPC Committee’s initial batch about 6 students (end of 2019):

      • 10000 iterations job in Fluent finished in \(<26\) hours vs. 46 hours in Calcul Quebec
    • -

      NAG’s MAC spoofer analyzer [1514], such as https://github.com/smokhov/atsm/tree/master/examples/flucid +

      NAG’s MAC spoofer analyzer [1817], such as https://github.com/smokhov/atsm/tree/master/examples/flucid

      • compilation of forensic computing reasoning cases about false or true positives of hardware address spoofing in the labs
    • -

      S4 LAB/GIPSY R&D Group’s:

      +

      S4 LAB/GIPSY R&D Group’s:

      • MARFCAT and MARFPCAT (OSS signal processing and machine learning tools for - vulnerable and weak code analysis and network packet capture analysis) [17123] + vulnerable and weak code analysis and network packet capture analysis) [20156]
      • Web service data conversion and analysis - - -
      • -
      • Forensic Lucid encoders (translation of large log data into Forensic Lucid [13] for +
      • Forensic Lucid encoders (translation of large log data into Forensic Lucid [16] for forensic analysis)
      • Genomic alignment exercises
      + + +
    • Serguei Mokhov, Jonathan Llewellyn, Carlos Alarcon Meza, Tariq Daradkeh, and Gillian Roper. The use of containers in OpenGL, ML and HPC for teaching and research support. In @@ -1389,57 +1628,150 @@

      3.3 tracking. In 34th British Machine Vision Conference (BMVC), Aberdeen, UK, November 2023. https://arxiv.org/abs/2309.05829 and https://github.com/goutamyg/MVT

    • +
    • Belkacem Belabes and Marius Paraschivoiu. CFD modeling of vertical-axis wind turbine wake + interaction. Transactions of the Canadian Society for Mechanical Engineering, pages 1–10, 2023. + https://doi.org/10.1139/tcsme-2022-0149 +
    • +
    • Belkacem Belabes and Marius Paraschivoiu. CFD study of the aerodynamic performance of a + vertical axis wind turbine in the wake of another turbine. In Proceedings of the CSME + International Congress, 2022. https://doi.org/10.7939/r3-rker-1746 +
    • +
    • Belkacem Belabes and Marius Paraschivoiu. Numerical study of the effect of turbulence intensity on + VAWT performance. Energy, 233:121139, 2021. https://doi.org/10.1016/j.energy.2021.121139 +
    • Parna Niksirat, Adriana Daca, and Krzysztof Skonieczny. The effects of reduced-gravity on planetary rover mobility. International Journal of Robotics Research, 39(7):797–811, 2020. https://doi.org/10.1177/0278364920913945
    • -

      The work “Haotao Lai. An OpenISS framework specialization for deep learning-based +

      The work “Haotao Lai. An OpenISS framework specialization for deep learning-based person re-identification. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, August 2019. https://spectrum.library.concordia.ca/id/eprint/985788/” using TensorFlow and Keras on OpenISS adjusted to run on Speed based on the repositories:

      - -

      +

      -

      A History

      -

      +

      A History

      +

      -

      A.1 Acknowledgments

      +

      A.1 Acknowledgments

      • The first 6 (to 6.5) versions of this manual and early UGE job script samples, Singularity testing and user support were produced/done by Dr. Scott Bunnell during his time at Concordia as a part of the NAG/HPC group. We thank him for his contributions.
      • -
      • The HTML version with devcontainer support was contributed by Anh H Nguyen.
      -

      +

    • The HTML version with devcontainer support was contributed by Anh H Nguyen. +
    • +
    • Dr. Tariq Daradkeh, was our IT Instructional Specialist August 2022 to September + 2023; working on the scheduler, scheduling research, end user support, and integration + of examples, such as YOLOv3 in Section 2.15.3.0 other tasks. We have a continued + collaboration on HPC/scheduling research.
    + + + +

    +

    +

    A.2 Migration from UGE to SLURM

    +

    For long term users who started off with Grid Engine here are some resources to make a transition +and mapping to the job submission process. +

    +
      +
    • +

      Queues are called “partitions” in SLURM. Our mapping from the GE queues to SLURM + partitions is as follows: + + + +

      +
      +     GE  => SLURM
      +     s.q    ps
      +     g.q    pg
      +     a.q    pa
      +
      +

      We also have a new partition pt that covers SPEED2 nodes, which previously did not + exist. +

    • +
    • +

      Commands and command options mappings are found in Figure 7 from
      https://slurm.schedmd.com/rosetta.pdf
      https://slurm.schedmd.com/pdfs/summary.pdf
      Other related helpful resources from similar organizations who either used SLURM for awhile or + also transitioned to it:
      https://docs.alliancecan.ca/wiki/Running_jobs
      https://www.depts.ttu.edu/hpcc/userguides/general_guides/Conversion_Table_1.pdf
      https://docs.mpcdf.mpg.de/doc/computing/clusters/aux/migration-from-sge-to-slurm +

      +
      + PIC +
      Figure 7: Rosetta Mappings of Scheduler Commands from SchedMD
      +
      +
    • +
    • +

      NOTE: If you have used UGE commands in the past you probably still have these lines there; + they should now be removed, as they have no use in SLURM and will start giving + “command not found” errors on login when the software is removed: +

      csh/tcsh: Sample .tcshrc file: + + + +

      +
      +     # Speed environment set up
      +     if ($HOSTNAME == speed-submit.encs.concordia.ca) then
      +        source /local/pkg/uge-8.6.3/root/default/common/settings.csh
      +     endif
      +
      +

      +

      Bourne shell/bash: Sample .bashrc file: + + + +

      +
      +     # Speed environment set up
      +     if [ $HOSTNAME = "speed-submit.encs.concordia.ca" ]; then
      +         . /local/pkg/uge-8.6.3/root/default/common/settings.sh
      +         printenv ORGANIZATION | grep -qw ENCS || . /encs/Share/bash/profile
      +     fi
      +
      +

      +

      Note that you will need to either log out and back in, or execute a new shell, for the + environment changes in the updated .tcshrc or .bashrc file to be applied (important). +

      +
    +

    -

    A.2 Phase 3

    -

    Phase 3 had 4 vidpro nodes added from Dr. Amer totalling 6x P6 and 6x V100 GPUs +

    A.3 Phases

    +

    Brief summary of Speed evolution phases. +

    +

    +
    A.3.1 Phase 4
    +

    Phase 4 had 7 SuperMicro servers with 4x A100 80GB GPUs each added, dubbed as “SPEED2”. We +also moved from Grid Engine to SLURM. +

    +

    +
    A.3.2 Phase 3
    +

    Phase 3 had 4 vidpro nodes added from Dr. Amer totalling 6x P6 and 6x V100 GPUs added. -

    +

    -

    A.3 Phase 2

    -

    Phase 2 saw 6x NVIDIA Tesla P6 added and 8x more compute nodes. The P6s replaced 4x of FirePro +

    A.3.3 Phase 2
    +

    Phase 2 saw 6x NVIDIA Tesla P6 added and 8x more compute nodes. The P6s replaced 4x of FirePro S7150. -

    +

    -

    A.4 Phase 1

    -

    Phase 1 of Speed was of the following configuration: +

    A.3.4 Phase 1
    +

    Phase 1 of Speed was of the following configuration:

    • Sixteen, 32-core nodes, each with 512 GB of memory and approximately 1 TB of @@ -1449,21 +1781,21 @@

      A.4

    -

    B Frequently Asked Questions

    +

    B Frequently Asked Questions

    -

    B.1 Where do I learn about Linux?

    -

    All Speed users are expected to have a basic understanding of Linux and its commonly used +

    B.1 Where do I learn about Linux?

    +

    All Speed users are expected to have a basic understanding of Linux and its commonly used commands. -

    +

    -
    Software Carpentry
    -

    Software Carpentry provides free resources to learn software, including a workshop on the Unix shell. +

    Software Carpentry
    +

    Software Carpentry provides free resources to learn software, including a workshop on the Unix shell. https://software-carpentry.org/lessons/ -

    +

    -
    Udemy
    -

    There are a number of Udemy courses, including free ones, that will assist you in learning Linux. +

    Udemy
    +

    There are a number of Udemy courses, including free ones, that will assist you in learning Linux. Active Concordia faculty, staff and students have access to Udemy courses such as Linux Mastery: Master the Linux Command Line in 11.5 Hours is a good starting point for beginners. Visit https://www.concordia.ca/it/services/udemy.html to learn how Concordians may access @@ -1471,229 +1803,270 @@

    Udemy
    -

    +

    -

    B.2 How to use the “bash shell” on Speed?

    -

    This section describes how to use the “bash shell” on Speed. Review Section 2.1.2 to ensure that your +

    B.2 How to use the “bash shell” on Speed?

    +

    This section describes how to use the “bash shell” on Speed. Review Section 2.1.2 to ensure that your bash environment is set up. -

    +

    -
    B.2.1 How do I set bash as my login shell?
    -

    In order to set your login shell to bash on Speed, your login shell on all GCS servers must be changed -to bash. To make this change, create a ticket with the Service Desk (or email help at concordia.ca) to -request that bash become your default login shell for your ENCS user account on all GCS +

    B.2.1 How do I set bash as my login shell?
    +

    In order to set your login shell to bash on Speed, your login shell on all GCS servers must be changed +to bash. To make this change, create a ticket with the Service Desk (or email help at concordia.ca) +to request that bash become your default login shell for your ENCS user account on all GCS servers. -

    +

    -
    B.2.2 How do I move into a bash shell on Speed?
    -

    To move to the bash shell, type bash at the command prompt. For example: +

    B.2.2 How do I move into a bash shell on Speed?
    +

    To move to the bash shell, type bash at the command prompt. For example:

    -
    +   
     [speed-submit] [/home/a/a_user] > bash
     bash-4.4$ echo $0
     bash
     
    -

    -

    Note how the command prompt changed from [speed-submit] [/home/a/a_user] > to +

    +

    Note how the command prompt changed from [speed-submit] [/home/a/a_user] > to bash-4.4$ after entering the bash shell. -

    +

    -
    B.2.3 How do I run scripts written in bash on Speed?
    -

    To execute bash scripts on Speed: -

      -
    1. Ensure that the shebang of your bash job script is #/encs/bin/bash! -
    2. -
    3. Use the qsub command to submit your job script to the scheduler.
    -

    The Speed GitHub contains a sample bash job script. -

    +

    B.2.3 How do I use the bash shell in an interactive session on Speed?
    +

    If you use one of the below commands (make sure job request settings such as memory, cores, etc are +set), they will allocate your interactive job sessions with bash as a shell on the compute +nodes:

    -

    B.3 How to resolve “Disk quota exceeded” errors?

    -

    +

      +
    • salloc ... /encs/bin/bash +
    • +
    • srun ... --pty /encs/bin/bash
    +

    -
    B.3.1 Probable Cause
    -

    The ‘‘Disk quota exceeded’’ Error occurs when your application has run out of disk space to write -to. On Speed this error can be returned when: +

    B.2.4 How do I run scripts written in bash on Speed?
    +

    To execute bash scripts on Speed:

      -
    1. The /tmp directory on the speed node your application is running on is full and cannot - be written to. +
    2. Ensure that the shebang of your bash job script is #!/encs/bin/bash +
    3. +
    4. Use the sbatch command to submit your job script to the scheduler.
    +

    The Speed GitHub contains a sample bash job script. +

    +

    +

    B.3 How to resolve “Disk quota exceeded” errors?

    +

    +

    +
    B.3.1 Probable Cause
    +

    The “Disk quota exceeded” Error occurs when your application has run out of disk space to write +to. On Speed this error can be returned when: +

    +

      +
    1. Your NFS-provided home is full and cannot be written to. You can verify this using quota + and bigfiles commands.
    2. -
    3. Your NFS-provided home is full and cannot be written to.
    -

    +

  • The /tmp directory on the speed node your application is running on is full and cannot + be written to.
  • +

    -
    B.3.2 Possible Solutions
    -

    +

    B.3.2 Possible Solutions
    +

      -
    1. Use the -cwd job script option to set the directory that the job script is submitted from - the job working directory. The job working directory is the directory that the job - will write output files in. +
    2. Use the --chdir job script option to set the directory that the job script is submitted + from the job working directory. The job working directory is the directory that + the job will write output files in.
    3. -
    4. -

      The use local disk space is generally recommended for IO intensive operations. However, as the - size of /tmp on speed nodes is 1GB it can be necessary for scripts to store temporary data +

    5. +

      The use local disk space is generally recommended for IO intensive operations. However, as the + size of /tmp on speed nodes is 1TB it can be necessary for scripts to store temporary data elsewhere. Review the documentation for each module called within your script to determine how to set working directories for that application. The basic steps for this solution are:

      + + +
      • Review the documentation on how to set working directories for each module called by the job script.
      • -

        Create a working directory in speed-scratch for output files. For example, this +

        Create a working directory in speed-scratch for output files. For example, this command will create a subdirectory called output in your speed-scratch directory:

        -
        +         
                  mkdir -m 750 /speed-scratch/$USER/output
                   
         
        -

        +

      • -

        To create a subdirectory for recovery files: +

        To create a subdirectory for recovery files:

        -
        +         
                  mkdir -m 750 /speed-scratch/$USER/recovery
         
        -

        +

      • Update the job script to write output to the subdirectories you created in your speed-scratch directory, e.g., /speed-scratch/$USER/output.
    -

    In the above example, $USER is an environment variable containing your ENCS username. -

    +

    In the above example, $USER is an environment variable containing your ENCS username. +

    -
    B.3.3 Example of setting working directories for COMSOL
    +
    B.3.3 Example of setting working directories for COMSOL
    • -

      Create directories for recovery, temporary, and configuration files. For example, to create these +

      Create directories for recovery, temporary, and configuration files. For example, to create these directories for your GCS ENCS user account:

      -
      +     
            mkdir -m 750 -p /speed-scratch/$USER/comsol/{recovery,tmp,config}
       
      -

      +

    • -

      Add the following command switches to the COMSOL command to use the directories created +

      Add the following command switches to the COMSOL command to use the directories created above:

      -
      +     
            -recoverydir /speed-scratch/$USER/comsol/recovery
            -tmpdir /speed-scratch/$USER/comsol/tmp
            -configuration/speed-scratch/$USER/comsol/config
       
      -

    -

    In the above example, $USER is an environment variable containing your ENCS username. -

    +

    +

    In the above example, $USER is an environment variable containing your ENCS username. +

    -
    B.3.4 Example of setting working directories for Python Modules
    -

    By default when adding a python module the /tmp directory is set as the temporary repository for +

    B.3.4 Example of setting working directories for Python Modules
    +

    By default when adding a python module the /tmp directory is set as the temporary repository for files downloads. The size of the /tmp directory on speed-submit is too small for pytorch. To add a python module

    • -

      Create your own tmp directory in your speed-scratch directory +

      Create your own tmp directory in your speed-scratch directory

      -
      +     
              mkdir /speed-scratch/$USER/tmp
       
      -

      +

    • -

      Use the tmp directory you created +

      Use the tmp directory you created

      -
      +     
              setenv TMPDIR /speed-scratch/$USER/tmp
       
      -

      +

    • Attempt the installation of pytorch
    -

    In the above example, $USER is an environment variable containing your ENCS username. -

    +

    In the above example, $USER is an environment variable containing your ENCS username. +

    +

    +

    B.4 How do I check my job’s status?

    +

    When a job with a job id of 1234 is running or terminated, the status of that job can be tracked using +‘sacct -j 1234’. squeue -j 1234 can show while the job is sitting in the queue as well. Long term +statistics on the job after its terminated can be found using sstat -j 1234 after slurmctld purges it +its tracking state into the database. +

    +

    +

    B.5 Why is my job pending when nodes are empty?

    +

    -

    B.4 How do I check my job’s status?

    -

    When a job with a job id of 1234 is running, the status of that job can be tracked using -‘qstat -j 1234‘. Likewise, if the job is pending, the ‘qstat -j 1234‘ command will report as to -why the job is not scheduled or running. Once the job has finished, or has been killed, the qacct -command must be used to query the job’s status, e.g., ‘qaact -j [jobid]‘. -

    -

    -

    B.5 Why is my job pending when nodes are empty?

    -

    -

    -
    B.5.1 Disabled nodes
    -

    It is possible that a (or a number of) the Speed nodes are disabled. Nodes are disabled if they require -maintenance. To verify if Speed nodes are disabled, request the current list of disabled nodes from -qstat. - - - -

    -
    -qstat -f -qs d
    -queuename                      qtype resv/used/tot. load_avg arch          states
    ----------------------------------------------------------------------------------
    -g.q@speed-05.encs.concordia.ca BIP   0/0/32         0.27     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-07.encs.concordia.ca BIP   0/0/32         0.01     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-10.encs.concordia.ca BIP   0/0/32         0.01     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-16.encs.concordia.ca BIP   0/0/32         0.02     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-19.encs.concordia.ca BIP   0/0/32         0.03     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-24.encs.concordia.ca BIP   0/0/32         0.01     lx-amd64      d
    ----------------------------------------------------------------------------------
    -s.q@speed-36.encs.concordia.ca BIP   0/0/32         0.03     lx-amd64      d
    +   
    B.5.1 Disabled nodes
    +

    It is possible that one or a number of the Speed nodes are disabled. Nodes are disabled if they require +maintenance. To verify if Speed nodes are disabled, see if they are in a draining or drained +state: + + + +

    +
    +[serguei@speed-submit src] % sinfo --long --Node
    +Thu Oct 19 21:25:12 2023
    +NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
    +speed-01       1        pa        idle 32     2:16:1 257458        0      1    gpu16 none
    +speed-03       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-05       1        pg        idle 32     2:16:1 515490        0      1    gpu16 none
    +speed-07       1       ps*       mixed 32     2:16:1 515490        0      1    cpu32 none
    +speed-08       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-09       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-10       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-11       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-12       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-15       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-16       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-17       1        pg     drained 32     2:16:1 515490        0      1    gpu16 UGE
    +speed-19       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-20       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-21       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-22       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-23       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-24       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-25       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-25       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-27       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-27       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
    +speed-29       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-30       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-31       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-32       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-33       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-34       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
    +speed-35       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-36       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
    +speed-37       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-38       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-39       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-40       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-41       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-42       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
    +speed-43       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
     
    -

    -

    Note how the all of the Speed nodes in the above list have a state of d, or disabled. -

    Your job will run once the maintenance has been completed and the disabled nodes have been -enabled. -

    -

    -
    B.5.2 Error in job submit request.
    -

    It is possible that your job is pending, because the job requested resources that are not available -within Speed. To verify why pending job with job id 1234 is not running, execute ‘qstat -j 1234‘ -and review the messages in the scheduling info: section. -

    -

    -

    C Sister Facilities

    -

    Below is a list of resources and facilities similar to Speed at various capacities. Depending on your +

    +

    Note which nodes are in the state of drained. Why the state is drained can be found in the reason +column. +

    Your job will run once an occupied node becomes availble or the maintenance has been completed +and the disabled nodes have a state of idle. +

    +

    +
    B.5.2 Error in job submit request.
    +

    It is possible that your job is pending, because the job requested resources that are not available +within Speed. To verify why job id 1234 is not running, execute ‘sacct -j 1234’. A summary of the +reasons is available via the squeue command. +

    +

    + + + +

    C Sister Facilities

    +

    Below is a list of resources and facilities similar to Speed at various capacities. Depending on your research group and needs, they might be available to you. They are not managed by HPC/NAG of AITS, so contact their respective representatives.

    • computation.encs CPU only 3-machine cluster running longer jobs without a scheduler at the moment - - -
    • apini.encs cluster for teaching and MPI programming (see the corresponding course in CSSE) @@ -1706,7 +2079,7 @@

      C
    • -

      There are various Lambda Labs other GPU servers and like computers acquired by individual +

      There are various Lambda Labs other GPU servers and like computers acquired by individual researchers; if you are member of their research group, contact them directly. These resources are not managed by us.

        @@ -1717,27 +2090,29 @@

        C Dr. Nizar Bouguila’s xailab.encs Lambda Labs station
      • Dr. Roch Glitho’s femto.encs server + + +
      • Dr. Maria Amer’s venom.encs Lambda Labs station
      • Dr. Leon Wang’s guerrera.encs DGX station
    • Dr. Ivan Contreras’ servers (managed by AITS) - - -
    • If you are a member of School of Health (formerly PERFORM Center), you may have access to their local PERFORM’s High Performance Computing (HPC) Cluster. Contact Thomas Beaudry for details and how to obtain access.
    • -
    • Digital Research Alliance Canada (Compute Canada / Calcul Quebec),
      https://alliancecan.ca/ +
    • Digital Research Alliance Canada (Compute Canada / Calcul Quebec),
      https://alliancecan.ca/. Follow this link on the information how to obtain access (students + need to be sponsored by their supervising faculty members, who should create accounts first). + Their SLURM examples are here: https://docs.alliancecan.ca/wiki/Running_jobs
    -

    -

    References

    +

    +

    References

    @@ -1749,111 +2124,123 @@

    C http://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/ANSYS+FLUENT.

    - [3]   Amine Boukhtouta, Nour-Eddine Lakhdari, Serguei A. Mokhov, and Mourad Debbabi. + [3]   Belkacem Belabes and Marius Paraschivoiu. + Numerical study of the effect of turbulence intensity on VAWT performance. Energy, 233:121139, + 2021. https://doi.org/10.1016/j.energy.2021.121139. +

    +

    + [4]   Belkacem Belabes and Marius Paraschivoiu. CFD study of the aerodynamic performance of a + vertical axis wind turbine in the wake of another turbine. In Proceedings of the CSME International + Congress, 2022. https://doi.org/10.7939/r3-rker-1746. +

    +

    + [5]   Belkacem Belabes and Marius Paraschivoiu. CFD modeling of vertical-axis wind turbine wake + interaction. Transactions of the Canadian Society for Mechanical Engineering, pages 1–10, 2023. + https://doi.org/10.1139/tcsme-2022-0149. +

    +

    + [6]   Amine Boukhtouta, Nour-Eddine Lakhdari, Serguei A. Mokhov, and Mourad Debbabi. Towards fingerprinting malicious traffic. In Proceedings of ANT’13, volume 19, pages 548–555. Elsevier, June 2013.

    - [4]   Amy Brown and Greg Wilson, editors. The Architecture of Open Source Applications: + [7]   Amy Brown and Greg Wilson, editors. The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks, volume I. aosabook.org, March 2012. Online at http://aosabook.org.

    - [5]   Goutam Yelluru Gopal and Maria Amer. Mobile vision transformer-based visual object + [8]   Goutam Yelluru Gopal and Maria Amer. Mobile vision transformer-based visual object tracking. In 34th British Machine Vision Conference (BMVC), Aberdeen, UK, November 2023. https://arxiv.org/abs/2309.05829 and https://github.com/goutamyg/MVT.

    - [6]   Goutam Yelluru Gopal and Maria Amer. Separable self and mixed attention transformers + [9]   Goutam Yelluru Gopal and Maria Amer. Separable self and mixed attention transformers for efficient object tracking. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, Hawaii, January 2024. https://arxiv.org/abs/2309.03979 and https://github.com/goutamyg/SMAT. + + +

    - [7]   Haotao Lai. An OpenISS framework + [10]   Haotao Lai. An OpenISS framework specialization for deep learning-based person re-identification. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, August 2019. https://spectrum.library.concordia.ca/id/eprint/985788/.

    - [8]   Haotao Lai et al. OpenISS keras-yolo3 v0.1.0, June 2021. + [11]   Haotao Lai et al. OpenISS keras-yolo3 v0.1.0, June 2021. https://github.com/OpenISS/openiss-yolov3.

    - [9]   Haotao Lai et al. Openiss person re-identification baseline v0.1.1, June 2021. + [12]   Haotao Lai et al. Openiss person re-identification baseline v0.1.1, June 2021. https://github.com/OpenISS/openiss-reid-tfk. - - -

    - [10]   MathWorks. MATLAB. [online], 2000–2012. http://www.mathworks.com/products/matlab/. + [13]   MathWorks. MATLAB. [online], 2000–2012. http://www.mathworks.com/products/matlab/.

    - [11]   Serguei Mokhov, Jonathan Llewellyn, Carlos Alarcon Meza, Tariq Daradkeh, and Gillian + [14]   Serguei Mokhov, Jonathan Llewellyn, Carlos Alarcon Meza, Tariq Daradkeh, and Gillian Roper. The use of containers in OpenGL, ML and HPC for teaching and research support. In ACM SIGGRAPH 2023 Posters, SIGGRAPH ’23, New York, NY, USA, 2023. ACM. https://doi.org/10.1145/3588028.3603676.

    - [12]   Serguei A. Mokhov. The use of machine learning with signal- and NLP processing + [15]   Serguei A. Mokhov. The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. Technical Report NIST SP 500-283, NIST, October 2011. Report: http://www.nist.gov/manuscript-publication-search.cfm?pub_id=909407, online e-print at http://arxiv.org/abs/1010.2511.

    - [13]   Serguei A. Mokhov. Intensional Cyberforensics. PhD thesis, Department of Computer Science + [16]   Serguei A. Mokhov. Intensional Cyberforensics. PhD thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, September 2013. Online at http://arxiv.org/abs/1312.0466.

    - [14]   Serguei A. Mokhov, Michael J. Assels, Joey Paquet, and Mourad Debbabi. Automating MAC + [17]   Serguei A. Mokhov, Michael J. Assels, Joey Paquet, and Mourad Debbabi. Automating MAC spoofer evidence gathering and encoding for investigations. In Frederic Cuppens et al., editors, Proceedings of The 7th International Symposium on Foundations & Practice of Security (FPS’14), LNCS 8930, pages 168–183. Springer, November 2014. Full paper.

    - [15]   Serguei A. Mokhov, Michael J. Assels, Joey Paquet, and Mourad Debbabi. Toward automated + [18]   Serguei A. Mokhov, Michael J. Assels, Joey Paquet, and Mourad Debbabi. Toward automated MAC spoofer investigations. In Proceedings of C3S2E’14, pages 179–184. ACM, August 2014. Short paper.

    - [16]   Serguei A. Mokhov and Scott Bunnell. Speed server farm: + [19]   Serguei A. Mokhov and Scott Bunnell. Speed server farm: + + + Gina Cody School of ENCS HPC facility. [online], 2018–2019. https://docs.google.com/presentation/d/1bWbGQvYsuJ4U2WsfLYp8S3yb4i7OdU7QDn3l_Q9mYis.

    - [17]   Serguei A. Mokhov, Joey Paquet, and Mourad Debbabi. The use of NLP techniques in static + [20]   Serguei A. Mokhov, Joey Paquet, and Mourad Debbabi. The use of NLP techniques in static code analysis to detect weaknesses and vulnerabilities. In Maria Sokolova and Peter van Beek, editors, Proceedings of Canadian Conference on AI’14, volume 8436 of LNAI, pages 326–332. Springer, May 2014. Short paper.

    - [18]   Parna Niksirat, Adriana Daca, and Krzysztof Skonieczny. The effects of reduced-gravity + [21]   Parna Niksirat, Adriana Daca, and Krzysztof Skonieczny. The effects of reduced-gravity on planetary rover mobility. International Journal of Robotics Research, 39(7):797–811, 2020. https://doi.org/10.1177/0278364920913945.

    - - -

    - [19]   Chet Ramey. The Bourne-Again Shell. In Brown and Wilson [4]. + [22]   Chet Ramey. The Bourne-Again Shell. In Brown and Wilson [7]. http://aosabook.org/en/bash.html.

    - [20]   Rob Schreiber. MATLAB. Scholarpedia, 2(6):2929, 2007. + [23]   Rob Schreiber. MATLAB. Scholarpedia, 2(6):2929, 2007. http://www.scholarpedia.org/article/MATLAB.

    - [21]   The MARF Research and Development Group. The Modular Audio Recognition + [24]   The MARF Research and Development Group. The Modular Audio Recognition Framework and its Applications. [online], 2002–2014. http://marf.sf.net and http://arxiv.org/abs/0905.1235, last viewed May 2015.

    - - - \ No newline at end of file diff --git a/doc/web/speed-manual.css b/doc/web/speed-manual.css index 6d423e9..94cef49 100644 --- a/doc/web/speed-manual.css +++ b/doc/web/speed-manual.css @@ -1,5 +1,6 @@ /* start css.sty */ +.cmr-7{font-size:70%;} .cmbx-12x-x-144{font-size:172%; font-weight: bold;} .cmbx-12x-x-144{ font-weight: bold;} .cmbx-12x-x-144{ font-weight: bold;} @@ -9,6 +10,7 @@ .cmtt-9{font-family: monospace,monospace;} .cmtt-9{font-family: monospace,monospace;} .cmtt-9{font-family: monospace,monospace;} +.cmtt-9{font-family: monospace,monospace;} .cmbx-10{ font-weight: bold;} .cmbx-10{ font-weight: bold;} .cmbx-10{ font-weight: bold;} @@ -20,16 +22,24 @@ .cmtt-10{font-family: monospace,monospace;} .cmtt-10{font-family: monospace,monospace;} .cmtt-10{font-family: monospace,monospace;} -.cmti-10{ font-style: italic;} +.cmtt-10{font-family: monospace,monospace;} .tctt-1000{font-family: monospace,monospace;} .tctt-1000{font-family: monospace,monospace;} +.cmti-10{ font-style: italic;} .cmitt-10{font-family: monospace,monospace; font-style: italic;} .cmitt-10{font-family: monospace,monospace; font-style: italic;} +.cmtt-8x-x-87{font-size:69%;font-family: monospace,monospace;} +.cmtt-8x-x-87{font-family: monospace,monospace;} +.cmtt-8x-x-87{font-family: monospace,monospace;} +.cmtt-8x-x-87{font-family: monospace,monospace;} +.cmtt-8x-x-87{font-family: monospace,monospace;} +.cmtt-8x-x-87{font-family: monospace,monospace;} .cmtt-8{font-size:80%;font-family: monospace,monospace;} .cmtt-8{font-family: monospace,monospace;} .cmtt-8{font-family: monospace,monospace;} .cmtt-8{font-family: monospace,monospace;} .cmtt-8{font-family: monospace,monospace;} +.cmtt-8{font-family: monospace,monospace;} .cmitt-10x-x-80{font-size:80%;font-family: monospace,monospace; font-style: italic;} .cmitt-10x-x-80{font-family: monospace,monospace; font-style: italic;} .tcit-0800{font-size:80%;} @@ -140,7 +150,6 @@ table.pmatrix {width:100%;} span.bar-css {text-decoration:overline;} img.cdots{vertical-align:middle;} .partToc a, .partToc, .likepartToc a, .likepartToc {line-height: 200%; font-weight:bold; font-size:110%;} -.chapterToc a, .chapterToc, .likechapterToc a, .likechapterToc, .appendixToc a, .appendixToc {line-height: 200%; font-weight:bold;} .index-item, .index-subitem, .index-subsubitem {display:block} div.caption {text-indent:-2em; margin-left:3em; margin-right:1em; text-align:left;} div.caption span.id{font-weight: bold; white-space: nowrap; }