- Frequently Asked Questions (FAQ)
- For Management
- For Users
- For Administrators
- How do I contribute to
slurm-gcp
orslurm
? - How do I use Terraform?
- How do I modify Slurm config files?
- What are GCP preemptible VMs?
- How do I reduce compute costs?
- How do I limit user access to only using login nodes?
- What Slurm image do I use for production?
- What operating systems can I use
slurm-gcp
with? - Should I disable Simultaneous Multithreading (SMT)?
- How do I automate custom cluster configurations?
- How do I replace the controller?
- How do I contribute to
https://slurm.schedmd.com/faq.html#foss
Free Open Source Software (FOSS) does not mean that it is without cost. It does mean that the you have access to the code so that you are free to use it, study it, and/or enhance it. These reasons contribute to Slurm (and FOSS in general) being subject to active research and development worldwide, displacing proprietary software in many environments. If the software is large and complex, like Slurm or the Linux kernel, then while there is no license fee, its use is not without cost.
This is the official and supported solution from SchedMD in partnership with Google for Slurm on Google Cloud Platform.
slurm-gcp
provides terraform modules. This make
standing up a cluster easy and will integrate into your existing infrastructure.
Please visit SchedMD Support and reach out. Tickets can be submitted via SchedMD's Bugzilla.
- Check the GCP Console Logs Viewer.
- On Slurm cloud nodes, check
/var/log/slurm/*.log
. - Otherwise check
/var/log/messages
(RHEL/CentOS) or/var/log/syslog
(Debian/Ubuntu).
- Enable debug logging using terraform enable_debug_logging variable.
- If you need more, such as verbose GCP API request information, enable the
appropriate logging flag using terrafor extra_logging_flags variable (See the
logging_flags variable in scripts/util.py to get the list of supported log
flags).
- For verbose API request information, use the
trace_api
logging flag.
- For verbose API request information, use the
- These increase the logging to Slurm-GCP script logs only, such as
resume.log
andsuspend.log
.
Data can be migrated to and from external sources using a workflow of dependent jobs. A workflow submission script and helper jobs are provided. See README for more information.
-
If the compute nodes have external IPs you can connect directly to the compute nodes. From the VM Instances page, the SSH drop down next to the compute instances gives several options for connecting to the compute nodes.
-
With IAP enabled, you can SSH to the nodes regardless of external IPs or not.
-
Use Slurm to get an allocation on the nodes.
For Example:
$ srun --pty $SHELL [g1-debug-test-0 ~]$
Enhancement requests can be submitted to SchedMD's Bugzilla.
Please see Terraform documentation.
For the Slurm terraform modules, please refer to their module API as documented in their README's. Additionally, please see the Slurm terraform examples for sample usage.
Presuming slurm_cluster terraform module was used to deploy the cluster, see input parameters:
- slurm_conf_tpl
- cgroup_conf_tpl
- slurmdbd_conf_tpl
Preemptible instances are cheaper than on-demand instances, however they can be reclaimed given their Service Level Agreement (SLA). Google Cloud offers two types of preemptible VMs: preemptible (v1); spot (beta). Spot VMs offer more features and better control over the reclaim process and when they can be reclaimed.
As far as Slurm is concerned, all preemptible type instances are treated the same. When reclaimed (terminated or stopped), they are marked as "down" and their running jobs are requeued, otherwise canceled. slurmsync will detect this activity and clear the "down" state from the node so it may be allocated jobs again.
-
In
partition_conf
, set a lowerSuspendTime
for a given slurm_partition.For example:
partition_conf = { SuspendTime = 120 }
-
For compute nodes within a given slurm_partition, use preemptible VM instances.
For example:
partition_nodes = [ { ... preemptible = true ... } ]
-
For compute nodes within a given slurm_partition, use SPOT VM instances.
For example:
partition_nodes = [ { ... enable_spot_vm = true ... } ]
By default, all instances are configured with OS Login. This keeps UID and GID of users consistent across all instances and allows easy user control with IAM Roles.
- Create a group for all users in
admin.google.com
. - At the project level in IAM, grant the Compute Viewer and Service Account User roles to the group.
- At the instance level for each login node, grant the Compute OS Login role to the group.
- Make sure the Info Panel is shown on the right.
- On the compute instances page, select the boxes to the left of the login nodes.
- Click Add Members and add the Compute OS Login role to the group.
- At the organization level, grant the Compute OS Login External User role to the group if the users are not part of the organization.
- To allow ssh to login nodes without external IPs, configure IAP for the group.
- Go to the Identity-Aware Proxy page
- Select project
- Click SSH AND TCP RESOURCES tab
- Select boxes for login nodes
- Add group as a member with the IAP-secured Tunnel User role. Please see Enabling IAP for Compute Engine for more information.
By default, the slurm_cluster terraform
module uses the latest Slurm image family (e.g.
slurm-gcp-6-2-hpc-rocky-linux-8
). As new Slurm image families are released,
coenciding with periodic Slurm releases, the terraform module will be updated to
track the newest image family by setting it as the new default. This update can
be considered a breaking change.
In a production setting, it is recommended to explicitly set an image family.
Doing so will prevent slurm-gcp
changes to the default image family from
negatively impacting your cluster. Moreover, the controller and all other
instances may be force replaced (destroyed, then deployed) when
terraform apply
detects that the image family of Slurm instances has changed.
Optionally, you may generate and use your own Slurm images. See custom image creation for more information.
You may use any OS supported by the image build process.
See image docs for more information.
Some HPC applications get better performance by disabling Simultaneous Multithreading (SMT) in the guest OS. Simultaneous Multithreading, commonly known as Intel Hyper-threading, allocates two virtual cores (vCPU) per physical core on the node. For many general computing tasks or tasks that require lots of I/O, SMT can increase application throughput significantly. For compute-bound jobs in which both virtual cores are compute-bound, SMT can hinder overall application performance and can add unpredictable variance to jobs. Turning off SMT allows more predictable performance and can decrease job times.
Important: Disabling SMT changes the way cores are counted, and may increase the cost per core of the cluster depending on how you count cores. Although cost per core is a common metric for on-premises hardware, a more appropriate metric for the cloud is cost per workload or cost per job. For compute-bound jobs, you pay for what you use. Turning off Hyper-Threading can reduce the overall runtime, which can reduce the overall cost of the job. We recommend that you benchmark your application and use this feature where it is beneficial.
You can disable Simultaneous Multithreading at VM creation on all VM types with the following exceptions:
- VMs that run on machine types that have fewer than 2 vCPUs (such as n1-standard-1) or shared-core machines (such as e2-small).
- VMs that run on the Tau T2D machine type.
When using slurm-gcp
terraform modules, use option disable_smt
to toggle
Simultaneous Multithreading (SMT) on/off.
The Slurm cluster module provide
multiple variables (controller_startup_scripts
, compute_startup_scripts
,
partition_startup_scripts
) which allow you input a list of scripts which will
be run on different sets of hosts at set-up time. The scripts are run
synchronousely and a non-zero exit will fail the setup step of the instance.
Generally, controller_startup_scripts
will run only on the controller node;
compute_startup_scripts
will run on the log and all compute nodes, and
partition_startup_scripts
will on all compute nodes within that partition. See
Slurm cluster module variables for
details.
If you want to install software, it is recommended to bake it into the image. Doing so will speed up the deployment of bursted compute nodes. See customize image for more information.
Replacing the controller instance is a hazardous action.
It is reccommeded to:
- Drain the cluster of all jobs.
- Optionally,
state=power_down
all nodes.
- Optionally,
- Save and export all local data off the controller.
- By default, the database (mariadb) and
/home
(NFS mounted) are local.
- By default, the database (mariadb) and
- Replace the controller instance by either:
- Update
tfvars
configuration thenterraform apply
. - Or, manually terminate the controller instance then
terraform apply
.
- Update
- Reboot all instances with NFS mounts to the controller.
- By default, this includes all login and compute nodes.