Skip to content

Latest commit

 

History

History
137 lines (111 loc) · 4.74 KB

File metadata and controls

137 lines (111 loc) · 4.74 KB

Module: Slurm Partition

FAQ | Troubleshooting | Glossary

Overview

This is a submodule of slurm_cluster. It creates a Slurm partition for slurm_controller_instance or slurm_controller_hybrid.

Conceptutally, a Slurm partition is a queue that is associated with compute resources, limits, and access controls. Users submit jobs to one or more partitions to have their jobs be completed against requested resources within their allotted limits and access.

This module defines a partition and its resources -- most notably, compute nodes. Sets of compute nodes reside within a partition. Each set of compute nodes must resolve to an instance template. Either the instance template is: created by definition -- module creates an instance template using subset of input parameters; or by the self link of an instance template that is managed outside of this module. Additionally, there are compute node parameters that will override certain properties of the instance template when instanceated as a VM.

Compute instances created by slurm_controller_instance, using this partition, run slurmd and slurmstepd.

Usage

See examples directory for sample usages.

See below for a simple inclusion within your own terraform project.

module "slurm_partition" {
  source = "[email protected]:SchedMD/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition?ref=v5.0.0"

  project_id = "<PROJECT_ID>"

  slurm_cluster_name = "<SLURM_CLUSTER_NAME>"

  partition_name = "debug"
  partition_nodes = {
    count_static  = 0
    count_dynamic = 10
    group_name    = "test"
    node_conf     = {}

    # Template by Definition
    additional_disks       = []
    can_ip_forward         = false
    disable_smt            = false
    disk_auto_delete       = true
    disk_labels            = {}
    disk_size_gb           = null
    disk_type              = null
    enable_confidential_vm = false
    enable_oslogin         = true
    enable_shielded_vm     = false
    gpu                    = null
    labels                 = {}
    machine_type           = "n1-standard-1"
    metadata               = {}
    min_cpu_platform       = null
    on_host_maintenance    = null
    preemptible            = false
    service_account = {
      email  = "<COMPUTE_SA_EMAIL>"
      scopes = ["https://www.googleapis.com/auth/cloud-platform"]
    }
    shielded_instance_config = null
    source_image_family      = null
    source_image_project     = null
    source_image             = null
    tags                     = []

    # Template by Source
    instance_template = null
  }
  region     = "us-central1"
  subnetwork = "default"
}

NOTE: Because this module is not hosted on Terraform Registry, the version must be strictly controlled via revision syntax on the source line.

Service Account

It is recommended to generate a compute type service account via slurm_sa_iam.

Otherwise reference compute service account and IAM to create a self managed compute service account and IAM.

Dependencies

  • Terraform is installed.
  • Compute Engine API is enabled.
  • Python is installed.
    • Required Version: >= 3.6.0, < 4.0.0
  • Pip packages are installed.
    • pip3 install -r ../../../scripts/requirements.txt --user

Module API

For the terraform module API reference, please see README_TF.md.