User Manual

Index

Config
Scheduling GPUs

Config

Config QuickStart

Config skuTypes

Description:

A skuType defines a resource unit in all resource dimensions.

Notes:
1. It is like the Azure VM Series or GCP Machine Types.
2. Currently, the skuTypes is not directly used by HivedScheduler, but it is used by OpenPAI RestServer to setup proportional Pod resource requests and limits. So, if you are not using with OpenPAI RestServer, you can skip to config it.
Example:

Assume you have some K80 nodes of the same SKU in your cluster, and you want to schedule Pods on them:
1. Using kubectl describe nodes to check if these K80 nodes have nearly the same (Allocatable Resources - All Daemon Pods Requests, such as Pods for Device Plugin, Network Plugin, etc), especially for gpu, cpu, memory. If not, please fix it. Assume the aligned minimal resources are: 4 gpus, 23 cpus, and 219GB memory.
2. Then proportionally, each gpu request should also has floor(23/4)=5 cpus and floor(219/4)=54GB memory along with it, so config the K80 skuType as below:
```
physicalCluster:
  skuTypes:
    K80:
      gpu: 1
      cpu: 5
      memory: 54Gi
```

Config cellTypes

Description:

A cellType defines a resource topology of a skuType.

Notes:

skuTypes are also cellTypes, but they are all leaf cellTypes which do not have internal topology anymore.

Example:

Using nvidia-smi topo --matrix to figure out the gpu topology on one above K80 node:

        GPU0    GPU1    GPU2    GPU3    CPU Affinity
GPU0     X      NODE    NODE    NODE    0-11
GPU1    NODE     X      NODE    NODE    0-11
GPU2    NODE    NODE     X      NODE    0-11
GPU3    NODE    NODE    NODE     X      0-11

These 4 gpus are equivalent under the node, so config the K80-NODE cellType as below:

physicalCluster:
  cellTypes:
    K80-NODE:
      childCellType: K80
      childCellNumber: 4
      isNodeLevel: true

Assume you have 3 above K80 nodes under the same network switch or as a pool, so config the K80-NODE-POOL cellType as below:

physicalCluster:
  cellTypes:
    K80-NODE-POOL:
      childCellType: K80-NODE
      childCellNumber: 3

Config physicalCells

Description:

A physicalCell defines a resource instance, i.e. a cellType instantiated by a specific set of physical devices.

Example:
1. Assume above 3 K80 nodes have K8S node names node1, node2 and node3, so config a K80-NODE-POOL physicalCell as below:
```
physicalCluster:
  physicalCells:
  - cellType: K80-NODE-POOL
    cellChildren:
    - cellAddress: node1
    - cellAddress: node2
    - cellAddress: node3
```
Config virtualClusters

Description:

A virtualCluster defines a resource guaranteed quota in terms of cellTypes.

Example:
1. Assume you want to partition above 3 K80 nodes to 2 virtual clusters: vc1 with 1 node and vc2 with 2 nodes, so config vc1 and vc2 virtualCluster as below:
```
virtualClusters:
  vc1:
    virtualCells:
    - cellType: K80-NODE-POOL.K80-NODE
      cellNumber: 1
  vc2:
    virtualCells:
    - cellType: K80-NODE-POOL.K80-NODE
      cellNumber: 2
```
  Notes:
  1. The name of virtualCluster should be constrained by the K8S naming convention.
  2. The virtualCells.cellType should be full qualified and should be started with a cellType which is explicitly referred in physicalCells.

Put it together

Example:

Finally, after above steps, your config would be:

physicalCluster:
  skuTypes:
    K80:
      gpu: 1
      cpu: 5
      memory: 54Gi
  cellTypes:
    K80-NODE:
      childCellType: K80
      childCellNumber: 4
      isNodeLevel: true
    K80-NODE-POOL:
      childCellType: K80-NODE
      childCellNumber: 3
  physicalCells:
  - cellType: K80-NODE-POOL
    cellChildren:
    - cellAddress: node1
    - cellAddress: node2
    - cellAddress: node3

virtualClusters:
  vc1:
    virtualCells:
    - cellType: K80-NODE-POOL.K80-NODE
      cellNumber: 1
  vc2:
    virtualCells:
    - cellType: K80-NODE-POOL.K80-NODE
      cellNumber: 2

Config Detail

Detail Example

Scheduling GPUs

To leverage this scheduler to schedule GPUs, if one container in the Pod want to use the allocated GPUs for the whole Pod, it could contain below environment variables:

NVIDIA GPUs

env:
- name: NVIDIA_VISIBLE_DEVICES
  valueFrom:
    fieldRef:
      fieldPath: metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']

The scheduler directly delivers GPU isolation decision to nvidia-container-runtime through Pod Env NVIDIA_VISIBLE_DEVICES.

AMD GPUs

env:
- name: AMD_VISIBLE_DEVICES
  valueFrom:
    fieldRef:
      fieldPath: metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']

The scheduler directly delivers GPU isolation decision to rocm-container-runtime through Pod Env AMD_VISIBLE_DEVICES.

The annotation referred by the env will be populated by scheduler when bind the pod.

If multiple containers in the Pod contain the env, the allocated GPUs are all visible to them, so it is these containers' freedom to control how to share these GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user-manual.md

user-manual.md

User Manual

Index

Config

Config QuickStart

Config Detail

Scheduling GPUs

Files

user-manual.md

Latest commit

History

user-manual.md

File metadata and controls

User Manual

Index

Config

Config QuickStart

Config Detail

Scheduling GPUs