Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History
191 lines (150 loc) · 6.65 KB

user-manual.md

File metadata and controls

191 lines (150 loc) · 6.65 KB
  1. Config skuTypes

    Description:

    A skuType defines a resource unit in all resource dimensions.

    Notes:

    1. It is like the Azure VM Series or GCP Machine Types.
    2. Currently, the skuTypes is not directly used by HivedScheduler, but it is used by OpenPAI RestServer to setup proportional Pod resource requests and limits. So, if you are not using with OpenPAI RestServer, you can skip to config it.

    Example:

    Assume you have some K80 nodes of the same SKU in your cluster, and you want to schedule Pods on them:

    1. Using kubectl describe nodes to check if these K80 nodes have nearly the same (Allocatable Resources - All Daemon Pods Requests, such as Pods for Device Plugin, Network Plugin, etc), especially for gpu, cpu, memory. If not, please fix it. Assume the aligned minimal resources are: 4 gpus, 23 cpus, and 219GB memory.

    2. Then proportionally, each gpu request should also has floor(23/4)=5 cpus and floor(219/4)=54GB memory along with it, so config the K80 skuType as below:

      physicalCluster:
        skuTypes:
          K80:
            gpu: 1
            cpu: 5
            memory: 54Gi
  2. Config cellTypes

    Description:

    A cellType defines a resource topology of a skuType.

    Notes:

    1. skuTypes are also cellTypes, but they are all leaf cellTypes which do not have internal topology anymore.

    Example:

    1. Using nvidia-smi topo --matrix to figure out the gpu topology on one above K80 node:

              GPU0    GPU1    GPU2    GPU3    CPU Affinity
      GPU0     X      NODE    NODE    NODE    0-11
      GPU1    NODE     X      NODE    NODE    0-11
      GPU2    NODE    NODE     X      NODE    0-11
      GPU3    NODE    NODE    NODE     X      0-11
      
    2. These 4 gpus are equivalent under the node, so config the K80-NODE cellType as below:

      physicalCluster:
        cellTypes:
          K80-NODE:
            childCellType: K80
            childCellNumber: 4
            isNodeLevel: true
    3. Assume you have 3 above K80 nodes under the same network switch or as a pool, so config the K80-NODE-POOL cellType as below:

      physicalCluster:
        cellTypes:
          K80-NODE-POOL:
            childCellType: K80-NODE
            childCellNumber: 3
  3. Config physicalCells

    Description:

    A physicalCell defines a resource instance, i.e. a cellType instantiated by a specific set of physical devices.

    Example:

    1. Assume above 3 K80 nodes have K8S node names node1, node2 and node3, so config a K80-NODE-POOL physicalCell as below:
      physicalCluster:
        physicalCells:
        - cellType: K80-NODE-POOL
          cellChildren:
          - cellAddress: node1
          - cellAddress: node2
          - cellAddress: node3
  4. Config virtualClusters

    Description:

    A virtualCluster defines a resource guaranteed quota in terms of cellTypes.

    Example:

    1. Assume you want to partition above 3 K80 nodes to 2 virtual clusters: vc1 with 1 node and vc2 with 2 nodes, so config vc1 and vc2 virtualCluster as below:
      virtualClusters:
        vc1:
          virtualCells:
          - cellType: K80-NODE-POOL.K80-NODE
            cellNumber: 1
        vc2:
          virtualCells:
          - cellType: K80-NODE-POOL.K80-NODE
            cellNumber: 2
      Notes:
      1. The name of virtualCluster should be constrained by the K8S naming convention.
      2. The virtualCells.cellType should be full qualified and should be started with a cellType which is explicitly referred in physicalCells.
  5. Put it together

    Example:

    Finally, after above steps, your config would be:

    physicalCluster:
      skuTypes:
        K80:
          gpu: 1
          cpu: 5
          memory: 54Gi
      cellTypes:
        K80-NODE:
          childCellType: K80
          childCellNumber: 4
          isNodeLevel: true
        K80-NODE-POOL:
          childCellType: K80-NODE
          childCellNumber: 3
      physicalCells:
      - cellType: K80-NODE-POOL
        cellChildren:
        - cellAddress: node1
        - cellAddress: node2
        - cellAddress: node3
    
    virtualClusters:
      vc1:
        virtualCells:
        - cellType: K80-NODE-POOL.K80-NODE
          cellNumber: 1
      vc2:
        virtualCells:
        - cellType: K80-NODE-POOL.K80-NODE
          cellNumber: 2

Detail Example

To leverage this scheduler to schedule GPUs, if one container in the Pod want to use the allocated GPUs for the whole Pod, it could contain below environment variables:

  • NVIDIA GPUs

    env:
    - name: NVIDIA_VISIBLE_DEVICES
      valueFrom:
        fieldRef:
          fieldPath: metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']

    The scheduler directly delivers GPU isolation decision to nvidia-container-runtime through Pod Env NVIDIA_VISIBLE_DEVICES.

  • AMD GPUs

    env:
    - name: AMD_VISIBLE_DEVICES
      valueFrom:
        fieldRef:
          fieldPath: metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']

    The scheduler directly delivers GPU isolation decision to rocm-container-runtime through Pod Env AMD_VISIBLE_DEVICES.

The annotation referred by the env will be populated by scheduler when bind the pod.

If multiple containers in the Pod contain the env, the allocated GPUs are all visible to them, so it is these containers' freedom to control how to share these GPUs.