Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: add scheduler policy design doc and code #152

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

lengrongfu
Copy link
Member

issue: #141

@archlitchi
Copy link
Collaborator

yes, but we need to take numa information into consideration. because we bind GPU to task in hami-scheduler, so we can't use 'Topologymanager' in kubelet configuration. If we implement 'binpack' and 'spread' schedule policy, combined with numa, we need to implement 4 strategies, 'binpack-numaEnforce', 'binpack-numaBesteffort','spread-numaEnforece','spread-numaBesteffort'

Information about topologyManager and numa-policy (https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/topology-manager/)

@lengrongfu
Copy link
Member Author

yes, but we need to take numa information into consideration. because we bind GPU to task in hami-scheduler, so we can't use 'Topologymanager' in kubelet configuration. If we implement 'binpack' and 'spread' schedule policy, combined with numa, we need to implement 4 strategies, 'binpack-numaEnforce', 'binpack-numaBesteffort','spread-numaEnforece','spread-numaBesteffort'

Information about topologyManager and numa-policy (https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/topology-manager/)

numa scheduler can use https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/noderesourcetopology this project.

@lengrongfu lengrongfu force-pushed the add_scheduler_policy branch 2 times, most recently from de5313b to 8294392 Compare March 12, 2024 10:15
@lengrongfu lengrongfu changed the title KEP: add scheduler policy design doc KEP: add scheduler policy design doc and code Mar 12, 2024
@lengrongfu lengrongfu force-pushed the add_scheduler_policy branch 3 times, most recently from 9b001e8 to e8f1ccb Compare March 19, 2024 08:07
@lengrongfu
Copy link
Member Author

@archlitchi @wawa0210 PTAL.

@lengrongfu lengrongfu force-pushed the add_scheduler_policy branch 3 times, most recently from 5a56fec to b927614 Compare April 10, 2024 01:51
@lengrongfu
Copy link
Member Author

Test Cluster

One cluster two nodes, and two GPU devices per node.

Untitled

Test Case

Node Binback policy, GPU Binback policy

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    hami.io/node-scheduler-policy: "binpack"
    hami.io/gpu-scheduler-policy: "binpack"
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod1
  annotations:
    hami.io/node-scheduler-policy: "binpack"
    hami.io/gpu-scheduler-policy: "binpack"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10

Test Result:

  • Scheduler two pod to one node
$ kubectl get pods -o wide
NAME       READY   STATUS    RESTARTS   AGE     IP              NODE                NOMINATED NODE   READINESS GATES
gpu-pod    1/1     Running   0          6m55s   10.233.74.99    controller-node-1   <none>           <none>
gpu-pod1   1/1     Running   0          6m55s   10.233.74.114   controller-node-1   <none>           <none>
  • Scheduler two pod use one GPU device
$ kubectl get pods gpu-pod -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-e441928e-e386-c020-4f78-dddd4debb238,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod1 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-e441928e-e386-c020-4f78-dddd4debb238,NVIDIA,1000,10:;

Node Binback policy, GPU spread policy

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    hami.io/node-scheduler-policy: "binpack"
    hami.io/gpu-scheduler-policy: "spread"
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod1
  annotations:
    hami.io/node-scheduler-policy: "binpack"
    hami.io/gpu-scheduler-policy: "spread"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10

Test Result:

  • Scheduler two pod to one node
$ kubectl get pods -o wide
NAME       READY   STATUS    RESTARTS   AGE     IP              NODE            NOMINATED NODE   READINESS GATES
gpu-pod    1/1     Running   0          2m13s   10.233.84.237   worker-node-1   <none>           <none>
gpu-pod1   1/1     Running   0          2m13s   10.233.84.198   worker-node-1   <none>           <none>
  • Scheduler two pod use two GPU device
$ kubectl get pods gpu-pod -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-a784a920-1cc2-5aee-072f-6d4ea477e2b4,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod1 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;

Node spread policy, GPU binback policy

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "binpack"
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod1
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "binpack"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod2
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "binpack"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10

Test Result:

  • Scheduler three pod to two node
$ kubectl get pods -o wide
NAME       READY   STATUS    RESTARTS   AGE   IP              NODE                NOMINATED NODE   READINESS GATES
gpu-pod    1/1     Running   0          93s   10.233.74.83    controller-node-1   <none>           <none>
gpu-pod1   1/1     Running   0          93s   10.233.84.247   worker-node-1       <none>           <none>
gpu-pod2   1/1     Running   0          93s   10.233.74.68    controller-node-1   <none>           <none>
  • Scheduler three pod use two GPU device
$ kubectl get pods gpu-pod -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-e441928e-e386-c020-4f78-dddd4debb238,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod1 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod2 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-e441928e-e386-c020-4f78-dddd4debb238,NVIDIA,1000,10:;

Node spread policy, GPU spread policy

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "spread"
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod1
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "spread"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod2
  annotations:
    hami.io/node-scheduler-policy: "spread"
    hami.io/gpu-scheduler-policy: "spread"  
spec:
  containers:
    - name: ubuntu-container
      image: chrstnhntschl/gpu_burn
      args:
      - "6000"
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 1000
          nvidia.com/gpucores: 10          

Test Result:

  • Scheduler three pod to two node
$ kubectl get pods -o wide
NAME       READY   STATUS    RESTARTS   AGE   IP              NODE                NOMINATED NODE   READINESS GATES
gpu-pod    1/1     Running   0          25s   10.233.74.125   controller-node-1   <none>           <none>
gpu-pod1   1/1     Running   0          25s   10.233.84.241   worker-node-1       <none>           <none>
gpu-pod2   1/1     Running   0          25s   10.233.74.127   controller-node-1   <none>           <none>
  • Scheduler three pod use three GPU device
$ kubectl get pods gpu-pod -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-70a7e30d-99a5-1117-8e85-759a592fb582,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod1 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-a784a920-1cc2-5aee-072f-6d4ea477e2b4,NVIDIA,1000,10:;
$ kubectl get pods gpu-pod2 -o jsonpath="{.metadata.annotations['hami\.io/vgpu-devices-allocated']}"
GPU-e441928e-e386-c020-4f78-dddd4debb238,NVIDIA,1000,10:;

@lengrongfu lengrongfu force-pushed the add_scheduler_policy branch 3 times, most recently from 5db7c18 to ee98a74 Compare April 22, 2024 05:18
@archlitchi archlitchi merged commit 4f1a323 into Project-HAMi:master Apr 22, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants