Skip to content

Commit 362da3d

Browse files
win5923troychiuFuture-Outlierrueian
authored
RayJob Volcano Integration (#3972)
* modify batch scheduler interface to support CRD other than RayCluster Signed-off-by: Troy Chiu <[email protected]> * [Feature] RayJob Volcano integration Signed-off-by: win5923 <[email protected]> * Modify kai scheduler and sheduler plugins Signed-off-by: win5923 <[email protected]> * Revert interface migration Signed-off-by: win5923 <[email protected]> * Append submitter resources Signed-off-by: win5923 <[email protected]> * Remove empty ResourceList Signed-off-by: win5923 <[email protected]> * Add K8sJobMode check to prevent YuniKorn from adding submitter pod annotations Signed-off-by: win5923 <[email protected]> * Apply Troy's comments Signed-off-by: win5923 <[email protected]> * Log more information Signed-off-by: win5923 <[email protected]> * Add sidecarmode Signed-off-by: win5923 <[email protected]> * fix Signed-off-by: Future-Outlier <[email protected]> * Update Rueian's advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Rueian <[email protected]> --------- Signed-off-by: Troy Chiu <[email protected]> Signed-off-by: win5923 <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Troy Chiu <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Co-authored-by: Rueian <[email protected]>
1 parent c6bafa3 commit 362da3d

File tree

4 files changed

+581
-61
lines changed

4 files changed

+581
-61
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
apiVersion: scheduling.volcano.sh/v1beta1
2+
kind: Queue
3+
metadata:
4+
name: kuberay-test-queue
5+
spec:
6+
weight: 1
7+
capability:
8+
cpu: 4
9+
memory: 6Gi
10+
---
11+
apiVersion: ray.io/v1
12+
kind: RayJob
13+
metadata:
14+
name: rayjob-sample-0
15+
labels:
16+
ray.io/scheduler-name: volcano
17+
volcano.sh/queue-name: kuberay-test-queue
18+
spec:
19+
entrypoint: python /home/ray/samples/sample_code.py
20+
runtimeEnvYAML: |
21+
pip:
22+
- requests==2.26.0
23+
- pendulum==2.1.2
24+
env_vars:
25+
counter_name: "test_counter"
26+
rayClusterSpec:
27+
rayVersion: '2.46.0'
28+
headGroupSpec:
29+
rayStartParams: {}
30+
template:
31+
spec:
32+
containers:
33+
- name: ray-head
34+
image: rayproject/ray:2.46.0
35+
ports:
36+
- containerPort: 6379
37+
name: gcs-server
38+
- containerPort: 8265
39+
name: dashboard
40+
- containerPort: 10001
41+
name: client
42+
resources:
43+
limits:
44+
cpu: "1"
45+
memory: "2Gi"
46+
requests:
47+
cpu: "1"
48+
memory: "2Gi"
49+
volumeMounts:
50+
- mountPath: /home/ray/samples
51+
name: code-sample
52+
volumes:
53+
- name: code-sample
54+
configMap:
55+
name: ray-job-code-sample
56+
items:
57+
- key: sample_code.py
58+
path: sample_code.py
59+
workerGroupSpecs:
60+
- replicas: 2
61+
minReplicas: 2
62+
maxReplicas: 2
63+
groupName: small-group
64+
rayStartParams: {}
65+
template:
66+
spec:
67+
containers:
68+
- name: ray-worker
69+
image: rayproject/ray:2.46.0
70+
resources:
71+
limits:
72+
cpu: "1"
73+
memory: "1Gi"
74+
requests:
75+
cpu: "1"
76+
memory: "1Gi"
77+
---
78+
apiVersion: v1
79+
kind: ConfigMap
80+
metadata:
81+
name: ray-job-code-sample
82+
data:
83+
sample_code.py: |
84+
import ray
85+
import os
86+
import requests
87+
88+
ray.init()
89+
90+
@ray.remote
91+
class Counter:
92+
def __init__(self):
93+
# Used to verify runtimeEnv
94+
self.name = os.getenv("counter_name")
95+
assert self.name == "test_counter"
96+
self.counter = 0
97+
98+
def inc(self):
99+
self.counter += 1
100+
101+
def get_counter(self):
102+
return "{} got {}".format(self.name, self.counter)
103+
104+
counter = Counter.remote()
105+
106+
for _ in range(5):
107+
ray.get(counter.inc.remote())
108+
print(ray.get(counter.get_counter.remote()))
109+
110+
# Verify that the correct runtime env was used for the job.
111+
assert requests.__version__ == "2.26.0"

0 commit comments

Comments
 (0)