Skip to content

Commit

Permalink
update docs and yaml
Browse files Browse the repository at this point in the history
Signed-off-by: limengxuan <[email protected]>
  • Loading branch information
archlitchi committed Sep 27, 2024
1 parent 9041e57 commit d7ccf67
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 5 deletions.
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ docker buildx build -t $IMAGE_NAME .

## Deployment

Due to dependencies with HAMi, the deployment is integrated into the HAMi deployment, you need to set 'devices.ascend.enabled=true'. The device-plugin is automaticaly deployed. For more details ,see 'devices' section in values.yaml.
Due to dependencies with HAMi, you need to set 'devices.ascend.enabled=true' during HAMi installation. For more details ,see 'devices' section in values.yaml.

```yaml
devices:
Expand All @@ -45,9 +45,19 @@ devices:
- huawei.com/Ascend310P-memory
```
Note that resources here(hawei.com/Ascend910A,huawei.com/Ascend910B,...) is managed in hami-scheduler-device configMap. It defines three different templates(910A,910B,310P).
Deploy ascend-device-plugin by running
```bash
kubectl apply -f ascend-device-plugin.yaml
```


## Usage

You can allocate a slice of NPU by specifying both resource number and resource memory. For more examples, see [examples](./examples/)

```yaml
...
containers:
Expand All @@ -56,6 +66,6 @@ devices:
resources:
limits:
huawei.com/Ascend910B: "1"
# 不填写显存默认使用整张卡
# if you don't specify Asend910B-memory, it will use a whole NPU.
huawei.com/Ascend910B-memory: "4096"
```
7 changes: 4 additions & 3 deletions ascend-device-plugin.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
Expand All @@ -9,7 +10,7 @@ rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "patch"]
----
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -22,15 +23,15 @@ roleRef:
kind: ClusterRole
name: hami-ascend
apiGroup: rbac.authorization.k8s.io
----
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: hami-ascend
namespace: kube-system
labels:
app.kubernetes.io/component: "hami-ascend"
----
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
Expand Down
13 changes: 13 additions & 0 deletions examples/ascendjob-310p.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Pod
metadata:
name: ascend310p-job
spec:
containers:
- name: ubuntu-container
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
huawei.com/Ascend310P: 1 # requesting 1 NPU
huawei.com/Ascend310P-memory: 2000 # requesting 2000m device m
13 changes: 13 additions & 0 deletions examples/ascendjob-910b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Pod
metadata:
name: ascend910b-job
spec:
containers:
- name: ubuntu-container
image: ascendhub.huawei.com/public-ascendhub/ascend-mindspore:23.0.RC3-centos7
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
huawei.com/Ascend910B: 1 # requesting 1 NPU
huawei.com/Ascend910B-memory: 2000 # requesting 2000m device memory

0 comments on commit d7ccf67

Please sign in to comment.