Skip to content

Commit

Permalink
Merge pull request #1 from Project-HAMi/update
Browse files Browse the repository at this point in the history
Add README.md and corresponding yaml file
  • Loading branch information
archlitchi authored Sep 14, 2024
2 parents aa1f98a + ced24cb commit b6df990
Show file tree
Hide file tree
Showing 3 changed files with 184 additions and 10 deletions.
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
# Ascend Device Plugin

## 说明
## Introduction

基于[HAMi](https://github.com/Project-HAMi/HAMi)调度机制的ascend device plugin。
This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) scheduling.

支持基于显存调度,显存是基于昇腾的虚拟化模板来切分的,会找到满足显存需求的最小模板来作为容器的显存。
Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [templeate](./config.yaml)

启动容器依赖[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)
## Prequisites

## 编译
[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)

### 编译二进制文件
## Compile

```bash
make all
```

### 编译镜像
### Build

```bash
docker buildx build -t $IMAGE_NAME .
```

## 部署
## Deployment

由于和HAMi的一些依赖关系,部署集成在HAMi的部署中,修改HAMi chart values中的以下部分即可。
Due to dependencies with HAMi, the deployment is integrated into the HAMi deployment, you need to set 'devices.ascend.enabled=true'. The device-plugin is automaticaly deployed. For more details ,see 'devices' section in values.yaml.

```yaml
devices:
Expand All @@ -45,7 +45,8 @@ devices:
- huawei.com/Ascend310P-memory
```
## 使用
## Usage
```yaml
...
Expand Down
60 changes: 60 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Ascend Device Plugin

## 说明

基于[HAMi](https://github.com/Project-HAMi/HAMi)调度机制的ascend device plugin。

支持基于显存调度,显存是基于昇腾的虚拟化模板来切分的,会找到满足显存需求的最小模板来作为容器的显存。

启动容器依赖[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)

## 编译

### 编译二进制文件

```bash
make all
```

### 编译镜像

```bash
docker buildx build -t $IMAGE_NAME .
```

## 部署

由于和HAMi的一些依赖关系,部署集成在HAMi的部署中,修改HAMi chart values中的以下部分即可。

```yaml
devices:
ascend:
enabled: true
image: "ascend-device-plugin:master"
imagePullPolicy: IfNotPresent
extraArgs: []
nodeSelector:
ascend: "on"
tolerations: []
resources:
- huawei.com/Ascend910A
- huawei.com/Ascend910A-memory
- huawei.com/Ascend910B
- huawei.com/Ascend910B-memory
- huawei.com/Ascend310P
- huawei.com/Ascend310P-memory
```
## 使用
```yaml
...
containers:
- name: npu_pod
...
resources:
limits:
huawei.com/Ascend910B: "1"
# 不填写显存默认使用整张卡
huawei.com/Ascend910B-memory: "4096"
```
113 changes: 113 additions & 0 deletions ascend-device-plugin.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: hami-ascend
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "update", "watch", "patch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "patch"]
----
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: hami-ascend
subjects:
- kind: ServiceAccount
name: hami-ascend
namespace: kube-system
roleRef:
kind: ClusterRole
name: hami-ascend
apiGroup: rbac.authorization.k8s.io
----
apiVersion: v1
kind: ServiceAccount
metadata:
name: hami-ascend
namespace: kube-system
labels:
app.kubernetes.io/component: "hami-ascend"
----
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: hami-ascend-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/component: hami-ascend-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/component: hami-ascend-device-plugin
hami.io/webhook: ignore
template:
metadata:
labels:
app.kubernetes.io/component: hami-ascend-device-plugin
hami.io/webhook: ignore
spec:
priorityClassName: "system-node-critical"
serviceAccountName: hami-ascend
containers:
- image: projecthami/ascend-device-plugin:main
imagePullPolicy: IfNotPresent
name: device-plugin
resources:
requests:
memory: 500Mi
cpu: 500m
limits:
memory: 500Mi
cpu: 500m
args:
- --config_file
- /ascend-config.yaml
securityContext:
privileged: true
readOnlyRootFilesystem: false
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: pod-resource
mountPath: /var/lib/kubelet/pod-resources
- name: hiai-driver
mountPath: /usr/local/Ascend/driver
readOnly: true
- name: log-path
mountPath: /var/log/mindx-dl/devicePlugin
- name: tmp
mountPath: /tmp
- name: device-config
mountPath: /ascend-config.yaml
subPath: ascend-config.yaml
readOnly: true
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: pod-resource
hostPath:
path: /var/lib/kubelet/pod-resources
- name: hiai-driver
hostPath:
path: /usr/local/Ascend/driver
- name: log-path
hostPath:
path: /var/log/mindx-dl/devicePlugin
type: Directory
- name: tmp
hostPath:
path: /tmp
- name: device-config
configMap:
name: hami-scheduler-device
nodeSelector:
ascend: "on"

0 comments on commit b6df990

Please sign in to comment.