想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理，目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢，怎么回事，求指导？ #8606

AllenMeng2009 · 2024-06-14T08:09:13Z

请提出你的问题

如题

AllenMeng2009 · 2024-06-14T08:14:11Z

npu k8s部署文件如下：
apiVersion: v1
kind: Pod
metadata:
name: hwei-ocr # 此处为示例，正在运行的任务都不允许重名
namespace: hwei
spec:
schedulerName: volcano
nodeSelector:
accelerator/huawei-npu: ascend-1980
containers:
- name: train
image: IMAGE
imagePullPolicy: IfNotPresent
command: # 此处的command 和args为示例，依据实际任务的启动命令为准
- python
args:
- "medical_report_ocr.py"
#- "MutSmiXAttention"
#- "--task_name"
#- "xa-no-freeze-lr1e-04"
#- "--model_name"
#- "xa-no-freeze-lr1e-04"
env: # 此处可自行增加任务内的env
- name: NCCL_ASYNC_ERROR_HANDLING
value: '1'
volumeMounts:
- mountPath: /dev/shm
name: cache-volume
- mountPath: /hwei-data # 在pod内用此目录访问，也可修改
name: data-volume
- name: ascend-driver #驱动挂载，保持不动
mountPath: /usr/local/Ascend/driver
- name: ascend-add-ons #驱动挂载，保持不动
mountPath: /usr/local/Ascend/add-ons
- name: hccn #驱动hccn配置，保持不动
mountPath: /etc/hccn.conf
- name: npu-smi #npu-smi
mountPath: /usr/local/bin/npu-smi
- name: localtime #The container time must be the same as the host time.
mountPath: /etc/localtime
resources: # 任务使用资源设置
limits: # 资源使用上限
cpu: '4' # 可调整
memory: 32G # 可调整
huawei.com/ascend-1980: '1' # npu 卡数
requests: # 资源申请量
cpu: '2' # 可调整
memory: 16G # 可调整
huawei.com/ascend-1980: '1' # npu 卡数
restartPolicy: Never
volumes:
- name: cache-volume
emptyDir:
medium: Memory
sizeLimit: 3000Mi
- name: data-volume
persistentVolumeClaim:
claimName: pvc-obs-hwei
- name: ascend-driver
hostPath:
path: /usr/local/Ascend/driver
- name: ascend-add-ons
hostPath:
path: /usr/local/Ascend/add-ons
- name: hccn
hostPath:
path: /etc/hccn.conf
- name: npu-smi
hostPath:
path: /usr/local/bin/npu-smi
- name: localtime
hostPath:
path: /etc/localtime

AllenMeng2009 · 2024-06-14T08:16:00Z

dockerfile如下：
FROM swr.cn-east-3.myhuaweicloud.com/atelier/pytorch_2_1_ascend:pytorch_2.1.0-cann_7.0.1.1-py_3.9-euler_2.10.7-aarch64-snt9b-20240411153110-ca68771
#FROM registry.baidubce.com/device/paddle-npu:cann80T2-910B-ubuntu18-aarch64
#FROM python:3.9.10
#pip install --disable-pip-version-check --no-cache-dir -i https://mirrors.aliyun.com/pypi/#simple paddlepaddle==2.5.2; \

RUN pip install --disable-pip-version-check --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple paddlepaddle==2.5.2;
pip install --disable-pip-version-check --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple paddlenlp==2.6.1;
pip install --disable-pip-version-check --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple paddleocr==2.6.1.3

复制代码到工作目录

COPY . /usr/src/app/

WORKDIR /usr/src/app

设置容器启动时执行的命令

CMD ["python", "/usr/src/app/medical_report_ocr.py"]

AllenMeng2009 · 2024-06-14T08:21:29Z

(base) PS C:\Users\12133> kubectl get pod -n hwei
NAME READY STATUS RESTARTS AGE
hwei-ocr 0/1 Completed 0 3h26m

(base) PS C:\Users\12133> kubectl describe pod hwei-ocr -n hwei
Name: hwei-ocr
Namespace: hwei
Priority: 0
Service Account: default
Node: 192.168.3.243/192.168.3.243
Start Time: Fri, 14 Jun 2024 12:50:06 +0800
Labels:
Annotations: cce.kubectl.kubernetes.io/ascend-1980-configuration:
{"pod_name":"hwei-ocr","server_id":"192.168.3.243","devices":[{"device_id":"1","device_ip":"29.61.179.253"}]}
kubernetes.io/psp: psp-global
scheduling.cce.io/gpu-topology-placement: huawei.com/ascend-1980=0x02
scheduling.k8s.io/group-name: podgroup-d3573a51-b104-4463-bc4d-ff5a5c50abaa
Status: Succeeded
IP: 10.0.2.118
IPs:
IP: 10.0.2.118
Containers:
train:
Container ID: docker://5d85e5d1c4d86d137a174532c74dc626c76fd2b7458ec5d98e93770429420c85
Image: swr.cn-east-3.myhuaweicloud.com/hwei/hwei-ocr-recognition:2297af78
Image ID: docker-pullable://swr.cn-east-3.myhuaweicloud.com/hwei/hwei-ocr-recognition@sha256:d755d246df4dd4e0c3bc20e96c52098d7c897e11b8960284d24d040cdbe7ac11
Port:
Host Port:
Command:
python
Args:
medical_report_ocr.py
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 14 Jun 2024 12:50:21 +0800
Finished: Fri, 14 Jun 2024 14:29:52 +0800
Ready: False
Restart Count: 0
Limits:
cpu: 4
huawei.com/ascend-1980: 1
memory: 32G
Requests:
cpu: 2
huawei.com/ascend-1980: 1
memory: 16G
Environment:
NCCL_ASYNC_ERROR_HANDLING: 1
Mounts:
/dev/shm from cache-volume (rw)
/etc/hccn.conf from hccn (rw)
/etc/localtime from localtime (rw)
/hwei-data from data-volume (rw)
/usr/local/Ascend/add-ons from ascend-add-ons (rw)
/usr/local/Ascend/driver from ascend-driver (rw)
/usr/local/bin/npu-smi from npu-smi (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94w29 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cache-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 3000Mi
data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pvc-obs-hwei
ReadOnly: false
ascend-driver:
Type: HostPath (bare host directory volume)
Path: /usr/local/Ascend/driver
HostPathType:
ascend-add-ons:
Type: HostPath (bare host directory volume)
Path: /usr/local/Ascend/add-ons
HostPathType:
hccn:
Type: HostPath (bare host directory volume)
Path: /etc/hccn.conf
HostPathType:
npu-smi:
Type: HostPath (bare host directory volume)
Path: /usr/local/bin/npu-smi
HostPathType:
localtime:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
HostPathType:
kube-api-access-94w29:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: accelerator/huawei-npu=ascend-1980
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:

(base) PS C:\Users\12133> kubectl logs -f hwei-ocr -n hwei
/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2024-06-14 12:50:30,091] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base_v1.1/model_state.pdparams
2.6.1.post
2.6.1.3
schema ['样本号', '姓名', '性别', '年龄', '就诊卡号', '住院号', '样本类型', '科室', '病区', '床号', '执行科室', '凝血酶原时间', 'PT国际化标准化比值', '活化部分凝血活酶时间', '纤维蛋白原', '凝血酶时间', 'D-二聚体', '申请医师', '检验者', '审核者', '采集时间', '接收时间', '报告时间', '检验门诊信息']
检测 7.152557373046875e-07
100%|██████████| 1.04G/1.04G [01:28<00:00, 12.6MB/s]
[2024-06-14 12:52:02,613] [ INFO] - Downloading config.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/config.json
100%|██████████| 451/451 [00:00<00:00, 1.05MB/s]
[2024-06-14 12:52:02,832] [ INFO] - Downloading vocab.txt from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/vocab.txt
100%|██████████| 2.70M/2.70M [00:01<00:00, 2.14MB/s]
[2024-06-14 12:52:04,386] [ INFO] - Downloading special_tokens_map.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/special_tokens_map.json
100%|██████████| 112/112 [00:00<00:00, 365kB/s]
[2024-06-14 12:52:04,587] [ INFO] - Downloading tokenizer_config.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/tokenizer_config.json
100%|██████████| 195/195 [00:00<00:00, 517kB/s]
[2024-06-14 12:52:04,788] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/sentencepiece.bpe.model
100%|██████████| 4.83M/4.83M [00:00<00:00, 16.6MB/s]
[2024-06-14 12:52:05,340] [ INFO] - Loading configuration file /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/config.json
[2024-06-14 12:52:05,341] [ INFO] - Loading weights file /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/model_state.pdparams
[2024-06-14 12:52:06,845] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-06-14 12:52:16,925] [ INFO] - All model checkpoint weights were used when initializing UIEM.

[2024-06-14 12:52:16,925] [ INFO] - All the weights of UIEM were initialized from the model checkpoint at /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEM for predictions without further training.
[2024-06-14 12:52:16,940] [ INFO] - Converting to the inference model cost a little time.
I0614 12:52:21.171234 1 interpretercore.cc:237] New Executor is Running.
[2024-06-14 12:52:28,424] [ INFO] - The inference model save in the path:/home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/static/inference
E0614 12:52:28.425216 1 analysis_config.cc:630] Please compile with MKLDNN first to use MKLDNN
[2024-06-14 12:52:29,551] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer'> to load '/home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base'.
加载模型 120.71543025970459
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /home/ma-user/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar
100%|██████████| 3.83M/3.83M [00:00<00:00, 13.9MiB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /home/ma-user/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar
100%|██████████| 11.9M/11.9M [00:00<00:00, 26.1MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/ma-user/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|██████████| 2.19M/2.19M [00:00<00:00, 9.26MiB/s]
信息提取 5840.636544704437
{"样本号": "10", "姓名": "邹福珍", "性别": "女", "年龄": "75岁", "就诊卡号": "0003570020", "住院号": "0969150", "科室": "老年内分泌科", "病区": "老年内分泌科", "床号": "25", "凝血酶原时间": "11. 10秒", "PT国际化标准化比值": "0.963", "活化部分凝血活酶时间": "21.20」秒", "D-二聚体": "2.99t", "申请医师": "狄文娟", "检验者": "狄文娟", "审核者": "第1页/共Y", "采集时间": "2024-04-1809:49:04", "接收时间": "2024-04-1812:21:29", "报告时间": "2024-04-18", "检验门诊信息": "每周三上午"}
5.984306335449219e-05

推理也正常进行，但是极其慢，一张医院检测报告推理用了（信息提取 5840.636544704437s），请问这个怎么回事？是paddlepaddle,paddlenip,paddleocr的版本不对吗？还是其他配置有问题，请指导，谢谢！

AllenMeng2009 · 2024-06-14T08:31:08Z

难道paddlepaddle,paddlenip,paddleocr这3个要自己build armv8体系架构版本？还是说我推理运行时要指派具体NPU 用python -npu -device app.py等命令？目前我申请的cpu和npu资源如下：
Limits:
cpu: 4
huawei.com/ascend-1980: 1
memory: 32G
Requests:
cpu: 2
huawei.com/ascend-1980: 1
memory: 16G

AllenMeng2009 · 2024-06-14T08:34:23Z

CMD ["python", "/usr/src/app/medical_report_ocr.py"]中medical_report_ocr.py代码如下：

import paddlenlp, paddleocr
import time
import csv
from pprint import pprint
from paddlenlp import Taskflow
#from paddlenlp_ov import Taskflow

print(paddlenlp.version)
print(paddleocr.version)

#start = time.time()
with open('schema.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
column = [row[0] for row in reader]
column = list(filter(None, column))
schema = column[1:len(column)]
print('schema ', schema)
#print('prompt ', time.time() - start)

start = time.time()
print('检测 ', time.time() - start)

start = time.time()
ie = Taskflow("information_extraction", schema=schema, model="uie-m-base", batch_size=1) #, batch_size=512, layout_analysis=True, predictor_type="openvino-inference", precision='fp32'
#pprint(ie({"doc": "./20244.jpg"}))
print('加载模型 ', time.time() - start)
start = time.time()
k = ie({"doc": "./image/202401.jpg"})
#print(k)
print('信息提取 ', time.time() - start)

start1 = time.time()
data = {}
for key, value in k[0].items():
#print(key, value)
data[key] = value[0]['text']
s = str(data).replace("'", '"')
print(s)
print(time.time() - start1)

AllenMeng2009 · 2024-06-16T12:32:45Z

AllenMeng2009 · 2024-06-16T12:33:38Z

@guoshengCS 麻烦帮忙看看，多谢！

daytime25 · 2024-06-28T00:38:10Z

@AllenMeng2009 想问一下，在Taskflow加载了openvino有加速吗？

AllenMeng2009 · 2024-06-29T07:56:21Z

@daytime25 您好！没有用到openvino加速，请问如果用openvino加速，是直接pip install --upgrade --user openvino-dev，还需要paddlenlp_ov.zip包？（此包在哪里下载？），这时下面语句才会生效吧？多谢！
my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best', predictor_type= "openvino-inference")

AllenMeng2009 · 2024-06-29T09:32:26Z

@daytime25 您好！我目前尝试pip install --upgrade --user openvino-dev进行了安装，并且在Taskflow中配置了predictor_type= "openvino-inference"，发现没有效果，难道要下载paddlenlp_ov.zip？然后
from paddlenlp_ov import Taskflow
my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best', predictor_type= "openvino-inference") 这样才能生效吗？请指导，谢了！

daytime25 · 2024-06-29T14:48:50Z

@AllenMeng2009 需要下载paddlenlp_ov.zip，这个文件和原始的paddlenlp不一样代码改了，我这里报错ENABLE_TORCH_CHECKPOINT，就修改了model_utils.py里面的from paddlenlp.utils.env >> from paddlenlp_ov.utils.env，这里使用场景是intel的cpu能实现从耗时30多秒缩减到18秒。现在有个问题是部署成serving报错，看到帖子说是要改输出才行

AllenMeng2009 · 2024-07-01T03:16:52Z

@daytime25,您好！ paddlenlp_ov.zip在哪下载呢？openvino加速能对nvidia GPU有效吗？我后来切换到2张nvidia A800 80G的gpu了，目前加载uie-x-base模型要7s左右，推理一张医学检测报告要5-10s，还是很慢，有没有其他方案提速呢？谢谢

zqszqszqstvt · 2024-08-05T02:47:06Z

你好，我想问一下，你是使用的项目的docker镜像部署吗，那么在昇腾服务器上你安装的驱动和固件等等环境是什么版本，在物理机还是在容器内，都有什么包和版本要求呢

AllenMeng2009 added the question Further information is requested label Jun 14, 2024

paddle-bot bot assigned guoshengCS Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理，目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢，怎么回事，求指导？ #8606

想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理，目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢，怎么回事，求指导？ #8606

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 16, 2024

AllenMeng2009 commented Jun 16, 2024

daytime25 commented Jun 28, 2024

AllenMeng2009 commented Jun 29, 2024

AllenMeng2009 commented Jun 29, 2024

daytime25 commented Jun 29, 2024 •

edited

Loading

AllenMeng2009 commented Jul 1, 2024

zqszqszqstvt commented Aug 5, 2024

想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理，目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢，怎么回事，求指导？ #8606

想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理，目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢，怎么回事，求指导？ #8606

Comments

AllenMeng2009 commented Jun 14, 2024

请提出你的问题

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

复制代码到工作目录

设置容器启动时执行的命令

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 14, 2024

AllenMeng2009 commented Jun 16, 2024

AllenMeng2009 commented Jun 16, 2024

daytime25 commented Jun 28, 2024

AllenMeng2009 commented Jun 29, 2024

AllenMeng2009 commented Jun 29, 2024

daytime25 commented Jun 29, 2024 • edited Loading

AllenMeng2009 commented Jul 1, 2024

zqszqszqstvt commented Aug 5, 2024

daytime25 commented Jun 29, 2024 •

edited

Loading