Skip to content

paddle 在昇腾Atlas 300I Duo 通过 Docker 安装后,运行报错 无法使用 NPU #71882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lezhizhe opened this issue Mar 25, 2025 · 1 comment
Assignees

Comments

@lezhizhe
Copy link

lezhizhe commented Mar 25, 2025

安装Paddle Docker镜像后,执行命令报错:

python -c "import paddle; paddle.utils.run_check()"

I0325 10:26:09.328598   449 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
I0325 10:26:09.328639   449 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
I0325 10:26:09.949532   449 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I0325 10:26:09.954984   449 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I0325 10:26:09.955219   449 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
I0325 10:26:09.955257   449 init.cc:242] CustomDevice: npu, visible devices count: 8
Running verify PaddlePaddle program ... 
I0325 10:26:10.864704   449 pir_interpreter.cc:1480] New Executor is Running ...
I0325 10:26:10.867235   449 pir_interpreter.cc:1506] pir interpreter is running by multi-thread mode ...
W0325 10:26:12.564280   639 pir_interpreter.cc:1980] Instruction OP id: 7, Ir OP id: 66, pd_op.sum_grad raises an EnforceNotMet exception common::enforce::EnforceNotMet
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 273, in run_check
    _run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
  File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 153, in _run_static_single
    exe.run(
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/executor.py", line 1896, in run
    res = self._run_pir_impl(
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/executor.py", line 2242, in _run_pir_impl
    ret = new_exe.run(list(feed.keys()), return_numpy)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/executor.py", line 845, in run
    tensors = self._new_exe.run(
OSError: In user code:


    ExternalError:  ACL error, the error code is : 500001.  (at /paddle/backends/npu/kernels/funcs/npu_op_runner.cc:445)
      [operator < pd_kernel.phi_kernel > error]

docker-compose.yaml 文件如下:

version: '3' 

services:
  paddle-npu-dev:
    container_name: paddle-npu  # 指定容器名称
    #image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84  # 基础镜像(需替换实际架构)
    build: 
        ./
    stdin_open: true  # 保持 STDIN 打开(对应 -i 参数)
    tty: true         # 分配伪终端(对应 -t 参数)
    privileged: true  # 特权模式(访问所有设备)
    network_mode: host  # 使用宿主机网络
    working_dir: /work  # 容器默认工作目录
    shm_size: 128G     # 共享内存大小
    environment:
      - ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  # 指定可见 NPU 设备
    volumes:
      # 挂载当前目录到容器 /work(注意替换为实际路径)
      - /www/paddlespeech/work:/work
      # 挂载 NPU 驱动相关文件
      - /usr/local/Ascend/driver:/usr/local/Ascend/driver
      - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
      - /usr/local/dcmi:/usr/local/dcmi
    command: /bin/bash  # 容器启动命令

Dockerfile 文件如下:

# 使用指定的基础镜像
FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84

# 设置工作目录
WORKDIR /work

# 配置pip使用阿里云镜像源
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple

RUN python -m pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ && \
    python -m pip install paddle-custom-npu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/npu/

# 设置工作目录
WORKDIR /work

服务器信息:

服务器配置:
CPU:2Kunpeng 920-5250
GPU:4
300I Duo
内存:32*32GB

操作系统:BCLinux-for-Euler-21.10-dvd-aarch64-230324

其他补充信息 Additional Supplementary Information

No response

@z-one
Copy link

z-one commented Apr 8, 2025

是不是意味着Atlas 300I Duo 不支持 paddle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants