Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Ascend910b安装mmcv后训练报错 #3203

Open
2 tasks done
BoomSky0416 opened this issue Nov 25, 2024 · 1 comment
Open
2 tasks done

[Bug] Ascend910b安装mmcv后训练报错 #3203

BoomSky0416 opened this issue Nov 25, 2024 · 1 comment

Comments

@BoomSky0416
Copy link

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.7.5 (default, Mar 20 2023, 04:32:29) [GCC 7.5.0]'), ('CUDA available', False), ('numpy_random_seed', 2147483648), ('GCC', 'gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0'), ('PyTorch', '1.8.0a0+56b43f4'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: NO AVX\n - Build settings: BLAS_INFO=generic, BUILD_TYPE=Release, CXX_COMPILER=/opt/buildtools/gcc-7.3.0/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, LAPACK_INFO=generic, TORCH_VERSION=1.8.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, \n'), ('TorchVision', '0.9.1'), ('OpenCV', '4.10.0'), ('MMEngine', '0.7.3'), ('MMCV', '2.0.1'), ('MMCV Compiler', 'GCC 7.5'), ('MMCV CUDA Compiler', 'not available')])


absl-py 2.1.0
addict 2.4.0
albumentations 1.3.1
apex 0.1+ascend
attrs 22.2.0
auto-tune 0.1.0
cachetools 5.5.0
certifi 2022.12.7
cffi 1.12.3
charset-normalizer 3.1.0
chumpy 0.70
click 8.1.7
cycler 0.11.0
Cython 3.0.11
decorator 5.1.1
DLLogger 1.0.0
easydict 1.9
einops 0.6.1
exceptiongroup 1.1.1
fonttools 4.38.0
google-auth 2.36.0
google-auth-oauthlib 0.4.6
grpcio 1.51.3
grpcio-tools 1.51.3
hccl 0.1.0
idna 3.4
imageio 2.31.2
imgaug 0.4.0
importlib-metadata 6.0.0
iniconfig 2.0.0
joblib 1.2.0
json-tricks 3.17.3
kiwisolver 1.4.4
lmdb 1.5.1
lxml 4.5.2
Markdown 3.4.4
markdown-it-py 2.2.0
MarkupSafe 2.1.5
mat4py 0.6.0
matplotlib 3.5.3
mdurl 0.1.2
mmcv 2.0.1
mmdet 3.1.0 /workspace/open-mmlab-2.0/mmdetection
mmengine 0.7.3 /workspace/open-mmlab-2.0/mmengine
mmocr 1.0.1 /workspace/open-mmlab-2.0/mmocr
mmpose 1.2.0 /workspace/open-mmlab-2.0/mmpose
mmpretrain 1.0.0rc8 /workspace/open-mmlab-2.0/mmpretrain
mmrazor 1.0.0 /workspace/open-mmlab-2.0/mmrazor
mmsegmentation 1.1.0 /workspace/open-mmlab-2.0/mmsegmentation
model-index 0.1.11
modelindex 0.0.2
mpmath 1.3.0
munkres 1.1.4
networkx 2.6.3
numexpr 2.8.4
numpy 1.21.6
oauthlib 3.2.2
opc-tool 0.1.0
opencv-python 4.10.0.84
ordered-set 4.1.0
packaging 23.0
pandas 1.3.5
pathlib2 2.3.7.post1
Pillow 9.1.0
pip 23.0.1
pluggy 1.0.0
prettytable 3.7.0
protobuf 3.20.3
pyasn1 0.5.1
pyasn1-modules 0.3.0
pyclipper 1.3.0.post6
pycocotools 2.0.6
pycparser 2.21
Pygments 2.17.2
pyparsing 3.0.9
pytest 7.2.2
python-dateutil 2.8.2
pytz 2022.7.1
PyWavelets 1.3.0
PyYAML 6.0
qudida 0.0.4
rapidfuzz 3.4.0
requests 2.28.2
requests-oauthlib 2.0.0
rich 13.8.1
rsa 4.9
schedule-search 0.0.1
scikit-image 0.19.3
scikit-learn 1.0.2
scipy 1.7.3
setuptools 41.2.0
shapely 2.0.6
six 1.16.0
sklearn 0.0
sympy 1.4
tables 3.6.1
te 0.4.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
termcolor 2.3.0
terminaltables 3.1.10
threadpoolctl 3.1.0
tifffile 2021.11.2
tomli 2.0.1
topi 0.4.0
torch 1.8.0a0+56b43f4
torch-npu 1.8.1
torchvision 0.9.1
tqdm 4.67.0
typing_extensions 4.5.0
urllib3 1.26.15
wcwidth 0.2.13
Werkzeug 2.2.3
wheel 0.40.0
xdoctest 1.1.0
xtcocotools 1.14.3
yapf 0.32.0
zipp 3.15.0

Reproduces the problem - code sample

use mmengine runner

Reproduces the problem - command or script

use mmengine runner

Reproduces the problem - error message

Traceback (most recent call last):
File "tools/caip_train.py", line 725, in
main()
File "tools/caip_train.py", line 721, in main
runner.train()
File "/workspace/open-mmlab-2.0/mmengine/mmengine/runner/runner.py", line 1707, in train
self._init_model_weights()
File "/workspace/open-mmlab-2.0/mmengine/mmengine/runner/runner.py", line 899, in _init_model_weights
model.init_weights()
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/base_module.py", line 130, in init_weights
m.init_weights()
File "/workspace/open-mmlab-2.0/mmpretrain/mmpretrain/models/backbones/resnet.py", line 638, in init_weights
super(ResNet, self).init_weights()
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/base_module.py", line 124, in init_weights
initialize(self, other_cfgs)
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/weight_init.py", line 610, in initialize
initialize(module, cp_cfg)
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/weight_init.py", line 518, in initialize
func(module)
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/weight_init.py", line 437, in call
module.apply(init)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
module.apply(fn)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 474, in apply
fn(self)
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/weight_init.py", line 435, in init
self.bias, self.distribution)
File "/workspace/open-mmlab-2.0/mmengine/mmengine/model/weight_init.py", line 104, in kaiming_init
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/init.py", line 413, in kaiming_normal

return tensor.normal
(0, std)
RuntimeError: Run:/usr1/workspace/FPTA_Daily_Plugin_open_date/Plugin/torch_npu/csrc/framework/OpParamMaker.cpp:128 NPU error,NPU error code is:100000
EZ9999: Inner Error, Please contact support engineer!
EZ9999 Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task.cc][LINE:1068]
TraceBack (most recent call last):
Aicpu kernel execute failed, device_id=0, stream_id=3, task_id=0.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]
AICPU Kernel task happen error, retCode=0x2a.[FUNC:GetError][FILE:stream.cc][LINE:1044]
Aicpu kernel execute failed, device_id=0, stream_id=3, task_id=0, flip_num=0, fault so_name=, fault kernel_name=, fault op_name=, extend_info=.[FUNC:GetError][FILE:stream.cc][LINE:1044]
rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]
Call rtStreamSynchronize(stream) fail, ret: 0x7BC8A[FUNC:KernelLaunchEx][FILE:model_manager.cc][LINE:145]
Failed to execute init graph[FUNC:Load][FILE:model_v2_executor.cc][LINE:119]
Assert ((executor->Load(arg)) == ge::SUCCESS) failed[FUNC:CreateAndLoad][FILE:stream_executor.cc][LINE:38]
Aicpu kernel execute failed, device_id=0, stream_id=4, task_id=0.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]
Aicpu kernel execute failed, device_id=0, stream_id=4, task_id=0, flip_num=0, fault so_name=, fault kernel_name=, fault op_name=, extend_info=.[FUNC:GetError][FILE:stream.cc][LINE:1044]
Aicpu kernel execute failed, device_id=0, stream_id=5, task_id=0.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]
Aicpu kernel execute failed, device_id=0, stream_id=5, task_id=0, flip_num=0, fault so_name=, fault kernel_name=, fault op_name=, extend_info=.[FUNC:GetError][FILE:stream.cc][LINE:1044]

THPModule_npu_shutdown success.

Additional information

麻烦提供一下ascend torch_npu版本和mmcv版本兼容的介绍

@BoomSky0416
Copy link
Author

@momo609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant