Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker of mmdetection3 not support CUDA #691

Open
xiaoyao9184 opened this issue Jan 8, 2025 · 0 comments
Open

docker of mmdetection3 not support CUDA #691

xiaoyao9184 opened this issue Jan 8, 2025 · 0 comments

Comments

@xiaoyao9184
Copy link

docker image heartexlabs/label-studio-ml-backend:mmdetection3-master encountered an issue.

RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

same as
#560
#79

Overriding the command to remove the --preload parameter will allow it to run normally.

services:
  mmdetection3:
    image: heartexlabs/label-studio-ml-backend:mmdetection3-master
    container_name: mmdetection3
    # https://github.com/HumanSignal/label-studio-ml-backend/issues/79
    command: gunicorn --bind :9090 --workers 1 --threads 8 --timeout 0 _wsgi:app
    env_file: external_env-mmdetection3.env
    ports:
      - '9090'
    networks:
      - labelstudio
    volumes:
      - data:/data

after that

[2025-01-08 06:42:00,804] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/label_studio_ml/api.py", line 69, in _predict
    response = model.predict(tasks, context=context, **params)
  File "/app/mmdetection.py", line 160, in predict
    prediction = self.predict_one_task(task)
  File "/app/mmdetection.py", line 167, in predict_one_task
    model_results = inference_detector(model, image_path).pred_instances
  File "/opt/conda/lib/python3.9/site-packages/mmdet/apis/inference.py", line 189, in inference_detector
    results = model.test_step(data_)[0]
  File "/opt/conda/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
    results = self(**data, mode=mode)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/detectors/base.py", line 94, in forward
    return self.predict(inputs, data_samples)
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/detectors/single_stage.py", line 110, in predict
    results_list = self.bbox_head.predict(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
    predictions = self.predict_by_feat(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/yolo_head.py", line 280, in predict_by_feat
    results = self._bbox_post_process(
  File "/opt/conda/lib/python3.9/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 485, in _bbox_post_process
    det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 303, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/opt/conda/lib/python3.9/site-packages/mmengine/utils/misc.py", line 395, in new_func
    output = old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 127, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
  File "/opt/conda/lib/python3.9/site-packages/mmcv/ops/nms.py", line 27, in forward
    inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.

Same as open-mmlab/mmdetection#6765

Reinstalling mmcv in the container and restarting it resolves the issue.

root@f0ea7e29bc01:/app# mim uninstall mmcv
Found existing installation: mmcv 2.1.0
Uninstalling mmcv-2.1.0:
  Would remove:
    /opt/conda/lib/python3.9/site-packages/mmcv-2.1.0.dist-info/*
    /opt/conda/lib/python3.9/site-packages/mmcv/*
Proceed (Y/n)? y
  Successfully uninstalled mmcv-2.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
root@f0ea7e29bc01:/app# mim install mmcv==2.1.0 
Looking in links: https://download.openmmlab.com/mmcv/dist/cu116/torch1.13.0/index.html
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x78bf017571f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /mmcv/dist/cu116/torch1.13.0/index.html
Collecting mmcv==2.1.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu116/torch1.13.0/mmcv-2.1.0-cp39-cp39-manylinux1_x86_64.whl (97.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.6/97.6 MB 427.3 kB/s eta 0:00:00
Requirement already satisfied: addict in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (2.4.0)
Requirement already satisfied: mmengine>=0.3.0 in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (0.10.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (1.26.4)
Requirement already satisfied: packaging in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (24.2)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (10.4.0)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (6.0.2)
Requirement already satisfied: yapf in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (0.43.0)
Requirement already satisfied: opencv-python>=3 in /opt/conda/lib/python3.9/site-packages (from mmcv==2.1.0) (4.10.0.84)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (3.9.4)
Requirement already satisfied: rich in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (13.4.2)
Requirement already satisfied: termcolor in /opt/conda/lib/python3.9/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (2.5.0)
Requirement already satisfied: platformdirs>=3.5.1 in /opt/conda/lib/python3.9/site-packages (from yapf->mmcv==2.1.0) (4.3.6)
Requirement already satisfied: tomli>=2.0.1 in /opt/conda/lib/python3.9/site-packages (from yapf->mmcv==2.1.0) (2.2.1)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.4.7)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (2.9.0.post0)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (6.4.5)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.9/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.9/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (2.18.0)
Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (3.21.0)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.3.0->mmcv==2.1.0) (0.1.2)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.16.0)
Installing collected packages: mmcv
Successfully installed mmcv-2.1.0

Referring to open-mmlab/mmdetection#6765 (comment) can resolve the issue.

change dockerfile like that

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PORT=${PORT:-9090} \
    PIP_CACHE_DIR=/.cache \
    WORKERS=1 \
    THREADS=8 \
    CUDA_HOME=/usr/local/cuda
ENV PATH="${CUDA_HOME}/bin:${PATH}"
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant