Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When I'm doing distributed inference using vllm and minference, the example starts with an error when I set tensor_parallel_size to a value greater than 1 #63

Closed
zh2333 opened this issue Aug 4, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@zh2333
Copy link

zh2333 commented Aug 4, 2024

Describe the bug

(VllmWorkerProcess pid=13977) Process VllmWorkerProcess:
(VllmWorkerProcess pid=13977) Traceback (most recent call last):
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
(VllmWorkerProcess pid=13977) self.run()
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=13977) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 211, in _run_worker_process
(VllmWorkerProcess pid=13977) worker = worker_factory()
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=13977) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker_base.py", line 311, in init_worker
(VllmWorkerProcess pid=13977) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker.py", line 87, in init
(VllmWorkerProcess pid=13977) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 196, in init
(VllmWorkerProcess pid=13977) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 51, in get_attn_backend
(VllmWorkerProcess pid=13977) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 158, in which_attn_to_use
(VllmWorkerProcess pid=13977) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 430, in get_device_capability
(VllmWorkerProcess pid=13977) prop = get_device_properties(device)
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 444, in get_device_properties
(VllmWorkerProcess pid=13977) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=13977) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 279, in _lazy_init
(VllmWorkerProcess pid=13977) raise RuntimeError(
(VllmWorkerProcess pid=13977) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=13978) Process VllmWorkerProcess:
(VllmWorkerProcess pid=13978) Traceback (most recent call last):
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
(VllmWorkerProcess pid=13978) self.run()
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=13978) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 211, in _run_worker_process
(VllmWorkerProcess pid=13978) worker = worker_factory()
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=13978) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker_base.py", line 311, in init_worker
(VllmWorkerProcess pid=13978) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker.py", line 87, in init
(VllmWorkerProcess pid=13978) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 196, in init
(VllmWorkerProcess pid=13978) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 51, in get_attn_backend
(VllmWorkerProcess pid=13978) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 158, in which_attn_to_use
(VllmWorkerProcess pid=13978) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 430, in get_device_capability
(VllmWorkerProcess pid=13978) prop = get_device_properties(device)
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 444, in get_device_properties
(VllmWorkerProcess pid=13978) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=13978) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 279, in _lazy_init
(VllmWorkerProcess pid=13978) raise RuntimeError(
(VllmWorkerProcess pid=13978) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=13979) Process VllmWorkerProcess:
(VllmWorkerProcess pid=13979) Traceback (most recent call last):
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
(VllmWorkerProcess pid=13979) self.run()
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=13979) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 211, in _run_worker_process
(VllmWorkerProcess pid=13979) worker = worker_factory()
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=13979) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker_base.py", line 311, in init_worker
(VllmWorkerProcess pid=13979) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker.py", line 87, in init
(VllmWorkerProcess pid=13979) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 196, in init
(VllmWorkerProcess pid=13979) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 51, in get_attn_backend
(VllmWorkerProcess pid=13979) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/vllm/attention/selector.py", line 158, in which_attn_to_use
(VllmWorkerProcess pid=13979) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 430, in get_device_capability
(VllmWorkerProcess pid=13979) prop = get_device_properties(device)
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 444, in get_device_properties
(VllmWorkerProcess pid=13979) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=13979) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py", line 279, in _lazy_init
(VllmWorkerProcess pid=13979) raise RuntimeError(
(VllmWorkerProcess pid=13979) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 170, in _enqueue_task
self._task_queue.put((task_id, method, args, kwargs))
File "/opt/conda/lib/python3.8/multiprocessing/queues.py", line 82, in put
raise ValueError(f"Queue {self!r} is closed")
ValueError: Queue <multiprocessing.queues.Queue object at 0x7feceed0fbb0> is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/ossfs/workspace/inter_vllm_long_length.py", line 94, in
test_model(path)
File "/ossfs/workspace/inter_vllm_long_length.py", line 64, in test_model
llm = LLM(model=model_path, **kwargs_launcher)
File "/opt/conda/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 144, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/opt/conda/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 409, in from_engine_args
engine = cls(
File "/opt/conda/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 242, in init
self.model_executor = executor_class(
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in init
super().init(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/executor_base.py", line 41, in init
self._init_executor()
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_gpu_executor.py", line 70, in _init_executor
self._run_workers("init_device")
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_gpu_executor.py", line 112, in _run_workers
worker_outputs = [
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_gpu_executor.py", line 113, in
worker.execute_method(method, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 177, in execute_method
self._enqueue_task(future, method, args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/vllm/executor/multiproc_worker_utils.py", line 173, in _enqueue_task
raise ChildProcessError("worker died") from e
ChildProcessError: worker died

Steps to reproduce

No response

Expected Behavior

No response

Logs

No response

Additional Information

No response

@iofu728
Copy link
Contributor

iofu728 commented Aug 5, 2024

Close due to duplicate content with #62.

@iofu728 iofu728 closed this as completed Aug 5, 2024
@iofu728 iofu728 self-assigned this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants