Does MInference supports CUDA11.8? #56

hensiesp32 · 2024-07-29T06:50:37Z

Describe the issue

I am wandering if the MInference support cuda11.8? Our devices don't support cuda12.3

iofu728 · 2024-07-30T08:14:29Z

Hi @hensiesp32, thanks for your interest in MInference.

It supports CUDA 11.8. We have released the wheel for CUDA 11.8 at this link. If you have any questions, feel free to leave a comment here.

hensiesp32 · 2024-08-01T08:27:43Z

Thanks for your reply. Well, I want to test the needle-in-a-haystack experiment, I only used one A100-80G，however when the contexts length reach to 300k，it occurred an OOM error. Then i open the kv_cache_cpu，but had the error

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

so I want to know how do you test the needle-in-a-haystack with 1M context length? Or can we use multi-gpu to run it?

hensiesp32 · 2024-08-05T06:30:40Z

I run the experiment/benchmarks, but the result showed that MInference can't speed up. I used 4 A100-80G GPUs to get the results，The results is show as belowing:

iofu728 · 2024-08-05T08:23:11Z

Hi @hensiesp32,

For the benchmark test, the results don't seem to make sense, especially with streamingLLM. Did you use vllm for the measurements? Our experiments were conducted on a single A100 using HF or vllm, detail in https://github.com/microsoft/MInference/tree/main/experiments#minference-benchmark-experiments, and I've received some feedback that the corresponding kernel isn't replaced in multi-card setups. Could you test it on a single A100 for now? We will support multi-card mode in the future.
When testing Needle In A Haystack, I used kv_cache_cpu for over 200K. However, this requires enough CPU memory on your machine—around 300GB for 1M.

hensiesp32 added the question Further information is requested label Jul 29, 2024

iofu728 self-assigned this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does MInference supports CUDA11.8? #56

Does MInference supports CUDA11.8? #56

hensiesp32 commented Jul 29, 2024

iofu728 commented Jul 30, 2024

hensiesp32 commented Aug 1, 2024 •

edited

Loading

hensiesp32 commented Aug 5, 2024

iofu728 commented Aug 5, 2024 •

edited

Loading

Does MInference supports CUDA11.8? #56

Does MInference supports CUDA11.8? #56

Comments

hensiesp32 commented Jul 29, 2024

Describe the issue

iofu728 commented Jul 30, 2024

hensiesp32 commented Aug 1, 2024 • edited Loading

hensiesp32 commented Aug 5, 2024

iofu728 commented Aug 5, 2024 • edited Loading

hensiesp32 commented Aug 1, 2024 •

edited

Loading

iofu728 commented Aug 5, 2024 •

edited

Loading