Is there any way in the meantime to request more than 1 replica from each GPU in my node? #929

wei1793786487 · 2024-08-27T09:01:28Z

I have started MPS and used 10 as the division factor, but in our application scenario, we might directly allocate 2 whole GPUs, which is equivalent to specifying nvidia.com/gpu: 20. If I set nvidia.com/gpu > 1, I encounter the error: ‘request for “nvidia.com/gpu”: invalid request: maximum request size for shared resources is 1; found 10, which is unexpected’.

Is there any way in the meantime to request more than 1 replica from each GPU in my node?

wei1793786487 · 2024-08-27T09:02:30Z

My configuration file is version: v1 sharing: mps: resources: - name: nvidia.com/gpu replicas: 10

agrogov · 2024-09-25T06:35:32Z

If failRequestsGreaterThanOne=true were set in either of these configurations (MPS or TimeSlicing) and a user requested more than one nvidia.com/gpu or nvidia.com/gpu.shared resource in their pod spec, then the container would fail with the resulting error you've seen.

ZYWNB666 · 2024-11-14T07:31:33Z

I also want to know the answer to this question, you can apply for multiple gpu resources when you do not use mps, but you cannot apply for multiple gpu resources once you use mps
@agrogov

agrogov · 2024-11-14T22:45:20Z

@ZYWNB666 as I got failRequestsGreaterThanOne is always set to true when MPS is used, so no way to change this behaviour...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any way in the meantime to request more than 1 replica from each GPU in my node? #929

Is there any way in the meantime to request more than 1 replica from each GPU in my node? #929

wei1793786487 commented Aug 27, 2024

wei1793786487 commented Aug 27, 2024

agrogov commented Sep 25, 2024

ZYWNB666 commented Nov 14, 2024

agrogov commented Nov 14, 2024

Is there any way in the meantime to request more than 1 replica from each GPU in my node? #929

Is there any way in the meantime to request more than 1 replica from each GPU in my node? #929

Comments

wei1793786487 commented Aug 27, 2024

wei1793786487 commented Aug 27, 2024

agrogov commented Sep 25, 2024

ZYWNB666 commented Nov 14, 2024

agrogov commented Nov 14, 2024