State of multi-gpu support #626

mfzmullen · 2024-05-17T15:11:41Z

mfzmullen
May 17, 2024

If I had a system with multiple NVIDIA GPUs I would of course just test this by compiling with MATX_MULTI_GPU=ON, but what is the state of multi-gpu support? Do I need to make any library changes (besides the one CMakeLists.txt change) when switching from one-to-many GPUs? Thanks!

Answered by cliffburdick

May 18, 2024

Hi @mfzmullen, right now multi-gpu support is possibly manually by the user writing device/stream-specific code using cudaSetDevice and others. We currently do not support single operations that launch across multiple GPUs. A lot of times this is best left to the user anyways since there's a tradeoff with transfer times, and that transfer time is dependent on the system. Basic arithmetic on a tensor might fall into that category where trying to automatically handle the data shuffles could perform worse than doing it manually.

cuSPARSE is another that's difficult to support at this time since the tensor types would not be compatible with anything we have now. If we were to support it users…

View full answer

cliffburdick · 2024-05-17T16:53:29Z

cliffburdick
May 17, 2024
Maintainer

Hi @mfzmullen, the state of multi-gpu is complicated and really depends on your use case. Some of our libraries on the backend support multi-gpu to some extent (BLAS, FFT), but not all of them. Can you let us know what you want to use multiple GPUs for? As much detail as you can provide will be appreciated.

4 replies

mfzmullen May 17, 2024
Author

Use case (right now) is fairly simple. Doing basic arithmetic on tensors (~10^6 - 10^8 elements) to simulate some (decoupled) physics.

In the future, that will hopefully evolve into some BLAS/GEMM (FFT would also be useful but not required) on tensors/matrices that are too large to fit in current memory for solving inverse problems. I am familiar with methods on solving inverse problems when the entire forward operator can't fit into memory, but was curious on the state of multi-gpu support to see if I could break it up across gpus.

Not quite sure how much detail you are looking for, happy to provide more if needed though. I usually try to be succinct first and expand later.

mfzmullen May 17, 2024
Author

Oops, forgot to mention cuSPARSE would be useful too.

cliffburdick May 18, 2024
Maintainer

Hi @mfzmullen, right now multi-gpu support is possibly manually by the user writing device/stream-specific code using cudaSetDevice and others. We currently do not support single operations that launch across multiple GPUs. A lot of times this is best left to the user anyways since there's a tradeoff with transfer times, and that transfer time is dependent on the system. Basic arithmetic on a tensor might fall into that category where trying to automatically handle the data shuffles could perform worse than doing it manually.

cuSPARSE is another that's difficult to support at this time since the tensor types would not be compatible with anything we have now. If we were to support it users would expect that all/most operations in MatX would just work with a sparse type, but in reality the only ones that would work are those supported by cuSPARSE.

Answer selected by mfzmullen

mfzmullen May 20, 2024
Author

Great, really appreciate all the info! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State of multi-gpu support #626

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

State of multi-gpu support #626

mfzmullen May 17, 2024

Replies: 1 comment · 4 replies

cliffburdick May 17, 2024 Maintainer

mfzmullen May 17, 2024 Author

mfzmullen May 17, 2024 Author

cliffburdick May 18, 2024 Maintainer

mfzmullen May 20, 2024 Author

mfzmullen
May 17, 2024

Replies: 1 comment 4 replies

cliffburdick
May 17, 2024
Maintainer

mfzmullen May 17, 2024
Author

mfzmullen May 17, 2024
Author

cliffburdick May 18, 2024
Maintainer

mfzmullen May 20, 2024
Author