You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in short, I am trying to build the engine to support images of 832x512 , and a batch size of 4
this results in the following error message:
[W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 11580055552 detected for tactic 0x0000000000000000.Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
I can see the memory usage increase to ~14.9GB of VRAM out of my RTX4080's 16GB , when the error occurs
More information:
with the same image dimensions and a batch size of 3, the build process peaks at around ~12GB of VRAM and it completes with no error messages... when StreamDiffusion is actually running, it uses up 9GB of VRAM in this configuration, and has a noticable performance increase over batch size 2
Question for the developers, is it possible to make a version of the build script which can take advantage of CUDA's memory fallback? i.e. partially using system RAM to build the engine when VRAM is full... It will take longer but it is a one-time process, similar to training (which is the original intent of CUDA's memory fallback), and seeing that there is spare VRAM during inference, I believe that there can be performance gains if the engine could be successfully built with a higher batch size
The text was updated successfully, but these errors were encountered:
I have the following TensorRT setup:
in short, I am trying to build the engine to support images of 832x512 , and a batch size of 4
this results in the following error message:
[W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 11580055552 detected for tactic 0x0000000000000000.Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
I can see the memory usage increase to ~14.9GB of VRAM out of my RTX4080's 16GB , when the error occurs
More information:
with the same image dimensions and a batch size of 3, the build process peaks at around ~12GB of VRAM and it completes with no error messages... when StreamDiffusion is actually running, it uses up 9GB of VRAM in this configuration, and has a noticable performance increase over batch size 2
Question for the developers, is it possible to make a version of the build script which can take advantage of CUDA's memory fallback? i.e. partially using system RAM to build the engine when VRAM is full... It will take longer but it is a one-time process, similar to training (which is the original intent of CUDA's memory fallback), and seeing that there is spare VRAM during inference, I believe that there can be performance gains if the engine could be successfully built with a higher batch size
The text was updated successfully, but these errors were encountered: