Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insufficient VRAM when building TensorRT engine? #161

Open
ccarmatic opened this issue Jun 27, 2024 · 0 comments
Open

insufficient VRAM when building TensorRT engine? #161

ccarmatic opened this issue Jun 27, 2024 · 0 comments

Comments

@ccarmatic
Copy link

ccarmatic commented Jun 27, 2024

I have the following TensorRT setup:

resolutiondict = {'engine_build_options' : {'opt_image_height': 512, 'opt_image_width': 832}}
stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=4, engine_build_options=resolutiondict
)

in short, I am trying to build the engine to support images of 832x512 , and a batch size of 4

this results in the following error message:

[W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 11580055552 detected for tactic 0x0000000000000000.Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
I can see the memory usage increase to ~14.9GB of VRAM out of my RTX4080's 16GB , when the error occurs

More information:
with the same image dimensions and a batch size of 3, the build process peaks at around ~12GB of VRAM and it completes with no error messages... when StreamDiffusion is actually running, it uses up 9GB of VRAM in this configuration, and has a noticable performance increase over batch size 2

Question for the developers, is it possible to make a version of the build script which can take advantage of CUDA's memory fallback? i.e. partially using system RAM to build the engine when VRAM is full... It will take longer but it is a one-time process, similar to training (which is the original intent of CUDA's memory fallback), and seeing that there is spare VRAM during inference, I believe that there can be performance gains if the engine could be successfully built with a higher batch size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant