Ideas how to use less VRAM for onnx models? #14472
Unanswered
elephantpanda
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have some onnx models loaded into the GPU with InferenceSession("model.onnx") in DirectML mode.
All my 3 onnx files come to a total of 1.77GB. (float16)
It is using about 2.5GB on the GPU to load in the models.
And a further 5.2 GB once the models have run their first inference.
Overall taking 8.2GB of GPU space.
Now the most common VRAM people have is 8GB. So I need to get about 10-20% saving of GPU.
Anyone have any tips how to decrease my GPU size a bit without sacrificing to much speed?
Beta Was this translation helpful? Give feedback.
All reactions