Ideas how to use less VRAM for onnx models? #14472

Unanswered

elephantpanda asked this question in General

elephantpanda
Jan 30, 2023

I have some onnx models loaded into the GPU with InferenceSession("model.onnx") in DirectML mode.

All my 3 onnx files come to a total of 1.77GB. (float16)

It is using about 2.5GB on the GPU to load in the models.
And a further 5.2 GB once the models have run their first inference.

Overall taking 8.2GB of GPU space.

Now the most common VRAM people have is 8GB. So I need to get about 10-20% saving of GPU.

Anyone have any tips how to decrease my GPU size a bit without sacrificing to much speed?

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment