Hello, thanks for open source .
Running inference.py on 20808ti with 22GB but still got OutOfMemory error.
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 0 has a total capacity of 22.00 GiB of which 0 bytes is free. Of the allocated memory 38.97 GiB is allocated by PyTorch, and 3.20 GiB is reserved by PyTorch but unallocated.
So how much GPU memory is required for infernece ?
Any suggestions to reduce memory usage, can we use quantization .
Thank you.
Hello, thanks for open source .
Running inference.py on 20808ti with 22GB but still got OutOfMemory error.
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 0 has a total capacity of 22.00 GiB of which 0 bytes is free. Of the allocated memory 38.97 GiB is allocated by PyTorch, and 3.20 GiB is reserved by PyTorch but unallocated.
So how much GPU memory is required for infernece ?
Any suggestions to reduce memory usage, can we use quantization .
Thank you.