You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run the tensorrt_llm program on the server, but I want this execution to be independent of the GPU conditions, the type of GPU, or the amount of free GPU memory. However, the lora_cache_gpu_memory_fraction parameter looks at the available GPU memory and allocates a percentage of it for LoRA. This causes the program execution to depend on the type of GPU or the amount of free GPU memory. Please, if possible, define another parameter that can replace this one, allowing us to specify a fixed amount, such as 1 GB, to be allocated for LoRA. This way, the memory allocation will always be constant and independent of the execution environment.
The text was updated successfully, but these errors were encountered:
I want to run the tensorrt_llm program on the server, but I want this execution to be independent of the GPU conditions, the type of GPU, or the amount of free GPU memory. However, the lora_cache_gpu_memory_fraction parameter looks at the available GPU memory and allocates a percentage of it for LoRA. This causes the program execution to depend on the type of GPU or the amount of free GPU memory. Please, if possible, define another parameter that can replace this one, allowing us to specify a fixed amount, such as 1 GB, to be allocated for LoRA. This way, the memory allocation will always be constant and independent of the execution environment.
The text was updated successfully, but these errors were encountered: