lora_cache_gpu_memory_fraction is not a good parameter #665

Alireza3242 · 2024-12-22T02:35:30Z

I want to run the tensorrt_llm program on the server, but I want this execution to be independent of the GPU conditions, the type of GPU, or the amount of free GPU memory. However, the lora_cache_gpu_memory_fraction parameter looks at the available GPU memory and allocates a percentage of it for LoRA. This causes the program execution to depend on the type of GPU or the amount of free GPU memory. Please, if possible, define another parameter that can replace this one, allowing us to specify a fixed amount, such as 1 GB, to be allocated for LoRA. This way, the memory allocation will always be constant and independent of the execution environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora_cache_gpu_memory_fraction is not a good parameter #665

lora_cache_gpu_memory_fraction is not a good parameter #665

Alireza3242 commented Dec 22, 2024

lora_cache_gpu_memory_fraction is not a good parameter #665

lora_cache_gpu_memory_fraction is not a good parameter #665

Comments

Alireza3242 commented Dec 22, 2024