Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lora_cache_gpu_memory_fraction is not a good parameter #665

Open
Alireza3242 opened this issue Dec 22, 2024 · 1 comment
Open

lora_cache_gpu_memory_fraction is not a good parameter #665

Alireza3242 opened this issue Dec 22, 2024 · 1 comment

Comments

@Alireza3242
Copy link

I want to run the tensorrt_llm program on the server, but I want this execution to be independent of the GPU conditions, the type of GPU, or the amount of free GPU memory. However, the lora_cache_gpu_memory_fraction parameter looks at the available GPU memory and allocates a percentage of it for LoRA. This causes the program execution to depend on the type of GPU or the amount of free GPU memory. Please, if possible, define another parameter that can replace this one, allowing us to specify a fixed amount, such as 1 GB, to be allocated for LoRA. This way, the memory allocation will always be constant and independent of the execution environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@Alireza3242 and others