RAM offloading like AutoGPTQ? #109

manyotherfunctions · 2023-06-28T03:53:30Z

manyotherfunctions
Jun 28, 2023

Are there any plans to add the ability to split the model between VRAM and system RAM like AutoGPTQ does? For example the oobabooga webui, through AutoGPTQ, lets you load even a 65B parameter model on a 8GB VRAM GPU, where only 1GB is loaded in VRAM and the rest in RAM

turboderp · 2023-06-28T08:40:35Z

turboderp
Jun 28, 2023
Maintainer

This would be horrendously slow without a dedicated CPU inference engine. And Llama.cpp is already very well optimized for this use case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAM offloading like AutoGPTQ? #109

{{title}}

Replies: 1 comment

{{title}}

Select a reply

RAM offloading like AutoGPTQ? #109

manyotherfunctions Jun 28, 2023

Replies: 1 comment

turboderp Jun 28, 2023 Maintainer

manyotherfunctions
Jun 28, 2023

turboderp
Jun 28, 2023
Maintainer