Skip to content

Drop-in support for libraries expected a HF Transformer/Tokenizer? #66

Answered by turboderp
andysalerno asked this question in Q&A
Discussion options

You must be logged in to vote

Basically, no, there's no easy way to do that. It would be about as involved as using a GGML model in Transformers, because there's very little of the original HF structure left. ExLlama relies on controlling the datatype and stride of the hidden state throughout the forward pass, for instance. That alone is a major obstacle unless you want to slow it down by constantly reshaping and converting tensors. And it wouldn't be possible to just replace any module with an equivalent, compatible module, like what AutoGPTQ and GPTQ-for-LLaMa rely on. PEFT wouldn't work because it relies on hooks, and most of ExLlama's operations are fused in one way or another.

You can pretty trivially make a wrap…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by andysalerno
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants