Skip to content

Is Exllama loss-less? #80

Answered by turboderp
ri938 asked this question in Q&A
Discussion options

You must be logged in to vote

ExLlama isn't doing approximate attention or anything like that, but it is using FP16 math in some places where other implementations do FP32.

I've managed to create a comparative benchmark with GPTQ-for-LLaMA that shows some small but measurable differences in perplexity. I still have to validate that there isn't an off-by-one error or whatever skewing the results, and crucially the differences get (a lot) smaller the larger the model gets. Sometimes they even come out in ExLlama's favor, so it's not that the FP16 math strictly hurts perplexity.

It's also not clear if it's a relevant comparison to make in the first place, since all of the logits contribute when calculating perplexity, bu…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ri938
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants