Layer Skip looks interesting #432
Replies: 1 comment 1 reply
-
It does look interesting, but it requires models specifically trained (or at least finetuned) for early exit. And the speedup isn't amazing compared to using a tiny draft model, so it's questionable if it's really worth it to finetune the large models that would really benefit from it. One big drawback is you still need to fill out the K/V cache entries of any skipped layers, so you can't just exit early and proceed even if you're what token you're going to sample, halfway through the forward pass. You can exit early and then do a full, batched pass over some number of early-exit tokens, but at that point you start to run into the same limitations as with other speculative methods, and the speedup ends up being very comparable. |
Beta Was this translation helpful? Give feedback.
-
https://huggingface.co/papers/2404.16710
Hey! :)
I just found this and the self-speculative decoding looks promising at first glance
@turboderp What do you think about it?
Beta Was this translation helpful? Give feedback.
All reactions