Allow users to specify a list of tokens which can be treated as atomic units (i.e., analogously to bytes). This will enable correct handling of multi-byte EOS tokens, which currently must be generated one byte at a time.
The TokenByteTrie already provides support for this via the optional atomic_tokens argument, which can take a list of tokens that are to be treated as atomic units rather than being split into bytes.
We will need to refactor the prefill function, which constructs a beam state given a byte sequence (for, e.g., prompted generation). This function currently steps the beam one byte at a time, which will lead to issues when an atomic byte sequence appears in the input to prefill.
Allow users to specify a list of tokens which can be treated as atomic units (i.e., analogously to bytes). This will enable correct handling of multi-byte EOS tokens, which currently must be generated one byte at a time.
The
TokenByteTriealready provides support for this via the optionalatomic_tokensargument, which can take a list of tokens that are to be treated as atomic units rather than being split into bytes.We will need to refactor the
prefillfunction, which constructs a beam state given a byte sequence (for, e.g., prompted generation). This function currently steps the beam one byte at a time, which will lead to issues when an atomic byte sequence appears in the input toprefill.