Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Supports dynamic number of samples in continuous batching.
Prefill is done once, followed by multiple inserts. Using an aux field to store finished samples, and performs postprocessing after all samples are received for a request. Releases resource for batching after each sample is done. typically, num_live_batches should be set to num_slots // prefill_batch_size or larger. Since prefill produces first token, we require the method to provide a new function to resample initial tokens. PiperOrigin-RevId: 671886877 Change-Id: Id4ddec1f99e8e13d755bbeb5343aab8ca12f688e
- Loading branch information