a

1

1 what is chunked pipeline parallelism

What bothers me is that the support of chunked prefill and pipeline parallelism don't equal to cpp support. CPP means for one long prompt, when its first chunk is calculated in the second p state, next chunk can start in the first p stage, it is chunk-level. However, pp means that, when one request is calculated in the second p state, next request can start in the first p stage, it is request-level. For chunked prefill itself, it needs the second chunk to start computing after all the first chunk complete computing. So I wonder if the support of chunked prefill and pp is equal to CPP?

1 init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a

1 what is chunked pipeline parallelism

Uh oh!

Clone this wiki locally