forked from open-mmlab/mmdeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
lianxintao edited this page Sep 18, 2025
·
1 revision
1
What bothers me is that the support of chunked prefill and pipeline parallelism don't equal to cpp support. CPP means for one long prompt, when its first chunk is calculated in the second p state, next chunk can start in the first p stage, it is chunk-level. However, pp means that, when one request is calculated in the second p state, next request can start in the first p stage, it is request-level. For chunked prefill itself, it needs the second chunk to start computing after all the first chunk complete computing. So I wonder if the support of chunked prefill and pp is equal to CPP?
1 init