forked from open-mmlab/mmdeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
lianxintao edited this page Sep 18, 2025
·
1 revision
Welcome to the mmdeploy wiki!
What bothers me is that the support of chunked prefill and pipeline parallelism don't equal to cpp support. CPP means for one long prompt, when its first chunk is calculated in the second p state, next chunk can start in the first p stage, it is chunk-level. However, pp means that, when one request is calculated in the second p state, next request can start in the first p stage, it is request-level. For chunked prefill itself, it needs the second chunk to start computing after all the first chunk complete computing. So I wonder if the support of chunked prefill and pp is equal to CPP?
1 init