Skip to content

[Perf] Pipeline-friendly shard task submission in CacheStore#888

Merged
mag1c-h merged 3 commits intoModelEngine-Group:developfrom
mag1c-h:dev-cache-load-pipeline
Apr 2, 2026
Merged

[Perf] Pipeline-friendly shard task submission in CacheStore#888
mag1c-h merged 3 commits intoModelEngine-Group:developfrom
mag1c-h:dev-cache-load-pipeline

Conversation

@mag1c-h
Copy link
Copy Markdown
Collaborator

@mag1c-h mag1c-h commented Apr 2, 2026

Purpose

Decouple shard-level backend task submission to enable pipelining between dispatch and transfer stages.

  • Backend's Load() is async, multiple shards can be processed concurrently
  • Transfer stage can now start H2D on completed shards while waiting for slower ones
  • Reduces latency when shard I/O times are imbalanced

Modifications

  • Submit each shard to backend independently (not batched)
  • Push ShardTasks to running queue immediately
  • Each ShardTask has its own backend task handle for independent Wait()

Test

Modify TensorSize ShardNumber BlockNumber Load 100% from backend Load 100% from Cache
Before 64KB 64 1024 314ms 264ms
After 64KB 64 1024 274ms 262ms

@mag1c-h mag1c-h force-pushed the dev-cache-load-pipeline branch from 5ad2ea6 to 74256e2 Compare April 2, 2026 03:10
@mag1c-h mag1c-h marked this pull request as ready for review April 2, 2026 03:33
@mag1c-h mag1c-h requested a review from ygwpz as a code owner April 2, 2026 03:33
@mag1c-h mag1c-h merged commit d5520f6 into ModelEngine-Group:develop Apr 2, 2026
17 of 18 checks passed
@mag1c-h mag1c-h deleted the dev-cache-load-pipeline branch April 2, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants