docs: update DeepSeek README status section by mvanhorn · Pull Request #348 · openinfer-project/openinfer

mvanhorn · 2026-06-11T09:13:28Z

Summary

Updates the README's DeepSeek coverage so both supported lines are accurately described: the Supported Models table rows are refreshed, and the status paragraph now covers DeepSeek-V4-Flash (--features deepseek-v4, 8-GPU MP8 checkpoint, TileLang build requirement, greedy-only serving with explicit stop_reason for unsupported parameters, no CUDA Graph yet) and DeepSeek-V2-Lite (--features deepseek-v2-lite, 2-GPU EP path, current correctness-gate status). Each claim links to the measured-evidence docs under docs/models/deepseek-v4/ and docs/models/deepseek-v2-lite/.

Why this matters

#329 (split from the README rework tracker #122) asks that the DeepSeek section let users understand the supported model lines, feature flags, hardware expectations, serving status, and where the performance evidence lives. The current README covers only the V4 line in its status paragraph, has no V2-Lite status text, and links none of the evidence docs. Every claim in this update is sourced from the in-repo status/gate docs, and the V2-Lite text deliberately stays within the status ledger's claim boundaries (host-staged vs NCCL) rather than overclaiming production continuous batching.

Testing

Docs-only change. All referenced doc paths (docs/models/deepseek-v4/support.md, serving-baseline.md, decode-performance.md, docs/models/deepseek-v2-lite/status.md, hf-accuracy-gate.md) exist in the tree; feature-flag names match the per-crate Cargo.toml definitions.

Fixes #329

Fixes openinfer-project#329

CAICAIIs · 2026-06-11T11:05:59Z

-DeepSeek V4 support is intentionally narrower than the Qwen paths in the initial PR: it requires `--features deepseek-v4`, uses CUDA devices `0..7`, serves greedy requests only, terminates unsupported logprobs and non-greedy sampling requests with an explicit `stop_reason`, and does not use CUDA Graph yet.
+DeepSeek support is intentionally narrower than the Qwen paths:
+
+- **DeepSeek-V4-Flash** requires `--features deepseek-v4`, the 8-GPU MP8 checkpoint, and TileLang at build time. The current OpenAI-compatible path is a single-request greedy smoke/direct regression path: unsupported logprobs and non-greedy sampling requests terminate with an explicit `stop_reason`; bs>1 serving, continuous batching, service-level KV management, and CUDA Graph remain follow-up. Evidence: [`support.md`](docs/models/deepseek-v4/support.md), [`serving-baseline.md`](docs/models/deepseek-v4/serving-baseline.md), and [`decode-performance.md`](docs/models/deepseek-v4/decode-performance.md).


README.md:161 still describes DeepSeek-V4 as a single-request path and says bs>1 serving remains follow-up. That is stale for the current tree: the DSV4 scheduler now has the HTTP active-set decode path wired, and docs/models/deepseek-v4/online-throughput.md records active set 2 / decode batch 2 evidence with caveats. Please update this bullet to describe the limited active-set decode state and link online-throughput.md or http-serving-benchmark.md, while keeping continuous batching, multi-request prefill, service-level KV, and CUDA Graph as follow-up work.

CAICAIIs · 2026-07-04T04:55:14Z

Hi @mvanhorn, thanks for opening this PR. Are you still planning to continue updating it?

#329 looks partially covered on current main, but the README still seems to need the current DeepSeek serving-status wording and links to the measured evidence docs before we can close the issue. If you’re not planning to continue, I can take it over or open a fresh PR.

…vidence The DeepSeek-V4-Flash bullet described the OpenAI-compatible path as single-request-only with bs>1 as follow-up, which is stale: the scheduler now has the HTTP active-set decode path wired, measured at active set 2 / decode batch 2 with caveats recorded in docs/models/deepseek-v4/online-throughput.md.

mvanhorn · 2026-07-04T05:42:14Z

Yes, still on it - thanks for the nudge. Updated in 43ff1ab: the DeepSeek-V4-Flash bullet now describes the current limited active-set decode state (HTTP active-set decode wired through the batch path, measured at active set 2 / decode batch 2, with the TPOT and c2/c4/c8 second-review caveats) and links online-throughput.md as the measured evidence alongside support.md and decode-performance.md.

docs: update DeepSeek README status section

4e07108

Fixes openinfer-project#329

mvanhorn mentioned this pull request Jun 11, 2026

docs(readme): update DeepSeek README status #329

Open

xiaguan requested a review from CAICAIIs June 11, 2026 09:27

CAICAIIs requested changes Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update DeepSeek README status section#348

docs: update DeepSeek README status section#348
mvanhorn wants to merge 2 commits into
openinfer-project:mainfrom
mvanhorn:fix/329-readme-deepseek-status

mvanhorn commented Jun 11, 2026

Uh oh!

CAICAIIs Jun 11, 2026

Uh oh!

CAICAIIs commented Jul 4, 2026

Uh oh!

mvanhorn commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mvanhorn commented Jun 11, 2026

Summary

Why this matters

Testing

Uh oh!

CAICAIIs Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

CAICAIIs commented Jul 4, 2026

Uh oh!

mvanhorn commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants