Skip to content

docs: update DeepSeek README status section#348

Open
mvanhorn wants to merge 2 commits into
openinfer-project:mainfrom
mvanhorn:fix/329-readme-deepseek-status
Open

docs: update DeepSeek README status section#348
mvanhorn wants to merge 2 commits into
openinfer-project:mainfrom
mvanhorn:fix/329-readme-deepseek-status

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

Updates the README's DeepSeek coverage so both supported lines are accurately described: the Supported Models table rows are refreshed, and the status paragraph now covers DeepSeek-V4-Flash (--features deepseek-v4, 8-GPU MP8 checkpoint, TileLang build requirement, greedy-only serving with explicit stop_reason for unsupported parameters, no CUDA Graph yet) and DeepSeek-V2-Lite (--features deepseek-v2-lite, 2-GPU EP path, current correctness-gate status). Each claim links to the measured-evidence docs under docs/models/deepseek-v4/ and docs/models/deepseek-v2-lite/.

Why this matters

#329 (split from the README rework tracker #122) asks that the DeepSeek section let users understand the supported model lines, feature flags, hardware expectations, serving status, and where the performance evidence lives. The current README covers only the V4 line in its status paragraph, has no V2-Lite status text, and links none of the evidence docs. Every claim in this update is sourced from the in-repo status/gate docs, and the V2-Lite text deliberately stays within the status ledger's claim boundaries (host-staged vs NCCL) rather than overclaiming production continuous batching.

Testing

Docs-only change. All referenced doc paths (docs/models/deepseek-v4/support.md, serving-baseline.md, decode-performance.md, docs/models/deepseek-v2-lite/status.md, hf-accuracy-gate.md) exist in the tree; feature-flag names match the per-crate Cargo.toml definitions.

Fixes #329

Comment thread README.md Outdated
DeepSeek V4 support is intentionally narrower than the Qwen paths in the initial PR: it requires `--features deepseek-v4`, uses CUDA devices `0..7`, serves greedy requests only, terminates unsupported logprobs and non-greedy sampling requests with an explicit `stop_reason`, and does not use CUDA Graph yet.
DeepSeek support is intentionally narrower than the Qwen paths:

- **DeepSeek-V4-Flash** requires `--features deepseek-v4`, the 8-GPU MP8 checkpoint, and TileLang at build time. The current OpenAI-compatible path is a single-request greedy smoke/direct regression path: unsupported logprobs and non-greedy sampling requests terminate with an explicit `stop_reason`; bs>1 serving, continuous batching, service-level KV management, and CUDA Graph remain follow-up. Evidence: [`support.md`](docs/models/deepseek-v4/support.md), [`serving-baseline.md`](docs/models/deepseek-v4/serving-baseline.md), and [`decode-performance.md`](docs/models/deepseek-v4/decode-performance.md).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md:161 still describes DeepSeek-V4 as a single-request path and says bs>1 serving remains follow-up. That is stale for the current tree: the DSV4 scheduler now has the HTTP active-set decode path wired, and docs/models/deepseek-v4/online-throughput.md records active set 2 / decode batch 2 evidence with caveats. Please update this bullet to describe the limited active-set decode state and link online-throughput.md or http-serving-benchmark.md, while keeping continuous batching, multi-request prefill, service-level KV, and CUDA Graph as follow-up work.

@CAICAIIs

CAICAIIs commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Hi @mvanhorn, thanks for opening this PR. Are you still planning to continue updating it?

#329 looks partially covered on current main, but the README still seems to need the current DeepSeek serving-status wording and links to the measured evidence docs before we can close the issue. If you’re not planning to continue, I can take it over or open a fresh PR.

…vidence

The DeepSeek-V4-Flash bullet described the OpenAI-compatible path as
single-request-only with bs>1 as follow-up, which is stale: the scheduler
now has the HTTP active-set decode path wired, measured at active set 2 /
decode batch 2 with caveats recorded in
docs/models/deepseek-v4/online-throughput.md.
@mvanhorn

mvanhorn commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Yes, still on it - thanks for the nudge. Updated in 43ff1ab: the DeepSeek-V4-Flash bullet now describes the current limited active-set decode state (HTTP active-set decode wired through the batch path, measured at active set 2 / decode batch 2, with the TPOT and c2/c4/c8 second-review caveats) and links online-throughput.md as the measured evidence alongside support.md and decode-performance.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs(readme): update DeepSeek README status

2 participants