docs: update DeepSeek README status section#348
Conversation
| DeepSeek V4 support is intentionally narrower than the Qwen paths in the initial PR: it requires `--features deepseek-v4`, uses CUDA devices `0..7`, serves greedy requests only, terminates unsupported logprobs and non-greedy sampling requests with an explicit `stop_reason`, and does not use CUDA Graph yet. | ||
| DeepSeek support is intentionally narrower than the Qwen paths: | ||
|
|
||
| - **DeepSeek-V4-Flash** requires `--features deepseek-v4`, the 8-GPU MP8 checkpoint, and TileLang at build time. The current OpenAI-compatible path is a single-request greedy smoke/direct regression path: unsupported logprobs and non-greedy sampling requests terminate with an explicit `stop_reason`; bs>1 serving, continuous batching, service-level KV management, and CUDA Graph remain follow-up. Evidence: [`support.md`](docs/models/deepseek-v4/support.md), [`serving-baseline.md`](docs/models/deepseek-v4/serving-baseline.md), and [`decode-performance.md`](docs/models/deepseek-v4/decode-performance.md). |
There was a problem hiding this comment.
README.md:161 still describes DeepSeek-V4 as a single-request path and says bs>1 serving remains follow-up. That is stale for the current tree: the DSV4 scheduler now has the HTTP active-set decode path wired, and docs/models/deepseek-v4/online-throughput.md records active set 2 / decode batch 2 evidence with caveats. Please update this bullet to describe the limited active-set decode state and link online-throughput.md or http-serving-benchmark.md, while keeping continuous batching, multi-request prefill, service-level KV, and CUDA Graph as follow-up work.
|
Hi @mvanhorn, thanks for opening this PR. Are you still planning to continue updating it? #329 looks partially covered on current main, but the README still seems to need the current DeepSeek serving-status wording and links to the measured evidence docs before we can close the issue. If you’re not planning to continue, I can take it over or open a fresh PR. |
…vidence The DeepSeek-V4-Flash bullet described the OpenAI-compatible path as single-request-only with bs>1 as follow-up, which is stale: the scheduler now has the HTTP active-set decode path wired, measured at active set 2 / decode batch 2 with caveats recorded in docs/models/deepseek-v4/online-throughput.md.
|
Yes, still on it - thanks for the nudge. Updated in 43ff1ab: the DeepSeek-V4-Flash bullet now describes the current limited active-set decode state (HTTP active-set decode wired through the batch path, measured at active set 2 / decode batch 2, with the TPOT and c2/c4/c8 second-review caveats) and links online-throughput.md as the measured evidence alongside support.md and decode-performance.md. |
Summary
Updates the README's DeepSeek coverage so both supported lines are accurately described: the Supported Models table rows are refreshed, and the status paragraph now covers DeepSeek-V4-Flash (
--features deepseek-v4, 8-GPU MP8 checkpoint, TileLang build requirement, greedy-only serving with explicitstop_reasonfor unsupported parameters, no CUDA Graph yet) and DeepSeek-V2-Lite (--features deepseek-v2-lite, 2-GPU EP path, current correctness-gate status). Each claim links to the measured-evidence docs underdocs/models/deepseek-v4/anddocs/models/deepseek-v2-lite/.Why this matters
#329 (split from the README rework tracker #122) asks that the DeepSeek section let users understand the supported model lines, feature flags, hardware expectations, serving status, and where the performance evidence lives. The current README covers only the V4 line in its status paragraph, has no V2-Lite status text, and links none of the evidence docs. Every claim in this update is sourced from the in-repo status/gate docs, and the V2-Lite text deliberately stays within the status ledger's claim boundaries (host-staged vs NCCL) rather than overclaiming production continuous batching.
Testing
Docs-only change. All referenced doc paths (
docs/models/deepseek-v4/support.md,serving-baseline.md,decode-performance.md,docs/models/deepseek-v2-lite/status.md,hf-accuracy-gate.md) exist in the tree; feature-flag names match the per-crateCargo.tomldefinitions.Fixes #329