perf(dsv4): keep resident weights across all multi-card cases by YunjiQin · Pull Request #687 · hw-native-sys/pypto-lib

YunjiQin · 2026-07-03T09:53:48Z

Summary

Extend the child_memory resident-weight optimization (already used by prefill_fwd) to every remaining multi-card DeepSeek-V4 case, so static weight shards upload to their card once and skip the per-dispatch H2D/D2H.
decode_fwd: build RESIDENT_WEIGHT_NAMES from the layer-stacked name lists minus CACHE_POOL_NAMES, plus RoPE tables + head/final-norm weights.
decode_layer / prefill_layer: mark stacked attention + compressor + MoE weights (and the static tid2eid table) resident="stacked".
moe: mark routed/shared expert weights, gate, HC-FFN, norm, and tid2eid resident.
lm_head: mark the TP-sharded lm_head_weight resident="stacked".
Excludes KV/state caches, per-step metadata (slot mappings, block tables, ids, position_ids, sparse indices), input activations, and outputs. All resident names are inputs (is_output=False).

Related Issues

N/A

Extend the child_memory resident-weight optimization (already used by prefill_fwd) to every remaining multi-card DeepSeek-V4 case, so static weight shards upload to their card once and skip the per-dispatch H2D/D2H. - decode_fwd: RESIDENT_WEIGHT_NAMES from the layer-stacked name lists minus CACHE_POOL_NAMES, plus RoPE tables + head/final-norm weights - decode_layer / prefill_layer: mark stacked attention + compressor + MoE weights (and the static tid2eid table) resident="stacked" - moe: mark routed/shared expert weights, gate, HC-FFN, norm, tid2eid - lm_head: mark the TP-sharded lm_head_weight resident="stacked" Excludes KV/state caches, per-step metadata (slot mappings, block tables, ids, position_ids, sparse indices), input activations, and outputs. All resident names are inputs (is_output=False).

coderabbitai · 2026-07-03T09:54:17Z

📝 Walkthrough

Walkthrough

This PR adds device-residency tagging to tensor spec builders across five DeepSeek v4 model files. Each build_tensor_specs function now defines a RESIDENT_WEIGHT_NAMES set and marks matching static weight specs with spec.resident = "stacked", excluding KV/state caches and per-step metadata.

Changes

Device-resident weight tagging

Layer / File(s)	Summary
Decode forward resident weights `models/deepseek/v4/decode_fwd.py`	Defines `RESIDENT_WEIGHT_NAMES` covering stacked attention/MoE, RoPE cos/sin, head, and final-norm weights; `build_tensor_specs` marks matching specs `resident="stacked"`.
Decode layer resident weights `models/deepseek/v4/decode_layer.py`	Adds `RESIDENT_WEIGHT_NAMES` for attention and MoE FFN/gate params plus `tid2eid`; loop marks matching specs resident.
LM head weight residency `models/deepseek/v4/lm_head.py`	Marks the TP-sharded `lm_head_weight` spec as `resident="stacked"` with explanatory comments.
MoE tensor spec residency `models/deepseek/v4/moe.py`	Assigns spec list to a local `specs` variable, adds `RESIDENT_WEIGHT_NAMES` for FFN/gate/expert weights, marks matches resident, then returns `specs`.
Prefill layer resident weights `models/deepseek/v4/prefill_layer.py`	Adds `RESIDENT_WEIGHT_NAMES` for attention/RoPE, HCA/CSA weights, output projection, and MoE static weights; loop marks matching specs resident.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related PRs

hw-native-sys/pypto-lib#640: Both PRs modify the tensor-spec construction path in models/deepseek/v4/decode_fwd.py's build_tensor_specs, one adding resident tagging and the other refactoring the same spec-building logic.

Poem

A rabbit hops through weights so fine,
Tagging each as "stacked" this time,
No more shuttling to and fro,
On-device now, they stay and glow.
🐇✨ Dispatch after dispatch, weights hold still,
Cache and metadata roam at will.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: extending resident-weight handling across DeepSeek-V4 multi-card paths.
Description check	✅ Passed	The description matches the code changes and clearly explains the resident-weight optimization across the affected modules.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request configures static weight parameters to be device-resident and sharded per rank (using resident='stacked') across several DeepSeek v4 model components, including decode_fwd, decode_layer, lm_head, moe, and prefill_layer. This optimization avoids per-dispatch host-to-device and device-to-host transfers. Feedback is provided to simplify the definition of RESIDENT_WEIGHT_NAMES in decode_fwd.py by removing redundant unpacking of CSA_LAYER_STACKED_NAMES and HCA_LAYER_STACKED_NAMES, which are already contained within LAYER_STACKED_NAMES.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-03T09:57:38Z

+RESIDENT_WEIGHT_NAMES = frozenset(
+    [
+        n
+        for n in (*LAYER_STACKED_NAMES, *CSA_LAYER_STACKED_NAMES, *HCA_LAYER_STACKED_NAMES)
+        if n not in CACHE_POOL_NAMES
+    ]
+    + ["freqs_cos", "freqs_sin"]
+    + HC_HEAD_NAMES
+    + FINAL_NORM_NAMES
+)


Unpacking CSA_LAYER_STACKED_NAMES and HCA_LAYER_STACKED_NAMES is redundant here because LAYER_STACKED_NAMES already contains all of those names. Simplifying the list comprehension improves readability and avoids unnecessary unpacking operations.

Suggested change

RESIDENT_WEIGHT_NAMES = frozenset(

[

n

for n in (*LAYER_STACKED_NAMES, *CSA_LAYER_STACKED_NAMES, *HCA_LAYER_STACKED_NAMES)

if n not in CACHE_POOL_NAMES

]

+ ["freqs_cos", "freqs_sin"]

+ HC_HEAD_NAMES

+ FINAL_NORM_NAMES

)

RESIDENT_WEIGHT_NAMES = frozenset(

[

n

for n in LAYER_STACKED_NAMES

if n not in CACHE_POOL_NAMES

]

+ ["freqs_cos", "freqs_sin"]

+ HC_HEAD_NAMES

+ FINAL_NORM_NAMES

)

coderabbitai

🧹 Nitpick comments (2)

models/deepseek/v4/moe.py (1)
898-919: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Post-hoc resident assignment bypasses TensorSpec's output/resident guard.

spec.resident = "stacked" is set after construction, so TensorSpec.__post_init__'s check that rejects resident combined with is_output=True never re-runs. x_next (is_output=True) is already in specs when this loop executes, so a future accidental name collision (e.g. a rename/typo adding x_next to RESIDENT_WEIGHT_NAMES) would silently mark an output tensor resident instead of raising, per the documented contract: "resident is only valid for inputs (a resident weight stays device-resident across dispatches); it cannot be combined with is_output=True."

A cheap guard restores the intended safety net. Based on learnings, module-level assert guards are an accepted convention in this codebase's DeepSeek v4 modules and aren't stripped at runtime.
🛡️ Proposed defensive check
     for spec in specs:
         if spec.name in RESIDENT_WEIGHT_NAMES:
+            assert not spec.is_output, f"{spec.name!r} is an output tensor; cannot mark resident"
             spec.resident = "stacked"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/moe.py` around lines 898 - 919, The post-construction
resident update in the specs loop bypasses TensorSpec’s is_output/resident
validation, so add a defensive module-level assert before setting spec.resident
to stacked in the RESIDENT_WEIGHT_NAMES loop. Use TensorSpec and the
RESIDENT_WEIGHT_NAMES assignment block in moe.py to ensure no entry like x_next
(or any future typo/rename collision) can ever be marked resident when
spec.is_output is true, preserving the constructor’s guard.
Source: Learnings
models/deepseek/v4/prefill_layer.py (1)
836-871: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Same validation-bypass gap as moe.py's build_tensor_specs.

x_next (is_output=True, appended at line 836) is already present in tensor_specs when the RESIDENT_WEIGHT_NAMES loop runs at lines 868-870, so a future name collision would silently set resident on an output tensor without triggering TensorSpec's constructor-time guard against resident + is_output=True.
🛡️ Proposed defensive check
     for spec in tensor_specs:
         if spec.name in RESIDENT_WEIGHT_NAMES:
+            assert not spec.is_output, f"{spec.name!r} is an output tensor; cannot mark resident"
             spec.resident = "stacked"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/prefill_layer.py` around lines 836 - 871, The `x_next`
output can still be matched by the `RESIDENT_WEIGHT_NAMES` loop and have
`resident="stacked"` assigned after construction, bypassing the `TensorSpec`
output/resident validation. Update the `tensor_specs` post-processing in
`prefill_layer.py` so only non-output specs are eligible for resident marking,
or add an explicit guard before setting `spec.resident`; use the `x_next`,
`RESIDENT_WEIGHT_NAMES`, and `TensorSpec` symbols to keep the check aligned with
the constructor rule.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/moe.py`:
- Around line 898-919: The post-construction resident update in the specs loop
bypasses TensorSpec’s is_output/resident validation, so add a defensive
module-level assert before setting spec.resident to stacked in the
RESIDENT_WEIGHT_NAMES loop. Use TensorSpec and the RESIDENT_WEIGHT_NAMES
assignment block in moe.py to ensure no entry like x_next (or any future
typo/rename collision) can ever be marked resident when spec.is_output is true,
preserving the constructor’s guard.

In `@models/deepseek/v4/prefill_layer.py`:
- Around line 836-871: The `x_next` output can still be matched by the
`RESIDENT_WEIGHT_NAMES` loop and have `resident="stacked"` assigned after
construction, bypassing the `TensorSpec` output/resident validation. Update the
`tensor_specs` post-processing in `prefill_layer.py` so only non-output specs
are eligible for resident marking, or add an explicit guard before setting
`spec.resident`; use the `x_next`, `RESIDENT_WEIGHT_NAMES`, and `TensorSpec`
symbols to keep the check aligned with the constructor rule.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a1e986e5-672d-4ce8-b063-818ab6148a26

📥 Commits

Reviewing files that changed from the base of the PR and between 7d376c8 and dbb8ac6.

📒 Files selected for processing (5)

models/deepseek/v4/decode_fwd.py
models/deepseek/v4/decode_layer.py
models/deepseek/v4/lm_head.py
models/deepseek/v4/moe.py
models/deepseek/v4/prefill_layer.py

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(dsv4): keep resident weights across all multi-card cases#687

perf(dsv4): keep resident weights across all multi-card cases#687
YunjiQin wants to merge 1 commit into
hw-native-sys:mainfrom
YunjiQin:feat/v4-resident-weights-multicard

YunjiQin commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

YunjiQin commented Jul 3, 2026

Summary

Related Issues

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading