docs: Running on WSL2 section (residency limit, spill diagnostic, UVA note) by sztlink · Pull Request #19 · huawei-csl/KVarN

sztlink · 2026-06-11T10:11:11Z

As invited in #15: a README note for WSL2 users covering the two things that bite there.

The VRAM residency limit (silent spill to system RAM past ~90% allocation, with the gpu-memory-utilization guidance and the power-draw diagnostic from the kvarn_k4v2_g128 throughput does not scale with batch size (Sinkhorn kernel JIT-recompiles per step under dynamic batching) #15 investigation).
The UVA requirement, with the current one-line workaround.

One suggestion outside the README scope: an official escape hatch for UVA on WSL (an env var like KVARN_FORCE_UVA=1, or detecting WSL and warning instead of failing) would spare users from editing platform_utils.py by hand. Happy to PR that separately if you want it.

Environment for the measurements: WSL 2.7.3 (kernel 6.6.114), Windows 11 build 26100, driver 595.79, RTX 4090.

… UVA)

philippebich

Thanks for adding the WSL2 note and documenting your findings.

I know these observations come from your test setup, but would you mind making this section a bit more general and avoiding references to specific card vendors or branded hardware where possible? Model names used for validation are of course fine, but I'd prefer not to call out vendors or cards in the main guidance for the project.

Thanks for the support and for documenting the WSL2 behavior.

Move the agency to the WDDM paravirtualization layer (the actual source of the residency limit and spill), express the spill diagnostic as the utilization/power divergence rather than a card-specific wattage, and keep the 24 GB GPU + driver only as a validation data point. Add a live-monitor command.

sztlink · 2026-06-12T11:30:48Z

Thanks, that is a fair call. I have generalized the section:

The residency limit and silent spill are now attributed to the WDDM paravirtualization layer rather than to any card, which is also the more accurate framing (it is a property of the Windows graphics virtualization stack, not of the silicon).
The diagnostic is expressed as the utilization/power divergence instead of a card-specific wattage, so it reads on any device.
The 24 GB GPU and driver remain only as a validation data point; the model name is kept for reproducibility.
Added a one-line live-monitor command for the two signals.

Ready for another look whenever you have a moment, and thanks again for the review.

docs: add Running on WSL2 section (residency limit, spill diagnostic,…

f02dc9d

… UVA)

sztlink mentioned this pull request Jun 11, 2026

kvarn_k4v2_g128 throughput does not scale with batch size (Sinkhorn kernel JIT-recompiles per step under dynamic batching) #15

Open

philippebich requested changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Running on WSL2 section (residency limit, spill diagnostic, UVA note)#19

docs: Running on WSL2 section (residency limit, spill diagnostic, UVA note)#19
sztlink wants to merge 2 commits into
huawei-csl:mainfrom
sztlink:wsl-readme-note

sztlink commented Jun 11, 2026

Uh oh!

philippebich left a comment

Uh oh!

sztlink commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sztlink commented Jun 11, 2026

Uh oh!

philippebich left a comment

Choose a reason for hiding this comment

Uh oh!

sztlink commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants