Skip to content

docs: Running on WSL2 section (residency limit, spill diagnostic, UVA note)#19

Open
sztlink wants to merge 2 commits into
huawei-csl:mainfrom
sztlink:wsl-readme-note
Open

docs: Running on WSL2 section (residency limit, spill diagnostic, UVA note)#19
sztlink wants to merge 2 commits into
huawei-csl:mainfrom
sztlink:wsl-readme-note

Conversation

@sztlink

@sztlink sztlink commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

As invited in #15: a README note for WSL2 users covering the two things that bite there.

  1. The VRAM residency limit (silent spill to system RAM past ~90% allocation, with the gpu-memory-utilization guidance and the power-draw diagnostic from the kvarn_k4v2_g128 throughput does not scale with batch size (Sinkhorn kernel JIT-recompiles per step under dynamic batching) #15 investigation).
  2. The UVA requirement, with the current one-line workaround.

One suggestion outside the README scope: an official escape hatch for UVA on WSL (an env var like KVARN_FORCE_UVA=1, or detecting WSL and warning instead of failing) would spare users from editing platform_utils.py by hand. Happy to PR that separately if you want it.

Environment for the measurements: WSL 2.7.3 (kernel 6.6.114), Windows 11 build 26100, driver 595.79, RTX 4090.

@philippebich philippebich left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the WSL2 note and documenting your findings.

I know these observations come from your test setup, but would you mind making this section a bit more general and avoiding references to specific card vendors or branded hardware where possible? Model names used for validation are of course fine, but I'd prefer not to call out vendors or cards in the main guidance for the project.

Thanks for the support and for documenting the WSL2 behavior.

Move the agency to the WDDM paravirtualization layer (the actual source of
the residency limit and spill), express the spill diagnostic as the
utilization/power divergence rather than a card-specific wattage, and keep the
24 GB GPU + driver only as a validation data point. Add a live-monitor command.
@sztlink

sztlink commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Thanks, that is a fair call. I have generalized the section:

  • The residency limit and silent spill are now attributed to the WDDM paravirtualization layer rather than to any card, which is also the more accurate framing (it is a property of the Windows graphics virtualization stack, not of the silicon).
  • The diagnostic is expressed as the utilization/power divergence instead of a card-specific wattage, so it reads on any device.
  • The 24 GB GPU and driver remain only as a validation data point; the model name is kept for reproducibility.
  • Added a one-line live-monitor command for the two signals.

Ready for another look whenever you have a moment, and thanks again for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants