Fix MOV memoization safety and add repro tests #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MOV vs DUP Bounty — Proof Report
This report explains what was broken, what is fixed now, why the fixes are correct, and why MOV can be faster in some cases yet slower in others. It also documents why a universally “always‑faster and always‑correct” MOV is not achievable in this runtime, using both the f3 (new by me) and Payor repros.
1) Problem recap
The bounty requires that MOV and DUP encodings produce the same output under
-C1, and that MOV finishes under 50k interactions on the provided program. In the original state, MOV differed from DUP because the runtime memoized MOV‑produced lambdas and accidentally shared mutable binder slots across uses. A second failure mode appeared when a MOV binder was used 3+ times, violating linearity without explicit duplication.Two minimal repros make the issues concrete:
2) Root cause
In HVM4’s C runtime, application performs in‑place substitution through binder slots / SUB cells (via
heap_subst_var). If a MOV‑bound value reduces to a LAM and we memoize it, multiple uses share the same consumable binder slot. That lets later applications overwrite earlier bindings and changes the program’s result. This is the root cause of the Payor mismatch. Separately, MOV binders used 3+ times must be made explicit (duplication) to preserve linearity; otherwise a single MOV binder is consumed multiple times.3) Fixes present in the current code
The current code fixes both failure modes without inventing new semantics. These are the changes intended for PR:
3.1 SAFE_MOV: disable MOV‑LAM memoization by default (only)
Files:
hvm4_static/clang/wnf/mov_lam.c,hvm4_static/clang/hvm4.c,hvm4_static/clang/main.cWhat changed: when a MOV value reduces to a LAM, the runtime does not memoize the MOV expr slot (no
heap_subst_varon the MOV slot). Each use gets a fresh lambda wrapper instead of sharing a mutable binder slot.Why: prevents multiple applications from clobbering one shared binder cell.
Scope note: SAFE_MOV only affects MOV‑LAM memoization. MOV‑NOD / MOV‑SUP / MOV‑RED still memoize normally.
Impact: correctness for Payor; small overhead in lambda‑heavy MOV paths because each use rebuilds the lambda wrapper.
Toggle: set
HVM4_SAFE_MOV=0to enable legacy memoization. Default is safe (memoization off for MOV‑LAM).3.2 Auto‑dup for MOV uses > 2 (parser‑level)
File:
hvm4_static/clang/parse/term/mov.cWhat changed: if a MOV binder is used more than twice,
parse_auto_dup(..., BJM, 0)rewrites the body to make extra uses explicit. The MOV value is consumed once and the extra uses are routed through DUPs.Why
BJMis correct here: at parse time, MOV‑bound variables are represented asBJM(seehvm4_static/clang/parse/term/var.c, MOV binder path).parse_auto_dupoperates on the parser’s de‑Bruijn representation (BJV/BJ0/BJ1/BJM), not runtime GOTs.Why: preserves linearity under nested duplication and fixes the f3 repro.
Impact: correctness for 3+ uses; small compile‑time rewrite and extra DUPs at runtime. This is required for semantics and is not optional.
3.3 Minimal MOV/DUP dispatch (no special MOV‑under‑DUP fast paths)
File:
hvm4_static/clang/wnf/_.cWhat changed: WNF dispatch stays aligned with upstream; no custom MOV‑under‑DUP or DUP/MOV fast paths. This avoids adding administrative interactions that were inflating counts without correctness benefit.
3.4 Safe‑atom MOV fast paths
Files:
hvm4_static/clang/wnf/mov_nod.c,hvm4_static/clang/wnf/mov_sup.cWhat changed: when all fields are immutable atoms (e.g.,
NUM,NAM,ERA, quoted vars), skip GOT allocation and directly reconstruct the node. Inmov_nod, this is guarded byari <= 16to keep a small stack buffer.Why: avoid needless GOT indirections for trivial data.
Impact: small perf win on atom‑heavy values; no semantic change.
4) Why MOV can be faster (and when)
MOV can reduce interactions when it avoids explicit duplication and avoids propagating duplication through large structures. Typical wins:
This is why the bounty program drops from ~97k interactions in the DUP encoding to ~24.8k in the MOV encoding.
5) Why MOV can be slower (and when)
MOV adds administrative overhead even when it is correct:
The Payor repro is a concrete case where MOV is correct but slower (22 vs 17 interactions). So “MOV is never slower” is not generally true without additional restrictions.
6) Why an “always‑faster and always‑correct MOV” is not achievable (in this runtime)
This is the conceptual boundary for the current MOV design (memoized sharing of potentially consumable structures like LAM). The key fact is that “used once per branch” is not a reduction invariant. During reduction, branch structure can vanish (e.g., same‑label SUP annihilation), collapsing the computation into a single residual term that simultaneously needs multiple branch‑indexed components. In such cases, sharing a single mutable binder slot becomes incorrect: the second use can overwrite the first. Correctness then forces duplication. Once duplication is forced, MOV cannot be universally faster than explicit DUP in this runtime without stronger primitives.
Both repros illustrate this:
Conclusion: a “perfect” MOV that is always faster and always correct is not possible in this runtime without changing the calculus or adding new, stronger primitives.
7) Evidence (commands and outputs)
All commands run using
hvm4_static/clang/mainbuilt withgcc -O2.Correctness: f3 repro
Output (both):
Correctness: Payor repro
Output (both):
Unsafe mode reproduces the bug
With
HVM4_SAFE_MOV=0, MOV‑LAM memoization is re‑enabled and the Payor repro diverges from the DUP result (unsafe behavior). This demonstrates that memoizing MOV‑produced lambdas is unsound in this runtime.Output:
Performance: Full bounty program (MOV)
Output:
Status: correct output, meets <50k target for the bounty program and the added repro/test set.
Control: Full bounty program (DUP)
Output:
Example where MOV is correct but slower
This runs
payor_repro(MOV) andpayor_repro_dup(DUP) with-s. MOV is correct but shows more interactions (22 vs 17).MOV‑LAM microbench (SAFE_MOV on/off)
Files:
hvm4_static/test/mov_lam_bench.hvm4hvm4_static/test/mov_lam_bench_dup.hvm4mov_lam_bench.shRun:
Observed:
Conclusion: memoizing MOV‑LAM yields only a tiny improvement on this microbench and still trails DUP.
MOV‑LAM erase microbench
Files:
hvm4_static/test/mov_lam_erase_bench.hvm4hvm4_static/test/mov_lam_erase_bench_dup.hvm4Observed:
Conclusion: memoization does not improve this case (already minimal).
8) Files changed
hvm4_static/clang/hvm4.c— SAFE_MOV flag and env override.hvm4_static/clang/main.c— SAFE_MOV initialization.hvm4_static/clang/wnf/mov_lam.c— skip MOV‑LAM memoization when SAFE_MOV is enabled.hvm4_static/clang/parse/term/mov.c— auto‑dup for uses > 2.hvm4_static/clang/wnf/_.c— dispatch aligned with upstream (no MOV‑under‑DUP special paths).hvm4_static/clang/wnf/mov_nod.c,hvm4_static/clang/wnf/mov_sup.c— safe‑atom fast paths.hvm4_static/test/mov_bounty.hvm4,hvm4_static/test/mov_bounty_dup.hvm4,hvm4_static/test/mov_bounty_f3.hvm4,hvm4_static/test/mov_bounty_f3_dup.hvm4,hvm4_static/test/payor_repro.hvm4,hvm4_static/test/payor_repro_dup.hvm4.hvm4_static/test/mov_bounty_min.hvm4,hvm4_static/test/mov_bounty_min_dup.hvm4,hvm4_static/test/mov_bounty_small.hvm4,hvm4_static/test/mov_bounty_small_dup.hvm4,hvm4_static/test/mov_bounty_tiny.hvm4,hvm4_static/test/mov_bounty_tiny_dup.hvm4,hvm4_static/test/mov_dup_func.hvm4,hvm4_static/test/mov_dup_nested.hvm4,hvm4_static/test/mov_dup_test.hvm4,hvm4_static/test/mov_erase_test.hvm4,hvm4_static/test/mov_erase_test_dup.hvm4,hvm4_static/test/mov_lam_bench.hvm4,hvm4_static/test/mov_lam_bench_dup.hvm4,hvm4_static/test/mov_lam_dupvar.hvm4,hvm4_static/test/mov_lam_erase_bench.hvm4,hvm4_static/test/mov_lam_erase_bench_dup.hvm4,hvm4_static/test/mov_lam_test.hvm4.mov_vs_dup_slow.sh,mov_lam_bench.sh.9) Test suite run
Outcome: PASS. The script reports
clang: command not foundbut continues using the existingclang/mainbinary.10) Letter vs spirit of the bounty
-C1for the bounty program and the added repro/test set, and MOV completes the bounty program in 24,805 interactions (<50k).11) Remaining risks / follow‑ups