Skip to content

Conversation

@HaileyStorm
Copy link

@HaileyStorm HaileyStorm commented Jan 5, 2026

MOV vs DUP Bounty — Proof Report

This report explains what was broken, what is fixed now, why the fixes are correct, and why MOV can be faster in some cases yet slower in others. It also documents why a universally “always‑faster and always‑correct” MOV is not achievable in this runtime, using both the f3 (new by me) and Payor repros.

1) Problem recap

The bounty requires that MOV and DUP encodings produce the same output under -C1, and that MOV finishes under 50k interactions on the provided program. In the original state, MOV differed from DUP because the runtime memoized MOV‑produced lambdas and accidentally shared mutable binder slots across uses. A second failure mode appeared when a MOV binder was used 3+ times, violating linearity without explicit duplication.

Two minimal repros make the issues concrete:

  • Payor repro: reduction can erase branch structure and force multiple independent uses into a single residual computation. If a MOV‑produced lambda is memoized, those uses share a mutable binder slot, and later applications can overwrite earlier bindings. This changes semantics.
  • f3 repro: a MOV binder is effectively used three times through nested duplication. Without explicit duplication, the runtime reuses a single MOV slot multiple times, violating linearity and creating aliasing.

2) Root cause

In HVM4’s C runtime, application performs in‑place substitution through binder slots / SUB cells (via heap_subst_var). If a MOV‑bound value reduces to a LAM and we memoize it, multiple uses share the same consumable binder slot. That lets later applications overwrite earlier bindings and changes the program’s result. This is the root cause of the Payor mismatch. Separately, MOV binders used 3+ times must be made explicit (duplication) to preserve linearity; otherwise a single MOV binder is consumed multiple times.

3) Fixes present in the current code

The current code fixes both failure modes without inventing new semantics. These are the changes intended for PR:

3.1 SAFE_MOV: disable MOV‑LAM memoization by default (only)

Files: hvm4_static/clang/wnf/mov_lam.c, hvm4_static/clang/hvm4.c, hvm4_static/clang/main.c

What changed: when a MOV value reduces to a LAM, the runtime does not memoize the MOV expr slot (no heap_subst_var on the MOV slot). Each use gets a fresh lambda wrapper instead of sharing a mutable binder slot.

Why: prevents multiple applications from clobbering one shared binder cell.

Scope note: SAFE_MOV only affects MOV‑LAM memoization. MOV‑NOD / MOV‑SUP / MOV‑RED still memoize normally.

Impact: correctness for Payor; small overhead in lambda‑heavy MOV paths because each use rebuilds the lambda wrapper.

Toggle: set HVM4_SAFE_MOV=0 to enable legacy memoization. Default is safe (memoization off for MOV‑LAM).

3.2 Auto‑dup for MOV uses > 2 (parser‑level)

File: hvm4_static/clang/parse/term/mov.c

What changed: if a MOV binder is used more than twice, parse_auto_dup(..., BJM, 0) rewrites the body to make extra uses explicit. The MOV value is consumed once and the extra uses are routed through DUPs.

Why BJM is correct here: at parse time, MOV‑bound variables are represented as BJM (see hvm4_static/clang/parse/term/var.c, MOV binder path). parse_auto_dup operates on the parser’s de‑Bruijn representation (BJV/BJ0/BJ1/BJM), not runtime GOTs.

Why: preserves linearity under nested duplication and fixes the f3 repro.

Impact: correctness for 3+ uses; small compile‑time rewrite and extra DUPs at runtime. This is required for semantics and is not optional.

3.3 Minimal MOV/DUP dispatch (no special MOV‑under‑DUP fast paths)

File: hvm4_static/clang/wnf/_.c

What changed: WNF dispatch stays aligned with upstream; no custom MOV‑under‑DUP or DUP/MOV fast paths. This avoids adding administrative interactions that were inflating counts without correctness benefit.

3.4 Safe‑atom MOV fast paths

Files: hvm4_static/clang/wnf/mov_nod.c, hvm4_static/clang/wnf/mov_sup.c

What changed: when all fields are immutable atoms (e.g., NUM, NAM, ERA, quoted vars), skip GOT allocation and directly reconstruct the node. In mov_nod, this is guarded by ari <= 16 to keep a small stack buffer.

Why: avoid needless GOT indirections for trivial data.

Impact: small perf win on atom‑heavy values; no semantic change.

4) Why MOV can be faster (and when)

MOV can reduce interactions when it avoids explicit duplication and avoids propagating duplication through large structures. Typical wins:

  • Branch‑local uses: when a value is consumed in each branch once and the runtime can preserve branch separation, MOV avoids building a DUP chain and avoids the associated admin work.
  • Value is large and structured: DUP causes propagation through structure; MOV can share the value and expand only when needed.
  • Atom‑heavy values: the safe‑atom fast path avoids GOT allocations altogether.

This is why the bounty program drops from ~97k interactions in the DUP encoding to ~24.8k in the MOV encoding.

5) Why MOV can be slower (and when)

MOV adds administrative overhead even when it is correct:

  1. Extra GOT work: MOV expands values by creating GOTs and redirecting through them. If a value is used only once, those GOTs are pure overhead.
  2. SAFE_MOV and LAMs: correctness requires producing a fresh lambda wrapper for each use of a MOV‑produced lambda. That can be more work than a DUP that propagates once.
  3. MOV interaction overhead: MOV‑NOD / MOV‑SUP / MOV‑RED / MOV‑DRY create extra heap traffic and indirections that can dominate small programs.

The Payor repro is a concrete case where MOV is correct but slower (22 vs 17 interactions). So “MOV is never slower” is not generally true without additional restrictions.

6) Why an “always‑faster and always‑correct MOV” is not achievable (in this runtime)

This is the conceptual boundary for the current MOV design (memoized sharing of potentially consumable structures like LAM). The key fact is that “used once per branch” is not a reduction invariant. During reduction, branch structure can vanish (e.g., same‑label SUP annihilation), collapsing the computation into a single residual term that simultaneously needs multiple branch‑indexed components. In such cases, sharing a single mutable binder slot becomes incorrect: the second use can overwrite the first. Correctness then forces duplication. Once duplication is forced, MOV cannot be universally faster than explicit DUP in this runtime without stronger primitives.

Both repros illustrate this:

  • Payor repro: reduction collapses branch structure, forcing two independent uses into one residual computation. Any memoized shared lambda slot becomes incorrect; correctness requires separation (duplication).
  • f3 repro: nested duplication yields 3 uses of a MOV binder. Without explicit duplication, linearity is violated. The runtime must introduce duplication to preserve semantics.

Conclusion: a “perfect” MOV that is always faster and always correct is not possible in this runtime without changing the calculus or adding new, stronger primitives.

7) Evidence (commands and outputs)

All commands run using hvm4_static/clang/main built with gcc -O2.

Correctness: f3 repro

./hvm4_static/clang/main hvm4_static/test/mov_bounty_f3.hvm4 -C1
./hvm4_static/clang/main hvm4_static/test/mov_bounty_f3_dup.hvm4 -C1

Output (both):

#O{#I{#E{}}}

Correctness: Payor repro

./hvm4_static/clang/main hvm4_static/test/payor_repro.hvm4 -C1
./hvm4_static/clang/main hvm4_static/test/payor_repro_dup.hvm4 -C1

Output (both):

λa.a

Unsafe mode reproduces the bug

With HVM4_SAFE_MOV=0, MOV‑LAM memoization is re‑enabled and the Payor repro diverges from the DUP result (unsafe behavior). This demonstrates that memoizing MOV‑produced lambdas is unsound in this runtime.

HVM4_SAFE_MOV=0 ./hvm4_static/clang/main hvm4_static/test/payor_repro.hvm4 -C1

Output:

λa.λb.λc.c

Performance: Full bounty program (MOV)

./hvm4_static/clang/main hvm4_static/test/mov_bounty.hvm4 -C1 -s

Output:

#O{#O{#O{#O{#I{#O{#E{}}}}}}}  #24805
- Itrs: 24805 interactions

Status: correct output, meets <50k target for the bounty program and the added repro/test set.

Control: Full bounty program (DUP)

./hvm4_static/clang/main hvm4_static/test/mov_bounty_dup.hvm4 -C1 -s

Output:

#O{#O{#O{#O{#I{#O{#E{}}}}}}}  #97885
- Itrs: 97885 interactions

Example where MOV is correct but slower

./mov_vs_dup_slow.sh

This runs payor_repro (MOV) and payor_repro_dup (DUP) with -s. MOV is correct but shows more interactions (22 vs 17).

MOV‑LAM microbench (SAFE_MOV on/off)

Files:

  • hvm4_static/test/mov_lam_bench.hvm4
  • hvm4_static/test/mov_lam_bench_dup.hvm4
  • mov_lam_bench.sh

Run:

./mov_lam_bench.sh

Observed:

  • MOV (SAFE_MOV=1 default): 366 interactions
  • MOV (SAFE_MOV=0, memoize MOV‑LAM): 363 interactions
  • DUP: 237 interactions
    Conclusion: memoizing MOV‑LAM yields only a tiny improvement on this microbench and still trails DUP.

MOV‑LAM erase microbench

Files:

  • hvm4_static/test/mov_lam_erase_bench.hvm4
  • hvm4_static/test/mov_lam_erase_bench_dup.hvm4

Observed:

  • MOV (SAFE_MOV=1 default): 16 interactions
  • MOV (SAFE_MOV=0, memoize MOV‑LAM): 16 interactions
  • DUP: 17 interactions
    Conclusion: memoization does not improve this case (already minimal).

8) Files changed

  • hvm4_static/clang/hvm4.c — SAFE_MOV flag and env override.
  • hvm4_static/clang/main.c — SAFE_MOV initialization.
  • hvm4_static/clang/wnf/mov_lam.c — skip MOV‑LAM memoization when SAFE_MOV is enabled.
  • hvm4_static/clang/parse/term/mov.c — auto‑dup for uses > 2.
  • hvm4_static/clang/wnf/_.c — dispatch aligned with upstream (no MOV‑under‑DUP special paths).
  • hvm4_static/clang/wnf/mov_nod.c, hvm4_static/clang/wnf/mov_sup.c — safe‑atom fast paths.
  • Test additions (bounty and repros): hvm4_static/test/mov_bounty.hvm4, hvm4_static/test/mov_bounty_dup.hvm4, hvm4_static/test/mov_bounty_f3.hvm4, hvm4_static/test/mov_bounty_f3_dup.hvm4, hvm4_static/test/payor_repro.hvm4, hvm4_static/test/payor_repro_dup.hvm4.
  • Additional test coverage and minimizers: hvm4_static/test/mov_bounty_min.hvm4, hvm4_static/test/mov_bounty_min_dup.hvm4, hvm4_static/test/mov_bounty_small.hvm4, hvm4_static/test/mov_bounty_small_dup.hvm4, hvm4_static/test/mov_bounty_tiny.hvm4, hvm4_static/test/mov_bounty_tiny_dup.hvm4, hvm4_static/test/mov_dup_func.hvm4, hvm4_static/test/mov_dup_nested.hvm4, hvm4_static/test/mov_dup_test.hvm4, hvm4_static/test/mov_erase_test.hvm4, hvm4_static/test/mov_erase_test_dup.hvm4, hvm4_static/test/mov_lam_bench.hvm4, hvm4_static/test/mov_lam_bench_dup.hvm4, hvm4_static/test/mov_lam_dupvar.hvm4, hvm4_static/test/mov_lam_erase_bench.hvm4, hvm4_static/test/mov_lam_erase_bench_dup.hvm4, hvm4_static/test/mov_lam_test.hvm4.
  • Scripts: mov_vs_dup_slow.sh, mov_lam_bench.sh.

9) Test suite run

./hvm4_static/test/_all_.sh

Outcome: PASS. The script reports clang: command not found but continues using the existing clang/main binary.

10) Letter vs spirit of the bounty

  • Letter: satisfied. MOV and DUP encodings match under -C1 for the bounty program and the added repro/test set, and MOV completes the bounty program in 24,805 interactions (<50k).
  • Spirit: satisfied in the sense that MOV remains a net win on the bounty program, while correctness comes first. MOV can still be slower on some small programs though (Payor repro), which is an inherent limitation of this runtime’s representation and the need to avoid unsafe sharing. A more complete explanation for why the "perfect" MOV cannot exist is also provided, though of course it depends in part on Payor's example.

11) Remaining risks / follow‑ups

  • MOV can be slower in some cases; this is expected and explained above.
  • A universal “never slower” guarantee is not possible without changing the calculus or adding stronger primitives.
  • Further performance work should focus on safe, local optimizations (e.g., more atom fast paths) rather than memoizing MOV‑LAMs.

@HaileyStorm
Copy link
Author

I'm hoping the silence so far is a tentative good sign :)

(Other bounty PRs quickly had a comment showing they were invalid.)

@HaileyStorm
Copy link
Author

HaileyStorm commented Jan 7, 2026

Also, I just realized I left the mov_nod and mov_sup and mov_red memoization guards in. I committed that change. There was no impact in validity or iteration count of any of the included tests, but this should speed up some scenarios (and I don't think it will introduce problems like mov_lam does).

@Lorenzobattistela
Copy link
Collaborator

We're going to review that shortly, impressive work anyways!

@Lorenzobattistela
Copy link
Collaborator

Hey @HaileyStorm, take a look at the following snippet:

@O = λp. λo. λi. λe. o(p)
@I = λp. λo. λi. λe. i(p)
@E =     λo. λi. λe. e
@N =     λc. λs. λn. n

@view = λxs.
  ! O = λp. #O{@view(p)}
  ! I = λp. #I{@view(p)}
  ! E = #E
  xs(O, I, E)

@rep2 = λ&f. λx. f(f(x))
@rep3 = λ&f. λx. f(f(f(x)))

@insert_mov = λn.
  ! O = &{}
  ! I = (λ&p. λxs. λ&o. λi. λe.
    % f = o
    ! O = λxs. f(@insert_mov(p, xs))
    ! I = λxs. i(@insert_mov(p, xs))
    ! E = f(@insert_mov(p, @N))
    xs(O, I, E))
  ! E = λxs. λo. λ&i. λe.
    % k = i
    ! O = λxs. k(xs)
    ! I = λxs. k(xs)
    ! E = k(@N)
    xs(O, I, E)
  n(O, I, E)

@insert_dup = λn.
  ! O = &{}
  ! I = (λ&p. λxs. λ&o. λi. λe.
    ! O = λxs. o(@insert_dup(p, xs))
    ! I = λxs. i(@insert_dup(p, xs))
    ! E = o(@insert_dup(p, @N))
    xs(O, I, E))
  ! E = λxs. λo. λ&i. λe.
    ! O = λxs. i(xs)
    ! I = λxs. i(xs)
    ! E = i(@N)
    xs(O, I, E)
  n(O, I, E)

@ins_mov = @insert_mov(@I(@I(@E)))
@ins_dup = @insert_dup(@I(@I(@E)))

@main = #T{
  @view(@rep2(@ins_mov, @O(@E))),  // OK:   #O{#O{#I{#E{}}}}
  @view(@rep3(@ins_mov, @O(@E))),  // FAIL: #O{λa.λb.a(...)}
  @view(@rep2(@ins_dup, @O(@E))),  // OK:   #O{#O{#I{#E{}}}}
  @view(@rep3(@ins_dup, @O(@E)))   // OK:   #O{#O{#I{#E{}}}}
}

I ran this on your branch, and got:

> ./main bug.hvm4 -s
#T{#O{#O{#I{#E{}}}},#O{λa.λb.A},#O{#O{#I{#E{}}}},#O{#O{#I{#E{}}}}};%A=A₀(λc.λd.λe.e);!A&F__e=a;
- Itrs: 1052 interactions
- Time: 0.000 seconds
- Perf: 10.85 M interactions/s

Note that the second result should be the same as all others, but it is: #O{λa.λb.A} with pending MOVs. Is this expected? Seems like a bug coming from MOV's sharing location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants