Skip to content

Add midenc-fuzza differential fuzzing harness#1087

Merged
greenhat merged 12 commits intonextfrom
mcfa-cc
May 5, 2026
Merged

Add midenc-fuzza differential fuzzing harness#1087
greenhat merged 12 commits intonextfrom
mcfa-cc

Conversation

@greenhat
Copy link
Copy Markdown
Contributor

@greenhat greenhat commented Apr 24, 2026

It's a PoC of a test suite where an agent is generating Rust programs for differential testing using the compiler's code coverage as feedback

@greenhat greenhat changed the title Add midenc-fuzza differential fuzzing harness. Add midenc-fuzza differential fuzzing harness Apr 24, 2026
//! plus any helpers it needs. The harness prepends a fixed header
//! (`#![no_std]` + `#[panic_handler]`) before writing the case as `src/lib.rs`
//! of a generated cargo project, builds it twice — natively as a host `cdylib`
//! and via `cargo-miden` to a MASM package — and compares outputs across
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant with our proptest-based tests (which have the added benefit of shrinking to find minimal repros)? Or is it meant more as a proof-of-concept, and the intended use would be for programs where proptest isn't as well suited vs a traditional fuzzer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a PoC of using an agent to generate Rust programs using the compiler's code coverage as feedback. I missed a PR description.

let entry: libloading::Symbol<EntryFn> = unsafe { lib.get(b"entrypoint\0") }
.unwrap_or_else(|e| panic!("missing `entrypoint` in {}: {e}", dylib_path.display()));

// Proptest: 16 cases, shrinking disabled — the whole case file IS the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's actually true? The primary purpose of shrinking is to find the simplest input which triggers an issue, not to find the smallest program that reproduces the issue. As far as I can tell here, the case file is just the program, not the minimal inputs.

Granted, if your only inputs to a program are just plain u32 integer values, shrinking provides less value - but it can still reduce the failing input to something like (u32::MAX, 0, 0), instead of a random set of values like (u32::MAX, 123439, 8631234), which can often highlight what input is the problem (or at least remove some randomness from the inputs that obscure things unnecessarily).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit off. My reason is two-fold:

  1. The shrinking generates a lot of noise that messes up the feedback for the agent.
  2. I want to capture the exact inputs that triggered the miscompilation. Shrunk inputs might trigger another code path (another miscompilation?).

panic = "abort"

[profile.dev]
panic = "abort"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you compile a cdylib with panic = "abort" and then dynamically-load it into the current process, and execution hits a panic, you'll crash the whole process without having a way to catch the panic (because it will abort, rather than unwind as expected).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I hadn't thought about that.

@greenhat
Copy link
Copy Markdown
Contributor Author

I'm pretty happy with how it turned out. During the tuning runs, it discovered #1093, #1094 and #1095.
The instructions on how to launch an agent are in the tests/fuzza/README.md.

@greenhat greenhat requested review from bitwalker and mooori April 30, 2026 11:02
Copy link
Copy Markdown
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! My only request is that we move this under tools (I want tests to contain only actual test sources, not tooling used by the test suite). We still have a pending task to clean up the organization of tests in general, but I mostly want to avoid adding new stuff under tests unless that is the only place that makes sense for it.

I'm marking this approved contingent on that move, so I don't hold things up while I'm offline

Copy link
Copy Markdown
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside - can we have the test cases emitted to the tests directory under some appropriate test crate? Those don't feel like they belong under the miden-fuzza tool itself.

Ideally, miden-fuzza can be used to generate the cases, and then they would be executed by our standard test suite, without the need to use miden-fuzza for that. In other words, miden-fuzza solves the problem of identifying interesting test cases, generating them, validating whether they capture a real regression or improve test coverage, and then those generated test cases get merged into our standard test suite.

AIUI, that totally fits within the intended usage of this new tool, but let me know if I'm missing anything

@greenhat
Copy link
Copy Markdown
Contributor Author

greenhat commented May 1, 2026

Good point! I'll split it.

@greenhat greenhat marked this pull request as draft May 1, 2026 10:51
greenhat and others added 11 commits May 5, 2026 10:33
Adds a new `tests/fuzza` crate that compiles each case's Rust source twice
— natively as a host cdylib (dlopened via `libloading`) and via
`cargo-miden` to a MASM package — and compares `entrypoint(u32, u32) -> u32`
outputs over 16 random input pairs per case.

Seed cases: `add`, `sub`, `xor` (green); `muladd` is `#[ignore]`d because
the harness surfaced a real native/MASM divergence on `wrapping_mul` that
needs a separate investigation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each case used to repeat `#![no_std]` plus a `#[panic_handler]` and
`#[alloc_error_handler]`. The harness now prepends a fixed header with
`#![no_std]` + panic handler before writing the case as `src/lib.rs` of
the generated cargo project, so a case contains only the `entrypoint`
function (and any helpers).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `cargo make fuzza-cov`, which runs the midenc-fuzza tests under
cargo-llvm-cov, writes a raw JSON report and an HTML browser view under
`target/fuzza-coverage/`, and reduces the JSON with
`tests/fuzza/cov.py` into a fuzza-oriented Markdown summary that
highlights the compiler functions most worth growing coverage on
(filtered to compiler crates, names demangled via `rustfilt`, boring
trait impls dropped). Also adds `fuzza-cov-clean` to reset the
accumulated profile data.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `fuzza-cov-step` task: re-runs the fuzza tests under cargo-llvm-cov
  with `--no-clean` so the profile data accumulates across iterations,
  rotates the previous report JSON into `report.prev.json`, and tolerates
  failing cases so a divergence or compile error still produces a report.
- `cov.py --prev <json>`: adds a "Delta since previous run" section to
  the Markdown summary, listing newly-exercised functions and functions
  that gained regions — the feedback signal the case-generation agent
  reads each iteration.
- `fuzza-cov-clean`: wipes both profile data and `target/fuzza-coverage/`.
- `tests/fuzza/AGENT-PROMPT.md`: copy-paste prompt template for launching
  a coverage-guided case-generation agent (targets a specific compiler
  area, stops on a plateau instead of an arbitrary % target).
- Adds a `branchy` seed case exercising if/else and u32 div/rem, which
  on its own lifts compiler coverage from ~20% to ~27% of regions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g note to agent prompt

A coverage smoke test surfaced two issues:

- `build_host_cdylib` spawned `cargo build` for the case project without
  clearing `CARGO_TARGET_DIR`. Under `cargo llvm-cov` the parent process
  has it set to `target/llvm-cov-target/`, so the host artifact landed
  there instead of the per-project `target/release/` we look in. Now the
  spawn does `.env_remove("CARGO_TARGET_DIR")`.

- AGENT-PROMPT.md told the agent to "skim source at File:line" without
  warning about the wasm-frontend → HIR-op → emitter routing layer. The
  agent picked `OpEmitter::cast` for its size; Rust `as` casts route via
  HIR `trunc`/`zext`/`sext`, never reaching `cast`. Step 2 of the loop
  now spells this out and points the agent at
  `frontend/wasm/src/code_translator/` for chain sanity checks.

Adds the `widening` case the smoke test produced; it now passes under
`fuzza-cov-step` (was `#[ignore]`d due to the harness bug above).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A second smoke run of the fuzza agent prompt produced `case_bitops.rs`,
exercising u32 bitwise / shift / rotate / comparison emitter arms in
`codegen/masm/src/emit/binary.rs`. It surfaces another likely native/MASM
divergence (inputs (4146962468, 1369714330) trigger a MASM `eqz`
assertion at cycle 92), so the test is `#[ignore]`d pending root-cause
investigation; compile-side coverage from the case still counts and
covered ~9 previously-untouched emitter functions in `binary.rs`.

Two minor refinements to AGENT-PROMPT.md from the run:
- Note that a failing case still contributes compile-side coverage so the
  agent doesn't think a divergence wastes the iteration.
- Note that integer literals in Rust source do not generally reach `_imm`
  emitter variants — those require HIR-level canonicalization, not raw
  user code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous host-build artifact lookup walked
`<project>/target/release/lib<name>.{so,dylib,dll}`, which broke under
some CI conditions where the artifact cargo emitted didn't end up at
that exact path (e.g. CARGO_TARGET_DIR redirection from `cargo make`
that the env_remove didn't fully cover, or platform/cargo path quirks).

Switch `build_host_cdylib` to spawn `cargo build` with
`--message-format=json-render-diagnostics`, parse the stream with
`cargo_metadata::Message`, and pick up the cdylib artifact's filename
directly from the build output. This is the canonical way and is robust
to platform naming, target-dir overrides, and cargo internals.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…self

The README documents the harness, how to run it, how to add cases, and the
coverage-guided workflow — including how to fill in and use AGENT-PROMPT.md.
The prompt file now contains only the prompt, with `[area]` defined in one
place at the top.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The fuzza harness was missing Rust cases that exercise structured loop, switch, and nested branch paths in the control-flow pipeline.

Add five focused no_std cases and wire them into the fuzza test list. Four are enabled coverage cases; the unreachable-edge case is kept ignored because it exposes a native/MASM divergence for inputs (363814857, 995348134) while still documenting the compiler path it reaches.

Verified with cargo make fuzza-cov-step during iteration, then cargo make test, cargo make clippy, and cargo make format-rust.
Move the differential fuzzing harness and its 12 cases from the
standalone `midenc-fuzza` crate into
`tests/integration/src/end_to_end/differential/`, where they sit
alongside the other Rust→MASM end-to-end tests. The coverage-driven
agent tooling (`AGENT-PROMPT.md`, `README.md`, `cov.py`) moves to
`tools/fuzza-agent/` so the test code and the agent workflow no longer
share a crate. The `cargo make fuzza-cov*` tasks keep their names but
now invoke the integration suite filtered by `differential`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@greenhat greenhat marked this pull request as ready for review May 5, 2026 09:03
@greenhat greenhat merged commit 59630d5 into next May 5, 2026
15 checks passed
@greenhat greenhat deleted the mcfa-cc branch May 5, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants