Conversation
midenc-fuzza differential fuzzing harness.midenc-fuzza differential fuzzing harness
| //! plus any helpers it needs. The harness prepends a fixed header | ||
| //! (`#![no_std]` + `#[panic_handler]`) before writing the case as `src/lib.rs` | ||
| //! of a generated cargo project, builds it twice — natively as a host `cdylib` | ||
| //! and via `cargo-miden` to a MASM package — and compares outputs across |
There was a problem hiding this comment.
Isn't this redundant with our proptest-based tests (which have the added benefit of shrinking to find minimal repros)? Or is it meant more as a proof-of-concept, and the intended use would be for programs where proptest isn't as well suited vs a traditional fuzzer?
There was a problem hiding this comment.
It's a PoC of using an agent to generate Rust programs using the compiler's code coverage as feedback. I missed a PR description.
| let entry: libloading::Symbol<EntryFn> = unsafe { lib.get(b"entrypoint\0") } | ||
| .unwrap_or_else(|e| panic!("missing `entrypoint` in {}: {e}", dylib_path.display())); | ||
|
|
||
| // Proptest: 16 cases, shrinking disabled — the whole case file IS the |
There was a problem hiding this comment.
I'm not sure that's actually true? The primary purpose of shrinking is to find the simplest input which triggers an issue, not to find the smallest program that reproduces the issue. As far as I can tell here, the case file is just the program, not the minimal inputs.
Granted, if your only inputs to a program are just plain u32 integer values, shrinking provides less value - but it can still reduce the failing input to something like (u32::MAX, 0, 0), instead of a random set of values like (u32::MAX, 123439, 8631234), which can often highlight what input is the problem (or at least remove some randomness from the inputs that obscure things unnecessarily).
There was a problem hiding this comment.
The comment is a bit off. My reason is two-fold:
- The shrinking generates a lot of noise that messes up the feedback for the agent.
- I want to capture the exact inputs that triggered the miscompilation. Shrunk inputs might trigger another code path (another miscompilation?).
| panic = "abort" | ||
|
|
||
| [profile.dev] | ||
| panic = "abort" |
There was a problem hiding this comment.
If you compile a cdylib with panic = "abort" and then dynamically-load it into the current process, and execution hits a panic, you'll crash the whole process without having a way to catch the panic (because it will abort, rather than unwind as expected).
There was a problem hiding this comment.
Thank you! I hadn't thought about that.
bitwalker
left a comment
There was a problem hiding this comment.
Looks good! My only request is that we move this under tools (I want tests to contain only actual test sources, not tooling used by the test suite). We still have a pending task to clean up the organization of tests in general, but I mostly want to avoid adding new stuff under tests unless that is the only place that makes sense for it.
I'm marking this approved contingent on that move, so I don't hold things up while I'm offline
bitwalker
left a comment
There was a problem hiding this comment.
As an aside - can we have the test cases emitted to the tests directory under some appropriate test crate? Those don't feel like they belong under the miden-fuzza tool itself.
Ideally, miden-fuzza can be used to generate the cases, and then they would be executed by our standard test suite, without the need to use miden-fuzza for that. In other words, miden-fuzza solves the problem of identifying interesting test cases, generating them, validating whether they capture a real regression or improve test coverage, and then those generated test cases get merged into our standard test suite.
AIUI, that totally fits within the intended usage of this new tool, but let me know if I'm missing anything
|
Good point! I'll split it. |
Adds a new `tests/fuzza` crate that compiles each case's Rust source twice — natively as a host cdylib (dlopened via `libloading`) and via `cargo-miden` to a MASM package — and compares `entrypoint(u32, u32) -> u32` outputs over 16 random input pairs per case. Seed cases: `add`, `sub`, `xor` (green); `muladd` is `#[ignore]`d because the harness surfaced a real native/MASM divergence on `wrapping_mul` that needs a separate investigation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each case used to repeat `#![no_std]` plus a `#[panic_handler]` and `#[alloc_error_handler]`. The harness now prepends a fixed header with `#![no_std]` + panic handler before writing the case as `src/lib.rs` of the generated cargo project, so a case contains only the `entrypoint` function (and any helpers). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `cargo make fuzza-cov`, which runs the midenc-fuzza tests under cargo-llvm-cov, writes a raw JSON report and an HTML browser view under `target/fuzza-coverage/`, and reduces the JSON with `tests/fuzza/cov.py` into a fuzza-oriented Markdown summary that highlights the compiler functions most worth growing coverage on (filtered to compiler crates, names demangled via `rustfilt`, boring trait impls dropped). Also adds `fuzza-cov-clean` to reset the accumulated profile data. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `fuzza-cov-step` task: re-runs the fuzza tests under cargo-llvm-cov with `--no-clean` so the profile data accumulates across iterations, rotates the previous report JSON into `report.prev.json`, and tolerates failing cases so a divergence or compile error still produces a report. - `cov.py --prev <json>`: adds a "Delta since previous run" section to the Markdown summary, listing newly-exercised functions and functions that gained regions — the feedback signal the case-generation agent reads each iteration. - `fuzza-cov-clean`: wipes both profile data and `target/fuzza-coverage/`. - `tests/fuzza/AGENT-PROMPT.md`: copy-paste prompt template for launching a coverage-guided case-generation agent (targets a specific compiler area, stops on a plateau instead of an arbitrary % target). - Adds a `branchy` seed case exercising if/else and u32 div/rem, which on its own lifts compiler coverage from ~20% to ~27% of regions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g note to agent prompt
A coverage smoke test surfaced two issues:
- `build_host_cdylib` spawned `cargo build` for the case project without
clearing `CARGO_TARGET_DIR`. Under `cargo llvm-cov` the parent process
has it set to `target/llvm-cov-target/`, so the host artifact landed
there instead of the per-project `target/release/` we look in. Now the
spawn does `.env_remove("CARGO_TARGET_DIR")`.
- AGENT-PROMPT.md told the agent to "skim source at File:line" without
warning about the wasm-frontend → HIR-op → emitter routing layer. The
agent picked `OpEmitter::cast` for its size; Rust `as` casts route via
HIR `trunc`/`zext`/`sext`, never reaching `cast`. Step 2 of the loop
now spells this out and points the agent at
`frontend/wasm/src/code_translator/` for chain sanity checks.
Adds the `widening` case the smoke test produced; it now passes under
`fuzza-cov-step` (was `#[ignore]`d due to the harness bug above).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A second smoke run of the fuzza agent prompt produced `case_bitops.rs`, exercising u32 bitwise / shift / rotate / comparison emitter arms in `codegen/masm/src/emit/binary.rs`. It surfaces another likely native/MASM divergence (inputs (4146962468, 1369714330) trigger a MASM `eqz` assertion at cycle 92), so the test is `#[ignore]`d pending root-cause investigation; compile-side coverage from the case still counts and covered ~9 previously-untouched emitter functions in `binary.rs`. Two minor refinements to AGENT-PROMPT.md from the run: - Note that a failing case still contributes compile-side coverage so the agent doesn't think a divergence wastes the iteration. - Note that integer literals in Rust source do not generally reach `_imm` emitter variants — those require HIR-level canonicalization, not raw user code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous host-build artifact lookup walked
`<project>/target/release/lib<name>.{so,dylib,dll}`, which broke under
some CI conditions where the artifact cargo emitted didn't end up at
that exact path (e.g. CARGO_TARGET_DIR redirection from `cargo make`
that the env_remove didn't fully cover, or platform/cargo path quirks).
Switch `build_host_cdylib` to spawn `cargo build` with
`--message-format=json-render-diagnostics`, parse the stream with
`cargo_metadata::Message`, and pick up the cdylib artifact's filename
directly from the build output. This is the canonical way and is robust
to platform naming, target-dir overrides, and cargo internals.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…self The README documents the harness, how to run it, how to add cases, and the coverage-guided workflow — including how to fill in and use AGENT-PROMPT.md. The prompt file now contains only the prompt, with `[area]` defined in one place at the top. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The fuzza harness was missing Rust cases that exercise structured loop, switch, and nested branch paths in the control-flow pipeline. Add five focused no_std cases and wire them into the fuzza test list. Four are enabled coverage cases; the unreachable-edge case is kept ignored because it exposes a native/MASM divergence for inputs (363814857, 995348134) while still documenting the compiler path it reaches. Verified with cargo make fuzza-cov-step during iteration, then cargo make test, cargo make clippy, and cargo make format-rust.
Move the differential fuzzing harness and its 12 cases from the standalone `midenc-fuzza` crate into `tests/integration/src/end_to_end/differential/`, where they sit alongside the other Rust→MASM end-to-end tests. The coverage-driven agent tooling (`AGENT-PROMPT.md`, `README.md`, `cov.py`) moves to `tools/fuzza-agent/` so the test code and the agent workflow no longer share a crate. The `cargo make fuzza-cov*` tasks keep their names but now invoke the integration suite filtered by `differential`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
It's a PoC of a test suite where an agent is generating Rust programs for differential testing using the compiler's code coverage as feedback