Skip to content

feat: implement debug info dialect and variable tracking#1097

Merged
bitwalker merged 33 commits intonextfrom
bitwalker/debug-variable-locations
May 4, 2026
Merged

feat: implement debug info dialect and variable tracking#1097
bitwalker merged 33 commits intonextfrom
bitwalker/debug-variable-locations

Conversation

@bitwalker
Copy link
Copy Markdown
Collaborator

This is @djolertrk's work from #820, but rebased on top of #1096, with various refactorings/fixes made to it in preparation for merging to next.

The three tip commits on this branch contain all the changes I made, so that I could preserve @djolertrk's original commits. See the commit message for the HEAD commit, as that has details on all of the more subtle changes/refactorings, and is the one that deserves the most review.

The other two are more obvious, so I'll summarize them here:

  1. I merged the separate debugdump tool into miden-objtool as a subcommand, specifically miden-objtool dump debug-info. The implementation is largely identical, with some superficial refactorings around how the config is constructed, etc.
  2. I removed the temporary workaround code that was added to midenc-compile to emit the custom debug info sections which are now produced by the assembler natively when assembling projects. That means that after this is merged, there will be a short period where we do not have the debug info sections available, until we merge refactor: use project assembler to build a Package #1071 (unless that gets merged first, which may very well be the case, as that is next on my TODO list to review).

Some issues I found while reviewing the changes in #820, and which I have not attempted any fix here:

  1. The debug info tests added in tests/integration/src/rust_masm_tests/debug.rs do not seem to result in any debug ops being emitted in all but one case, and in that one case, the location information is wrong (i.e. "unknown" source file). Presumably this will be addressed by improvements to the frontend, but it is worth noting.
  2. The di.declare and di.kill ops are not currently emitted anywhere, and I'd expect them to be (even though we don't really handle them during codegen, except for di.declare which I added the lowering for). The di.kill op will likely require a new decorator in MAST unfortunately.
  3. The frontend does not handle most DWARF expressions, only a small subset, which means that variable tracking is likely to be inaccurate. I plan to discuss how to handle this with @djolertrk ASAP - but after doing some investigation, I believe we're likely best off emitting DWARF bytecode directly, and then using gimli to decode and evaluate it in the debugger. gimli provides an evaluator that delegates all the machine-specific details to us, which works perfectly for our needs.

Base automatically changed from bitwalker/objtool-cleanup to next April 30, 2026 22:31
@bitwalker bitwalker requested review from djolertrk and greenhat April 30, 2026 22:31
djolertrk and others added 26 commits May 1, 2026 12:39
This commit adds debug information support for tracking
variable locations through the compilation pipeline:

- Add debug info representation on HIR level (builtin.dbg_value ops,
  DIExpression, DILocalVariable attributes)
- Handle DebugVars during lowering from HIR to MASM
- Add DebugInfoBuilder for constructing debug metadata
- Add RemoveDeadDebugOps pass to clean up debug ops with dead operands
- Add miden-debugdump tool for inspecting .debug_info sections in MASP
  packages (similar to llvm-dwarfdump)
- Add documentation for the debug info format (docs/DebugInfoFormat.md,
  docs/DebugInfoMetadata.md)
- Add lit tests for debug functionality

Run tests:
$ litcheck lit run --verbose tests/lit/debugdump
$ litcheck lit run --path bin ./tests/lit/debug/
- Check if WasmLocal value is on Miden operand stack before assuming
  it's in local memory (emit Stack(pos) if on stack)
- Add post-processing in MasmFunctionBuilder::build() to patch
  Local(idx) with correct FMP offset
- Compute num_wasm_locals = params_in_felts + hir_locals for correct
  offset calculation (DWARF WasmLocal uses WASM indexing where params
  come first)
- Add patch_debug_var_locals_in_block() to recursively patch all
  DebugVar decorators in the MASM body
Confirm that cfg --> scf preserves debug info
Debug info operations (debuginfo.value, debuginfo.kill, debuginfo.declare)
are purely observational and were causing panics and assembler validation
errors when they survived through optimization passes into codegen.

- Skip operand scheduling for DebugValue (observational, not consuming)
- Gracefully skip debuginfo ops with no HirLowering in the emitter
- Make SinkOperandDefs and ControlFlowSink debug-info-aware so debug
  uses don't block sinking or prevent dead code elimination
- Strip DebugVar-only procedure bodies that would be rejected by the
  Miden assembler (decorators without instructions have no MAST effect)
Add end-to-end debug variable location tracking through the compiler
pipeline, and patch miden-vm to fix a crash when basic blocks are
deduplicated during assembly.
Fix data_offset_align and raw_entity_metadata_layout_for_value_layout
to correctly handle types with alignment > 8
- add FrameBase lowering
- DWARF parsing (parse variable/params names)
- and fix duplicate DebugVar emissions
bitwalker added 3 commits May 1, 2026 13:02
The initial debug info implementation in #820 defined a new debuginfo
dialect which provides the ability to emit tracking information for
local variables in the source program. There were a couple of issues
that I wanted to address with it before merging, so that has been done
here in this commit:

* Move the new debug dialect into `midenc-hir`, as debug info attributes
  are essentially core/builtin attributes, and having them in a
  downstream dialect was awkward: not only did we need to place them in
  the `builtin` dialect rather than `di`, but debug info ops are
  somewhat special in a way that most other ops are not - this warranted
  making them part of the core IR crate, though still as a separate
  dialect.
* Renamed the dialect to `di` from `debuginfo`, as this better mirrors
  the naming conventions in DWARF, from which these ops are derived in
  part.
* Renamed the debug dialect attribute types to remove the `DI` prefix,
  as it is redundant.
* Added a new `Transparent` operation trait, which is used to indicate
  that an operation is "transparent" with regards to its value uses,
  allowing those uses to be excluded in specific situations, such as
  when determining liveness of operation results. Transparent ops are
  only permitted to have at most a single operand, and may not produce
  results - these restrictions make dealing with transparent ops much
  more straightforward when rewriting the IR.
* Added a new effect type, `DebugEffect`, which is used to represent
  effects that an operation may have on the state of an attached
  debugger. This can be used in the future to ensure that operations are
  not reordered with respect to such effects on a common resource. This
  largely mirrors MemoryEffect for now. There is a corresponding
  DebugEffectOpInterface alias added as well.
* Made all ops of the debug info dialect implement `Transparent`, as
  well as both DebugEffectOpInterface and MemoryEffectOpInterface. The
  memory effects are modeled as empty (i.e. the ops are considered
  pure), which means that those ops are trivially dead by default -
  however I have also modified region DCE to explicitly leave
  transparent ops alone unless we have determined that the defining op
  of their input operand is dead.
* Added a new method to `Value`, called `has_real_uses`, which returns
  true if the value is used and at least one of those uses is from a
  non-transparent op. The existing `is_used` method returns true
  regardless of transparency if any use of the value exists.
* Modified various places where `Value::is_used` was being used, when we
  should use `Value::has_real_uses` to avoid pessimizing a
  canonicalization based on the presence of debug info ops.
* Modified `Rewriter::erase_op` to be more precise about verifying that
  the op to erase has no _real_ uses. If transparent users still exist,
  they are removed before erasing the original target op - this is
  always safe as transparent ops can only ever use a single value.
* Added tests to ensure that transparent ops do not interfere with dead
  code elimination.
* Simplified the places where debug ops were special-cased to instead
  work in terms of transparency (unless the special-casing truly needed
  to be specific to a given op).
* Implemented AttrPrinter/AttrParser for all debug dialect attributes
* Implemented OpPrinter/OpParser for all debug dialect operations
* Implemented `Serializable`/`Deserializable` for debug dialect's
  `ExpressionOp` and `Expression` types, so that complex expressions can
  be serialized for use by the debugger.
* Modified how the frontend was computing storage locations to use
  `Expression` directly, rather than `VariableStorage`, which could not
  represent the full set of location expressions, and was largely
  redundant with `ExpressionOp`. We still only handle a subset of
  possible expression types, but it will be easier to support more going
  forward.
Copy link
Copy Markdown
Collaborator

@djolertrk djolertrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we marked the di.* ops as Transparent, they get deleted in passes like DCE, so we have zero variables (locations) at the end in debug info sections. We need to teach passes not to delete them.

@djolertrk
Copy link
Copy Markdown
Collaborator

djolertrk commented May 4, 2026

Since we marked the di.* ops as Transparent, they get deleted in passes like DCE, so we have zero variables (locations) at the end in debug info sections. We need to teach passes not to delete them.

Latest commits address this, and also adds testing of di.debug_value ops.

@djolertrk
Copy link
Copy Markdown
Collaborator

djolertrk commented May 4, 2026

Since we marked the di.* ops as Transparent, they get deleted in passes like DCE, so we have zero variables (locations) at the end in debug info sections. We need to teach passes not to delete them.

Hmm. As we discussed, we should actually keep DCE perform on these as well, since we do not need to keep dead values in HIR anyway (even though we could salvage some values, like the one depending on constants, or if a variables value can be described in terms of other SSA value that is still alive - but lets not worry about this scenario at this point).

I am working on a fix. For one use case that I am hitting issues with, we could be using di.debug_declare - e.g. if we have a parameter that is placed at an offset from frame register, we won't have any SSA value to connect that with at HIR level, so, in those cases, we can just describe it with a DWARF expression.

@bitwalker bitwalker force-pushed the bitwalker/debug-variable-locations branch from 4dc4292 to 90206ab Compare May 4, 2026 18:10
@bitwalker bitwalker merged commit e69ff5b into next May 4, 2026
15 checks passed
@bitwalker bitwalker deleted the bitwalker/debug-variable-locations branch May 4, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce variable location representation and tracking Add means of debugging variables in Rust

2 participants