feat: implement debug info dialect and variable tracking#1097
feat: implement debug info dialect and variable tracking#1097
Conversation
This commit adds debug information support for tracking variable locations through the compilation pipeline: - Add debug info representation on HIR level (builtin.dbg_value ops, DIExpression, DILocalVariable attributes) - Handle DebugVars during lowering from HIR to MASM - Add DebugInfoBuilder for constructing debug metadata - Add RemoveDeadDebugOps pass to clean up debug ops with dead operands - Add miden-debugdump tool for inspecting .debug_info sections in MASP packages (similar to llvm-dwarfdump) - Add documentation for the debug info format (docs/DebugInfoFormat.md, docs/DebugInfoMetadata.md) - Add lit tests for debug functionality Run tests: $ litcheck lit run --verbose tests/lit/debugdump $ litcheck lit run --path bin ./tests/lit/debug/
- Check if WasmLocal value is on Miden operand stack before assuming it's in local memory (emit Stack(pos) if on stack) - Add post-processing in MasmFunctionBuilder::build() to patch Local(idx) with correct FMP offset - Compute num_wasm_locals = params_in_felts + hir_locals for correct offset calculation (DWARF WasmLocal uses WASM indexing where params come first) - Add patch_debug_var_locals_in_block() to recursively patch all DebugVar decorators in the MASM body
Confirm that cfg --> scf preserves debug info
Debug info operations (debuginfo.value, debuginfo.kill, debuginfo.declare) are purely observational and were causing panics and assembler validation errors when they survived through optimization passes into codegen. - Skip operand scheduling for DebugValue (observational, not consuming) - Gracefully skip debuginfo ops with no HirLowering in the emitter - Make SinkOperandDefs and ControlFlowSink debug-info-aware so debug uses don't block sinking or prevent dead code elimination - Strip DebugVar-only procedure bodies that would be rejected by the Miden assembler (decorators without instructions have no MAST effect)
Add end-to-end debug variable location tracking through the compiler pipeline, and patch miden-vm to fix a crash when basic blocks are deduplicated during assembly.
Fix data_offset_align and raw_entity_metadata_layout_for_value_layout to correctly handle types with alignment > 8
- add FrameBase lowering - DWARF parsing (parse variable/params names) - and fix duplicate DebugVar emissions
The initial debug info implementation in #820 defined a new debuginfo dialect which provides the ability to emit tracking information for local variables in the source program. There were a couple of issues that I wanted to address with it before merging, so that has been done here in this commit: * Move the new debug dialect into `midenc-hir`, as debug info attributes are essentially core/builtin attributes, and having them in a downstream dialect was awkward: not only did we need to place them in the `builtin` dialect rather than `di`, but debug info ops are somewhat special in a way that most other ops are not - this warranted making them part of the core IR crate, though still as a separate dialect. * Renamed the dialect to `di` from `debuginfo`, as this better mirrors the naming conventions in DWARF, from which these ops are derived in part. * Renamed the debug dialect attribute types to remove the `DI` prefix, as it is redundant. * Added a new `Transparent` operation trait, which is used to indicate that an operation is "transparent" with regards to its value uses, allowing those uses to be excluded in specific situations, such as when determining liveness of operation results. Transparent ops are only permitted to have at most a single operand, and may not produce results - these restrictions make dealing with transparent ops much more straightforward when rewriting the IR. * Added a new effect type, `DebugEffect`, which is used to represent effects that an operation may have on the state of an attached debugger. This can be used in the future to ensure that operations are not reordered with respect to such effects on a common resource. This largely mirrors MemoryEffect for now. There is a corresponding DebugEffectOpInterface alias added as well. * Made all ops of the debug info dialect implement `Transparent`, as well as both DebugEffectOpInterface and MemoryEffectOpInterface. The memory effects are modeled as empty (i.e. the ops are considered pure), which means that those ops are trivially dead by default - however I have also modified region DCE to explicitly leave transparent ops alone unless we have determined that the defining op of their input operand is dead. * Added a new method to `Value`, called `has_real_uses`, which returns true if the value is used and at least one of those uses is from a non-transparent op. The existing `is_used` method returns true regardless of transparency if any use of the value exists. * Modified various places where `Value::is_used` was being used, when we should use `Value::has_real_uses` to avoid pessimizing a canonicalization based on the presence of debug info ops. * Modified `Rewriter::erase_op` to be more precise about verifying that the op to erase has no _real_ uses. If transparent users still exist, they are removed before erasing the original target op - this is always safe as transparent ops can only ever use a single value. * Added tests to ensure that transparent ops do not interfere with dead code elimination. * Simplified the places where debug ops were special-cased to instead work in terms of transparency (unless the special-casing truly needed to be specific to a given op). * Implemented AttrPrinter/AttrParser for all debug dialect attributes * Implemented OpPrinter/OpParser for all debug dialect operations * Implemented `Serializable`/`Deserializable` for debug dialect's `ExpressionOp` and `Expression` types, so that complex expressions can be serialized for use by the debugger. * Modified how the frontend was computing storage locations to use `Expression` directly, rather than `VariableStorage`, which could not represent the full set of location expressions, and was largely redundant with `ExpressionOp`. We still only handle a subset of possible expression types, but it will be easier to support more going forward.
5087d95 to
90ebb54
Compare
djolertrk
left a comment
There was a problem hiding this comment.
Since we marked the di.* ops as Transparent, they get deleted in passes like DCE, so we have zero variables (locations) at the end in debug info sections. We need to teach passes not to delete them.
Latest commits address this, and also adds testing of |
Hmm. As we discussed, we should actually keep I am working on a fix. For one use case that I am hitting issues with, we could be using |
4dc4292 to
90206ab
Compare
This is @djolertrk's work from #820, but rebased on top of #1096, with various refactorings/fixes made to it in preparation for merging to
next.The three tip commits on this branch contain all the changes I made, so that I could preserve @djolertrk's original commits. See the commit message for the
HEADcommit, as that has details on all of the more subtle changes/refactorings, and is the one that deserves the most review.The other two are more obvious, so I'll summarize them here:
debugdumptool intomiden-objtoolas a subcommand, specificallymiden-objtool dump debug-info. The implementation is largely identical, with some superficial refactorings around how the config is constructed, etc.midenc-compileto emit the custom debug info sections which are now produced by the assembler natively when assembling projects. That means that after this is merged, there will be a short period where we do not have the debug info sections available, until we merge refactor: use project assembler to build aPackage#1071 (unless that gets merged first, which may very well be the case, as that is next on my TODO list to review).Some issues I found while reviewing the changes in #820, and which I have not attempted any fix here:
tests/integration/src/rust_masm_tests/debug.rsdo not seem to result in any debug ops being emitted in all but one case, and in that one case, the location information is wrong (i.e."unknown"source file). Presumably this will be addressed by improvements to the frontend, but it is worth noting.di.declareanddi.killops are not currently emitted anywhere, and I'd expect them to be (even though we don't really handle them during codegen, except fordi.declarewhich I added the lowering for). Thedi.killop will likely require a new decorator in MAST unfortunately.gimlito decode and evaluate it in the debugger.gimliprovides an evaluator that delegates all the machine-specific details to us, which works perfectly for our needs.