Refactoring to better support IR references #1051

sampsyo · 2022-06-24T13:51:09Z

sampsyo
Jun 24, 2022
Maintainer

At the risk of opening a can of worms, I was thinking about @calebmkim's progress on calculating dominators in Calyx control programs in #1039 and in particular about the representational problems we're facing. I mentioned in #1039 (reply in thread) that one long-term thing to think about is what changes to our IR would make this kind of analysis easier to implement. (This should not be a blocker for the current work, but it got me thinking.)

That thread pondered switching to an unstructured CFG, which may also be a good idea in the long term, but this post is about a different and orthogonal (and maybe easier) change we could make. Namely, something about the current IR design that is complicating the work is the pointer-based structure of Control. Control structs refer to each other using a Box<Control>, and Control objects are allocated anywhere they please, which is of course the standard way of representing something tree-shaped.

The problem with this (very natural! very normal!) representation for Control is that, in Rust, it is surprisingly hard for client code to refer to specific Control objects. In the current situation, for instance, we want to build up a dominator map—but what should the types in the HashMap<_, _> be? Certainly not Control, which would mean taking ownership of a copy of the control statement, and wouldn't even admit comparisons for identity! The two solutions we have MacGyvered are adding an integer attribute to every relevant control statement and using raw pointers. Neither make it possible to get back to the Control object from the reference if you need it (without scanning over the entire program to find a matching Control). We could consider switching to using Rc everywhere, but that seems like an unfortunate surrender.

The idea is to borrow a page from other compiler IRs that pack IR objects into dense arrays and—instead of using pointers directly to the objects—refer to them by integer offsets. Compilers often do this for efficiency (better locality, fewer allocations, smaller reference values), but it has the side benefit of enabling easier references in Rust. Your "reference" is really just an integer, and to "dereference" it you just need access to the entire chunk of values. We would not need to stamp the IR with unique IDs nor use *const Control pointers; the canonical way to refer to specific control statements would be through those indices.

I even stumbled across a pretty nice implementation of this pattern in Cranelift, in the form of the cranelift_entity crate. This crate provides a facility for making newtypes to wrap integers (improving safety/readability over just using integer types as references directly) and offers workalikes for HashMap and stuff that rely on this pattern.

Anyway, not something we should do right now, but could be an interesting refactoring to consider!

rachitnigam · 2022-06-25T20:26:21Z

rachitnigam
Jun 25, 2022
Maintainer

yeah, interesting problem that I was expecting we would run into as we tried to do complex analyses! My short term approach would've been wrapping control nodes with an Rc or an RRC but that might break some of the nice guarantees we get about ownership of the control tree.

Unstructured CFG might be useful but it represents a trade-off: you might reconstruct loops and conditional instead of having direct access to those structures. Some optimizations, like loop unrolling, become non-trivial with an unstructured representation.

I'd be interested in hearing more about the dense array pattern. It represents significant work so I would hesitate to do it before we have multiple use cases.

1 reply

sampsyo Jun 26, 2022
Maintainer Author

Yes, I think the (far more radical) decision to use unstructured control flow would be best considered separately from this, which is just an internal data structure change.

On places where this is currently useful (aside from @calebmkim's current dominators/etc. work):

The TDCC pass. As linked above, TDCC currently relies on @NODE_ID attributes to identify FSM states; the proposed system could replace that.
The interpreter's architecture is complicated by the fact that it hooks into the Control representation using Rc<Control> everywhere. The proposal could plausibly create a more convenient separation between the program and the interpreter's state for interpreting that program.
The current live range analysis can tell you which components are live for which groups, but it can't tell you which components are live for which specific activations or other control statements. It could do this if we had a better way to refer to specific control statements.

Another place where I can imagine this quickly becoming an issue in the future is in any sort of overlay type system, which we consider from time to time.

For more general background on "slab"-style allocation in Rust, I thought the documentation for the slab crate was pretty helpful. (slab just uses usizes as keys, however.) I wish I could find a good source writing about doing this kind of thing in compilers in particular, but I'm not sure it's out there… but here are a couple examples from within Cranelift!

The core data flow graph representation is primarily a primary map that holds instructions.
Source positions are a "secondary map" that adds information to instructions (Inst references).

Anyway, yes, this would be a lot of work and it is not clear at all whether it's worth it! Just thought it would be fun to discuss. 😃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Calyx Infrastructure

Refactoring to better support IR references #1051

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The Calyx Infrastructure

Refactoring to better support IR references #1051

sampsyo Jun 24, 2022 Maintainer

Replies: 1 comment · 1 reply

rachitnigam Jun 25, 2022 Maintainer

sampsyo Jun 26, 2022 Maintainer Author

sampsyo
Jun 24, 2022
Maintainer

Replies: 1 comment 1 reply

rachitnigam
Jun 25, 2022
Maintainer

sampsyo Jun 26, 2022
Maintainer Author