Calyx 2.0: Static Everywhere #1334
rachitnigam
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Calyx's latency-insensitive abstraction is powerful---you can write programs without having to worry about the timing behavior of things and the compiler attempts to generate designs with good performance for you.
In the three years of its existence, we've mostly focused on the kinds of resource optimizations and language constructs that Calyx's abstractions enable. We've added the
comb
,invoke
,ref
, and@sync
constructs to the language and explored optimizations like generalized sharing and unsharing explored along with traditional software-like optimizations such as constant propogation, inlining, and unrolling.However, the work on Filament has opened my eyes to two fundamental limitations of the Calyx ecosystem as it exists today
Time and again we've seen that timing-based optimizations beat anything else Calyx's dynamic optimizations can do. This is because the clock is a powerful abstraction in synchronous hardware design because it makes things like synchronization "free". More importantly, however, the clock is always there---Calyx does not target any hardware that does not have a clock. This means not using the clock will always leave performance on the table.
Second, the clock is a necessary abstraction to interact with the outside world. Interface for hardware modules are often defined in terms of cycle-level behavior and not being able to explicitly talk about the clock in Calyx program means that we have no way to interface with such modules---we have to wrap to the Verilog in some latency-insensitive interface and use that. Furthermore, things like #1274 are much harder to do without a clock.
Given this, here is a proposed guiding principle for Calyx 2.0: Take compositional, latency-insensitive descriptions of computations and turn them into performant latency-sensitive designs.
Driving Frontend
Another thing that's been lacking with Calyx is a driving frontend---one that pushes the compiler and language design forward in order to support real state of the art accelerator design. Our existing frontends, while numerous, are not very competitive with existing HLS or research-grade accelerator design languages. Much like clang drove the design of LLVM, we need to go all in on a frontend that needs Calyx to generate good designs.
The two main candidates for this are
Pipelining
A sure shot way of making sure that we can at least expressive high-performance designs in Calyx is adding pipelining in the language. Being competitive with HLS requires Calyx designs to be able to express latency-sensitive pipelining in some capacity. However, it is not clear how to integrate arbitrary pipelines into Calyx due to some specific problems:
Virtual Operators
An orthogonal design axis that has shown up is the distinction between virtual and physical operators. For example, #1175 shows that HLS tools delay deciding what timing properties a multiplier should exactly have. Similarly, #1151 proposes separating the physical choices for memories from the logical operations they perform (AMC takes this idea much further).
In general, it seems that we would want frontend to use "virtual" operators that use latency-insensitive interfaces to schedule computations and then have the compiler decide how to implement these virtual operators. Of course, the true power of this idea shows up when the compiler also has visibility into the pipelined behavior of these operators so it can, for example, decide what II a loop needs to be pipelined at.
The Path Forward
Calyx is a big enough project that I don't envision a full rewrite of any form to support the above features. Instead, we need to take a gradual approach to supporting these features. In the short term, we have a set of proposals that we can work on:
static
control operator. This will give us really precise control over exact scheduling of computations.Along with the implementation of these proposals, we need to evaluate and orient the compiler's design around them. For example:
@sync
intostatic
(Supportsync
withoutstd_sync_reg
#1333)Other Proposals
I think the above lays out a useful, wholesale vision for Calyx 2.0. However, some other proposals are worth mentioning:
@sync
subsumespar
The
par
operator in Calyx implements fork-join parallelism. However, the@sync
operator is much more general and can subsumepar
. We might want to still keeppar
around because it is easier to reason about but at some point in the compiler middle-end, we should canonicalizepar
into@sync
.Indexed IR
Switching from a pointer-based IR to an index-based IR can be useful for many performance reasons, especially for tools like the interpreter (#1183).
A specific approach to do this is implementing components so that they keep track of all cells and ports defined within them using a contiguous array. A cell is represented by the index into the
cells
array and ports are represented by the index into theports
array. The cell data structure, instead of tracking the ports directly, simply has a range of indices into theports
array:The benefit of this approach is that iterating over all ports and cells is very cheap. Furthermore, equality checks on cells and ports are also cheap. The interpreter can easily use this representation to have a flattened state of the instance tree.
Beta Was this translation helpful? Give feedback.
All reactions