Calyx as an infra for compiler generation of programmable accelerators #892
Replies: 2 comments
-
Hey @JosseVanDelm, this sounds super exciting. I recommend a few resources:
I would recommend getting started with Calyx and following the language tutorial. We have a lot of tool to support Calyx development and if you run into problems, you can open an issue in this repository. The programmability question is interesting and has basically not been addressed by any research I know, especially in the context of generating programmable accelerators. I would probably start by first hand-writing a simple programmable accelerator in Calyx and then seeing what we can do to automatically add that programmability. A good starting point in my opinion is the Systolic Array Generator for Calyx which generates pretty simple, fixed-function systolic arrays for Calyx. I would first take a 4x4 array and try to add some programmability to it. Once that is done, it should be possible to change the generator code to add that programmability to every systolic array it generates making it quite powerful. Let us know how this goes! |
Beta Was this translation helpful? Give feedback.
-
Hi, @JosseVanDelm! Just wanted to copy some things I wrote via email, for completeness:
Broadly, I think it's a really interesting question to ask what explicit support for programmability in a Calyx program would look like. I think an important step in getting started, as @rachitnigam suggested in a sibling comment, would be to try designing such a programmable accelerator so you know what "programmability" looks like exactly: is it a handful of configuration bits? More like a processor ISA? Something else entirely? Then you can think about how to take that programming interface and "bake" it into the design of the hardware that implements it. |
Beta Was this translation helpful? Give feedback.
-
Calyx as an infra for compiler generation of programmable accelerators?
Hi everyone!
I'm creating this post as a follow-up on an email of myself to @sampsyo, @rachitnigam, @sgpthomas and @tissue3 .
@sampsyo suggested in his reply to continue the discussion here so others could weigh in as well.
Intro
My name is Josse, I'm a PhD researcher at KULeuven (Belgium), where we are researching programmable heterogeneous accelerator SoC for TinyML workloads. Part of my current research is "porting" TVM to an existing programmable embedded SoC with a RISC-V core and two coarse-grained accelerators.
Proposal
I'm looking for some type of HDL or ADL which can describe programmable compute systems at a high level, so it can be lowered to two things:
Please do note that I'm not proposing an HLS flow. In an HLS flow a single algorithm gets lowered to an efficient hardware equivalent. I'm proposing a set of hardware design intrinsics (pragma's maybe?) which can be inserted in a hardware description to make certain compiler-use semantics explicit there, to generate programmable hardware and a compiler back-end.
Rationale
Development of current programmable accelerators, or even cpus and gpus and their compilers are separated, which makes it quite labour-intensive to:
Discussion
I think (and @sampsyo agrees) that this idea is quite ambitious, so I'm looking for experts that can comment on this idea so we can start narrowing this down towards a minimum viable solution.
I personally believe that the MLIR, CIRCT, and Calyx/Dahlia projects can serve as a big part of this solution.
We are currently also looking into extending TVM's VTA project as an alternative.
@sampsyo also suggested to look into Stanford AHA group's CGRA's and their Halide-to-Hardware compiler, though this seems more like an HLS-flow. And homogeneous flexible CGRAs are quite different from the fixed-size yet heterogeneous platforms we want to target.
Please let me know if you have any suggestions or questions! Thanks!
Beta Was this translation helpful? Give feedback.
All reactions