Proposal: libm generator for Calyx #686

sampsyo · 2021-09-23T13:54:58Z

sampsyo
Sep 23, 2021
Maintainer

I promised a while back to jot down some thoughts and links about the idea of generating numerical functions for Calyx programs.

The idea would be to write a Calyx "frontend" that generates implementations of elementary mathematical functions. It would roughly want to cover the set of functions in C's libm: that means stuff like exp, log2, tanh, sqrt, that sort of thing. Having these functions is important because lots of realistic accelerators really, really need to use math functions. For an example, our Relay frontend very quickly ran into a need for exponentiation to implement the softmax operator: see #463, #490, #369, etc.

The reason this would need to be a generator rather than a single, fixed-function library is twofold:

We'd want to generate implementations for multiple numerical representations. Supporting the same functions for integer, various floating-point formats, and arbitrary fixed-point formats should be a goal.
We'll want to generate different implementations for different accuracy/performance/area trade-offs. For instance, for some points in this three-way trade-off space, a simple look-up table may be the most expedient. For others, we will want to use a Taylor expansion or similar polynomial approximation. Beyond these two basic approaches, one can imagine scouring the literature for more exotic hardware implementations.

A good place to start reading about generating elementary numerical function implementations (for software, not hardware) would be the very recent RLIBM paper. There's also a nice blog post about the project. However, keep this in mind when reading about that work: we should not strive for correct rounding as RLIBM does. That's a very high bar that requires complex machinery to get right; we should use a much more standard approach to get "roughly good enough" implementations at various accuracy levels. I recommend the paper because it does a good job of surveying the status quo of best-effort, not-correctly-rounded implementations—it's that status quo that we should emulate.

I also recommend reading about (i.e., googling around for papers and such about) the Remez algorithm. It's an example of a practical algorithm for finding a polynomial approximation to a given function. A great place to start, for instance, would be the lolremez tool that uses it to generate polynomial approximations for arbitrary input functions. We could do far worse in v1.0 of this project than essentially porting lolremez from generating C to generating Calyx.

However, I think we should start at an even more basic level: let's just generate look-up tables! That is, let's build a generator whose input is a specification of the function to generate (in whatever representation we can come up with), a value range, and a number of table entries. It should also take a numerical format as input. It produces, as output, Calyx program that just looks up the answer in an appropriately-sized ROM. Having this basic LUT-based approach implemented will be a truly excellent minimum viable product: it will be far better than nothing (which we currently have), and it will be a great basis for comparison to empirically demonstrate how more advanced techniques (i.e., polynomial approximation) can outperform it as a baseline.

cgyurgyik · 2021-09-30T19:04:44Z

cgyurgyik
Sep 30, 2021
Collaborator

Nice, I definitely agree with the generator aspect.

FWIW, the exp operator, which uses a Taylor Series expansion, is still not entirely correct (#505). This is (probably) why we get negative numbers in #504.

In #463, I tried taking the LUT approach, but wasn't getting very accurate numbers, hence moving to the TS expansion. I don't recall the why this isn't working, but may be a good starting place.

One question about the LUT implementation that we've discussed before, but should put concretely. Where do you see LUTs living in a Calyx program?

SystemVerilog takes a switch-style approach, e.g.

reg out;
wire [1:0] in;
always @(in)
case(in)
    2'b00 : out = 0;
    2'b01 : out = 0;
    2'b10 : out = 1;
    2'b11 : out = 1;
endcase

Our options in Calyx:

We could make it an external memory, and append it manually to the .data file each time you want to use the operator. However, this is tedious and error-prone.
The component manually writes each index of the table within the component, e.g.

Group write_table_index_0_0 { ... }
Group write_table_index_0_1 { ... }
...

However, this seems like a patch for what we really want:

Component op_with_LUT {
  cells {
    // A pre-initialized memory:
    my_lut = @LUT std_mem_d1(1, 4, 2) { 1'b0, 1'b0, 1'b1, 1'b1 };

    // Or its own primitive, which lowers to the SystemVerilog case statement above.
    my_lut = std_lut(/*in=*/4, /*out=*/1) { 1'b0, 1'b0, 1'b1, 1'b1 };
  }
  ...
}

1 reply

sampsyo Sep 30, 2021
Maintainer Author

Yes! I really like your idea of a std_lut for this purpose (or std_rom???). It seems like we need some sort of primitive support for this, whether the contents of the memory are (a) written right there in the Calyx code as literals or (b) encoded as a flat data file on the side. That's a good point that this support would be a kind of prerequisite for providing LUT-based implementations (unless we start by just wrapping the Verilog case version).

cgyurgyik · 2021-09-30T19:10:45Z

cgyurgyik
Sep 30, 2021
Collaborator

The second question I have is testing. We can write simple corner case tests, but I don't think that will always be sufficient. It would be interesting to set up some comparison with an "industry-standard" version. For example, LLVM libc tests their Math library against the battle-hardened GCC version. Is their an equivalent battle-hardened library for fixed point operations to validate the Calyx libm generator?

3 replies

sampsyo Sep 30, 2021
Maintainer Author

That's a great point, and I don't think it has a simple answer. Especially because this effort would be exploring deeply into the "approximate computing" space, comparing directly against an existing reference implementation wouldn't be as straightforward as looking for bit-level perfection. We'd probably want to report mean squared error (MSE) or similar instead. And because we'd want to support very funky numerical formats that software often doesn't support (e.g., fixed-point with 3 bits of decimal and 9 bits of fraction), a standard reference implementation may be very hard to come by. Perhaps comparing all precision levels against a single arbitrary-precision implementation like MPFR would be the thing to do.

cgyurgyik Oct 2, 2021
Collaborator

Yeah I guess there's two different forms of testing.

mean squared deviation
expected behavior for corner cases. For example, what happens if an operator yields the mathematical equivalent of infinity? Or, more simply, how do we treat cases of over/underflow?

I could imagine adding boolean ports, at least for debugging. Or, perhaps this is something the interpreter could cover.

sampsyo Oct 2, 2021
Maintainer Author

Yeah, good point about undefined results. Maybe we could hedge against this by quantifying accuracy within a target window demanded by the application.

calebmkim · 2022-10-17T02:01:26Z

calebmkim
Oct 17, 2022
Collaborator

I was trying to add on to this project to get an ln(x) generator that would work similarly to the (already implemented) e^x generator. Once we get this, then we would pretty easily be able to compute logs and exponents for any power and any base. There are good, relatively simple, ways to approximate ln(x). The main problem is division.

Currently the way the std_div Calyx primitives work is by giving a quotient and a remainder. So if you were to do 10/8.5, you would get a quotient of 1 with a remainder of 1.5. However, we often will want a more exact single fixed point quotient-- so 1.176 for 10/8.5. If we are able to get this down, then I think we should be able to approximate ln(x).

13 replies

cgyurgyik Oct 22, 2022
Collaborator

Sorry for the delayed reply. It has been a bit of time since I've dealt with fixed point mathematics, but here's what I was thinking about:

int main() {
    // Assuming Q32.16
    constexpr int64_t x = 0b0000'0000'0000'0011'1000'0000'0000'0000; // 3.5
    constexpr int32_t y = 0b0000'0000'0000'1000'1000'0000'0000'0000; // 8.5
    constexpr int32_t N = 16;

    // Expected: 0.4112 (approximately)

    // Intermediate value for `x` needs to be at least 32 + 16 bits, i.e., this requires a 48-bit integer divider.
    return (x << N) / y; 
    // = 26985
    // = 0b0000'0000'0000'0000'0110'1001'0110'1001
    // = 0.25 + 0.125 + .03125 + 0.00390625 + ...
    // = 0.4101 (approximately)
}

https://godbolt.org/z/3z71jzdrY

I think we could get away with only shifting the dividend by storing it in a larger intermediary value before performing the integer division.

cgyurgyik Oct 22, 2022
Collaborator

The one thing that you have to be careful about is overflow when shifting left, hence the larger storage.

sampsyo Oct 25, 2022
Maintainer Author

Awesome, yeah, I think we've converged on something in common here: you need to change the "relative precision" of the numerator and the denominator. You can either shift the denominator down or the numerator up: @calebmkim did one and @cgyurgyik's code did the other. But once you have that, then you can get a useful integer result (which you then have to reinterpret or shift back to get the answer you want).

Surprisingly tricky stuff, huh?

calebmkim Nov 4, 2022
Collaborator

Ok, so I feel really dumb right now:

I actually implemented this idea, only to find out that Calyx's current fixed point division primitive actually does this fine.

I seem to remember using Calyx'x fixed point division primitive to do things like 3.5/8.5 and it was giving me 0. This might have been because I wasn't triggering the go/done ports correctly. It's weird though, since I seem to remember 3.5/8.5 giving me 0 in the out_quotient, but 3.5 in the out_remainder, which made me think I was triggering the go done ports correctly.

Either way, Calyx's fixed point division seems to actually work, so it seems like this actually wasn't necessary (sorry guys)

sampsyo Nov 5, 2022
Maintainer Author

No worries! It is good to explore the underlying math!

sampsyo · 2022-11-05T18:15:31Z

sampsyo
Nov 5, 2022
Maintainer Author

Just appending one more idea to this one: consider automatically searching the space of alternative implementations for a given math expression, as suggested in #1226 (comment) and subsequent comments. (Perhaps using egraphs??)

The overall message for the project would be that a "libm for hardware" must fundamentally look different than a libm for software. You don't just want a single inventory of math primitives; you want a generator that can implement many different low-level strategies for accomplishing parts of an expression that uses libm functions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Calyx Infrastructure

Proposal: libm generator for Calyx #686

{{title}}

Replies: 4 comments 17 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

The Calyx Infrastructure

Proposal: libm generator for Calyx #686

sampsyo Sep 23, 2021 Maintainer

Replies: 4 comments · 17 replies

cgyurgyik Sep 30, 2021 Collaborator

sampsyo Sep 30, 2021 Maintainer Author

cgyurgyik Sep 30, 2021 Collaborator

sampsyo Sep 30, 2021 Maintainer Author

cgyurgyik Oct 2, 2021 Collaborator

sampsyo Oct 2, 2021 Maintainer Author

calebmkim Oct 17, 2022 Collaborator

cgyurgyik Oct 22, 2022 Collaborator

cgyurgyik Oct 22, 2022 Collaborator

sampsyo Oct 25, 2022 Maintainer Author

calebmkim Nov 4, 2022 Collaborator

sampsyo Nov 5, 2022 Maintainer Author

sampsyo Nov 5, 2022 Maintainer Author

sampsyo
Sep 23, 2021
Maintainer

Replies: 4 comments 17 replies

cgyurgyik
Sep 30, 2021
Collaborator

sampsyo Sep 30, 2021
Maintainer Author

cgyurgyik
Sep 30, 2021
Collaborator

sampsyo Sep 30, 2021
Maintainer Author

cgyurgyik Oct 2, 2021
Collaborator

sampsyo Oct 2, 2021
Maintainer Author

calebmkim
Oct 17, 2022
Collaborator

cgyurgyik Oct 22, 2022
Collaborator

cgyurgyik Oct 22, 2022
Collaborator

sampsyo Oct 25, 2022
Maintainer Author

calebmkim Nov 4, 2022
Collaborator

sampsyo Nov 5, 2022
Maintainer Author

sampsyo
Nov 5, 2022
Maintainer Author