The PE for the second generation CGRA (garnet).
-
Make sure you have python 3.7. Peak needs this.
-
First install peak
git clone https://github.com/phanrahan/peak.git
cd peak
python setup.py install --user
- Install Lassen
git clone [email protected]:StanfordAHA/lassen.git
cd lassen
python setup.py install --user
- Run tests using
pytest
Date | Person | Status | Task |
---|---|---|---|
Feb 13 | Alex | Complete | Add BFloat16 add and multiply functional model |
Add Float (configurable width?) type to CoreIR, create float add and multiply operator implementations in CoreIR by wrapping designware module | |||
Add Float type to Magma | |||
Generate verilog from Peak | |||
Figure out what to do about different latencies (if they are different for the floating point ops) | |||
Add multi-PE support to Peak | |||
Change CoreIR mapper, PnR to support multi-PE |
Compared to the first generation PE (Diablo), Lassen shall have two new features:
- BFloat addition and multiplication in every PE
- Transcendental functions (div, log, e^x, sin, pow) on BFloats implemented using a cluster of PEs and memory. The PEs and memory will get some small special instructions to support these operations.
- It will not support denormalized numbers
+/- 1.mantissa * 2 ^ exponent, where mantissa is a 7 bit unsigned integer, and exponent is 8 bit signed integer (. dot means decimal point)
- div
- Implements
out = a/b
, where a, b and out are all BFloats - It is performed using
out = a * (1/b)
, the BFloat multiply already exists as an instruction. So we basically have to implement reciprocal,1/b
- Let us say
b = +/- 1.f * 2 ^ x
1/b = +/- (1/1.f) * 2 ^ (-x)
(1/1.f)
is stored as a Bfloat in a look up table in a memory tile. It is a table with 128 entries as f is 7 bits. So you read this entry out, let us say it is some+/- 1.g * 2 ^ y
- Then
1/b = +/- 1.g * 2 ^ y * 2 ^ (-x) = +/- 1.g * 2 ^ (y - x)
- We implement subtraction of the exponent portions of two BFloats as a new instruction in the PE
- So div boils down to a 16 bit lookup from a 128 entry table, one 8 bit signed integer subtraction and 1 BFloat multiply
- We must take into account corner cases when a, b, out are not normal numbers
- ln
- Implements
out = ln(a)
where a and out are BFloats - Let us say
a = +/- 1.f * 2 ^ x
ln(a)
should error out whena < 0
- Otherwise,
ln(a) = ln(1.f * 2 ^ x) = ln(1.f) + x * ln(2)
ln(1.f)
is a look up table, similar to what we did for div- We add a special instruction to PE to convert 8 bit signed integer x to a BFloat,
ln(2)
is also a BFloat - So ln boils down to a lookup, 8 bit signed integer to BFloat conversion, a BFloat multiply and a BFloat add
- e^x
e^x = (2^(1/ln(2)))^x = 2^(x/ln(2)) = 2^y
- We can get
y
with existing instructions. Bfloat multiply it with a constant 1/ln(2). - Let us just work out the case when y is positive. If y is negative 2^y = 1/(2^-y) and we have already implemented reciprocal.
- Convert Bfloat y to a+b where a is integer part and b is fractional part. When b is smaller than 2^-6 it is zero in Bfloat16.
- We look up 2^b from a table, this has 64 entries.
- Then we increment exponent of the looked up number by a.
- sin
- To compute sin(x), first we calculate y = x mod (pi/2)
- If y is less than some number, return y, else lookup in table. This gets rid of most negative exponents, and table it basically dependent on mantissa.
- pow
- a^x = e^(ln(a^x)) = e^(x * ln(a))
- We already have BFloat multiply, ln, and exponential, so we can implement power.