generated from mpskex/chisel-docker-build
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
257 additions
and
192 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Neural Core | ||
|
||
[Systolic Arrays](SystolicArray.md) are high throughput high latency computing architectures. They can be very efficient if we controll them with care. | ||
|
||
To support more general operations for linear algebra, we need to split the computing logic from the addressing and controlling logic. So the architecture should look like below: | ||
|
||
<div style="text-align: center"> | ||
<img src="../../images/neural_core.png" width=80%/> | ||
</div> | ||
|
||
The overall architecture of the proposed Neural Core will look like a multi-layered 3D grid. If you look along z-axis, you will find them forming a pipeline. | ||
|
||
```mermaid | ||
graph LR | ||
subgraph I[Scratch Pad Memory] | ||
C[SPM inbound] | ||
F[SPM outbound] | ||
end | ||
G[DMA] | ||
subgraph Neural Core | ||
A[CU] | ||
B[i-MMU] | ||
D[PE] | ||
E[o-MMU] | ||
end | ||
B --> |addr| C | ||
C --> |data| B | ||
A --> |ctrl| B | ||
A --> |i-base-addr| B | ||
B --> |data| D | ||
A --> |ctrl| D | ||
D --> |data| E | ||
A --> |ctrl| E | ||
E --> |addr| F | ||
E --> |data| F | ||
A --> |o-base-addr| E | ||
I <--> G | ||
``` | ||
|
||
Above is the pipeline of a Neural Unit (NU), which is an element of the processing element pipeline in the Neural Core. They can organize as systolic arrays or parallelized thread cores. | ||
|
||
This flexible architecture is managed by MMU, where all the data flow is controlled. To reduce the number of running transistors, we fused the systolic design with a parallelism design. All $\mu$-CU and i-$\mu$MMU will have a stair-like scheduling characteristics. Though this design choice may lead to high latency, I think it is still quite efficient: It preserves high throughput with fair amount of registers and arithmetic units. Of course you can have a multiplexed control set to manage this grid, but that will have more overhead. For example, you need a large piece of logic to implement the parallelism and another one to avoid bubbles in Neural Units. | ||
|
||
An Neural Processing Unit (NPU) can have multiple Neural Cores (NCore). Each Neural Core has a 2 dimensional grid of Neural Uint (NU). Each Neural Unit has its own micro-CU ($\mu$-CU), micro-MMU for both input and output(i-$\mu$MMU/o-$\mu$MMU) and [processing element (PE)](ProcessingElement.md). Having large registers that hold the matrix is impossible. So the design follows other NPU designs, using a Scratch Pad Memory to store input and output data. Each $\mu$MMU is directly connected to SPM to obtain a instant access to the data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// See README.md for license details | ||
package ncore.cu | ||
|
||
import chisel3._ | ||
|
||
/** | ||
* Control unit also uses systolic array to pass instructions | ||
*/ | ||
class ControlUnit(val n: Int = 8, val ctrl_width: Int = 8) extends Module { | ||
val io = IO(new Bundle { | ||
val cbus_in = Input(UInt(ctrl_width.W)) | ||
val cbus_out = Output(Vec(n * n, UInt(ctrl_width.W))) | ||
}) | ||
// Assign each element with diagnal control signal | ||
val reg = RegInit(VecInit(Seq.fill(2*n-1)(0.U(ctrl_width.W)))) | ||
|
||
// 1D systolic array for control | ||
reg(0) := io.cbus_in | ||
for(i<- 1 until 2*n-1){ | ||
reg(i) := reg(i-1) | ||
} | ||
// Boardcast to all elements in the array | ||
for(i <- 0 until n){ | ||
for(j <- 0 until n){ | ||
io.cbus_out(n*i+j) := reg(i+j) | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
// See README.md for license details | ||
package ncore | ||
|
||
import chisel3._ | ||
|
||
/** | ||
* This is the neural core design | ||
*/ | ||
class NeuralCore(val n: Int = 8, val nbits: Int = 8, val ctrl_width: Int = 8) extends Module { | ||
val io = IO(new Bundle { | ||
val vec_a = Input(Vec(n, UInt(nbits.W))) // vector `a` is the left input | ||
val vec_b = Input(Vec(n, UInt(nbits.W))) // vector `b` is the top input | ||
val ctrl = Input(UInt(ctrl_width.W)) | ||
val out = Output(Vec(n * n, UInt((2 * nbits + 12).W))) | ||
}) | ||
|
||
// Create n x n pe blocks | ||
val pe_io = VecInit(Seq.fill(n * n) {Module(new pe.PE(nbits)).io}) | ||
// Create 2d register for horizontal & vertical | ||
val pe_reg_h = RegInit(VecInit(Seq.fill((n - 1) * n)(0.U(nbits.W)))) | ||
val pe_reg_v = RegInit(VecInit(Seq.fill((n - 1) * n)(0.U(nbits.W)))) | ||
|
||
// we use systolic array to pipeline the instructions | ||
// this will avoid bubble and inst complexity | ||
// while simplifying design with higher efficiency | ||
val ctrl_array = Module(new cu.ControlUnit(n, ctrl_width)) | ||
ctrl_array.io.cbus_in := io.ctrl | ||
|
||
for (i <- 0 until n){ | ||
for (j <- 0 until n) { | ||
// ==== OUTPUT ==== | ||
// pe array's output mapped to the matrix position | ||
io.out(n * i + j) := pe_io(n * i + j).out | ||
|
||
// ==== INPUT ==== | ||
// vertical | ||
if (i==0) { | ||
pe_io(j).in_b := io.vec_b(j) | ||
} else { | ||
pe_io(n * i + j).in_b := pe_reg_v(n * (i - 1) + j) | ||
} | ||
if (i < n - 1 && j < n) | ||
pe_reg_v(n * i + j) := pe_io(n * i + j).in_b | ||
|
||
// horizontal | ||
if (j==0) { | ||
pe_io(n * i).in_a := io.vec_a(i) | ||
} else { | ||
pe_io(n * i + j).in_a := pe_reg_h((n - 1) * i + (j - 1)) | ||
} | ||
if (i < n && j < n - 1) | ||
pe_reg_h((n - 1) * i + j) := pe_io(n * i + j).in_a | ||
|
||
// ==== CONTROL ==== | ||
// Currently we only have one bit control | ||
// which is `ACCUM` | ||
// TODO: | ||
// Add ALU control to pe elements | ||
val ctrl = ctrl_array.io.cbus_out(n * i + j).asBools | ||
pe_io(n * i + j).accum := ctrl(0) | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.