Skip to content

Commit

Permalink
replace old pages with pointers to new pages
Browse files Browse the repository at this point in the history
  • Loading branch information
steveri committed Feb 1, 2024
1 parent c4baa86 commit 5da8dd2
Show file tree
Hide file tree
Showing 12 changed files with 57 additions and 986 deletions.
131 changes: 5 additions & 126 deletions _pages/01_docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,131 +5,10 @@ date: 2022-01-01 (Date used for order should be 2022-01-XX.)
layout: post
---

# Get Started
Since there can be many different environments we need to
setup for development in the AHA project, we use
**Docker** to make thing easier.

This is the old wiki, it has been decommissioned. The new wiki can be found here:
* [https://github.com/StanfordAHA/aha/wiki](https://github.com/StanfordAHA/aha/wiki)

Docker is a tool that allows developers to easily deploy their
applications in a container to run on the host operating system.
The key benefit of Docker is that it allows users to
**package an application with all of its dependencies into a standardized unit**.
The particular page you were looking for can be found here maybe:
* https://github.com/StanfordAHA/aha/wiki/Docker-Setup

But first, you need to contact *Can Wang ([email protected]) * to **setup your kiwi account**
and **get added to the docker group**.


> ##### WARNING
>
> You might see a `permission denied` error after running docker
> command. This is because if you're on Linux, then you need to
> prefix your docker commands with `sudo`. In this case, you should
> make sure that *Can* had add your account into a docker group to
> get the permission of using Docker.
{: .block-warning }


# Using docker
Here is a brief overview of **Docker Images** and **Docker Container**.


## Docker Image
A *image* is a **read-only blueprint** of our application which
**form the basis of containers**. Often, an image is based on another
image, with some additional customization.

docker images

We can use `docker images` command to list all images. For example,
we can see `stanfordaha/garnet` is the name of the Docker image and
we will build our containers on top of it. Typically, we would use
the latest image version called `stanfordaha/garnet:latest`, so we
can use `docker pull` command to pull docker image from docker hub.

docker pull stanfordaha/garnet:latest


## Docker Container
Since *images* are just templates, you cannot start or run them. We
could create a *container* from *image* and run the actual application.
In other words, **a container is a runnable instance of an image**,
where we can read, write and modify.

You can create, start, stop, move, or delete a container using the Docker
API or CLI.


### List containers

docker ps
docker container ls

We can use `docker ps` command to show all containers that are currently
running, which is exactly the same function as `docker container ls`.


### Create containers

And we can use `docker run` to create a container based on specific image.
The `-it` flag specifies an interactive terminal which makes it easier to
kill the container with `ctrl+c` (on windows) and the `--rm` flag
automatically removes the container when it exits. The `-d` flag will
detach our terminal so we can happily close your terminal and keep the
container running. The `--name` flag corresponds to a name we want to give,
while `<container-name>` is totally self-defined, typically we will use
**first name + usage** when we create a new container.

docker run -it --rm -d -v /cad:/cad --name <container-name> stanfordaha/garnet:latest bash

After running above command, the new container must appear on the list if
we call `docker ps` again.


### Attach, Detach container

To attach to Docker container after we create it, we use `docker attach` command.

docker exec -it <container-name> bash

When we are currently attach to Docker container, we can use
`ctrl+p ctrl+q` to detach from the Docker container.


#### Delete container
Now is the dangerous part. When we are currently attach to Docker container,
we can use `ctrl+d` to delete the docker. When we are currently in the detach
mode, we can use `docker stop` command to stop a running container.

docker stop <container-name>


# Updating Tools Within Docker
To get everything up to date, it is better to run `apt update`. Since `vim` is
not installed in the docker yet, we can install vim to make things easier.

apt update
apt install -y vim



# Other Useful Stuff
To check the history commands:

history
git log --graph --oneline --all


# Get the latest version
Best way to get the latest version is to create a new docker. To update an existing docker use:

git submodule update --init --recursive





# Reference

[1] [https://docker-curriculum.com/](https://docker-curriculum.com/)

[2] [https://docs.docker.com/get-started/overview/](https://docs.docker.com/get-started/overview/)
89 changes: 5 additions & 84 deletions _pages/02_design_flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,89 +5,10 @@ date: 2022-01-02 (Date used for order should be 2022-01-XX.)
layout: post
---

Our CGRA design flow generates **both the CGRA-based hardware accelerator and the application compiler**. The main feature of AHA project is the **co-design** of accelerators and compilers, where the compiler updates automatically as the accelerator evolves.

This is the old wiki, it has been decommissioned. The new wiki can be found here:
* [https://github.com/StanfordAHA/aha/wiki](https://github.com/StanfordAHA/aha/wiki)

The particular page you were looking for can be found here maybe:
* https://github.com/StanfordAHA/aha/wiki/CGRA-Design-Flow

# Accelerator
CGRAs generally consist of: **PEs**, **memories**, and an **interconnect**. Accordingly, three high-level **domain-specific hardware specification languages (DSLs)** are used to specify each component. Each DSL is used to generate both the RTL code for the CGRA and the collateral for generating the new compilers.
The three DSLs we use are:
- PEak for PEs
- Lake for memories
- Canal for interconnects

All of these DSL's are embedded in Python, and are based on a low-level hardware description DSL called **magma** which is also embedded in Python.

## Hardware Generation
Inside the Docker container, we generate **verilog of the CGRA** using the following command. The `width` and `height` flag represent **the size of the array** we want to generate. Although the size might be flexible for different applications, it's still better to generate a larger array so that we can have more tiles to use. Note that every flag needs to be included in the command.

aha garnet --width 32 --height 16 --verilog --use_sim_sram --rv --sparse-cgra --sparse-cgra-combined

When generating the verilog for physical design, remove the `--use_sim_sram` flag

After running the command, it would create the verilog file `/aha/garnet/garnet.v`.


# Compiler
We will now go through the steps to compile an application onto the CGRA hardware and simulate it.

In most of the experiments, we target applications in **dense linear algebra applications** domain.

The application is written in high-level DSL called **Halide**, which is embedded in C++. The Halide application would go through a several stages before we can simulate it.

We will use Gaussian as an example to go through the three steps of the compiler: map, pnr, and test.

## aha map
`aha map apps/gaussian` will first compile the halide application using the Halide to Hardware compiler. It takes in the gaussian_generator.cpp and process.cpp described in https://stanfordaha.github.io/aha-wiki-page/pages/03_h2h_files/.

Next, `aha map` will use a tool called MetaMapper to map the compute of the application (described in the CoreIR file `bin/gaussian_compute.json`) to PEs. It will produce a CoreIR file called `bin/gaussian_compute_mapped.json`.

Finally, it will use the clockwork tool to schedule the application and map the storage and buffers of the application to memory tiles. This step takes in `bin/gaussian_compute_mapped.json` and `bin/gaussian_memory.cpp` and produces a CoreIR file called `bin/design_top.json` (described in detail here: https://stanfordaha.github.io/aha-wiki-page/pages/08_design_files/).

## aha pnr
Now we have both verilog file and fully mapped CoreIR file. The `aha pnr apps/gaussian --width 32 --height 16` command will now place and route the application, perform pipelining, and generate a bitstream used to configure the CGRA to execute your application.

The `width` and `height` flag specify what portion of the array we like to map to. This can only be as large as the verilog that you generated using `aha garnet` but may be smaller if desired.

After running the command, the bitstream file would be saved in `./bin/gaussian.bs`. Additionally, several design files (described here: https://stanfordaha.github.io/aha-wiki-page/pages/08_design_files/) will be generated as well.

## aha test
Then we can run `aha test apps/gaussian`, which will run a VCS functional simulation of your application running on your CGRA verilog. On the kiwi server, use `module load base vcs` to load vcs before running `aha test`.

You can optionally generate an fsdb waveform using the `--waveform` flag. This will require Verdi to be loaded: `module load verdi`. The waveform will be called `cgra.fsdb` and will be in `/aha/garnet/tests/test_app`.

## aha sta
To run your mapped application through a critical path timing model, we can use `aha sta apps/gaussian`. The maximum frequency that you will be able to run the CGRA array at when running the application will be printed.

The `aha sta` command can also be used to generate a visualization of the critical path of the application using the `--visualize or -v` flag. It will produce `./bin/pnr_result_{width}.png`. See `aha/aha/util/sta.py` for more details.

## aha regress
When testing architectural or compiler changes, its often useful to automate the compilation and testing of many applications. `aha regress` is the command that we use for this purpose. In this step, it will iterate through a list of applications and run `aha map`, `aha pnr`, and `aha test`. There are many regression application suites, each with different uses.


- `aha regress fast` will run the smallest sparse and smallest dense image processing application and is intended to run in minutes.
- `aha regress daily` tests more complex applications including gaussian, harris, camera pipeline, unsharp, and resnet. It generally takes hours.
- `aha regress full` tests every application and takes several hours to complete.


To see a list of all regress suites and the applications they include, see `aha/util/regress.py`.

## Environmental Variables
To modify the compilation, scheduling, place-and-route, and pipelining of our applications, we use environmental variables. Here is a brief description of the important variables that we use:

- `HALIDE_GEN_ARGS` determines scheduling parameters used in the `{app}_generator.cpp` described here: https://stanfordaha.github.io/aha-wiki-page/pages/03_h2h_files. A valid assignment to this variable is a space separated list of `GeneratorParam` listed at the top of `{app}_generator.cpp`. For example if you want to run gaussian you could set `HALIDE_GEN_ARGS="mywidth=62 myunroll=2 schedule=3`.
- `PIPELINED` determines whether or not compute pipelining is turned on. Valid values are `PIPELINED=1` (default) compute pipelining turned on or `PIPELINED=0` compute pipelining turned off
- `DISABLE_GP` determines whether or not the global placement stage of place-and-route is done. `DISABLE_GP=1` (default) turns off global placement, `DISABLE_GP=0` turns on global placement
- `HL_TARGET` has two valid values `HL_TARGET=host-x86-64` or `HL_TARGET=host-x86-64-enable_ponds`. Use `HL_TARGET=host-x86-64-enable_ponds` do enable the memory mapper to use the register files (ponds) present in our PE tiles. By default, this is set to `HL_TARGET=host-x86-64` for image processing applications and `HL_TARGET=host-x86-64-enable_ponds` for machine learning applications.
- `PNR_PLACER_EXP` is a tuning knob for the placement stage of place-and-route. Generally, a higher value results in an application placement and routing that has a shorter critical path and may reduce the runtime of applications running on the array. By default, this variable is not set, and the place-and-route tool will choose the first value that routes successfully.
- `SWEEP_PNR_PLACER_EXP` will tell the place-and-route tool to try every value of `PNR_PLACER_EXP` from 1 to 30 and choose the result with the shortest critical path. By default it is not set.



To avoid needing to set each of these environmental variables each time we run an application, we have included default values in `aha/util/application_parameters.json`. If you run any AHA flow command, it will use the `default` entry in the json file for the application you are compiling. If you would like to run the fastest (shortest critical path) version of the application you can run:
`aha halide apps/gaussian --env-parameters fastest`

Some machine learning applications, including ResNet, have significantly different schedules depending on the layer in the network. In the AHA flow, we use one application with different environmental parameters to choose which layer and which schedule to use. Default parameters for each layer are saved in `aha/util/application_parameters.json` and can be selected using the `--layer` flag. For example:

`aha map apps/resnet_output_stationary --layer conv1`

`aha pnr apps/resnet_output_stationary --width 32 --height 16 --layer conv1`
22 changes: 4 additions & 18 deletions _pages/03_h2h_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,10 @@ date: 2022-01-03 (Date used for order should be 2022-01-XX.)
layout: post
---

Under each application folder, there are two files that describe the application in Halide.

`<AHA_path>/aha/Halide-to-Hardware/apps/hardware_benchmarks/apps/<app>/<app>_generator.cpp`

`<AHA_path>/aha/Halide-to-Hardware/apps/hardware_benchmarks/apps/<app>/process.cpp`

In `<app>_generator.cpp`, there are several knobs that we can control to determine how the images are streamed. The schedule, mywidth, and myunroll are three arguments that we usually focus on. We can change the arguments either by setting `HALIDE_GEN_ARGS` (e.g. `export HALIDE_GEN_ARGS="mywidth=62 myunroll=2 schedule=3"` for gaussian) or setting the arguments in both `<app>_generator.cpp` and `process.cpp`. To quickly know the `HALIDE_GEN_ARGS` that work for dense apps, we could check in `/aha/aha/util/regress.py`.

# Change the Application Unrolling
In order to change the utilization of a certain application, we can change its unrolling by setting `myunroll`. This would be useful if we want to exploit the maximum utilization of the CGRA array or use low unrolling duplication to get higher frequency. However, there are certain rules needed to be honored when changing the unrolling.

1. `output width % unroll == 0`
2. `input bank width < 20 || input bank width % 4 == 0`
This is the old wiki, it has been decommissioned. The new wiki can be found here:
* [https://github.com/StanfordAHA/aha/wiki](https://github.com/StanfordAHA/aha/wiki)

The variables can be calculated as follows:
- The `output width` is the `mywidth` set in `<app>_generator.cpp` and `process.cpp`.
- `input bank width = ceiling(input width / unroll)`.
- The `input width` depends on which application you are running. For most dense applications (e.g. gaussian, harris_color, unsharp), `input width = output width + filter size - 1`. The `filter size` could be found in `<app>_generator.cpp`. Take the gaussian for example, `filter size = 3`.
The particular page you were looking for can be found here maybe:
* https://github.com/StanfordAHA/aha/wiki/H2H-Description

# Change the Tile Size
In most cases, the images are not streamed to the CGRA at one time, but are divided into several tiles first and then streamed tile by tile. When you run the bitstream generation and the rtl simulation test, you only get the estimated frequency, cycle delays, and simulation result for one tile. We can change the tile size by adjusting the tileWidth and tileHeight in `<app>_generator.cpp` for specific purposes. For example, if we want to accelerate the spin time for CI test, we can set a smaller tile size by reducing `mywidth` in `HALIDE_GEN_ARGS`. If we want to make sure that the whole image is processed correctly, we can set the size to the whole image size to test the functionality of an application.
Loading

0 comments on commit 5da8dd2

Please sign in to comment.