TODO

- [x] Enable launch in the hierarchical structure (e.g., nested scf loops).
- [x] Enable pattern matching with pre-defined CGRA ops.
- [x] Implement integration test (on CPU) including.
  - [x] Lower control part (e.g., for loops and main function) to LLVM IR and execute them on CPU using lli.
    - [x] Can include the main func in the same file and run with mlir-cpu-runner: https://github.com/agostini01/llvm-project/blob/83470445ffdf7984c73a41da140c4e651ccad5a6/mlir/test/axi4mlir-runner/run-axi-v1-data-copy.mlir (no need)
    - [x] A main.cpp template to call the lowered mlir func.
  - [x] Replace the SODA-/CGRA-MLIR part/opt in lli/mlir-cpu-runner with dummy func but with appropriate input/output data for the other lowered parts.
    - [x] Replace with simple template: https://gitlab.pnnl.gov/sodalite/soda-opt/-/blob/main/test/Dialect/SODA/host-generation.mlir
    - [x] Replace with user-defined func: https://github.com/agostini01/llvm-project/commit/83470445ffdf7984c73a41da140c4e651ccad5a6#diff-1f7239455135ba129f8889b1d5567bcca9c8ac472d08532b5e2cb21ce80bb527
    - [x] Lowering to customized func call: https://github.com/agostini01/llvm-project/blob/39bb4786d2a844a91e4e534fc2bc5d69242fb8ee/mlir/lib/Conversion/LinalgToAXI4MLIR/LinalgToAXI4MLIR.cpp
    - [x] Generate llvm.call: https://github.com/tancheng/mlir-cgra/blob/main/dev/integration_test/script
    - [x] Make pattern string/id as a parameter for fusion call?
    - [x] The generic opts that cannot be fused need to be offloaded into a single module with id as func name.
    - [x] Scripts to flatten the generic opts and lower to LLVMIR for conventional CGRA mapping with appropriate func call.
    - [x] A top level pthread simulator that can mimic data communication/queue blocking/synch and obtain execution cycles.
- [x] Determine the target models (just go with attention-based).
  - [x] Start with simple/manual model (onnx-mlir or pytorch-mlir).
  - [x] A Bert model to go through the entire flow.
- [x] support all the required basic opts.
  - [x] batch_matmul (by tiling). 
  - [x] fusion should care reduction?
  - [x] exp (by lowering).
- [x] Enable a simple tiling strategy with the consideration of on-chip buffer size, double buffering, and DMA Gb/s.
  - [x] Refer to the linalg tiling pass to make each opt tiled by a specific/user-provided/strategy-generated factor rather than the same factor applying across all the opts.
- [x] Evaluation (compare with traditional CGRA).
  - [x] Make the DMA overhead and generic kernel execution latency accurate.
  - [x] Bert golden.
  - [x] Bert baseline.
  - [x] Bert cgra.
  - [x] Others.

Future work:
- [ ] Conv -> Gemm (Nico is working on).
- [ ] Connect the dummy interface/CGRA call with OpenCGRA to enable end-to-end simulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TODO #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions