-
Notifications
You must be signed in to change notification settings - Fork 7
TODO #1
Copy link
Copy link
Open
Description
- Enable launch in the hierarchical structure (e.g., nested scf loops).
- Enable pattern matching with pre-defined CGRA ops.
- Implement integration test (on CPU) including.
- Lower control part (e.g., for loops and main function) to LLVM IR and execute them on CPU using lli.
- Can include the main func in the same file and run with mlir-cpu-runner: https://github.com/agostini01/llvm-project/blob/83470445ffdf7984c73a41da140c4e651ccad5a6/mlir/test/axi4mlir-runner/run-axi-v1-data-copy.mlir (no need)
- A main.cpp template to call the lowered mlir func.
- Replace the SODA-/CGRA-MLIR part/opt in lli/mlir-cpu-runner with dummy func but with appropriate input/output data for the other lowered parts.
- Replace with simple template: https://gitlab.pnnl.gov/sodalite/soda-opt/-/blob/main/test/Dialect/SODA/host-generation.mlir
- Replace with user-defined func: agostini01/llvm-project@8347044#diff-1f7239455135ba129f8889b1d5567bcca9c8ac472d08532b5e2cb21ce80bb527
- Lowering to customized func call: https://github.com/agostini01/llvm-project/blob/39bb4786d2a844a91e4e534fc2bc5d69242fb8ee/mlir/lib/Conversion/LinalgToAXI4MLIR/LinalgToAXI4MLIR.cpp
- Generate llvm.call: https://github.com/tancheng/mlir-cgra/blob/main/dev/integration_test/script
- Make pattern string/id as a parameter for fusion call?
- The generic opts that cannot be fused need to be offloaded into a single module with id as func name.
- Scripts to flatten the generic opts and lower to LLVMIR for conventional CGRA mapping with appropriate func call.
- A top level pthread simulator that can mimic data communication/queue blocking/synch and obtain execution cycles.
- Lower control part (e.g., for loops and main function) to LLVM IR and execute them on CPU using lli.
- Determine the target models (just go with attention-based).
- Start with simple/manual model (onnx-mlir or pytorch-mlir).
- A Bert model to go through the entire flow.
- support all the required basic opts.
- batch_matmul (by tiling).
- fusion should care reduction?
- exp (by lowering).
- Enable a simple tiling strategy with the consideration of on-chip buffer size, double buffering, and DMA Gb/s.
- Refer to the linalg tiling pass to make each opt tiled by a specific/user-provided/strategy-generated factor rather than the same factor applying across all the opts.
- Evaluation (compare with traditional CGRA).
- Make the DMA overhead and generic kernel execution latency accurate.
- Bert golden.
- Bert baseline.
- Bert cgra.
- Others.
Future work:
- Conv -> Gemm (Nico is working on).
- Connect the dummy interface/CGRA call with OpenCGRA to enable end-to-end simulation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels