[vectorization] Apply tiling only if element types vectorizable #476

newling · 2024-06-27T21:06:38Z

Vectorization in iree-amd-aie consists of 2 passes:

tile linalg.generic ops in all leading dimensions, so that batched matmuls, and matmuls which have been packed into high-dimesions with multiple reduction dimensions, get replaced by unbatched 'atomic' matmuls with scf.for loops
lower the 'atomic' matmuls to the vector dialect (vector.contract and other vector dialect casts/copies).

Before this PR, pass 1 applied to all linalg.generics, irrespective of their element type. However, pass 2 only applies to linalg.generics for which the operand element types have hardware support on AIE for vectorization. All linalg.generics with other types, like matmuls with i32 operands and result, are not converted to the vector dialect, they are later unrolled in a pass, linalg-to-loop (or something).

So basically pass 1, before this PR, might transform linalg.generics in preparation for vectorization, even though in pass 2 they are not vectorized. This is not a problem, but unnecessarily changes IR. More importantly, there has been a request to make pass 1 more conservative and leave unvectorizable ops alone, so make a later object-fifo related pass easier (@yzhang93 @Abhishek-Varma)

So that's what this PR does: makes the loop tiling only apply when the element types are vectorizable for AIE.

yzhang93

LGTM. Any idea what is failing?

newling · 2024-06-27T21:53:31Z

LGTM. Any idea what is failing?

I've been trying to understand the failure for 30 mins, no success, very mysterious to me

newling · 2024-06-28T01:14:52Z

LGTM. Any idea what is failing?

I've been trying to understand the failure for 30 mins, no success, very mysterious to me

If this is supported in llvm-aie (peano) Xilinx/llvm-aie#102 then I think this path will work.

nirvedhmeshram · 2024-06-28T13:48:36Z

LGTM. Any idea what is failing?

I've been trying to understand the failure for 30 mins, no success, very mysterious to me

If this is supported in llvm-aie (peano) Xilinx/llvm-aie#102 then I think this path will work.

I think we should still look into this from our side on why we have <2 x s32> when we were expecting to make scalar code, do we know where that happens? If not @newling could you please produce the ir dump with the standard *-dump-ir-after-all flags and in addition we may also need to pass --print-after-all flag to opt over here if its not our mlir lowering doing it.

newling · 2024-06-28T15:35:37Z

LGTM. Any idea what is failing?

I've been trying to understand the failure for 30 mins, no success, very mysterious to me

If this is supported in llvm-aie (peano) Xilinx/llvm-aie#102 then I think this path will work.

I think we should still look into this from our side on why we have <2 x s32> when we were expecting to make scalar code, do we know where that happens? If not @newling could you please produce the ir dump with the standard *-dump-ir-after-all flags and in addition we may also need to pass --print-after-all flag to opt over here if its not our mlir lowering doing it.

I think it's because linalg-to-loops only scalarizes the reduction dimensions of the packed linalg.generic.

So the insert-loops-for-vectorization, before this PR, was scalarizing (aka tiling to size 1) the outer m- and n- dimensions. But with this PR, that does not happen. See below (left is before this PR, right is after this PR).

newling · 2024-06-28T16:04:01Z

LGTM. Any idea what is failing?

I've been trying to understand the failure for 30 mins, no success, very mysterious to me

If this is supported in llvm-aie (peano) Xilinx/llvm-aie#102 then I think this path will work.

I think we should still look into this from our side on why we have <2 x s32> when we were expecting to make scalar code, do we know where that happens? If not @newling could you please produce the ir dump with the standard *-dump-ir-after-all flags and in addition we may also need to pass --print-after-all flag to opt over here if its not our mlir lowering doing it.

I think it's because linalg-to-loops only scalarizes the reduction dimensions of the packed linalg.generic.

So the insert-loops-for-vectorization, before this PR, was scalarizing (aka tiling to size 1) the outer m- and n- dimensions. But with this PR, that does not happen. See below (left is before this PR, right is after this PR).

I take that back, it is being completely scalarized (before and after). Just that before there is an intermediate memref.subview.

newling · 2024-06-28T16:11:30Z

One more data point: this issue only arises with matmul transpose. Transpose vs non-transpose below:

(The peano error arises in the matmul-transpose case, not the matmul case).

newling · 2024-06-28T16:20:11Z

Just to summarize my view of the situation:

This is all for matmul with i32, which is not our priority (and shouldn't be a priority for peano either)
matmul with i32 should not require any vectorization-specific passes to compile end-to-end
Disabling --insert-loops-for-vectorization for i32 makes peano compilation fail
So by pure luck, the pass --insert-loops-for-vectorization puts the IR (matmul transpose) in a state which peano handles
With or without --insert-loops-for-vectorization, the IR is correctly scalarized by --linalg-to-loops

newling · 2024-06-28T19:34:32Z

build_tools/ci/run_matmul_test.sh

@@ -654,13 +669,6 @@ run_matmul_test \
    --acc_type "f32" \
    --m "128"  --n "128" --k "2304" \

-run_matmul_test \
-  --name_prefix "packPeel_t_i32" \
-  --pipeline "pack-peel" \


I've removed this test, as there is a test with the same shapes for bf16.

yzhang93 · 2024-06-28T19:38:14Z

build_tools/ci/run_matmul_test.sh

+# Note I'm not using the --expect_compile_failure flag here,
+# as that would require all developers to use the same verion
+# of peano, which we currently don't enforce. 


I don't quite understand this statement.
Nit typo: verion -> version

If we use the --expect_compile_failure=1 trick here, then after peano has fixed the issue we'll need to do

bump peano past the fix on CI, and for all developers

change to --expect_compile_failure=0

at the same time. But we don't control the version of peano we use from with iree-amd-aie (like we do iree, and third-party repos), so this is basically impossible.

(....I think I'll just remove the comment!)

yzhang93 · 2024-06-28T19:40:49Z

build_tools/ci/run_matmul_test.sh

 run_matmul_test \
-  --name_prefix "transpose_int32" \
-  --lhs_rhs_type "i32" \
+  --name_prefix "transpose_i8_i32" \


This test is successful? Only the input and output types are i32 failed?

Yup. i8 -> i32 is vectorized.

newling requested review from MaheshRavishankar, nirvedhmeshram, yzhang93, Abhishek-Varma and jtuyls as code owners June 27, 2024 21:06

newling changed the title ~~Apply tiling only if vectorization likely~~ [vectorization] Apply tiling only if element types vectorizable Jun 27, 2024

yzhang93 mentioned this pull request Jun 27, 2024

Add objectFifo cpp pipeline #473

Merged

yzhang93 reviewed Jun 27, 2024

View reviewed changes

newling added 2 commits June 28, 2024 12:22

first commit

a84df13

update tests

0fda88f

newling force-pushed the only_tile_if_eltype_vectorizable branch from 82facf6 to 0fda88f Compare June 28, 2024 19:33

newling commented Jun 28, 2024

View reviewed changes

yzhang93 approved these changes Jun 28, 2024

View reviewed changes

newling added 2 commits June 28, 2024 13:24

disable another test

88821f9

disable additional tests

c5debf9

newling merged commit c8d095b into nod-ai:main Jun 28, 2024
2 checks passed

newling deleted the only_tile_if_eltype_vectorizable branch July 13, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vectorization] Apply tiling only if element types vectorizable #476

[vectorization] Apply tiling only if element types vectorizable #476

newling commented Jun 27, 2024

yzhang93 left a comment

newling commented Jun 27, 2024

newling commented Jun 28, 2024

nirvedhmeshram commented Jun 28, 2024 •

edited

Loading

newling commented Jun 28, 2024

newling commented Jun 28, 2024

newling commented Jun 28, 2024 •

edited

Loading

newling commented Jun 28, 2024

newling Jun 28, 2024

yzhang93 Jun 28, 2024

newling Jun 28, 2024

yzhang93 Jun 28, 2024

newling Jun 28, 2024

[vectorization] Apply tiling only if element types vectorizable #476

[vectorization] Apply tiling only if element types vectorizable #476

Conversation

newling commented Jun 27, 2024

yzhang93 left a comment

Choose a reason for hiding this comment

newling commented Jun 27, 2024

newling commented Jun 28, 2024

nirvedhmeshram commented Jun 28, 2024 • edited Loading

newling commented Jun 28, 2024

newling commented Jun 28, 2024

newling commented Jun 28, 2024 • edited Loading

newling commented Jun 28, 2024

newling Jun 28, 2024

Choose a reason for hiding this comment

yzhang93 Jun 28, 2024

Choose a reason for hiding this comment

newling Jun 28, 2024

Choose a reason for hiding this comment

yzhang93 Jun 28, 2024

Choose a reason for hiding this comment

newling Jun 28, 2024

Choose a reason for hiding this comment

nirvedhmeshram commented Jun 28, 2024 •

edited

Loading

newling commented Jun 28, 2024 •

edited

Loading