-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[LV, VP]VP intrinsics support for the Loop Vectorizer
This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions, These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV, but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. Other important part of this approach is how the Explicit Vector Length is computed. (We use active vector length and explicit vector length interchangeably; VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We consider the following three ways to compute the EVL parameter for the VP Intrinsics. - The simplest way is to use the VF as EVL and rely solely on the mask parameter to control predication. The mask parameter is the same as computed for current tail-folding implementation. - The second way is to insert instructions to compute `min(VF, trip_count - index)` for each vector iteration. - For architectures like RISC-V, which have special instruction to compute/set an explicit vector length, we also introduce an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. ===Tentative Development Roadmap=== * Use vp-intrinsics for all possible vector operations. That work has 2 possible implementations: 1. Introduce a new pass which transforms emitted vector instructions to vp intrinsics if the the loop was transformed to use predication for loads/stores. The advantage of this approach is that it does not require many changes in the loop vectorizer itself. The disadvantage is that it may require to copy some existing functionality from the loop vectorizer in a separate patch, have similar code in the different passes and perform the same analysis 2 times, at least. 2. Extend Loop Vectorizer using VectorBuildor and make it emit vp intrinsics automatically in presence of EVL value. The advantage is that it does not require a separate pass, thus it may reduce compile time. Plus, we can avoid code duplication. It requires some extra work in the LoopVectorizer to add VectorBuilder support and smart vector instructions/vp intrinsics emission. Also, to fully support Loop Vectorizer it will require adding a new PHI recipe to handle EVL on the previous iteration + extending several existing recipes with the new operands (depends on the design). * Switch to vp-intrinsics for memory operations for VLS and VLA vectorizations. Differential Revision: https://reviews.llvm.org/D99750
- Loading branch information
1 parent
17afa5b
commit 2c203bb
Showing
24 changed files
with
1,632 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.