-
Notifications
You must be signed in to change notification settings - Fork 77
Wave Transform to generate SSA Exec mask manipulation instrs #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-feature/wave-transform
Are you sure you want to change the base?
Wave Transform to generate SSA Exec mask manipulation instrs #789
Conversation
|
Failed to trigger build: |
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
cdevadas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you done the clang-format? Felt like at some places the format wasn't good.
| /// getDomVRegDefInBasicBlock - Return the last machine instr that defines | ||
| /// the specified virtual register in the basic block, searching backwards | ||
| /// from instruction I (exclusive). Returns MBB.end() if no definition is found. | ||
| LLVM_ABI MachineBasicBlock::iterator getDomVRegDefInBasicBlock(Register Reg, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should go to the MRI. This function is for a very specific usecase and used currently only for AMDGPU. SIRegisterInfo.cpp seems ok to me.
|
Failed to trigger build: |
Addressed in latest commit |
|
I have started reviewing the code change . In the meantime,
Actually, for those ll files, I would suggest that we first have a separate PR to add those run-line and check-result showing what the MIR look like right before wave-transform. Hopefully, those tests are easy to add, they can get merged before this PR. |
| Register Reg, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const { | ||
| if(I == MBB.begin()) return MBB.end(); | ||
| Register Reg, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const { | ||
| if (I == MBB.begin()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code can be simplified into a while (I != MBB.begin()) { .... } loop, right? also return a pointer of MachineInstr seems more straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Turn it into a for/while loop and add this condition check as the loop terminator. I second that idea of returning a MachineInstr*. All the machinery of checking I != MBB.end() at the call-sites (after this function returns) can be simplified with a !MI.
|
Also please make sure all the code comments are up to date with the code changes. For example, any comment mentioning PHI node is likely out of date. Also I feel thtat we should clean up LaneMaskUtil code that does not really get used. For example, for our application, we always assume accumulating == true. If we are not going to maintain the code that assumes accumulating == false, we may want to delete them. I personally would prefer getting the code as simpler as possible |
I thought about that initially. Once this patch gets merged, the next patch will be to enable wave-transform by default, and that would cover all lit tests in the new pipeline. At the moment, most lit tests would break if wave transform is force-enabled as the original implementation depends on the SSAUpdater and introduces PHI nodes. However, it makes sense to add some selected tests to verify the new wave-transform changes.
Better to select some control-flow tests involving loops and if-else and stop-after wave-transform pass. @lalaniket8 can you identify some tests and pre-commit the new changes?
|
Should we also remove the SSAReconstructor class in AMDGPUWaveTransform.cpp since that is not needed anymore?
Yes, I think it a good idea to remove the Default mode and keep only Accumulating mode, it will simplify the code a lot. |
You can add the clean up in this PR itself. |
How about the second part of the SSAReconstructor that deals with the dominance relation between defs and their respective uses?" Keep it for now. Anyway, we disabled the SSAReconstructor.run() invocation for now. Let's see if there is any fixup needed later when we turn on the wave-transform pipeline by default. |
|
Failed to trigger build: |
|
Cleanup looks good to me. In terms of testing, I was suggesting that we first add run-line for those LL tests to STOP-BEFORE wave-transform in a separate PR, I expect that should works (not crashing). If we can add more tests for more control-flow situations, that would be even better. In this PR, we should try to turn those STOP-BEFORE into STOP-after. We got multiple people here to examine those test results to ensure correctness, which should be a good and healthy exercise. |
| // Iterate backwards from I (exclusive) to the beginning of the basic block | ||
| do { | ||
| --I; | ||
| if (I->modifiesRegister(Reg, getTargetRegisterInfo())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (I->modifiesRegister(Reg, getTargetRegisterInfo())) | |
| if (I->definesRegister(Reg, getTargetRegisterInfo())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you looking for those instructions only that fully define Reg use the above suggestion because the existing one also accounts for the case of partial definition!
| /// getDomVRegDefInBasicBlock - Return the last machine instr that defines | ||
| /// the specified virtual register in the basic block, searching backwards | ||
| /// from instruction I (exclusive). Returns MBB.end() if no definition is | ||
| /// found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment seems bit misleading, as the last instruction while searching backward implies the very first definition in the MBB. I think you can separate the two lines : the last instruction encountered in the MBB & you search in the backward manner from I.
| I--; | ||
| BuildMI(*B, I, {}, TII->get(LMU.getLaneMaskConsts().MovOpc), ACC) | ||
| .addImm(0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you insert all resets one after another once you find the right place rather than searching for right insertion place for every accumulator to reset? Seems bit expensive!
Since we are moving Wave Transform to the middle of Register Allocation after PHI-elimination, the Exec Mask Manipulation instructions added to the code by Wave Transform should not be in SSA.
This PR contains code changes to support this.
We remove SSAUpdater originally used and used a single Accumulator Register to capture contributions from Thread-level CFG predecessors of a basic block. This Accumulator is used to set the appropriate EXEC mask. The Reset Flag Semantics of GCNLaneMaskUpdater is retained and used to reset the Accumulator at correct points in the code.