BPF: extend JSET optimisation to cover select IR pattern#2
BPF: extend JSET optimisation to cover select IR pattern#20xull wants to merge 1 commit intoblueshift-gg:upstream-gallery-21from
Conversation
| continue; | ||
|
|
||
| MachineInstr *AndMI = getPrevNonDebugInstr(BrMI); | ||
| // Scan back to find the AND defining DstReg, skipping non-defining |
There was a problem hiding this comment.
Nice! i feel this is how we're supposed to find AndMI. BUT now that we're skipping instructions in between, the operands could get touched in the middle, right?
so for example
%and = and %0, %1
%0 = changed_here
br %and
then now you're comparing the updated value of %0 after converting and + br to jset
There was a problem hiding this comment.
besides, could you put it in a lambda function like how getPrevNonDebugInstr was implemented?
There was a problem hiding this comment.
thank you, good catch. fixed it by adding a forward scan after finding the AND that bails if any intervening instruction clobbers the AND's source register. also wrapped it in a lambda.
There was a problem hiding this comment.
one more thing! BPF uses Wn registers as the low 32-bit register alias of Rn, so MO.getReg() would return different values for for example, w1 and r1, but they're actually using the same physical r1 when it lowers to BPF code
example having intervening instruction clobbers the AND's source register even after this change,
r1 &= r2
w1 = 0
if r1 != 0 goto bb
There was a problem hiding this comment.
thank you @clairechingching for the catch. replaced the direct register equality checks in both the backward scan and the forward clobber scan with TRI->regsOverlap() to handle Wn/Rn aliasing correctly. if you have any other approach, i'd love to know.
| : nullptr; | ||
| if (DefMI && DefMI->getParent() == BB && | ||
| (DefMI->getOpcode() == BPF::AND_rr || | ||
| DefMI->getOpcode() == BPF::AND_ri)) { |
There was a problem hiding this comment.
line 455, 456 explicitly skip and instruction if any of the operand is not a register, so we don't want to touch DefMI->getOpcode() == BPF::AND_ri, right?
There was a problem hiding this comment.
yes, you're right. i didn't foresee that, lines 455-456 bail if operand 1 is not a register, so AND_ri would never complete the fold anyway. removed it from the detection check.
There was a problem hiding this comment.
ackshually.jpg (haha) we shouldn't drop AND_ri. After checking again, the code i was referring to (now line 471) is actually checking whether LHS and first operand at RHS are both registers and referring to DstReg so for example,
%1 = and %1, %2 instead of %3 = and %1, %2
operand %2 can be either register or immediate
There was a problem hiding this comment.
my bad, misread the original comment. it is true, operand 2 can be register or immediate; added AND_ri back.
658c69e to
f7e356a
Compare
|
looking good! the last i'm hoping for is to have test to verify JSET being actually emitted. i looked around and think
thank you! good job! |
|
hello @clairechingching i've added the following MIR-level tests in |
b5d3188 to
e43af8f
Compare
This patch detects select(SETEQ, AND(a,b), 0, T, F) in EmitInstrWithCustomInserter and emits JNE_ri with swapped PHI operands instead of JEQ_ri. Then extends the existing foldBitTestBranchIntoJSet peephole to fold AND+JNE into JSET.
e43af8f to
0cba6d9
Compare
The existing
foldBitTestBranchIntoJSetpeephole folds AND + JNE dst, 0into JSET for the br IR path. When both branch outcomes are compile-time
constants, LLVM emits select instead of br. select lowers through
EmitInstrWithCustomInserterinto a diamond CFG with JEQ_ri (SETEQpolarity), which the peephole never matched.
This patch extends the optimisation to the select path with two changes:
BPFISelLowering.cpp: detect select(SETEQ, AND(a,b), 0, T, F) inEmitInstrWithCustomInserterand emit JNE_ri instead of JEQ_ri,swapping the PHI operands to compensate for the polarity inversion.
This feeds the existing peephole exactly the AND + JNE structure it
already handles.
BPFMIPeephole.cpp: replace the singlegetPrevNonDebugInstrcall witha backward scan that finds the AND defining DstReg regardless of how
many non-defining instructions (e.g. the MOV materialised by PHI
lowering) intervene between the AND and the JNE.
Result for select_jset(x: u64, m: u64) -> u64:
Before: and64 + mov64 + jeq + mov64 + exit (5 instructions)
After: mov64 + jset + mov64 + exit (4 instructions)
The AND is eliminated.