Skip to content

BPF: extend JSET optimisation to cover select IR pattern#2

Open
0xull wants to merge 1 commit intoblueshift-gg:upstream-gallery-21from
0xull:jset-select-pattern-optimisation
Open

BPF: extend JSET optimisation to cover select IR pattern#2
0xull wants to merge 1 commit intoblueshift-gg:upstream-gallery-21from
0xull:jset-select-pattern-optimisation

Conversation

@0xull
Copy link

@0xull 0xull commented Mar 13, 2026

The existing foldBitTestBranchIntoJSet peephole folds AND + JNE dst, 0
into JSET for the br IR path. When both branch outcomes are compile-time
constants, LLVM emits select instead of br. select lowers through
EmitInstrWithCustomInserter into a diamond CFG with JEQ_ri (SETEQ
polarity), which the peephole never matched.

This patch extends the optimisation to the select path with two changes:

  1. BPFISelLowering.cpp: detect select(SETEQ, AND(a,b), 0, T, F) in
    EmitInstrWithCustomInserter and emit JNE_ri instead of JEQ_ri,
    swapping the PHI operands to compensate for the polarity inversion.
    This feeds the existing peephole exactly the AND + JNE structure it
    already handles.

  2. BPFMIPeephole.cpp: replace the single getPrevNonDebugInstr call with
    a backward scan that finds the AND defining DstReg regardless of how
    many non-defining instructions (e.g. the MOV materialised by PHI
    lowering) intervene between the AND and the JNE.

Result for select_jset(x: u64, m: u64) -> u64:
Before: and64 + mov64 + jeq + mov64 + exit (5 instructions)
After: mov64 + jset + mov64 + exit (4 instructions)

The AND is eliminated.

continue;

MachineInstr *AndMI = getPrevNonDebugInstr(BrMI);
// Scan back to find the AND defining DstReg, skipping non-defining

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! i feel this is how we're supposed to find AndMI. BUT now that we're skipping instructions in between, the operands could get touched in the middle, right?

so for example

%and = and %0, %1
%0 = changed_here
br %and 

then now you're comparing the updated value of %0 after converting and + br to jset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides, could you put it in a lambda function like how getPrevNonDebugInstr was implemented?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, good catch. fixed it by adding a forward scan after finding the AND that bails if any intervening instruction clobbers the AND's source register. also wrapped it in a lambda.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing! BPF uses Wn registers as the low 32-bit register alias of Rn, so MO.getReg() would return different values for for example, w1 and r1, but they're actually using the same physical r1 when it lowers to BPF code

example having intervening instruction clobbers the AND's source register even after this change,

r1 &= r2
w1 = 0
if r1 != 0 goto bb

Copy link
Author

@0xull 0xull Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @clairechingching for the catch. replaced the direct register equality checks in both the backward scan and the forward clobber scan with TRI->regsOverlap() to handle Wn/Rn aliasing correctly. if you have any other approach, i'd love to know.

: nullptr;
if (DefMI && DefMI->getParent() == BB &&
(DefMI->getOpcode() == BPF::AND_rr ||
DefMI->getOpcode() == BPF::AND_ri)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 455, 456 explicitly skip and instruction if any of the operand is not a register, so we don't want to touch DefMI->getOpcode() == BPF::AND_ri, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. i didn't foresee that, lines 455-456 bail if operand 1 is not a register, so AND_ri would never complete the fold anyway. removed it from the detection check.

Copy link

@clairechingching clairechingching Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ackshually.jpg (haha) we shouldn't drop AND_ri. After checking again, the code i was referring to (now line 471) is actually checking whether LHS and first operand at RHS are both registers and referring to DstReg so for example,

%1 = and %1, %2 instead of %3 = and %1, %2

operand %2 can be either register or immediate

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, misread the original comment. it is true, operand 2 can be register or immediate; added AND_ri back.

@0xull 0xull force-pushed the jset-select-pattern-optimisation branch 2 times, most recently from 658c69e to f7e356a Compare March 17, 2026 14:18
@clairechingching
Copy link

looking good! the last i'm hoping for is to have test to verify JSET being actually emitted. i looked around and think llvm/test/CodeGen/MIR/ is the place to add. could you add a BPF/ folder and follow other existing examples to add few tests?

  1. jset_select.ll is actually emitting JSET instruction
  2. same code with jset_select.ll but with one operand of AND being an immediate value
  3. pattern with intervening instructions clobbering AND operands should not get turned into JSET

thank you! good job!

@0xull
Copy link
Author

0xull commented Mar 19, 2026

hello @clairechingching

i've added the following MIR-level tests in llvm/test/CodeGen/MIR/BPF/:
jset-select-rr.mir — AND_rr + JNE folds to JSET_rr
jset-select-ri.mir — AND_ri + JNE folds to JSET_ri
jset-select-clobber.mir — intervening clobber of AND source prevents fold

@0xull 0xull force-pushed the jset-select-pattern-optimisation branch 4 times, most recently from b5d3188 to e43af8f Compare March 23, 2026 04:15
This patch detects select(SETEQ, AND(a,b), 0, T, F) in EmitInstrWithCustomInserter
and emits JNE_ri with swapped PHI operands instead of JEQ_ri. Then extends the
existing foldBitTestBranchIntoJSet peephole to fold AND+JNE into JSET.
@0xull 0xull force-pushed the jset-select-pattern-optimisation branch from e43af8f to 0cba6d9 Compare March 23, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants