Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: powerpc64/bpf: Inline helper in powerpc JIT
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1030844

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 835a507
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1030844
version: 3

@kernel-patches-review-bot
Copy link

commit 558b423954774b682c4b993fd904da4cf8c9f549
Author: Saket Kumar Bhaskar <[email protected]>

powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU
addrs

This commit adds support for BPF_MOV64_PERCPU_REG instruction to the
powerpc64 JIT compiler, enabling per-CPU address resolution by loading
the per-CPU offset from paca and adding it to the source register.

> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 5e976730b2f5..d53e9cd7563f 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c

[ ... ]

> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 1fe37128c876..816f9d7d9e5d 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -918,6 +918,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
>  	case BPF_ALU | BPF_MOV | BPF_X: /* (u32) dst = src */
>  	case BPF_ALU64 | BPF_MOV | BPF_X: /* dst = src */
>
> +		if (insn_is_mov_percpu_addr(&insn[i])) {
> +			if (IS_ENABLED(CONFIG_SMP)) {
> +				EMIT(PPC_RAW_LD(tmp1_reg, _R13, offsetof(struct paca_struct, data_offset)));
> +				EMIT(PPC_RAW_ADD(dst_reg, src_reg, tmp1_reg));
> +			} else if (src_reg != dst_reg) {
> +				EMIT(PPC_RAW_MR(dst_reg, src_reg));
> +			}
> +		}
> +

Does this code need a break statement here? After handling the per-CPU
MOV instruction, execution falls through to the subsequent MOV handling
code. In the common case where dst_reg != src_reg with CONFIG_SMP
enabled, the code first emits ADD to compute the per-CPU address in
dst_reg, but then continues to the final else-if in this case statement
which emits MR dst_reg, src_reg, overwriting the computed address.

Other architectures (x86 and arm64) use break after their per-CPU
instruction handling in bpf_jit_build_body()->BPF_MOV case.

>  		if (insn_is_cast_user(&insn[i])) {
>  			EMIT(PPC_RAW_RLDICL_DOT(tmp1_reg, src_reg, 0, 32));
>  			PPC_LI64(dst_reg, (ctx->user_vm_start & 0xffffffff00000000UL));
>  			PPC_BCC_SHORT(COND_EQ, (ctx->idx + 2) * 4);
>  			EMIT(PPC_RAW_OR(tmp1_reg, dst_reg, tmp1_reg));
>  			EMIT(PPC_RAW_MR(dst_reg, tmp1_reg));
>  			break;
>  		}
>
>  		if (imm == 1) {
>  			/* special mov32 for zext */
>  			EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0, 0, 31));
>  			break;
>  		} else if (off == 8) {
>  			EMIT(PPC_RAW_EXTSB(dst_reg, src_reg));
>  		} else if (off == 16) {
>  			EMIT(PPC_RAW_EXTSH(dst_reg, src_reg));
>  		} else if (off == 32) {
>  			EMIT(PPC_RAW_EXTSW(dst_reg, src_reg));
>  		} else if (dst_reg != src_reg)
                        ^^^^
			This condition can be true for per-CPU instructions
			(which have off == -1), causing the computed per-CPU
			address to be overwritten with src_reg.

>  			EMIT(PPC_RAW_MR(dst_reg, src_reg));
>  		goto bpf_alu32_trunc;



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19964922727

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 81f88f6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1030844
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 5d9fb42
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1030844
version: 3

Saket Kumar Bhaskar added 2 commits December 5, 2025 17:00
…PU addrs

With the introduction of commit 7bdbf74 ("bpf: add special
internal-only MOV instruction to resolve per-CPU addrs"),
a new BPF instruction BPF_MOV64_PERCPU_REG has been added to
resolve absolute addresses of per-CPU data from their per-CPU
offsets. This update requires enabling support for this
instruction in the powerpc JIT compiler.

As of commit 7a0268f ("[PATCH] powerpc/64: per cpu data
optimisations"), the per-CPU data offset for the CPU is stored in
the paca.

To support this BPF instruction in the powerpc JIT, the following
powerpc instructions are emitted:
if (IS_ENABLED(CONFIG_SMP))
ld tmp1_reg, 48(13)		//Load per-CPU data offset from paca(r13) in tmp1_reg.
add dst_reg, src_reg, tmp1_reg	//Add the per cpu offset to the dst.
else if (src_reg != dst_reg)
mr dst_reg, src_reg		//Move src_reg to dst_reg, if src_reg != dst_reg

To evaluate the performance improvements introduced by this change,
the benchmark described in [1] was employed.

Before Change:
glob-arr-inc   :   41.580 ± 0.034M/s
arr-inc        :   39.592 ± 0.055M/s
hash-inc       :   25.873 ± 0.012M/s

After Change:
glob-arr-inc   :   42.024 ± 0.049M/s
arr-inc        :   55.447 ± 0.031M/s
hash-inc       :   26.565 ± 0.014M/s

[1] anakryiko/linux@8dec900975ef

Reviewed-by: Puranjay Mohan <[email protected]>
Signed-off-by: Saket Kumar Bhaskar <[email protected]>
…task/_btf()

Inline the calls to bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
in the powerpc bpf jit.

powerpc saves the Logical processor number (paca_index) and pointer
to current task (__current) in paca.

Here is how the powerpc JITed assembly changes after this commit:

Before:

cpu = bpf_get_smp_processor_id();

addis 12, 2, -517
addi 12, 12, -29456
mtctr 12
bctrl
mr	8, 3

After:

cpu = bpf_get_smp_processor_id();

lhz 8, 8(13)

To evaluate the performance improvements introduced by this change,
the benchmark described in [1] was employed.

+---------------+-------------------+-------------------+--------------+
|      Name     |      Before       |        After      |   % change   |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc  | 40.701 ± 0.008M/s | 55.207 ± 0.021M/s |   + 35.64%   |
| arr-inc       | 39.401 ± 0.007M/s | 56.275 ± 0.023M/s |   + 42.42%   |
| hash-inc      | 24.944 ± 0.004M/s | 26.212 ± 0.003M/s |   +  5.08%   |
+---------------+-------------------+-------------------+--------------+

[1] anakryiko/linux@8dec900975ef

Reviewed-by: Puranjay Mohan <[email protected]>
Signed-off-by: Saket Kumar Bhaskar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant