polkavm-compiler: Uninline division and remainder instructions #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

aman4150 wants to merge 1 commit into paritytech:master from aman4150:division-subroutine

Collaborator

aman4150 commented Sep 16, 2025 •

edited

Loading

Instead of compiling individual division and remainder instructions inline, we now compile them as calls to trampoline functions. This will reduce the maximum number of compiled instruction per PolkaVM instruction.

We are doing this because division and remainder instructions are quite rarely used in practice, and inlining them bloats the compiled code size significantly.

Additionally, we have made division and remainder operations branchless.

For easy review, this is the final code for unsigned division operation:

push rax
push rdx
push rbx
mov rax, [rsp+0x20]
xor ebx, ebx
test rcx, rcx
sete bl
cmove rcx, rbx
xor edx, edx
div rcx
neg rbx
cmovne rax, rbx
mov rcx, rax
pop rbx
pop rdx
pop rax
ret

Fixes: #169


          polkavm-compiler: Uninline division and remainder instructions

Instead of compiling individual division and remainder instructions inline,
we now compile them as calls to trampoline functions. This will reduce
the maximum number of compiled instruction per PolkaVM instruction.

We are doing this because division and remainder instructions are quite
rarely used in practice, and inlining them bloats the compiled code size
significantly.

Additionally, we have made division and remainder operations branchless.

Signed-off-by: Aman <[email protected]>

aman4150 requested a review from koute

September 16, 2025 09:00

aman4150 mentioned this pull request

Uninline the division in the AMD64 backend #169

Open

koute reviewed

View reviewed changes

crates/polkavm/src/compiler/amd64.rs

Comment on lines +1791 to +1795

    
                      self.push(mov(RegSize::R64, TMP_REG, conv_reg(s2)));

                      self.push(push(conv_reg(s1)));

                      self.call_to_label(label);

                      self.push(pop(conv_reg(d)));

                      self.push(mov(RegSize::R64, conv_reg(d), TMP_REG));

Collaborator

koute Sep 23, 2025

Now that all of these are constant-length, convert all of these to use asm.reserve (which makes codegen more efficient as it removes bounds checks from every push call).

crates/polkavm/src/compiler/amd64.rs

Comment on lines +1782 to +1783

    
                      self.push(pop(conv_reg(d)));

                      self.push(mov(RegSize::R64, conv_reg(d), TMP_REG));

Collaborator

koute Sep 23, 2025

This looks weird. You're popping a value from the stack into d, and then you overwrite d with TMP_REG. Why? Is it only because you need to reverse the push you did before the call? In which case why don't we store the argument in vmctx? That will make the pop unnecessary, saving us one instruction per div.

Collaborator Author

aman4150 Sep 23, 2025

I did float this idea during our discussion but later I realized that this approach could be faster and smaller (total bytes). I can compare these two approaches on the test bench.

crates/polkavm/src/compiler/amd64.rs

Comment on lines +735 to +738

    
                          match reg_size {

                              RegSize::R32 => self.push(mov_imm(rdx, imm32(cast(i32::MIN).to_unsigned()))),

                              RegSize::R64 => self.push(mov_imm64(rdx, cast(i64::MIN).to_unsigned())),

                          }

Collaborator

koute Sep 23, 2025

Use mov_imm + imm64 instead of mov_imm64.

crates/polkavm/src/compiler/amd64.rs

Comment on lines +744 to +750

    
                          match reg_size {

                              RegSize::R32 => self.push(cmp((TMP_REG, imm32(cast(-1_i32).to_unsigned())))),

                              RegSize::R64 => self.push(cmp((TMP_REG, imm64(-1_i32)))),

                          }

                          self.push(xor((RegSize::R32, r12, r12)));

                          self.push(setcc(Condition::Equal, r12));

Collaborator

koute Sep 23, 2025

I'm confused. You're using cmp to compare TMP_REG with -1, but then you're throwing away the result of this comparison by overwriting the flag with xor?

Collaborator Author

aman4150 Sep 23, 2025

This is definitely incorrect. Not sure how it passed all the tests - I'll fix it and add a test for the same.

crates/polkavm/src/compiler/amd64.rs

    
                  }

                  pub(crate) fn emit_divrem_trampoline(&mut self) {

                      log::trace!("Emitting trampoline: divrem");

Collaborator

koute Sep 23, 2025

Make each trampoline have its own print (so that it's easy to see in the logs the exact assembly emitted for each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet