-
Notifications
You must be signed in to change notification settings - Fork 74
polkavm-compiler: Uninline division and remainder instructions #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Instead of compiling individual division and remainder instructions inline, we now compile them as calls to trampoline functions. This will reduce the maximum number of compiled instruction per PolkaVM instruction. We are doing this because division and remainder instructions are quite rarely used in practice, and inlining them bloats the compiled code size significantly. Additionally, we have made division and remainder operations branchless. Signed-off-by: Aman <[email protected]>
self.push(mov(RegSize::R64, TMP_REG, conv_reg(s2))); | ||
self.push(push(conv_reg(s1))); | ||
self.call_to_label(label); | ||
self.push(pop(conv_reg(d))); | ||
self.push(mov(RegSize::R64, conv_reg(d), TMP_REG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that all of these are constant-length, convert all of these to use asm.reserve
(which makes codegen more efficient as it removes bounds checks from every push
call).
self.push(pop(conv_reg(d))); | ||
self.push(mov(RegSize::R64, conv_reg(d), TMP_REG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird. You're popping a value from the stack into d
, and then you overwrite d
with TMP_REG
. Why? Is it only because you need to reverse the push
you did before the call
? In which case why don't we store the argument in vmctx
? That will make the pop
unnecessary, saving us one instruction per div.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did float this idea during our discussion but later I realized that this approach could be faster and smaller (total bytes). I can compare these two approaches on the test bench.
match reg_size { | ||
RegSize::R32 => self.push(mov_imm(rdx, imm32(cast(i32::MIN).to_unsigned()))), | ||
RegSize::R64 => self.push(mov_imm64(rdx, cast(i64::MIN).to_unsigned())), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mov_imm
+ imm64
instead of mov_imm64
.
match reg_size { | ||
RegSize::R32 => self.push(cmp((TMP_REG, imm32(cast(-1_i32).to_unsigned())))), | ||
RegSize::R64 => self.push(cmp((TMP_REG, imm64(-1_i32)))), | ||
} | ||
|
||
self.push(xor((RegSize::R32, r12, r12))); | ||
self.push(setcc(Condition::Equal, r12)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. You're using cmp
to compare TMP_REG
with -1
, but then you're throwing away the result of this comparison by overwriting the flag with xor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely incorrect. Not sure how it passed all the tests - I'll fix it and add a test for the same.
} | ||
|
||
pub(crate) fn emit_divrem_trampoline(&mut self) { | ||
log::trace!("Emitting trampoline: divrem"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make each trampoline have its own print (so that it's easy to see in the logs the exact assembly emitted for each)
Instead of compiling individual division and remainder instructions inline, we now compile them as calls to trampoline functions. This will reduce the maximum number of compiled instruction per PolkaVM instruction.
We are doing this because division and remainder instructions are quite rarely used in practice, and inlining them bloats the compiled code size significantly.
Additionally, we have made division and remainder operations branchless.
For easy review, this is the final code for unsigned division operation:
Fixes: #169