Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used bit packing for evaluating ARM condition codes instead of a switch-case #2

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wheremyfoodat
Copy link

This PR gains a couple FPS depending on the scene in emerald in my VM, tell me if you get any benefit from it

The truthfulness of an ARM condition depends on 2 factors:

  • The condition code (4 bits)
  • The upper 4 CPSR bits
    This means that you can use a 256-entry truth table that uses the upper 8 bits as a hash, instead of using a switch-case which would probably compile to an array lookup + indirect jump.

LUTs aren't generally the best thing for the cache and stuff, so they shouldn't be abused toooo much. So, here's a neat bit packing trick which originates from MelonDS's ARM interpreter, which uses a packed 32 (16*2) byte LUT of masks depending on the condition code, instead of a switch-case, to verify if a condition is true. The 16 masks in the LUT are magic numbers which get masked by (1 << CPSR_FLAGS). The masks are specially-made so that masks [conditionCode] & (1 << CPSR_FLAGS) will always return a non-zero value if the condition is met, and 0 if not. This way, you can

  • Minimize the dcache overhead of a 256-byte truth table by tightly packing it (32 bytes are fewer than 256 :p)
  • Not use a switch-case

I used Pokemon Emerald to make sure it works and arm.gba wihch still passes.
I tried ARMWrestler too but I couldn't find the start button. It boots though.
Tell me what you think when you can

@mattrberry
Copy link
Owner

Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?

@wheremyfoodat
Copy link
Author

Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?

Yeah though nothing too groundbreaking.
Oh well :(

@ITotalJustice
Copy link

Somewhat related to this issue, but I think an easy / free optimisation to implement is to check if the cond is AL (0xE), if so, continue, otherwise, use the LUT (or switch). In the vast majority or cases, the cond is going to be AL, so the switch / LUT won't be hit.

If crystal supports marking stuff likely/unlikely, you can label that if(cond==AL) as likely.

@mattrberry
Copy link
Owner

@ITotalJustice Thanks for the idea! Tested in 8d9c789, although I didn't see any noticeable improvement in the few games I tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants