-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
double-width load/store for performance #108
Comments
I think we’ve always been leery of the opcode footprint :-). In similar discussions in the past, though, the thinking has been that if you want to strip tags when implementing something memcpy()-like, for example, you’d remove the capability load permission from the base capability, so that you use the same instruction but get the untagged value. This is pretty reasonable for a memcpy()-like thing, where the cost of the register-to-register move will be negligible, and you don’t have to worry about squashing the original capability, but for something more granular that might not be desirable. |
yes - stripping tags on the loads certainly works - but what about storing the data back? I'm not keen on checking the tag on a store, and setting the PTE CD bit only if a tag is actually set - as this is a data dependant exception. This would work better if we go for this suggestion #73 where store cap without store cap permission just stores a zero tag. Then the PTE CD bit doesn't need to be updated in a data dependant way - it just doesn't get updated if the permission is missing. So the proposal would be: CW CD Behavior I think that gives an elegant solution without the extra encoding space. Additionally we can say that if the load_cap/store_cap permission isn't set then we can allow an implementation to relax the alignment constraints on load/store cap. So really the permissions switch them from load/store-double-width to load/store-cap. Then we meet all the objectives of my original request. What do you think? |
A long long time ago I proposed adding a It remains an open question how much dirtying on any tag-capable store would hurt, for example, revocation's cap-dirty tracking. [0] While making stores through !permit-store-cap not tag-capable (and so, zeroing tags and not dirtying pages) could further inform or side-step the dependence, I suspect that in practice nearly every Also, I suspect there's mileage to be gotten by having the dependence on the value conditional on the TLB result; that is, if the translation comes back as being stores-permissive, there's no need to wait for the data before deciding that the instruction cannot raise a fault here. A plausible middle ground would be to treat tag-capable stores to ...
[0] There are some other uses of CW/CD in the system, but not much that I think is essential; the primary consumer is revocation. And for revocation we'd really rather do something (anything!) else than have this be associated with virtual pages, because aliases are just a nightmare. But that's a big ask. |
This immediately breaks memcpy'ing to an mmap'ed file |
That assumes that we continue to give out permit-cap-store caps to mmap()ed files, doesn't it? If we're revising semantics here, that could perhaps be changed? |
There are many ways you can, and need to, end up with that. Partial over-mapping is one such example. Even if you restrict the file mapping to have PROT_*_CAP-less PROT_MAX, the bigger mapping may have PROT_WRITE_CAP in PROT_MAX, and thus Permit_Store_Cap in the capability permissions. Then you would fault if using the strictly-more-permissive capability as your authority, but not if you used the less-permissive one. That's extremely odd and surprising. |
Given that CHERI doubles the width of the register file and that the upper half of the registers can be easily accessed using CGetHigh/CSetHigh has anyone considered having XLEN*2 width load/stores which access integer data but fill the whole register (clearing the tag)?
XLEN2 load/stores would be different to LQ/SQ which would notionally access a register-pair. Instead they access [XLEN2-1:0] in the capability register file.
They can be used to accelerate memcpy without the complication of looking like they might want to store tags, requiring the SW PTE permission, meaning that the OS may think that capabilities are stored there. They could execute misaligned as an extra advantage.
This seems like a neat trick to get more performance out of the upper half of the register file.
The text was updated successfully, but these errors were encountered: