Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag clearing vs trapping for store capability violations: resolve TODO? #476

Open
rmn30 opened this issue Dec 11, 2024 · 4 comments
Open

Comments

@rmn30
Copy link

rmn30 commented Dec 11, 2024

https://riscv.github.io/riscv-cheri/#_capability_load_and_store_instructions currently contains a TODO saying that storing a tagged capability via a capability that does not have the C permission may cause a trap in future rather than clearing the tag of the stored capability as it does at the moment.

This is a point of divergence for CHERIoT which currently traps in this case (but not in the case of a store-local / capability level violation). Trapping may be slightly preferable for software as it detects the error sooner, but is more difficult for hardware as it introduces a data dependent trap.

@davidchisnall
Copy link

I would not want the standard to mandate trapping in the core specification because (as we learned with Morello), the data-dependent trap in the store can be painful on superscalar implementations. At the same time, it's fairly easy to implement on in-order pipelines.

From a software perspective, the trap is useful. We can avoid traps in memcpy and other forms of type-oblivious copy by propagating the MC permission from the destination to the source (if the destination lacks store-capability, remove load-capability from the source). This lets you avoid taking traps when copying untrusted data, but otherwise storing a capability via a !MC capability is silent data corruption in the tag-clearing case and so catching it early is beneficial.

Prior to ratifying the standard, it would be nice to ensure that both forms are supported. On big systems, the trapping mode may be useful as a per-process debugging feature for things willing to take the performance hit (and as a store barrier for concurrent garbage collection), but the tag-clearing option is a better default. A *NIX profile should at least provide a tag-clearing mode, and optionally a trapping mode. An embedded profile should be free to use only the trapping mode.

@andresag01
Copy link
Collaborator

I think this TODO was added from the initial version of the spec. I agree, the tag-clearing option should be the default although this is best controlled through the profiles.

@davidchisnall: When you mentioned "optional" tag-clearing / trapping mode, did you mean behavior controlled through (e.g.) a CSR such that it can be changed at run-time or fixed by the hardware designers? I would have thought that one would generally like either tag-clearing or trapping, but not both options in a system.

@davidchisnall
Copy link

As a debugging feature, it would be useful to have a CSR that enables a trap on a filtered set of tag-clearing operations. Some of these can be quite fast. Tag clear on arithmetic can be made into a trap fairly easily on a pipeline designed for it but is painful on one that assumes arithmetic doesn’t trap. Tag clearing on store overwriting tagged data turns every store into a read-modify-write, so would have a huge performance impact, but be very useful for finding data corruption.

I’d like a future spec to be able to define these as trapping either unconditionally (and M-mode code can then emulate tag clearing) or conditionally based on a CSR.

I think we now have no trapping behaviour in CHERIoT (@nwf / @rmn30, did I forget any) and it’s a simpler programming model for mutual distrust, but I do not want to guarantee this. As I said, you can avoid these traps on type-oblivious copying by removing capability-load permission from the source of a copy, and getting an early failure when you think you are copying a capability but aren’t might be a nicer model. I think, at this point, if we did that then we’d want to gate it on a CSR and make it a per-compartment property.

@rmn30
Copy link
Author

rmn30 commented Jan 30, 2025

At the moment CHERIoT store cap still has a data dependent trap: https://github.com/CHERIoT-Platform/cheriot-sail/blob/81bf3d2261780627193f44beaf90fa2c76bdae4e/src/cheri_insts.sail#L965

Edit:
There are other traps but they are easier on the pipeline e.g. traps based on the authorising / address capability of cjalr, loads and stores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants