-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge first two pull requests of 6.1 development #263
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Peter Maydell <[email protected]>
The driver can query some bits in SMMUv3 IDR5 to learn which translation granules are supported. Arm recommends that SMMUv3 implementations support at least 4K and 64K granules. But in the vSMMUv3, there seems to be no reason not to support 16K translation granule. In addition, if 16K is not supported, vSVA will failed to be enabled in the future for 16K guest kernel. So it'd better to support it. Signed-off-by: Kunkun Jiang <[email protected]> Reviewed-by: Eric Auger <[email protected]> Tested-by: Eric Auger <[email protected]> Signed-off-by: Peter Maydell <[email protected]>
The Arm ARM specifies that for Thumb encodings of the various plain store insns, if the Rn field is 1111 then we must UNDEF. This is different from the Arm encodings, where this case is either UNPREDICTABLE or has well-defined behaviour. The exclusive stores, store-release and STRD do not have this UNDEF case for any encoding. Enforce the UNDEF for this case in the Thumb plain store insns. Fixes: https://bugs.launchpad.net/qemu/+bug/1922887 Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Message-id: [email protected]
We were incorrectly assuming that only the first byte of an MTE access is checked against the tags. But per the ARM, unaligned accesses are pre-decomposed into single-byte accesses. So by the time we reach the actual MTE check in the ARM pseudocode, all accesses are aligned. Therefore, the first failure is always either the first byte of the access, or the first byte of the granule. In addition, some of the arithmetic is off for last-first -> count. This does not become directly visible until a later patch that passes single bytes into this function, so ptr == ptr_last. Buglink: https://bugs.launchpad.net/bugs/1921948 Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Reviewed-by: Peter Maydell <[email protected]> [PMM: tweaked a comment] Signed-off-by: Peter Maydell <[email protected]>
Split out a helper function from mte_checkN to perform all of the checking and address manpulation. So far, just use this in mte_checkN itself. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Peter Maydell <[email protected]>
We were incorrectly assuming that only the first byte of an MTE access is checked against the tags. But per the ARM, unaligned accesses are pre-decomposed into single-byte accesses. So by the time we reach the actual MTE check in the ARM pseudocode, all accesses are aligned. We cannot tell a priori whether or not a given scalar access is aligned, therefore we must at least check. Use mte_probe_int, which is already set up for checking multiple granules. Buglink: https://bugs.launchpad.net/bugs/1921948 Tested-by: Alex Bennée <[email protected]> Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Buglink: https://bugs.launchpad.net/bugs/1921948 Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
After recent changes, mte_checkN does not use ESIZE, and mte_check1 never used TSIZE. We can combine the two into a single field: SIZEM1. Choose to pass size - 1 because size == 0 is never used, our immediate need in mte_probe_int is for the address of the last byte (ptr + size - 1), and since almost all operations are powers of 2, this makes the immediate constant one bit smaller. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
The mte_check1 and mte_checkN functions are now identical. Drop mte_check1 and rename mte_checkN to mte_check. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
For consistency with the mte_check1 + mte_checkN merge to mte_check, rename the probe function as well. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Now that mte_check1 and mte_checkN have been merged, we can merge sve_cont_ldst_mte_check1 and sve_cont_ldst_mte_checkN. Which means that we can eliminate the function pointer into sve_ldN_r and sve_stN_r, calling sve_cont_ldst_mte_check directly. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
The log2_esize parameter is not used except trivially. Drop the parameter and the deferral to gen_mte_check1. This fixes a bug in that the parameters as documented in the header file were the reverse from those in the implementation. Which meant that translate-sve.c was passing the parameters in the wrong order. Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
The encoding of size = 2 and size = 3 had the incorrect decode for align, overlapping the stride field. This error was hidden by what should have been unnecessary masking in translate. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
We're about to rearrange the macro expansion surrounding tbflags, and this field name will be expanded using the bit definition of the same name, resulting in a token pasting error. So SCTLR_B -> SCTLR__B in the 3 uses, and document it. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
We're about to rearrange the macro expansion surrounding tbflags, and this field name will be expanded using the bit definition of the same name, resulting in a token pasting error. So PSTATE_SS -> PSTATE__SS in the uses, and document it. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
We're about to split tbflags into two parts. These macros will ensure that the correct part is used with the correct set of bits. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
In preparation for splitting tb->flags across multiple fields, introduce a structure to hold the value(s). So far this only migrates the one uint32_t and fixes all of the places that require adjustment to match. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Now that we have all of the proper macros defined, expanding the CPUARMTBFlags structure and populating the two TB fields is relatively simple. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Now that these bits have been moved out of tb->flags, where TBFLAG_ANY was filling from the top, move AM32 to fill from the top, and A32 and M32 to fill from the bottom. This means fewer changes when adding new bits. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Now that other bits have been moved out of tb->flags, there's no point in filling from the top. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Use this to signal when memory access alignment is required. This value comes from the CCR register for M-profile, and from the SCTLR register for A-profile. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Create a finalize_memop function that computes alignment and endianness and returns the final MemOp for the operation. Split out gen_aa32_{ld,st}_internal_i32 which bypasses any special handling of endianness or alignment. Adjust gen_aa32_{ld,st}_i32 so that s->be_data is not added by the callers. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
This is the only caller. Adjust some commentary to talk about SCTLR_B instead of the vanishing function. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Just because operating on a TCGv_i64 temporary does not mean that we're performing a 64-bit operation. Restrict the frobbing to actual 64-bit operations. This bug is not currently visible because all current users of these two functions always pass MO_64. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Adjust the interface to match what has been done to the TCGv_i32 load/store functions. This is less obvious, because at present the only user of these functions, trans_VLDST_multiple, also wants to manipulate the endianness to speed up loading multiple bytes. Thus we retain an "internal" interface which is identical to the current gen_aa32_{ld,st}_i64 interface. The "new" interface will gain users as we remove the legacy interfaces, gen_aa32_ld64 and gen_aa32_st64. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1905356 Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
In the case of gpr load, merge the size and is_signed arguments; otherwise, simply convert size to memop. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
For 128-bit load/store, use 16-byte alignment. This requires that we perform the two operations in the correct order so that we generate the alignment fault before modifying memory. Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Add 6.1 machine types for arm/i440fx/q35/s390x/spapr. Signed-off-by: Cornelia Huck <[email protected]> Acked-by: Greg Kurz <[email protected]> Message-id: [email protected] Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Peter Maydell <[email protected]>
Currently the gpex PCI controller implements no special behaviour for guest accesses to areas of the PIO and MMIO where it has not mapped any PCI devices, which means that for Arm you end up with a CPU exception due to a data abort. Most host OSes expect "like an x86 PC" behaviour, where bad accesses like this return -1 for reads and ignore writes. In the interests of not being surprising, make host CPU accesses to these windows behave as -1/discard where there's no mapped PCI device. The old behaviour generally didn't cause any problems, because almost always the guest OS will map the PCI devices and then only access where it has mapped them. One corner case where you will see this kind of access is if Linux attempts to probe legacy ISA devices via a PIO window access. So far the only case where we've seen this has been via the syzkaller fuzzer. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Peter Maydell <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Message-id: [email protected] Fixes: https://bugs.launchpad.net/qemu/+bug/1918917 Signed-off-by: Peter Maydell <[email protected]>
…0210430' into staging target-arm queue: * hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows * hw: add compat machines for 6.1 * Fault misaligned accesses where the architecture requires it * Fix some corner cases of MTE faults (notably with misaligned accesses) * Make Thumb store insns UNDEF for Rn==1111 * hw/arm/smmuv3: Support 16K translation granule # gpg: Signature made Fri 30 Apr 2021 11:33:45 BST # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "[email protected]" # gpg: Good signature from "Peter Maydell <[email protected]>" [ultimate] # gpg: aka "Peter Maydell <[email protected]>" [ultimate] # gpg: aka "Peter Maydell <[email protected]>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20210430: (43 commits) hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows hw: add compat machines for 6.1 target/arm: Enforce alignment for sve LD1R target/arm: Enforce alignment for aa64 vector LDn/STn (single) target/arm: Enforce alignment for aa64 vector LDn/STn (multiple) target/arm: Use MemOp for size + endian in aa64 vector ld/st target/arm: Enforce alignment for aa64 load-acq/store-rel target/arm: Use finalize_memop for aa64 fpr load/store target/arm: Use finalize_memop for aa64 gpr load/store target/arm: Enforce alignment for VLDn/VSTn (single) target/arm: Enforce alignment for VLDn/VSTn (multiple) target/arm: Enforce alignment for VLDn (all lanes) target/arm: Enforce alignment for VLDR/VSTR target/arm: Enforce alignment for VLDM/VSTM target/arm: Enforce alignment for SRS target/arm: Enforce alignment for RFE target/arm: Enforce alignment for LDM/STM target/arm: Enforce alignment for LDA/LDAH/STL/STLH target/arm: Enforce word alignment for LDRD/STRD target/arm: Adjust gen_aa32_{ld, st}_i64 for align+endianness ... Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
…0210430' into staging target-arm queue: * hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows * hw: add compat machines for 6.1 * Fault misaligned accesses where the architecture requires it * Fix some corner cases of MTE faults (notably with misaligned accesses) * Make Thumb store insns UNDEF for Rn==1111 * hw/arm/smmuv3: Support 16K translation granule * remotes/pmaydell/tags/pull-target-arm-20210430: (43 commits) hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows hw: add compat machines for 6.1 target/arm: Enforce alignment for sve LD1R target/arm: Enforce alignment for aa64 vector LDn/STn (single) target/arm: Enforce alignment for aa64 vector LDn/STn (multiple) target/arm: Use MemOp for size + endian in aa64 vector ld/st target/arm: Enforce alignment for aa64 load-acq/store-rel target/arm: Use finalize_memop for aa64 fpr load/store target/arm: Use finalize_memop for aa64 gpr load/store target/arm: Enforce alignment for VLDn/VSTn (single) target/arm: Enforce alignment for VLDn/VSTn (multiple) target/arm: Enforce alignment for VLDn (all lanes) target/arm: Enforce alignment for VLDR/VSTR target/arm: Enforce alignment for VLDM/VSTM target/arm: Enforce alignment for SRS target/arm: Enforce alignment for RFE target/arm: Enforce alignment for LDM/STM target/arm: Enforce alignment for LDA/LDAH/STL/STLH target/arm: Enforce word alignment for LDRD/STRD target/arm: Adjust gen_aa32_{ld, st}_i64 for align+endianness ... Signed-off-by: Peter Maydell <[email protected]>
Commit a378206 "target/arm: Move mode specific TB flags to tb->cs_base" now uses this field for additional TB flags as it was unused in the Arm target before. We were using this field downstream to store the base of PCC (which IMO is appropriate considering how it is used in other targets). While we could reuse cs_base to store the CHERI flags now, this commit uses a simpler approach of adding another field for pcc_base to avoid future merge conflicts. This can be cleaned up once we have caught up with latest QEMU.
Commit 4479ec3 added support for the alignment checks to upstream, so we can delete this local diff.
This results in QEMU being extremely slow since the TCG tb hash lookups now compare uninitialized values from the stack and are likely to fail. This brings the CheriBSD kernel init benchmark `~/cheri/output/sdk/bin/qemu-system-riscv64cheri.slow -M virt -m 2048 -nographic -kernel ~/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT-NODEBUG.full -device virtio-net-device,netdev=net0 -netdev user,id=net0,ipv6=off,hostfwd=tcp::12345-:22 -append init_path=/sbin/startup-benchmark.sh` back down to 2.5 seconds instead of around 60 seconds.
This is now part of #264 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These were quite awkward to merge so uploading this as a test PR for CI.
CC: @qwattash