Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge first two pull requests of 6.1 development #263

Closed
wants to merge 50 commits into from

Conversation

arichardson
Copy link
Member

These were quite awkward to merge so uploading this as a test PR for CI.

CC: @qwattash

pm215 and others added 30 commits April 30, 2021 11:15
Signed-off-by: Peter Maydell <[email protected]>
The driver can query some bits in SMMUv3 IDR5 to learn which
translation granules are supported. Arm recommends that SMMUv3
implementations support at least 4K and 64K granules. But in
the vSMMUv3, there seems to be no reason not to support 16K
translation granule. In addition, if 16K is not supported,
vSVA will failed to be enabled in the future for 16K guest
kernel. So it'd better to support it.

Signed-off-by: Kunkun Jiang <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Tested-by: Eric Auger <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
The Arm ARM specifies that for Thumb encodings of the various plain
store insns, if the Rn field is 1111 then we must UNDEF.  This is
different from the Arm encodings, where this case is either
UNPREDICTABLE or has well-defined behaviour.  The exclusive stores,
store-release and STRD do not have this UNDEF case for any encoding.

Enforce the UNDEF for this case in the Thumb plain store insns.

Fixes: https://bugs.launchpad.net/qemu/+bug/1922887
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: [email protected]
We were incorrectly assuming that only the first byte of an MTE access
is checked against the tags.  But per the ARM, unaligned accesses are
pre-decomposed into single-byte accesses.  So by the time we reach the
actual MTE check in the ARM pseudocode, all accesses are aligned.

Therefore, the first failure is always either the first byte of the
access, or the first byte of the granule.

In addition, some of the arithmetic is off for last-first -> count.
This does not become directly visible until a later patch that passes
single bytes into this function, so ptr == ptr_last.

Buglink: https://bugs.launchpad.net/bugs/1921948
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
[PMM: tweaked a comment]
Signed-off-by: Peter Maydell <[email protected]>
Split out a helper function from mte_checkN to perform
all of the checking and address manpulation.  So far,
just use this in mte_checkN itself.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
We were incorrectly assuming that only the first byte of an MTE access
is checked against the tags.  But per the ARM, unaligned accesses are
pre-decomposed into single-byte accesses.  So by the time we reach the
actual MTE check in the ARM pseudocode, all accesses are aligned.

We cannot tell a priori whether or not a given scalar access is aligned,
therefore we must at least check.  Use mte_probe_int, which is already
set up for checking multiple granules.

Buglink: https://bugs.launchpad.net/bugs/1921948
Tested-by: Alex Bennée <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Buglink: https://bugs.launchpad.net/bugs/1921948
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
After recent changes, mte_checkN does not use ESIZE,
and mte_check1 never used TSIZE.  We can combine the
two into a single field: SIZEM1.

Choose to pass size - 1 because size == 0 is never used,
our immediate need in mte_probe_int is for the address
of the last byte (ptr + size - 1), and since almost all
operations are powers of 2, this makes the immediate
constant one bit smaller.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
The mte_check1 and mte_checkN functions are now identical.
Drop mte_check1 and rename mte_checkN to mte_check.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
For consistency with the mte_check1 + mte_checkN merge
to mte_check, rename the probe function as well.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Now that mte_check1 and mte_checkN have been merged, we can
merge sve_cont_ldst_mte_check1 and sve_cont_ldst_mte_checkN.

Which means that we can eliminate the function pointer into
sve_ldN_r and sve_stN_r, calling sve_cont_ldst_mte_check directly.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
The log2_esize parameter is not used except trivially.
Drop the parameter and the deferral to gen_mte_check1.

This fixes a bug in that the parameters as documented
in the header file were the reverse from those in the
implementation.  Which meant that translate-sve.c was
passing the parameters in the wrong order.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
The encoding of size = 2 and size = 3 had the incorrect decode
for align, overlapping the stride field.  This error was hidden
by what should have been unnecessary masking in translate.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
We're about to rearrange the macro expansion surrounding tbflags,
and this field name will be expanded using the bit definition of
the same name, resulting in a token pasting error.

So SCTLR_B -> SCTLR__B in the 3 uses, and document it.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
We're about to rearrange the macro expansion surrounding tbflags,
and this field name will be expanded using the bit definition of
the same name, resulting in a token pasting error.

So PSTATE_SS -> PSTATE__SS in the uses, and document it.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
We're about to split tbflags into two parts.  These macros
will ensure that the correct part is used with the correct
set of bits.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
In preparation for splitting tb->flags across multiple
fields, introduce a structure to hold the value(s).
So far this only migrates the one uint32_t and fixes
all of the places that require adjustment to match.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Now that we have all of the proper macros defined, expanding
the CPUARMTBFlags structure and populating the two TB fields
is relatively simple.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Now that these bits have been moved out of tb->flags,
where TBFLAG_ANY was filling from the top, move AM32
to fill from the top, and A32 and M32 to fill from the
bottom.  This means fewer changes when adding new bits.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Now that other bits have been moved out of tb->flags,
there's no point in filling from the top.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Use this to signal when memory access alignment is required.
This value comes from the CCR register for M-profile, and
from the SCTLR register for A-profile.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Create a finalize_memop function that computes alignment and
endianness and returns the final MemOp for the operation.

Split out gen_aa32_{ld,st}_internal_i32 which bypasses any special
handling of endianness or alignment.  Adjust gen_aa32_{ld,st}_i32
so that s->be_data is not added by the callers.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
This is the only caller.  Adjust some commentary to talk
about SCTLR_B instead of the vanishing function.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Just because operating on a TCGv_i64 temporary does not
mean that we're performing a 64-bit operation.  Restrict
the frobbing to actual 64-bit operations.

This bug is not currently visible because all current
users of these two functions always pass MO_64.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Adjust the interface to match what has been done to the
TCGv_i32 load/store functions.

This is less obvious, because at present the only user of
these functions, trans_VLDST_multiple, also wants to manipulate
the endianness to speed up loading multiple bytes.  Thus we
retain an "internal" interface which is identical to the
current gen_aa32_{ld,st}_i64 interface.

The "new" interface will gain users as we remove the legacy
interfaces, gen_aa32_ld64 and gen_aa32_st64.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1905356
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
rth7680 and others added 19 commits April 30, 2021 11:16
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
In the case of gpr load, merge the size and is_signed arguments;
otherwise, simply convert size to memop.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
For 128-bit load/store, use 16-byte alignment.  This
requires that we perform the two operations in the
correct order so that we generate the alignment fault
before modifying memory.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Add 6.1 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <[email protected]>
Acked-by: Greg Kurz <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Currently the gpex PCI controller implements no special behaviour for
guest accesses to areas of the PIO and MMIO where it has not mapped
any PCI devices, which means that for Arm you end up with a CPU
exception due to a data abort.

Most host OSes expect "like an x86 PC" behaviour, where bad accesses
like this return -1 for reads and ignore writes.  In the interests of
not being surprising, make host CPU accesses to these windows behave
as -1/discard where there's no mapped PCI device.

The old behaviour generally didn't cause any problems, because
almost always the guest OS will map the PCI devices and then only
access where it has mapped them. One corner case where you will see
this kind of access is if Linux attempts to probe legacy ISA
devices via a PIO window access. So far the only case where we've
seen this has been via the syzkaller fuzzer.

Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Message-id: [email protected]
Fixes: https://bugs.launchpad.net/qemu/+bug/1918917
Signed-off-by: Peter Maydell <[email protected]>
…0210430' into staging

target-arm queue:
 * hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows
 * hw: add compat machines for 6.1
 * Fault misaligned accesses where the architecture requires it
 * Fix some corner cases of MTE faults (notably with misaligned accesses)
 * Make Thumb store insns UNDEF for Rn==1111
 * hw/arm/smmuv3: Support 16K translation granule

# gpg: Signature made Fri 30 Apr 2021 11:33:45 BST
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20210430: (43 commits)
  hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows
  hw: add compat machines for 6.1
  target/arm: Enforce alignment for sve LD1R
  target/arm: Enforce alignment for aa64 vector LDn/STn (single)
  target/arm: Enforce alignment for aa64 vector LDn/STn (multiple)
  target/arm: Use MemOp for size + endian in aa64 vector ld/st
  target/arm: Enforce alignment for aa64 load-acq/store-rel
  target/arm: Use finalize_memop for aa64 fpr load/store
  target/arm: Use finalize_memop for aa64 gpr load/store
  target/arm: Enforce alignment for VLDn/VSTn (single)
  target/arm: Enforce alignment for VLDn/VSTn (multiple)
  target/arm: Enforce alignment for VLDn (all lanes)
  target/arm: Enforce alignment for VLDR/VSTR
  target/arm: Enforce alignment for VLDM/VSTM
  target/arm: Enforce alignment for SRS
  target/arm: Enforce alignment for RFE
  target/arm: Enforce alignment for LDM/STM
  target/arm: Enforce alignment for LDA/LDAH/STL/STLH
  target/arm: Enforce word alignment for LDRD/STRD
  target/arm: Adjust gen_aa32_{ld, st}_i64 for align+endianness
  ...

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
…0210430' into staging

target-arm queue:
 * hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows
 * hw: add compat machines for 6.1
 * Fault misaligned accesses where the architecture requires it
 * Fix some corner cases of MTE faults (notably with misaligned accesses)
 * Make Thumb store insns UNDEF for Rn==1111
 * hw/arm/smmuv3: Support 16K translation granule

* remotes/pmaydell/tags/pull-target-arm-20210430: (43 commits)
  hw/pci-host/gpex: Don't fault for unmapped parts of MMIO and PIO windows
  hw: add compat machines for 6.1
  target/arm: Enforce alignment for sve LD1R
  target/arm: Enforce alignment for aa64 vector LDn/STn (single)
  target/arm: Enforce alignment for aa64 vector LDn/STn (multiple)
  target/arm: Use MemOp for size + endian in aa64 vector ld/st
  target/arm: Enforce alignment for aa64 load-acq/store-rel
  target/arm: Use finalize_memop for aa64 fpr load/store
  target/arm: Use finalize_memop for aa64 gpr load/store
  target/arm: Enforce alignment for VLDn/VSTn (single)
  target/arm: Enforce alignment for VLDn/VSTn (multiple)
  target/arm: Enforce alignment for VLDn (all lanes)
  target/arm: Enforce alignment for VLDR/VSTR
  target/arm: Enforce alignment for VLDM/VSTM
  target/arm: Enforce alignment for SRS
  target/arm: Enforce alignment for RFE
  target/arm: Enforce alignment for LDM/STM
  target/arm: Enforce alignment for LDA/LDAH/STL/STLH
  target/arm: Enforce word alignment for LDRD/STRD
  target/arm: Adjust gen_aa32_{ld, st}_i64 for align+endianness
  ...

Signed-off-by: Peter Maydell <[email protected]>
Commit a378206 "target/arm: Move mode specific TB flags to tb->cs_base"
now uses this field for additional TB flags as it was unused in the Arm
target before. We were using this field downstream to store the base of
PCC (which IMO is appropriate considering how it is used in other targets).

While we could reuse cs_base to store the CHERI flags now, this commit uses
a simpler approach of adding another field for pcc_base to avoid future
merge conflicts. This can be cleaned up once we have caught up with latest
QEMU.
Commit 4479ec3 added support for the
alignment checks to upstream, so we can delete this local diff.
@arichardson arichardson requested a review from qwattash November 22, 2024 06:27
This results in QEMU being extremely slow since the TCG tb hash lookups now
compare uninitialized values from the stack and are likely to fail.

This brings the CheriBSD kernel init benchmark
`~/cheri/output/sdk/bin/qemu-system-riscv64cheri.slow -M virt -m 2048 -nographic -kernel ~/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT-NODEBUG.full -device virtio-net-device,netdev=net0  -netdev user,id=net0,ipv6=off,hostfwd=tcp::12345-:22 -append init_path=/sbin/startup-benchmark.sh`
back down to 2.5 seconds instead of around 60 seconds.
@qwattash
Copy link
Contributor

qwattash commented Jan 7, 2025

This is now part of #264

@qwattash qwattash closed this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants