From d6f218b0c6a355930952917e04dc18c4ad60387f Mon Sep 17 00:00:00 2001 From: Claudio Bantaloukas Date: Mon, 25 Nov 2024 12:06:47 +0000 Subject: [PATCH 01/10] Update url to target attribute documentation (#366) The link was pointing to an unmaintained documentation page that happened to be indexed by search engines in preference to the actual documentation. --- name: Update url to target attribute documentation about: Technical issues, document format problems, bugs in scripts or feature proposal. --- **Thank you for submitting a pull request!** If this PR is about a bugfix: Please use the bugfix label and make sure to go through the checklist below. If this PR is about a proposal: We are looking forward to evaluate your proposal, and if possible to make it part of the Arm C Language Extension (ACLE) specifications. We would like to encourage you reading through the [contribution guidelines](https://github.com/ARM-software/acle/blob/main/CONTRIBUTING.md), in particular the section on [submitting a proposal](https://github.com/ARM-software/acle/blob/main/CONTRIBUTING.md#proposals-for-new-content). Please use the proposal label. As for any pull request, please make sure to go through the below checklist. Checklist: (mark with ``X`` those which apply) * [ ] If an issue reporting the bug exists, I have mentioned it in the PR (do not bother creating the issue if all you want to do is fixing the bug yourself). * [ ] I have added/updated the `SPDX-FileCopyrightText` lines on top of any file I have edited. Format is `SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>` (Please update existing copyright lines if applicable. You can specify year ranges with hyphen , as in `2017-2019`, and use commas to separate gaps, as in `2018-2020, 2022`). * [ ] I have updated the `Copyright` section of the sources of the specification I have edited (this will show up in the text rendered in the PDF and other output format supported). The format is the same described in the previous item. * [x] I have run the CI scripts (if applicable, as they might be tricky to set up on non-*nix machines). The sequence can be found in the [contribution guidelines](https://github.com/ARM-software/acle/blob/main/CONTRIBUTING.md#continuous-integration). Don't worry if you cannot run these scripts on your machine, your patch will be automatically checked in the Actions of the pull request. * [x] I have added an item that describes the changes I have introduced in this PR in the section **Changes for next release** of the section **Change Control**/**Document history** of the document. Create **Changes for next release** if it does not exist. Notice that changes that are not modifying the content and rendering of the specifications (both HTML and PDF) do not need to be listed. * [x] When modifying content and/or its rendering, I have checked the correctness of the result in the PDF output (please refer to the instructions on [how to build the PDFs locally](https://github.com/ARM-software/acle/blob/main/CONTRIBUTING.md#continuous-integration)). * [x] The variable `draftversion` is set to `true` in the YAML header of the sources of the specifications I have modified. * [ ] Please *DO NOT* add my GitHub profile to the list of contributors in the [README](https://github.com/ARM-software/acle/blob/main/README.md#contributors-) page of the project. --- main/acle.md | 3 ++- main/design_documents/function-multi-versioning.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/main/acle.md b/main/acle.md index ea6caee1..b038a5ee 100644 --- a/main/acle.md +++ b/main/acle.md @@ -423,6 +423,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Fixed range of operand `o0` (too small) in AArch64 system register designations. * Fixed SVE2.1 quadword gather load/scatter store intrinsics. * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. +* Fixed urls. ### References @@ -2584,7 +2585,7 @@ be found in [[BA]](#BA). This section describes ACLE features that use GNU-style attributes. The general rules for attribute syntax are described in the GCC -documentation . +documentation . Briefly, for this declaration: ``` c diff --git a/main/design_documents/function-multi-versioning.md b/main/design_documents/function-multi-versioning.md index 57283458..73bf08f5 100644 --- a/main/design_documents/function-multi-versioning.md +++ b/main/design_documents/function-multi-versioning.md @@ -25,7 +25,7 @@ derived from a function via FMV: 2. the derived function obey to the same calling convention of the original function. -Currently the `target` [attribute for aarch64](https://gcc.gnu.org/onlinedocs/gcc/extensions-to-the-c-language-family/declaring-attributes-of-functions/aarch64-function-attributes.html) +Currently the `target` [attribute for aarch64](https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html) is used for many purposes, some of which might overlap the functionality introduced by FMV. To avoid confusion, we named the attributes used by FMV with `target_version` and `target_clones`. From f6190ce920cc3e8937bc143bb4608a9f17480110 Mon Sep 17 00:00:00 2001 From: Kerry McLaughlin Date: Tue, 26 Nov 2024 09:37:51 +0000 Subject: [PATCH 02/10] Include SME attributes in the name mangling of types (#358) This change extends the name mangling of types to include the SME streaming and ZA interface. This will avoid naming conflicts which can currently arise such as in the following example: ``` void foo(void (*f)()) { f(); } void foo(void (*f)() __arm_streaming) { f(); } ``` Without this change, both functions 'foo' above will mangle to the same name, despite the function pointers being different. --- main/acle.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/main/acle.md b/main/acle.md index b038a5ee..1182f7bb 100644 --- a/main/acle.md +++ b/main/acle.md @@ -424,6 +424,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Fixed SVE2.1 quadword gather load/scatter store intrinsics. * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. * Fixed urls. +* Changed name mangling of function types to include SME attributes. ### References @@ -10094,6 +10095,65 @@ an [`__arm_streaming`](#arm_streaming) type. See [Changing streaming mode locally](#changing-streaming-mode-locally) for more information. +### C++ mangling of SME keywords + +SME keyword attributes which apply to function types must be included +in the name mangling of the type, if the mangling would normally include +the return type of the function. + +SME attributes are mangled in the same way as a template: + +``` c + template __SME_ATTRS; +``` + +with the arguments: + +``` c + __SME_ATTRS; +``` + +where: + +* normal_function_type is the function type without any SME attributes. + +* sme_state is an unsigned 64-bit integer representing the streaming and ZA + properties of the function's interface. + +The bits are defined as follows: + +| **Bits** | **Value** | **Interface Type** | +| -------- | --------- | ------------------------------ | +| 0 | 0b1 | __arm_streaming | +| 1 | 0b1 | __arm_streaming_compatible | +| 2 | 0b1 | __arm_agnostic("sme_za_state") | +| 3-5 | 0b000 | No ZA state (default) | +| | 0b001 | __arm_in("za") | +| | 0b010 | __arm_out("za") | +| | 0b011 | __arm_inout("za") | +| | 0b100 | __arm_preserves("za") | +| 6-8 | 0b000 | No ZT0 state (default) | +| | 0b001 | __arm_in("zt0") | +| | 0b010 | __arm_out("zt0") | +| | 0b011 | __arm_inout("zt0") | +| | 0b100 | __arm_preserves("zt0") | + +Bits 9-63 are defined to be zero by this revision of the ACLE and are reserved +for future type attributes. + +For example: + +``` c + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj1EE + void f(svint8_t (*fn)() __arm_streaming) { fn(); } + + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj26EE + void f(svint8_t (*fn)() __arm_streaming_compatible __arm_inout("za")) { fn(); } + + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj128EE + void f(svint8_t (*fn)() __arm_out("zt0")) { fn(); } +``` + ## SME types ### Predicate-as-counter From e9cb1e495995aa9eaadf08a8923ce2fc73fc315b Mon Sep 17 00:00:00 2001 From: SpencerAbson Date: Fri, 29 Nov 2024 16:51:38 +0000 Subject: [PATCH 03/10] Change __ARM_NEON_SVE_BRIDGE to refer to the availability of the header (#362) **Afterthought**: Another way of looking at this is that the user should not expect to be able to use intrinsics after specifying the relevant target features via anything other than the command line, it's unclear to me if this is the case. The ACLE suggests the use of the predefined `__ARM_NEON_SVE_BRIDGE` macro to gaurd the inclusion of `arm_neon_sve_bridge.h`. > defines intrinsics for moving data between Neon and SVE vector types; see [NEON-SVE Bridge](https://github.com/ARM-software/acle/blob/main/main/acle.md#neon-sve-bridge) for details. Before including the header, you should test the __ARM_NEON_SVE_BRIDGE macro. The current definition of this macro is >`__ARM_NEON_SVE_BRIDGE` is defined to 1 if [NEON-SVE Bridge](#neon-sve-bridge) intrinsics are available. This implies that the following macros are nonzero > - __ARM_NEON > - __ARM_NEON_FP > - __ARM_FEATURE_SVE The intrinsics described here are not preprocessor guarded (See [change for LLVM]( https://reviews.llvm.org/D132639)). We should expect to be able to use them in any function with the necessary features, whether they are supplied globally on the command line or via a `target` attribute. However, since we cannot make assumptions about the order in which the predefined feature macros are evaluated (see [relevant ACLE](https://github.com/ARM-software/acle/blob/main/main/acle.md#predefined-feature-macros-and-header-file)), we cannot use the `__ARM_NEON_SVE_BRIDGE` macro to guard the inclusion of `arm_neon_sve_bridge.h` **and** expect to use it's builtins in unless the required features are supplied globally on the command line. See an example of this issue (in LLVM Vs. GCC) from @georges-arm - https://godbolt.org/z/6YPvqdjTv. The proposal of this PR is to change the meaning of `__ARM_NEON_SVE_BRIDGE` to refer to the availability of the `arm_neon_sve_bridge.h` header file only, such that it can be unconditionally defined in supporting compilers and it's builtins can be safely used in the context of the example above. --- main/acle.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/main/acle.md b/main/acle.md index 1182f7bb..6589011c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -241,7 +241,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin specifications in [Cortex-M Security Extension (CMSE)](#cortex-m-security-extension-cmse). * Added specification for [NEON-SVE Bridge](#neon-sve-bridge) and - [NEON-SVE Bridge macros](#neon-sve-bridge-macros). + [NEON-SVE Bridge macros](#neon-sve-bridge-macro). * Added feature detection macro for the memcpy family of memory operations (MOPS) at [memcpy family of memory operations standarization instructions - @@ -425,6 +425,9 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. * Fixed urls. * Changed name mangling of function types to include SME attributes. +* Changed `__ARM_NEON_SVE_BRIDGE` to refer to the availability of the + [`arm_neon_sve_bridge.h`](#arm_neon_sve_bridge.h) header file, rather + than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics. ### References @@ -1928,14 +1931,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero. are available and if the associated [ACLE features] (#sme-language-extensions-and-intrinsics) are supported. -#### NEON-SVE Bridge macros +#### NEON-SVE Bridge macro -`__ARM_NEON_SVE_BRIDGE` is defined to 1 if [NEON-SVE Bridge](#neon-sve-bridge) -intrinsics are available. This implies that the following macros are nonzero: - -* `__ARM_NEON` -* `__ARM_NEON_FP` -* `__ARM_FEATURE_SVE` +`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [``](#arm_neon_sve_bridge.h) +header file is available. #### Scalable Matrix Extension (SME) @@ -2570,7 +2569,7 @@ be found in [[BA]](#BA). | [`__ARM_FP_FENV_ROUNDING`](#floating-point-model) | Rounding is configurable at runtime | 1 | | [`__ARM_NEON`](#advanced-simd-architecture-extension-neon) | Advanced SIMD (Neon) extension | 1 | | [`__ARM_NEON_FP`](#neon-floating-point) | Advanced SIMD (Neon) floating-point | 0x04 | -| [`__ARM_NEON_SVE_BRIDGE`](#neon-sve-bridge-macros) | Moving data between Neon and SVE data types | 1 | +| [`__ARM_NEON_SVE_BRIDGE`](#neon-sve-bridge-macro) | Availability of [`arm_neon_sve_brdge.h`](#arm_neon_sve_bridge.h) | 1 | | [`__ARM_PCS`](#procedure-call-standard) | Arm procedure call standard (32-bit-only) | 0x01 | | [`__ARM_PCS_AAPCS64`](#procedure-call-standard) | Arm PCS for AArch64. | 1 | | [`__ARM_PCS_VFP`](#procedure-call-standard) | Arm PCS hardware FP variant in use (32-bit-only) | 1 | From 6d6b40b6cf341b31a87fd04be724ebf04b496a1d Mon Sep 17 00:00:00 2001 From: "allcontributors[bot]" <46447321+allcontributors[bot]@users.noreply.github.com> Date: Fri, 29 Nov 2024 16:54:19 +0000 Subject: [PATCH 04/10] docs: add SpencerAbson as a contributor for content (#367) Adds @SpencerAbson as a contributor for content. This was requested by vhscampos [in this comment](https://github.com/ARM-software/acle/pull/362#issuecomment-2508148895) [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> --- .all-contributorsrc | 9 +++++++++ README.md | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/.all-contributorsrc b/.all-contributorsrc index 657fbc50..560cd3db 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -342,6 +342,15 @@ "contributions": [ "content" ] + }, + { + "login": "SpencerAbson", + "name": "SpencerAbson", + "avatar_url": "https://avatars.githubusercontent.com/u/76910239?v=4", + "profile": "https://github.com/SpencerAbson", + "contributions": [ + "content" + ] } ], "contributorsPerLine": 7, diff --git a/README.md b/README.md index 9d95d9f3..a31c1ae7 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ -[![All Contributors](https://img.shields.io/badge/all_contributors-36-orange.svg?style=flat-square)](#contributors-) +[![All Contributors](https://img.shields.io/badge/all_contributors-37-orange.svg?style=flat-square)](#contributors-) ![Continuous Integration](https://github.com/ARM-software/acle/actions/workflows/ci.yml/badge.svg) @@ -134,6 +134,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d Robert Dazi
Robert Dazi

đź–‹ + SpencerAbson
SpencerAbson

🖋 From 33a0cb30f67291862497547662f7c62e6b52e93f Mon Sep 17 00:00:00 2001 From: Alexandros Lamprineas Date: Wed, 4 Dec 2024 09:04:14 +0000 Subject: [PATCH 05/10] [FMV][AArch64] Remove feature dgh since it can be used unconditionally. (#357) The DGH instruction belongs to the hint space. It executes as NOP if the corresponding feature is not present in hardware, so there's no need for runtime dispatch. --- main/acle.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main/acle.md b/main/acle.md index 6589011c..244067ca 100644 --- a/main/acle.md +++ b/main/acle.md @@ -420,6 +420,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Unified Function Multi Versioning features aes and pmull. * Unified Function Multi Versioning features sve2-aes and sve2-pmull128. * Removed Function Multi Versioning features ebf16, memtag3, and rpres. +* Removed Function Multi Versioning feature dgh. * Fixed range of operand `o0` (too small) in AArch64 system register designations. * Fixed SVE2.1 quadword gather load/scatter store intrinsics. * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. @@ -2803,7 +2804,6 @@ The following table lists the architectures feature mapping for AArch64 | 240 | `FEAT_LRCPC2` | rcpc2 | ```ID_AA64ISAR1_EL1.LRCPC >= 0b0010``` | | 241 | `FEAT_LRCPC3` | rcpc3 | ```ID_AA64ISAR1_EL1.LRCPC >= 0b0011``` | | 250 | `FEAT_FRINTTS` | frintts | ```ID_AA64ISAR1_EL1.FRINTTS >= 0b0001``` | - | 260 | `FEAT_DGH` | dgh | ```ID_AA64ISAR1_EL1.DGH >= 0b0001``` | | 270 | `FEAT_I8MM` | i8mm | ```ID_AA64ISAR1_EL1.I8MM >= 0b0001``` | | 280 | `FEAT_BF16` | bf16 | ```ID_AA64ISAR1_EL1.BF16 >= 0b0001``` | | 310 | `FEAT_SVE` | sve | ```ID_AA64PFR0_EL1.SVE >= 0b0001``` | From 11ce13e67e58c918fb0ce5b3b1c74dc1adf97388 Mon Sep 17 00:00:00 2001 From: Alexandros Lamprineas Date: Fri, 6 Dec 2024 09:49:42 +0000 Subject: [PATCH 06/10] [FMV] Remove features which can be expressed as a combination of other features (#353) All of sve-bf16, sve-ebf16, and sve-i8mm are obsolete. This is already reflected on the second column of the FMV table (we have bf16, ebf16, and i8mm with the same Architecture name). According to https://developer.arm.com/documentation/ddi0487/latest Arm Architecture Reference Manual for A-profile architecture: D23.2.72 ID_AA64ISAR1_EL1, AArch64 Instruction Set Attribute Register 1 ID_AA64ISAR1_EL1.I8MM, bits [55:52] > When Advanced SIMD and SVE are both implemented, this field must return > the same value as ID_AA64ZFR0_EL1.I8MM ID_AA64ISAR1_EL1.BF16, bits [47:44] > When FEAT_SVE or FEAT_SME is implemented, this field must return the > same value as ID_AA64ZFR0_EL1.BF16. So one could write target_version("sve+bf16") or sme+bf16 instead. There is a proposal to explicitely document FMV feature dependences in ACLE, so that the user won't have to write long feature strings on the attributes like sve+simd+i8mm (sve+i8mm should be enough). --- main/acle.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/main/acle.md b/main/acle.md index 244067ca..232b8ba1 100644 --- a/main/acle.md +++ b/main/acle.md @@ -419,6 +419,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Unified Function Multi Versioning features memtag and memtag2. * Unified Function Multi Versioning features aes and pmull. * Unified Function Multi Versioning features sve2-aes and sve2-pmull128. +* Removed Function Multi Versioning features sve-bf16, sve-ebf16, and sve-i8mm. * Removed Function Multi Versioning features ebf16, memtag3, and rpres. * Removed Function Multi Versioning feature dgh. * Fixed range of operand `o0` (too small) in AArch64 system register designations. @@ -2807,9 +2808,6 @@ The following table lists the architectures feature mapping for AArch64 | 270 | `FEAT_I8MM` | i8mm | ```ID_AA64ISAR1_EL1.I8MM >= 0b0001``` | | 280 | `FEAT_BF16` | bf16 | ```ID_AA64ISAR1_EL1.BF16 >= 0b0001``` | | 310 | `FEAT_SVE` | sve | ```ID_AA64PFR0_EL1.SVE >= 0b0001``` | - | 320 | `FEAT_BF16` | sve-bf16 | ```ID_AA64ZFR0_EL1.BF16 >= 0b0001``` | - | 330 | `FEAT_EBF16` | sve-ebf16 | ```ID_AA64ZFR0_EL1.BF16 >= 0b0010``` | - | 340 | `FEAT_I8MM` | sve-i8mm | ```ID_AA64ZFR0_EL1.I8MM >= 0b00001``` | | 350 | `FEAT_F32MM` | f32mm | ```ID_AA64ZFR0_EL1.F32MM >= 0b00001``` | | 360 | `FEAT_F64MM` | f64mm | ```ID_AA64ZFR0_EL1.F64MM >= 0b00001``` | | 370 | `FEAT_SVE2` | sve2 | ```ID_AA64ZFR0_EL1.SVEver >= 0b0001``` | From 73c35a3d26d929244910338ae88db778640a8a30 Mon Sep 17 00:00:00 2001 From: Alexandros Lamprineas Date: Thu, 12 Dec 2024 15:53:14 +0000 Subject: [PATCH 07/10] [FMV] Document feature dependencies and detect at selection. (#368) --- main/acle.md | 50 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/main/acle.md b/main/acle.md index 232b8ba1..79ad91c6 100644 --- a/main/acle.md +++ b/main/acle.md @@ -422,6 +422,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Removed Function Multi Versioning features sve-bf16, sve-ebf16, and sve-i8mm. * Removed Function Multi Versioning features ebf16, memtag3, and rpres. * Removed Function Multi Versioning feature dgh. +* Document Function Multi Versioning feature dependencies. * Fixed range of operand `o0` (too small) in AArch64 system register designations. * Fixed SVE2.1 quadword gather load/scatter store intrinsics. * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. @@ -2675,8 +2676,6 @@ The following attributes trigger the multi version code generation: * The `default` version means the version of the function that would be generated without these attributes. * `name` is the dependent features from the tables below. - * If a feature depends on another feature as defined by the Architecture - Reference Manual then no need to explicitly state in the attribute[^fmv-note-names]. * The dependent features could be joined by the `+` sign. * None of these attributes will enable the corresponding ACLE feature(s) associated to the `name` expressed in the attribute. @@ -2686,9 +2685,6 @@ The following attributes trigger the multi version code generation: * FMV may be disabled in compile time by a compiler flag. In this case the `default` version shall be used. -[^fmv-note-names]: For example the `sve_bf16` feature depends on `sve` - but it is enough to say `target_version("sve_bf16")` in the code. - The attribute `__attribute__((target_version("name")))` expresses the following: @@ -2828,6 +2824,50 @@ The following table lists the architectures feature mapping for AArch64 | 580 | `FEAT_SME2` | sme2 | ```ID_AA64PFR1_EL1.SMEver >= 0b0001``` | | 650 | `FEAT_MOPS` | mops | ```ID_AA64ISAR2_EL1.MOPS >= 0b0001``` | +### Dependencies + +If a feature depends on another feature as defined by the table below then: + +* the depended-on feature *need not* be specified in the attribute, +* the depended-on feature *may* be specified in the attribute. + +These dependencies are taken into account transitively when selecting the +most appropriate version of a function (see section [Selection](#selection)). +The following table lists the feature dependencies for AArch64. + + | **Feature** | **Depends on** | + | ---------------- | ----------------- | + | flagm2 | flagm | + | simd | fp | + | dotprod | simd | + | sm4 | simd | + | rdm | simd | + | sha2 | simd | + | sha3 | sha2 | + | aes | simd | + | fp16 | fp | + | fp16fml | simd, fp16 | + | dpb2 | dpb | + | jscvt | fp | + | fcma | simd | + | rcpc2 | rcpc | + | rcpc3 | rcpc2 | + | frintts | fp | + | i8mm | simd | + | bf16 | simd | + | sve | fp16 | + | f32mm | sve | + | f64mm | sve | + | sve2 | sve | + | sve2-aes | sve2, aes | + | sve2-bitperm | sve2 | + | sve2-sha3 | sve2, sha3 | + | sve2-sm4 | sve2, sm4 | + | sme | fp16, bf16 | + | sme-f64f64 | sme | + | sme-i16i64 | sme | + | sme2 | sme | + ### Selection The following rules shall be followed by all implementations: From ff7467b9f1dae7e3cd38463b3377c5e27d31dd01 Mon Sep 17 00:00:00 2001 From: rsandifo-arm Date: Wed, 18 Dec 2024 15:51:05 +0000 Subject: [PATCH 08/10] Some tweaks to the SVE2p1 load and store intrinsics (#359) The pre-SVE2p1 gather and scatter intrinsics allow vector displacements (offsets or indices) to be either signed or unsigned. svld1q and svst1q instead required them to be unsigned. This patch adds signed versions too, for consistency. Also, the SVE2p1 stores were specified to take pointers to const, but they ought to be pointers to non-const instead. --- main/acle.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/main/acle.md b/main/acle.md index 79ad91c6..3e434b5c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -431,6 +431,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Changed `__ARM_NEON_SVE_BRIDGE` to refer to the availability of the [`arm_neon_sve_bridge.h`](#arm_neon_sve_bridge.h) header file, rather than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics. +* Removed extraneous `const` from SVE2.1 store intrinsics. ### References @@ -9221,11 +9222,13 @@ Gather Load Quadword. // _mf8, _bf16, _f16, _f32, _f64 svint8_t svld1q_gather[_u64base]_s8(svbool_t pg, svuint64_t zn); svint8_t svld1q_gather[_u64base]_offset_s8(svbool_t pg, svuint64_t zn, int64_t offset); + svint8_t svld1q_gather_[s64]offset[_s8](svbool_t pg, const int8_t *base, svint64_t offset); svint8_t svld1q_gather_[u64]offset[_s8](svbool_t pg, const int8_t *base, svuint64_t offset); // Variants are also available for: // _u16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 + svint16_t svld1q_gather_[s64]index[_s16](svbool_t pg, const int16_t *base, svint64_t index); svint16_t svld1q_gather_[u64]index[_s16](svbool_t pg, const int16_t *base, svuint64_t index); svint16_t svld1q_gather[_u64base]_index_s16(svbool_t pg, svuint64_t zn, int64_t index); ``` @@ -9295,14 +9298,14 @@ Contiguous store of single vector operand, truncating from quadword. ``` c // Variants are also available for: // _u32, _s32 - void svst1wq[_f32](svbool_t, const float32_t *ptr, svfloat32_t data); - void svst1wq_vnum[_f32](svbool_t, const float32_t *ptr, int64_t vnum, svfloat32_t data); + void svst1wq[_f32](svbool_t, float32_t *ptr, svfloat32_t data); + void svst1wq_vnum[_f32](svbool_t, float32_t *ptr, int64_t vnum, svfloat32_t data); // Variants are also available for: // _u64, _s64 - void svst1dq[_f64](svbool_t, const float64_t *ptr, svfloat64_t data); - void svst1dq_vnum[_f64](svbool_t, const float64_t *ptr, int64_t vnum, svfloat64_t data); + void svst1dq[_f64](svbool_t, float64_t *ptr, svfloat64_t data); + void svst1dq_vnum[_f64](svbool_t, float64_t *ptr, int64_t vnum, svfloat64_t data); ``` #### ST1Q @@ -9315,12 +9318,14 @@ Scatter store quadwords. // _mf8, _bf16, _f16, _f32, _f64 void svst1q_scatter[_u64base][_s8](svbool_t pg, svuint64_t zn, svint8_t data); void svst1q_scatter[_u64base]_offset[_s8](svbool_t pg, svuint64_t zn, int64_t offset, svint8_t data); - void svst1q_scatter_[u64]offset[_s8](svbool_t pg, const uint8_t *base, svuint64_t offset, svint8_t data); + void svst1q_scatter_[s64]offset[_s8](svbool_t pg, uint8_t *base, svint64_t offset, svint8_t data); + void svst1q_scatter_[u64]offset[_s8](svbool_t pg, uint8_t *base, svuint64_t offset, svint8_t data); // Variants are also available for: // _u16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 - void svst1q_scatter_[u64]index[_s16](svbool_t pg, const int16_t *base, svuint64_t index, svint16_t data); + void svst1q_scatter_[s64]index[_s16](svbool_t pg, int16_t *base, svint64_t index, svint16_t data); + void svst1q_scatter_[u64]index[_s16](svbool_t pg, int16_t *base, svuint64_t index, svint16_t data); void svst1q_scatter[_u64base]_index[_s16](svbool_t pg, svuint64_t zn, int64_t index, svint16_t data); ``` From 80b917a085b967418222a5187260cd216be32649 Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Fri, 20 Dec 2024 15:51:23 +0000 Subject: [PATCH 09/10] [SME] Add __arm_agnostic("sme_za_state") keyword attribute (#336) The `__arm_agnostic` keyword attribute enables the user to specify that a function is agnostic to a specified piece of architectural state. That means that the function must preserve this state when it exists, or otherwise ignores its contents. The reason for not naming this something like `__arm_za_compatible` was so that we might want use the attribute keyword for other architectural state in the future. --- main/acle.md | 62 +++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 52 insertions(+), 10 deletions(-) diff --git a/main/acle.md b/main/acle.md index 3e434b5c..3178149f 100644 --- a/main/acle.md +++ b/main/acle.md @@ -432,6 +432,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin [`arm_neon_sve_bridge.h`](#arm_neon_sve_bridge.h) header file, rather than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics. * Removed extraneous `const` from SVE2.1 store intrinsics. +* Added [`__arm_agnostic`](#arm_agnostic) keyword attribute. ### References @@ -861,6 +862,7 @@ predefine the associated macro to a nonzero value. | **Name** | **Target** | **Predefined macro** | | ----------------------------------------------------------- | --------------------- | --------------------------------- | +| [`__arm_agnostic`](#arm_agnostic) | function type | `__ARM_FEATURE_SME` | | [`__arm_locally_streaming`](#arm_locally_streaming) | function declaration | `__ARM_FEATURE_LOCALLY_STREAMING` | | [`__arm_in`](#ways-of-sharing-state) | function type | Argument-dependent | | [`__arm_inout`](#ways-of-sharing-state) | function type | Argument-dependent | @@ -5059,6 +5061,31 @@ if such a restoration is necessary. For example: } ``` +## `__arm_agnostic` + +A function with the `__arm_agnostic` [keyword attribute](#keyword-attributes) +must preserve the architectural state that is specified by its arguments when +such state exists at runtime. The function is otherwise unconcerned with this +state. + +The `__arm_agnostic` [keyword attribute](#keyword-attributes) applies to +**function types** and accepts the following arguments: + +```"sme_za_state"``` + +* This attribute affects the ABI of a function, which must implement an + [agnostic-ZA interface](#agnostic-za). It is the compiler's responsibility + to ensure that the function's object code honors the ABI requirements. + +* The use of `__arm_agnostic("sme_za_state")` allows writing functions that + are compatible with ZA state without having to share ZA state with the + caller, as required by `__arm_preserves`. The use of this attribute + does not imply that SME is available. + +* It is not valid for a function declaration with + `__arm_agnostic("sme_za_state")` to [share](#shares-state) PSTATE.ZA state + with its caller. + ## Mapping to the Procedure Call Standard [[AAPCS64]](#AAPCS64) classifies functions as having one of the following @@ -5070,13 +5097,21 @@ interfaces: * a “shared-ZA” interface -If a C or C++ function F forms part of the object code's ABI, that -object code function has a shared-ZA interface if and only if at least -one of the following is true: + + +* an "agnostic-ZA" interface + +If a C or C++ function F forms part of the object code's ABI: -* F shares ZA with its caller +* the object code function has a shared-ZA interface if and only if at least + one of the following is true: -* F shares ZT0 with its caller + * F shares ZA with its caller + + * F shares ZT0 with its caller + +* the object code function has an agnostic-ZA interface if and only if F's type + has an `__arm_agnostic("sme_za_state")` attribute. All other functions have a private-ZA interface. @@ -5161,12 +5196,15 @@ function F if at least one of the following is true: Otherwise, ZA can be in any state on entry to A if at least one of the following is true: -* F [uses](#uses-state) `"za"` +* F [uses](#uses-state) `"za"`. + +* F [uses](#uses-state) `"zt0"`. -* F [uses](#uses-state) `"zt0"` +* F's type has an [`__arm_agnostic("sme_za_state")` attribute](#agnostic-za) + and A's clobber-list includes neither `"za"` nor `"zt0"`. -Otherwise, ZA can be off or dormant on entry to A, as for what AAPCS64 -calls “private-ZA” functions. +Otherwise, ZA can be off or dormant on entry to A, in the same way as if F were +to call what the [[AAPCS64]](#AAPCS64) describes as a "private-ZA" function. If ZA is active on entry to A then A's instructions must ensure that ZA is also active when the asm finishes. @@ -5193,7 +5231,11 @@ depend on ZT0 as well as ZA. | off | off | F's uses and A's clobbers are disjoint | | dormant | dormant | " " " | | dormant | off | " " ", and A clobbers `"za"` | -| active | active | F uses `"za"` and/or `"zt0"` | +| active | active | F uses `"za"` and/or `"zt0"`, or | +| | | F's type has an | +| | | `__arm_agnostic("sme_za_state")` | +| | | attribute with A's clobber-list | +| | | including neither `"za"` nor `"zt0"` | The [`__ARM_STATE` macros](#state-strings) indicate whether a compiler is guaranteed to support a particular clobber string. For example, From afd6b56e7109a3e3ef64ca7b10057b4ed48433c5 Mon Sep 17 00:00:00 2001 From: Alfie Richards Date: Mon, 6 Jan 2025 09:26:51 +0000 Subject: [PATCH 10/10] Minor modification to FMV rules for scope and signatures (#363) Hi all, While attempting to implement FMV in the GCC front-end some questions were raised that I think are worth clarifying here. This PR changes the rules to use the default function to determine the signature and scope of the versioned function set. This clears up some cases such as: ```C int fn (int c = 1); int __attribute__((target_version("sve"))) fn (int c = 2); int bar() { return fn(); } ``` Where there are conflicting signatures and which default should be used is not clear at the moment. ```C int fn (int c[]); int __attribute__((target_version("default"))) fn (int c[1]) { } int __attribute__((target_version("sve"))) fn (int c[2]) { } ``` Where if this should be considered a conflicting signature is not clear. ```C int __attribute__((target_version("default"))) fn (int x) { return 1; } void bar () { int __attribute__((target_version("sve2"))) fn (int); fn(1); } ``` Where the scope of multi-versioned functions differs. And ```C // TU 1 #import TU2 int fn (int c = 1); int bar() { return fn(); } // TU 2 int __attribute__((target_version("sve"))) fn (int c = 2); int __attribute__((target_version("sve2"))) fn (int c = 2); int bar() { return fn(); } ``` Where it is possible calls in different TU's could use different default argument values. --- main/acle.md | 45 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/main/acle.md b/main/acle.md index 3178149f..fc1146a5 100644 --- a/main/acle.md +++ b/main/acle.md @@ -433,6 +433,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics. * Removed extraneous `const` from SVE2.1 store intrinsics. * Added [`__arm_agnostic`](#arm_agnostic) keyword attribute. +* Refined function versioning scope and signature rules to use the default + version scope and signature. ### References @@ -2675,10 +2677,12 @@ The following attributes trigger the multi version code generation: `__attribute__((target_version("name")))` and `__attribute__((target_clones("name",...)))`. +* Functions are allowed to have the same name and signature when + annotated with these attributes. * These attributes can be mixed with each other. +* `name` is the dependent features from the tables below. * The `default` version means the version of the function that would be generated without these attributes. -* `name` is the dependent features from the tables below. * The dependent features could be joined by the `+` sign. * None of these attributes will enable the corresponding ACLE feature(s) associated to the `name` expressed in the attribute. @@ -2687,21 +2691,46 @@ The following attributes trigger the multi version code generation: * If only the `default` version exist it should be linked directly. * FMV may be disabled in compile time by a compiler flag. In this case the `default` version shall be used. +* All function versions must be declared at the same scope level. +* The default version signature is the signature for calling + the multiversioned functions. Therefore, a versioned function + cannot be called unless the declaration of the default version + is visible in the scope of the call site. +* Non-default versions shall have a type that is convertible to the + type of the default version. +* All the function versions must be declared at the translation + unit in which the definition of the default version resides. The attribute `__attribute__((target_version("name")))` expresses the following: -* when applied to a function it becomes one of the versions. Function - with the same name may exist with multiple versions in the same - or in different translation units. +* When applied to a function it becomes one of the versions. +* Multiple function versions may exist in the same or in different + translation units. * One `default` version of the function is required to be provided in one of the translation units. * Implicitly, without this attribute, * or explicitly providing the `default` in the attribute. -* All instances of the versions shall share the same function - signature and calling convention. -* All the function versions must be declared at the translation - unit in which the definition of the default version resides. + +For example, the below is valid and 2 is used as the default +value for `c` when calling the multiversioned function `f`. + +```cpp +int __attribute__((target_version("simd"))) f (int c = 1); +int __attribute__((target_version("default"))) f (int c = 2); +int __attribute__((target_version("sve"))) f (int c = 3); + +int g() { return f(); } +``` + +Additionally, the below is not valid as the two statements declare +the same entity (the `default` version of `f`) with conflicting +signatures. + +```cpp +int f (int c = 1); +int __attribute__((target_version("default"))) f (int c = 2); +``` The attribute `__attribute__((target_clones("name",...)))` expresses the following: