diff --git a/.all-contributorsrc b/.all-contributorsrc index 657fbc50..560cd3db 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -342,6 +342,15 @@ "contributions": [ "content" ] + }, + { + "login": "SpencerAbson", + "name": "SpencerAbson", + "avatar_url": "https://avatars.githubusercontent.com/u/76910239?v=4", + "profile": "https://github.com/SpencerAbson", + "contributions": [ + "content" + ] } ], "contributorsPerLine": 7, diff --git a/README.md b/README.md index 9d95d9f3..a31c1ae7 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ -[![All Contributors](https://img.shields.io/badge/all_contributors-36-orange.svg?style=flat-square)](#contributors-) +[![All Contributors](https://img.shields.io/badge/all_contributors-37-orange.svg?style=flat-square)](#contributors-) ![Continuous Integration](https://github.com/ARM-software/acle/actions/workflows/ci.yml/badge.svg) @@ -134,6 +134,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d Robert Dazi
Robert Dazi

đź–‹ + SpencerAbson
SpencerAbson

🖋 diff --git a/main/acle.md b/main/acle.md index b651aa76..dd3af59c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -241,7 +241,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin specifications in [Cortex-M Security Extension (CMSE)](#cortex-m-security-extension-cmse). * Added specification for [NEON-SVE Bridge](#neon-sve-bridge) and - [NEON-SVE Bridge macros](#neon-sve-bridge-macros). + [NEON-SVE Bridge macros](#neon-sve-bridge-macro). * Added feature detection macro for the memcpy family of memory operations (MOPS) at [memcpy family of memory operations standarization instructions - @@ -419,13 +419,26 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Unified Function Multi Versioning features memtag and memtag2. * Unified Function Multi Versioning features aes and pmull. * Unified Function Multi Versioning features sve2-aes and sve2-pmull128. +* Removed Function Multi Versioning features sve-bf16, sve-ebf16, and sve-i8mm. * Removed Function Multi Versioning features ebf16, memtag3, and rpres. +* Removed Function Multi Versioning feature dgh. +* Document Function Multi Versioning feature dependencies. * Fixed range of operand `o0` (too small) in AArch64 system register designations. * Fixed SVE2.1 quadword gather load/scatter store intrinsics. * Removed unnecessary Zd argument from `svcvtnb_mf8[_f32_x2]_fpm`. +* Fixed urls. +* Changed name mangling of function types to include SME attributes. +* Changed `__ARM_NEON_SVE_BRIDGE` to refer to the availability of the + [`arm_neon_sve_bridge.h`](#arm_neon_sve_bridge.h) header file, rather + than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics. +* Removed extraneous `const` from SVE2.1 store intrinsics. +* Added [`__arm_agnostic`](#arm_agnostic) keyword attribute. +* Refined function versioning scope and signature rules to use the default + version scope and signature. * Changed the status of the SME2p1 ACLE from Alpha to Beta. * Changed the status of the SVE2p1 ACLE from Alpha to Beta. + ### References This document refers to the following documents. @@ -854,6 +867,7 @@ predefine the associated macro to a nonzero value. | **Name** | **Target** | **Predefined macro** | | ----------------------------------------------------------- | --------------------- | --------------------------------- | +| [`__arm_agnostic`](#arm_agnostic) | function type | `__ARM_FEATURE_SME` | | [`__arm_locally_streaming`](#arm_locally_streaming) | function declaration | `__ARM_FEATURE_LOCALLY_STREAMING` | | [`__arm_in`](#ways-of-sharing-state) | function type | Argument-dependent | | [`__arm_inout`](#ways-of-sharing-state) | function type | Argument-dependent | @@ -1928,14 +1942,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero. are available and if the associated [ACLE features] (#sme-language-extensions-and-intrinsics) are supported. -#### NEON-SVE Bridge macros - -`__ARM_NEON_SVE_BRIDGE` is defined to 1 if [NEON-SVE Bridge](#neon-sve-bridge) -intrinsics are available. This implies that the following macros are nonzero: +#### NEON-SVE Bridge macro -* `__ARM_NEON` -* `__ARM_NEON_FP` -* `__ARM_FEATURE_SVE` +`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [``](#arm_neon_sve_bridge.h) +header file is available. #### Scalable Matrix Extension (SME) @@ -2570,7 +2580,7 @@ be found in [[BA]](#BA). | [`__ARM_FP_FENV_ROUNDING`](#floating-point-model) | Rounding is configurable at runtime | 1 | | [`__ARM_NEON`](#advanced-simd-architecture-extension-neon) | Advanced SIMD (Neon) extension | 1 | | [`__ARM_NEON_FP`](#neon-floating-point) | Advanced SIMD (Neon) floating-point | 0x04 | -| [`__ARM_NEON_SVE_BRIDGE`](#neon-sve-bridge-macros) | Moving data between Neon and SVE data types | 1 | +| [`__ARM_NEON_SVE_BRIDGE`](#neon-sve-bridge-macro) | Availability of [`arm_neon_sve_brdge.h`](#arm_neon_sve_bridge.h) | 1 | | [`__ARM_PCS`](#procedure-call-standard) | Arm procedure call standard (32-bit-only) | 0x01 | | [`__ARM_PCS_AAPCS64`](#procedure-call-standard) | Arm PCS for AArch64. | 1 | | [`__ARM_PCS_VFP`](#procedure-call-standard) | Arm PCS hardware FP variant in use (32-bit-only) | 1 | @@ -2586,7 +2596,7 @@ be found in [[BA]](#BA). This section describes ACLE features that use GNU-style attributes. The general rules for attribute syntax are described in the GCC -documentation . +documentation . Briefly, for this declaration: ``` c @@ -2670,12 +2680,12 @@ The following attributes trigger the multi version code generation: `__attribute__((target_version("name")))` and `__attribute__((target_clones("name",...)))`. +* Functions are allowed to have the same name and signature when + annotated with these attributes. * These attributes can be mixed with each other. +* `name` is the dependent features from the tables below. * The `default` version means the version of the function that would be generated without these attributes. -* `name` is the dependent features from the tables below. - * If a feature depends on another feature as defined by the Architecture - Reference Manual then no need to explicitly state in the attribute[^fmv-note-names]. * The dependent features could be joined by the `+` sign. * None of these attributes will enable the corresponding ACLE feature(s) associated to the `name` expressed in the attribute. @@ -2684,24 +2694,46 @@ The following attributes trigger the multi version code generation: * If only the `default` version exist it should be linked directly. * FMV may be disabled in compile time by a compiler flag. In this case the `default` version shall be used. - -[^fmv-note-names]: For example the `sve_bf16` feature depends on `sve` - but it is enough to say `target_version("sve_bf16")` in the code. +* All function versions must be declared at the same scope level. +* The default version signature is the signature for calling + the multiversioned functions. Therefore, a versioned function + cannot be called unless the declaration of the default version + is visible in the scope of the call site. +* Non-default versions shall have a type that is convertible to the + type of the default version. +* All the function versions must be declared at the translation + unit in which the definition of the default version resides. The attribute `__attribute__((target_version("name")))` expresses the following: -* when applied to a function it becomes one of the versions. Function - with the same name may exist with multiple versions in the same - or in different translation units. +* When applied to a function it becomes one of the versions. +* Multiple function versions may exist in the same or in different + translation units. * One `default` version of the function is required to be provided in one of the translation units. * Implicitly, without this attribute, * or explicitly providing the `default` in the attribute. -* All instances of the versions shall share the same function - signature and calling convention. -* All the function versions must be declared at the translation - unit in which the definition of the default version resides. + +For example, the below is valid and 2 is used as the default +value for `c` when calling the multiversioned function `f`. + +```cpp +int __attribute__((target_version("simd"))) f (int c = 1); +int __attribute__((target_version("default"))) f (int c = 2); +int __attribute__((target_version("sve"))) f (int c = 3); + +int g() { return f(); } +``` + +Additionally, the below is not valid as the two statements declare +the same entity (the `default` version of `f`) with conflicting +signatures. + +```cpp +int f (int c = 1); +int __attribute__((target_version("default"))) f (int c = 2); +``` The attribute `__attribute__((target_clones("name",...)))` expresses the following: @@ -2804,13 +2836,9 @@ The following table lists the architectures feature mapping for AArch64 | 240 | `FEAT_LRCPC2` | rcpc2 | ```ID_AA64ISAR1_EL1.LRCPC >= 0b0010``` | | 241 | `FEAT_LRCPC3` | rcpc3 | ```ID_AA64ISAR1_EL1.LRCPC >= 0b0011``` | | 250 | `FEAT_FRINTTS` | frintts | ```ID_AA64ISAR1_EL1.FRINTTS >= 0b0001``` | - | 260 | `FEAT_DGH` | dgh | ```ID_AA64ISAR1_EL1.DGH >= 0b0001``` | | 270 | `FEAT_I8MM` | i8mm | ```ID_AA64ISAR1_EL1.I8MM >= 0b0001``` | | 280 | `FEAT_BF16` | bf16 | ```ID_AA64ISAR1_EL1.BF16 >= 0b0001``` | | 310 | `FEAT_SVE` | sve | ```ID_AA64PFR0_EL1.SVE >= 0b0001``` | - | 320 | `FEAT_BF16` | sve-bf16 | ```ID_AA64ZFR0_EL1.BF16 >= 0b0001``` | - | 330 | `FEAT_EBF16` | sve-ebf16 | ```ID_AA64ZFR0_EL1.BF16 >= 0b0010``` | - | 340 | `FEAT_I8MM` | sve-i8mm | ```ID_AA64ZFR0_EL1.I8MM >= 0b00001``` | | 350 | `FEAT_F32MM` | f32mm | ```ID_AA64ZFR0_EL1.F32MM >= 0b00001``` | | 360 | `FEAT_F64MM` | f64mm | ```ID_AA64ZFR0_EL1.F64MM >= 0b00001``` | | 370 | `FEAT_SVE2` | sve2 | ```ID_AA64ZFR0_EL1.SVEver >= 0b0001``` | @@ -2831,6 +2859,50 @@ The following table lists the architectures feature mapping for AArch64 | 580 | `FEAT_SME2` | sme2 | ```ID_AA64PFR1_EL1.SMEver >= 0b0001``` | | 650 | `FEAT_MOPS` | mops | ```ID_AA64ISAR2_EL1.MOPS >= 0b0001``` | +### Dependencies + +If a feature depends on another feature as defined by the table below then: + +* the depended-on feature *need not* be specified in the attribute, +* the depended-on feature *may* be specified in the attribute. + +These dependencies are taken into account transitively when selecting the +most appropriate version of a function (see section [Selection](#selection)). +The following table lists the feature dependencies for AArch64. + + | **Feature** | **Depends on** | + | ---------------- | ----------------- | + | flagm2 | flagm | + | simd | fp | + | dotprod | simd | + | sm4 | simd | + | rdm | simd | + | sha2 | simd | + | sha3 | sha2 | + | aes | simd | + | fp16 | fp | + | fp16fml | simd, fp16 | + | dpb2 | dpb | + | jscvt | fp | + | fcma | simd | + | rcpc2 | rcpc | + | rcpc3 | rcpc2 | + | frintts | fp | + | i8mm | simd | + | bf16 | simd | + | sve | fp16 | + | f32mm | sve | + | f64mm | sve | + | sve2 | sve | + | sve2-aes | sve2, aes | + | sve2-bitperm | sve2 | + | sve2-sha3 | sve2, sha3 | + | sve2-sm4 | sve2, sm4 | + | sme | fp16, bf16 | + | sme-f64f64 | sme | + | sme-i16i64 | sme | + | sme2 | sme | + ### Selection The following rules shall be followed by all implementations: @@ -5021,6 +5093,31 @@ if such a restoration is necessary. For example: } ``` +## `__arm_agnostic` + +A function with the `__arm_agnostic` [keyword attribute](#keyword-attributes) +must preserve the architectural state that is specified by its arguments when +such state exists at runtime. The function is otherwise unconcerned with this +state. + +The `__arm_agnostic` [keyword attribute](#keyword-attributes) applies to +**function types** and accepts the following arguments: + +```"sme_za_state"``` + +* This attribute affects the ABI of a function, which must implement an + [agnostic-ZA interface](#agnostic-za). It is the compiler's responsibility + to ensure that the function's object code honors the ABI requirements. + +* The use of `__arm_agnostic("sme_za_state")` allows writing functions that + are compatible with ZA state without having to share ZA state with the + caller, as required by `__arm_preserves`. The use of this attribute + does not imply that SME is available. + +* It is not valid for a function declaration with + `__arm_agnostic("sme_za_state")` to [share](#shares-state) PSTATE.ZA state + with its caller. + ## Mapping to the Procedure Call Standard [[AAPCS64]](#AAPCS64) classifies functions as having one of the following @@ -5032,13 +5129,21 @@ interfaces: * a “shared-ZA” interface -If a C or C++ function F forms part of the object code's ABI, that -object code function has a shared-ZA interface if and only if at least -one of the following is true: + + +* an "agnostic-ZA" interface + +If a C or C++ function F forms part of the object code's ABI: + +* the object code function has a shared-ZA interface if and only if at least + one of the following is true: -* F shares ZA with its caller + * F shares ZA with its caller -* F shares ZT0 with its caller + * F shares ZT0 with its caller + +* the object code function has an agnostic-ZA interface if and only if F's type + has an `__arm_agnostic("sme_za_state")` attribute. All other functions have a private-ZA interface. @@ -5123,12 +5228,15 @@ function F if at least one of the following is true: Otherwise, ZA can be in any state on entry to A if at least one of the following is true: -* F [uses](#uses-state) `"za"` +* F [uses](#uses-state) `"za"`. + +* F [uses](#uses-state) `"zt0"`. -* F [uses](#uses-state) `"zt0"` +* F's type has an [`__arm_agnostic("sme_za_state")` attribute](#agnostic-za) + and A's clobber-list includes neither `"za"` nor `"zt0"`. -Otherwise, ZA can be off or dormant on entry to A, as for what AAPCS64 -calls “private-ZA” functions. +Otherwise, ZA can be off or dormant on entry to A, in the same way as if F were +to call what the [[AAPCS64]](#AAPCS64) describes as a "private-ZA" function. If ZA is active on entry to A then A's instructions must ensure that ZA is also active when the asm finishes. @@ -5155,7 +5263,11 @@ depend on ZT0 as well as ZA. | off | off | F's uses and A's clobbers are disjoint | | dormant | dormant | " " " | | dormant | off | " " ", and A clobbers `"za"` | -| active | active | F uses `"za"` and/or `"zt0"` | +| active | active | F uses `"za"` and/or `"zt0"`, or | +| | | F's type has an | +| | | `__arm_agnostic("sme_za_state")` | +| | | attribute with A's clobber-list | +| | | including neither `"za"` nor `"zt0"` | The [`__ARM_STATE` macros](#state-strings) indicate whether a compiler is guaranteed to support a particular clobber string. For example, @@ -9184,11 +9296,13 @@ Gather Load Quadword. // _mf8, _bf16, _f16, _f32, _f64 svint8_t svld1q_gather[_u64base]_s8(svbool_t pg, svuint64_t zn); svint8_t svld1q_gather[_u64base]_offset_s8(svbool_t pg, svuint64_t zn, int64_t offset); + svint8_t svld1q_gather_[s64]offset[_s8](svbool_t pg, const int8_t *base, svint64_t offset); svint8_t svld1q_gather_[u64]offset[_s8](svbool_t pg, const int8_t *base, svuint64_t offset); // Variants are also available for: // _u16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 + svint16_t svld1q_gather_[s64]index[_s16](svbool_t pg, const int16_t *base, svint64_t index); svint16_t svld1q_gather_[u64]index[_s16](svbool_t pg, const int16_t *base, svuint64_t index); svint16_t svld1q_gather[_u64base]_index_s16(svbool_t pg, svuint64_t zn, int64_t index); ``` @@ -9258,14 +9372,14 @@ Contiguous store of single vector operand, truncating from quadword. ``` c // Variants are also available for: // _u32, _s32 - void svst1wq[_f32](svbool_t, const float32_t *ptr, svfloat32_t data); - void svst1wq_vnum[_f32](svbool_t, const float32_t *ptr, int64_t vnum, svfloat32_t data); + void svst1wq[_f32](svbool_t, float32_t *ptr, svfloat32_t data); + void svst1wq_vnum[_f32](svbool_t, float32_t *ptr, int64_t vnum, svfloat32_t data); // Variants are also available for: // _u64, _s64 - void svst1dq[_f64](svbool_t, const float64_t *ptr, svfloat64_t data); - void svst1dq_vnum[_f64](svbool_t, const float64_t *ptr, int64_t vnum, svfloat64_t data); + void svst1dq[_f64](svbool_t, float64_t *ptr, svfloat64_t data); + void svst1dq_vnum[_f64](svbool_t, float64_t *ptr, int64_t vnum, svfloat64_t data); ``` #### ST1Q @@ -9278,12 +9392,14 @@ Scatter store quadwords. // _mf8, _bf16, _f16, _f32, _f64 void svst1q_scatter[_u64base][_s8](svbool_t pg, svuint64_t zn, svint8_t data); void svst1q_scatter[_u64base]_offset[_s8](svbool_t pg, svuint64_t zn, int64_t offset, svint8_t data); - void svst1q_scatter_[u64]offset[_s8](svbool_t pg, const uint8_t *base, svuint64_t offset, svint8_t data); + void svst1q_scatter_[s64]offset[_s8](svbool_t pg, uint8_t *base, svint64_t offset, svint8_t data); + void svst1q_scatter_[u64]offset[_s8](svbool_t pg, uint8_t *base, svuint64_t offset, svint8_t data); // Variants are also available for: // _u16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 - void svst1q_scatter_[u64]index[_s16](svbool_t pg, const int16_t *base, svuint64_t index, svint16_t data); + void svst1q_scatter_[s64]index[_s16](svbool_t pg, int16_t *base, svint64_t index, svint16_t data); + void svst1q_scatter_[u64]index[_s16](svbool_t pg, int16_t *base, svuint64_t index, svint16_t data); void svst1q_scatter[_u64base]_index[_s16](svbool_t pg, svuint64_t zn, int64_t index, svint16_t data); ``` @@ -10095,6 +10211,65 @@ an [`__arm_streaming`](#arm_streaming) type. See [Changing streaming mode locally](#changing-streaming-mode-locally) for more information. +### C++ mangling of SME keywords + +SME keyword attributes which apply to function types must be included +in the name mangling of the type, if the mangling would normally include +the return type of the function. + +SME attributes are mangled in the same way as a template: + +``` c + template __SME_ATTRS; +``` + +with the arguments: + +``` c + __SME_ATTRS; +``` + +where: + +* normal_function_type is the function type without any SME attributes. + +* sme_state is an unsigned 64-bit integer representing the streaming and ZA + properties of the function's interface. + +The bits are defined as follows: + +| **Bits** | **Value** | **Interface Type** | +| -------- | --------- | ------------------------------ | +| 0 | 0b1 | __arm_streaming | +| 1 | 0b1 | __arm_streaming_compatible | +| 2 | 0b1 | __arm_agnostic("sme_za_state") | +| 3-5 | 0b000 | No ZA state (default) | +| | 0b001 | __arm_in("za") | +| | 0b010 | __arm_out("za") | +| | 0b011 | __arm_inout("za") | +| | 0b100 | __arm_preserves("za") | +| 6-8 | 0b000 | No ZT0 state (default) | +| | 0b001 | __arm_in("zt0") | +| | 0b010 | __arm_out("zt0") | +| | 0b011 | __arm_inout("zt0") | +| | 0b100 | __arm_preserves("zt0") | + +Bits 9-63 are defined to be zero by this revision of the ACLE and are reserved +for future type attributes. + +For example: + +``` c + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj1EE + void f(svint8_t (*fn)() __arm_streaming) { fn(); } + + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj26EE + void f(svint8_t (*fn)() __arm_streaming_compatible __arm_inout("za")) { fn(); } + + // Mangled as _Z1fP11__SME_ATTRSIFu10__SVInt8_tvELj128EE + void f(svint8_t (*fn)() __arm_out("zt0")) { fn(); } +``` + ## SME types ### Predicate-as-counter diff --git a/main/design_documents/function-multi-versioning.md b/main/design_documents/function-multi-versioning.md index 57283458..73bf08f5 100644 --- a/main/design_documents/function-multi-versioning.md +++ b/main/design_documents/function-multi-versioning.md @@ -25,7 +25,7 @@ derived from a function via FMV: 2. the derived function obey to the same calling convention of the original function. -Currently the `target` [attribute for aarch64](https://gcc.gnu.org/onlinedocs/gcc/extensions-to-the-c-language-family/declaring-attributes-of-functions/aarch64-function-attributes.html) +Currently the `target` [attribute for aarch64](https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html) is used for many purposes, some of which might overlap the functionality introduced by FMV. To avoid confusion, we named the attributes used by FMV with `target_version` and `target_clones`.