Skip to content
52 changes: 52 additions & 0 deletions main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin

* Added feature test macro for FEAT_SSVE_FEXPA.
* Added feature test macro for FEAT_CSSC.
* Added [**Alpha**](#current-status-and-anticipated-changes) support
for FEAT_SVE_AES2, FEAT_SSVE_AES intrinsics.

### References

Expand Down Expand Up @@ -2147,6 +2149,18 @@ support for the SVE2 AES (FEAT_SVE_AES) instructions and if the associated
ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
and `__ARM_FEATURE_SVE2` are both nonzero.

In addition, `__ARM_FEATURE_SVE2_AES2` is defined to `1` if there is hardware
support for the SVE2 AES2 (FEAT_SVE_AES2) instructions and if the associated
Copy link
Contributor

@CarolineConcatto CarolineConcatto Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you saying it is support for SVE2?
I believe it should be SVE AES2.
The description for the instruction does not have sve2, does it?
https://developer.arm.com/documentation/ddi0602/2025-09/SVE-Instructions/AESD--indexed---Multi-vector-AES-single-round-decryption-?lang=en

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos. Fixed here and some other places as well. Thanks.

ACLE intrinsics are available.

`__ARM_FEATURE_SSVE_AES` is defined to 1 if there is hardware support for
SVE2 AES2 (FEAT_SVE_AES2) instructions in Streaming SVE mode (FEAT_SSVE_AES)
and if the associated ACLE intrinsics are available.

The specification for SVE2 AES2 (FEAT_SVE_AES2, FEAT_SSVE_AES) instructions is in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

#### SHA2 extension

`__ARM_FEATURE_SHA2` is defined to 1 if the SHA1 & SHA2-256 Crypto
Expand Down Expand Up @@ -2642,6 +2656,8 @@ be found in [[BA]](#BA).
| [`__ARM_FEATURE_SVE_VECTOR_OPERATORS`](#scalable-vector-extension-sve) | Level of support for C and C++ operators on SVE predicate types | 1 |
| [`__ARM_FEATURE_SVE2`](#sve2) | SVE version 2 (FEAT_SVE2) | 1 |
| [`__ARM_FEATURE_SVE2_AES`](#aes-extension) | SVE2 support for the AES cryptographic extension (FEAT_SVE_AES) | 1 |
| [`__ARM_FEATURE_SVE2_AES2`](#aes-extension) | SVE2 support for the multi-vector AES cryptographic and 128-bit polynomial multiply long extension (FEAT_SVE_AES2) | 1 |
| [`__ARM_FEATURE_SSVE_AES`](#aes-extension) | SVE2 support for the multi-vector AES cryptographic and 128-bit polynomial multiply long extension (FEAT_SSVE_AES) | 1 |
| [`__ARM_FEATURE_SVE2_BITPERM`](#bit-permute-extension) | SVE2 bit permute extension | 1 |
| [`__ARM_FEATURE_SSVE_BITPERM`](#bit-permute-extension) | SVE2 bit permute extension | 1 |
| [`__ARM_FEATURE_SSVE_FEXPA`](#streaming-sve-fexpa-extension) | Streaming SVE FEXPA extension | 1 |
Expand Down Expand Up @@ -9712,6 +9728,42 @@ Lookup table read with 4-bit indices.
svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
```

### SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions

The specification for SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions is in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

#### AESE, AESD, AESEMC, AESDIMC

Multi-vector Advanced Encryption Standard instructions

```c
// Only if __ARM_FEATURE_SVE2_AES2 != 0 or __ARM_FEATURE_SSVE_AES != 0

svuint8x2_t svaese[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
svuint8x4_t svaese[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
svuint8x2_t svaesd[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
svuint8x4_t svaesd[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
svuint8x2_t svaesemc[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
svuint8x4_t svaesemc[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
svuint8x2_t svaesdimc[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
svuint8x4_t svaesdimc[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
```

#### PMULL, PMLAL

Multi-vector 128-bit polynomial multiply long instructions

``` c
// Only if __ARM_FEATURE_SVE2_AES2 != 0 or __ARM_FEATURE_SSVE_AES != 0

// Variants are also available for:
// _s64x2, _f64x2
svuint64x2_t svpmull[_u64x2](svuint64_t zn, svuint64_t zm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be like this:
svpmull_u64_x2, for me it does not looks it is optional: _u64x2 as it is the output and we cannot deduce that from the input
The svpmlal[_u64x2] is fine, because the first parameter is also the output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will ever be pmull on 64-bit type that will result have single vector output. Just by the virtue of the operation you results need to be 128-bit elements so will require 2 vectors to fit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After internal discussion, it was decided to keep the suffixes optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we also should have the _n version, like I see in the llvm implementation.
I am proposing something like this:

svuint64x2_t svpmull_u64x2(svuint64_t zn, svuint64_t zm);
svuint64x2_t svpmull[_n]_u64x2(svuint64_t zn, uint64_t zm);

svuint64x2_t svpmlal[_u64x2](svuint64x2_t zda, svuint64_t zn, svuint64_t zm);
svuint64x2_t svpmlal[_n_u64x2](svuint64x2_t zda, svuint64_t zn, uint64_t zm)

svuint64x2_t svpmlal[_u64x2](svuint64x2_t zda, svuint64_t zn, svuint64_t zm);
```

# SME language extensions and intrinsics

The specification for SME is in
Expand Down