diff --git a/main/acle.md b/main/acle.md
index adefa8f1..edede65d 100644
--- a/main/acle.md
+++ b/main/acle.md
@@ -400,6 +400,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 * Added a requirement for function version declaration in Function Multi Versioning.
 * Fixed some rendering issues in the online Markdown documentation and fixed
   a misplaced anchor.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SME2.1 (FEAT_SME2p1).
 
 ### References
 
@@ -1904,23 +1906,31 @@ intrinsics are available. This implies that the following macros are nonzero:
 
 #### Scalable Matrix Extension (SME)
 
-The specification for SME is in
-[**Beta** state](#current-status-and-anticipated-changes) and may
-change or be extended in the future.
+The specification for SME2.1 is in
+[**Alpha** state](#current-status-and-anticipated-changes) and the
+specification for the rest of SME is in
+[**Beta** state](#current-status-and-anticipated-changes).  The
+specifications may change or be extended in the future.
+
+ACLE provides [features](#sme-language-extensions-and-intrinsics)
+for accessing the Scalable Matrix Extension (SME). Each revision
+of SME has an associated preprocessor macro, given in the table below:
 
-`__ARM_FEATURE_SME` is defined to 1 if there is hardware support
-for the FEAT_SME instructions and if the associated [ACLE
-features](#sme-language-extensions-and-intrinsics) are available.
-This implies that `__ARM_FEATURE_SVE` is nonzero.
+| **Feature** | **Macro**                  |
+| ----------- | -------------------------- |
+| FEAT_SME    | __ARM_FEATURE_SME          |
+| FEAT_SME2   | __ARM_FEATURE_SME2         |
+| FEAT_SME2p1 | __ARM_FEATURE_SME2p1       |
+
+Each macro is defined if there is hardware support for the associated
+architecture feature and if all of the [ACLE
+features](#sme-language-extensions-and-intrinsics) that are conditional
+on the macro are supported.
 
 In addition, `__ARM_FEATURE_LOCALLY_STREAMING` is defined to 1 if
 the [`arm_locally_streaming`](#arm_locally_streaming) attribute
 is available.
 
-`__ARM_FEATURE_SME2` is defined to 1 if the FEAT_SME2 instructions
-are available and if the associated [ACLE
-features](#sme-language-extensions-and-intrinsics) are supported.
-
 #### M-profile Vector Extension
 
 `__ARM_FEATURE_MVE` is defined as a bitmap to indicate M-profile Vector
@@ -1972,6 +1982,16 @@ instructions from Armv8.2-A are supported and intrinsics targeting them are
 available. This implies that `__ARM_FEATURE_FP16_SCALAR_ARITHMETIC` is
 defined to a nonzero value.
 
+#### Half-precision floating-point SME intrinsics
+
+The specification for SME2.1 is in
+[**Alpha** state](#current-status-and-anticipated-changes) and may change or be
+extended in the future.
+
+`__ARM_FEATURE_SME_F16F16` is defined to `1` if there is hardware support
+for the SME2 half-precision (FEAT_SME_F16F16) instructions and if their
+associated intrinsics are available.
+
 #### Brain 16-bit floating-point support
 
 `__ARM_BF16_FORMAT_ALTERNATIVE` is defined to 1 if the Arm
@@ -1997,6 +2017,32 @@ See [Half-precision brain
 floating-point](#half-precision-brain-floating-point) for details
 of half-precision brain floating-point types.
 
+#### Non-widening brain 16-bit floating-point support
+
+The specification for B16B16 is in
+[**Alpha** state](#current-status-and-anticipated-changes) and may change or be
+extended in the future.
+
+`__ARM_FEATURE_SVE_B16B16` is defined to `1` if there is hardware support
+for the FEAT_SVE_B16B16 instructions and if their associated intrinsics
+are available.  Specifically, if this macro is defined to `1`, then:
+
+*    the SVE subset of the FEAT_SVE_B16B16 intrinsics are available in
+     [non-streaming statements](#non-streaming-statement)
+     if `__ARM_FEATURE_SVE` is nonzero.
+
+*    the SVE subset of the FEAT_SVE_B16B16 intrinsics are available in
+     [streaming-compatible statements](#streaming-compatible-statement)
+     if `__ARM_FEATURE_SME` is nonzero.
+
+*    all FEAT_SVE_B16B16 intrinsics are available in
+     [streaming statements](#streaming-statement) if `__ARM_FEATURE_SME`
+     is nonzero.
+
+`__ARM_FEATURE_SME_B16B16` is defined to `1` if there is hardware support
+for the FEAT_SME_B16B16 instructions and if their associated intrinsics
+are available.
+
 ### Cryptographic extensions
 
 #### “Crypto” extension
@@ -2390,10 +2436,13 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_SM4`](#sm4-extension)                                                                                                                   | SM4 Crypto extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A)                                   | 1           |
 | [`__ARM_FEATURE_SME`](#scalable-matrix-extension-sme)                                                                                                   | Scalable Matrix Extension (FEAT_SME)                                                               | 1           |
 | [`__ARM_FEATURE_SME2`](#scalable-matrix-extension-sme)                                                                                                  | Scalable Matrix Extension (FEAT_SME2)                                                              | 1           |
+| [`__ARM_FEATURE_SME_B16B16`](#non-widening-brain-16-bit-floating-point-support)                                                                         | Non-widening brain 16-bit floating-point SME intrinsics (FEAT_SME_B16B16)                          | 1           |
+| [`__ARM_FEATURE_SME_F16F16`](#half-precision-floating-point-sme-intrinsics)                                                                             | Half-precision floating-point SME intrinsics (FEAT_SME_F16F16)                                     | 1           |
 | [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics)                                                                 | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64)                         | 1           |
 | [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics)                                                               | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64)                       | 1           |
 | [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme)                                                                                 | Support for the `arm_locally_streaming` attribute                                                  | 1           |
 | [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve)                                                                                                   | Scalable Vector Extension (FEAT_SVE)                                                               | 1           |
+| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support)                                                                         | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16)                              | 1           |
 | [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support)                                                                                        | SVE support for the 16-bit brain floating-point extension (FEAT_BF16)                              | 1           |
 | [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve)                                                                                              | The number of bits in an SVE vector, when known in advance                                         | 256         |
 | [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices)                                                                    | 32-bit floating-point matrix multiply extension (FEAT_F32MM)                                       | 1           |
@@ -8672,8 +8721,8 @@ The specification for B16B16 is in
 [**Alpha** state](#current-status-and-anticipated-changes) and may change or be
 extended in the future.
 
-The instructions in this section are available when __ARM_FEATURE_B16B16 is
-non-zero.
+The instructions in this section are available when `__ARM_FEATURE_SVE_B16B16`
+is non-zero.
 
 #### BFADD, BFSUB
 
@@ -8744,6 +8793,7 @@ BFloat16 floating-point maximum/minimum number (predicated).
    ```
 
 #### BFMLA, BFMLS
+
 BFloat16 floating-point fused multiply add or sub vectors.
 
  ``` c
@@ -10191,17 +10241,16 @@ Replacing `_hor` with `_ver` gives the associated vertical forms.
     __arm_streaming __arm_inout("za");
 ```
 
-#### FMOPA (non-widening)
+#### BFMOPA, FMOPA (non-widening)
 
 ``` c
+  // Variants are also available for:
+  //   _za16[_bf16]_m (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16]_m (only if __ARM_FEATURE_SME_F16F16 != 0)
+  //   _za64[_f64]_m (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmopa_za32[_f32]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                            svfloat32_t zn, svfloat32_t zm)
     __arm_streaming __arm_inout("za");
-
-  // Only if __ARM_FEATURE_SME_F64F64 != 0
-  void svmopa_za64[_f64]_m(uint64_t tile, svbool_t pn, svbool_t pm,
-                           svfloat64_t zn, svfloat64_t zm)
-    __arm_streaming __arm_inout("za");
 ```
 
 #### BFMOPS, FMOPS (widening), SMOPS, UMOPS
@@ -10234,17 +10283,16 @@ Replacing `_hor` with `_ver` gives the associated vertical forms.
     __arm_streaming __arm_inout("za");
 ```
 
-#### FMOPS (non-widening)
+#### BFMOPS, FMOPS (non-widening)
 
 ``` c
+  // Variants are also available for:
+  //   _za16[_bf16]_m (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16]_m (only if __ARM_FEATURE_SME_F16F16 != 0)
+  //   _za64[_f64]_m (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmops_za32[_f32]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                            svfloat32_t zn, svfloat32_t zm)
     __arm_streaming __arm_inout("za");
-
-  // Only if __ARM_FEATURE_SME_F64F64 != 0
-  void svmops_za64[_f64]_m(uint64_t tile, svbool_t pn, svbool_t pm,
-                           svfloat64_t zn, svfloat64_t zm)
-    __arm_streaming __arm_inout("za");
 ```
 
 #### RDSVL
@@ -10462,12 +10510,14 @@ Multi-vector add
   svint8x4_t svadd[_single_s8_x4](svint8x4_t zdn, svint8_t zm) __arm_streaming;
   ```
 
-#### ADD, SUB, FADD, FSUB (accumulate into ZA)
+#### ADD, SUB, BFADD, BFSUB, FADD, FSUB (accumulate into ZA)
 
 Multi-vector add/sub and accumulate into ZA
 
 ``` c
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za32[_s32]
   //   _za32[_u32]
@@ -10479,6 +10529,8 @@ Multi-vector add/sub and accumulate into ZA
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za32[_s32]
   //   _za32[_u32]
@@ -10490,6 +10542,8 @@ Multi-vector add/sub and accumulate into ZA
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za32[_s32]
   //   _za32[_u32]
@@ -10501,6 +10555,8 @@ Multi-vector add/sub and accumulate into ZA
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za32[_s32]
   //   _za32[_u32]
@@ -10520,6 +10576,16 @@ Multi-vector floating-point convert from single-precision to interleaved half-pr
   svbfloat16_t svcvtn_bf16[_f32_x2](svfloat32x2_t zn) __arm_streaming;
   ```
 
+#### FCVTL
+
+Multi-vector floating-point convert from half-precision to deinterleaved
+single-precision.
+
+```
+  // Only if __ARM_FEATURE_SME_F16F16 != 0
+  svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
+```
+
 #### FCVT, BFCVT, FCVTZS, FCVTZU, SCVTF, UCVTF
 
 Multi-vector convert to/from floating-point.
@@ -10535,6 +10601,9 @@ Multi-vector convert to/from floating-point.
 
   // Variants are also available for _f32[_u32_x4], _s32[_f32_x4] and _u32[_f32_x4]
   svfloat32x4_t svcvt_f32[_s32_x4](svint32x4_t zn) __arm_streaming;
+
+  // Only if __ARM_FEATURE_SME_F16F16 != 0
+  svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
   ```
 
 #### SQCVT, SQCVTU, UQCVT
@@ -10781,12 +10850,14 @@ Bitwise exclusive NOR population count outer product and accumulate/subtract
     __arm_streaming __arm_inout("za");
   ```
 
-#### FMLA, FMLS (single)
+#### BFMLA, BFMLS, FMLA, FMLS (single)
 
 Multi-vector floating-point fused multiply-add/subtract
 
 ``` c
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla[_single]_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10795,6 +10866,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla[_single]_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -10803,6 +10876,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls[_single]_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10811,6 +10886,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls[_single]_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -10818,12 +10895,14 @@ Multi-vector floating-point fused multiply-add/subtract
     __arm_streaming __arm_inout("za");
   ```
 
-#### FMLA, FMLS (multi)
+#### BFMLA, BFMLS, FMLA, FMLS (multi)
 
 Multi-vector floating-point fused multiply-add/subtract
 
 ``` c
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10832,6 +10911,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -10840,6 +10921,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10848,6 +10931,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -10855,12 +10940,14 @@ Multi-vector floating-point fused multiply-add/subtract
     __arm_streaming __arm_inout("za");
   ```
 
-#### FMLA, FMLS (indexed)
+#### BFMLA. BFMLS, FMLA, FMLS (indexed)
 
 Multi-vector floating-point fused multiply-add/subtract
 
 ``` c
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla_lane_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10869,6 +10956,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmla_lane_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -10877,6 +10966,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls_lane_za32[_f32]_vg1x2(uint32_t slice, svfloat32x2_t zn,
@@ -10885,6 +10976,8 @@ Multi-vector floating-point fused multiply-add/subtract
 
 
   // Variants are available for:
+  //   _za16[_bf16] (only if __ARM_FEATURE_SME_B16B16 != 0)
+  //   _za16[_f16] (only if __ARM_FEATURE_SME_F16F16 != 0)
   //   _za32[_f32]
   //   _za64[_f64] (only if __ARM_FEATURE_SME_F64F64 != 0)
   void svmls_lane_za32[_f32]_vg1x4(uint32_t slice, svfloat32x4_t zn,
@@ -11276,114 +11369,214 @@ Multi-vector multiply-subtract long long (widening)
     __arm_streaming __arm_inout("za");
   ```
 
-#### SMAX, SMIN, UMAX, UMIN, FMAX, FMIN (single)
+#### SMAX, SMIN, UMAX, UMIN, BFMAX, BFMIN, FMAX, FMIN (single)
 
 Multi-vector min/max
 
 ``` c
-  // Variants are also available for _single_s8_x2, _single_u8_x2,
-  // _single_s16_x2, _single_u16_x2, _single_s32_x2, _single_u32_x2,
-  // _single_f32_x2, _single_s64_x2, _single_u64_x2 and _single_f64_x2
+  // Variants are also available for:
+  //   _single_s8_x2
+  //   _single_u8_x2,
+  //   _single_bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x2
+  //   _single_u16_x2
+  //   _single_s32_x2
+  //   _single_u32_x2,
+  //   _single_f32_x2
+  //   _single_s64_x2
+  //   _single_u64_x2
+  //   _single_f64_x2
   svfloat16x2_t svmax[_single_f16_x2](svfloat16x2_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_s8_x4, _single_u8_x4,
-  // _single_s16_x4, _single_u16_x4, _single_s32_x4, _single_u32_x4,
-  // _single_f32_x4, _single_s64_x4, _single_u64_x4 and _single_f64_x4
+  // Variants are also available for:
+  //   _single_s8_x4
+  //   _single_u8_x4,
+  //   _single_bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x4
+  //   _single_u16_x4
+  //   _single_s32_x4
+  //   _single_u32_x4,
+  //   _single_f32_x4
+  //   _single_s64_x4
+  //   _single_u64_x4
+  //   _single_f64_x4
   svfloat16x4_t svmax[_single_f16_x4](svfloat16x4_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_s8_x2, _single_u8_x2,
-  // _single_s16_x2, _single_u16_x2, _single_s32_x2, _single_u32_x2,
-  // _single_f32_x2, _single_s64_x2, _single_u64_x2 and _single_f64_x2
+  // Variants are also available for:
+  //   _single_s8_x2
+  //   _single_u8_x2,
+  //   _single_bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x2
+  //   _single_u16_x2
+  //   _single_s32_x2
+  //   _single_u32_x2,
+  //   _single_f32_x2
+  //   _single_s64_x2
+  //   _single_u64_x2
+  //   _single_f64_x2
   svfloat16x2_t svmin[_single_f16_x2](svfloat16x2_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_s8_x4, _single_u8_x4,
-  // _single_s16_x4, _single_u16_x4, _single_s32_x4, _single_u32_x4,
-  // _single_f32_x4, _single_s64_x4, _single_u64_x4 and _single_f64_x4
+  // Variants are also available for:
+  //   _single_s8_x4
+  //   _single_u8_x4,
+  //   _single_bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x4
+  //   _single_u16_x4
+  //   _single_s32_x4
+  //   _single_u32_x4,
+  //   _single_f32_x4
+  //   _single_s64_x4
+  //   _single_u64_x4
+  //   _single_f64_x4
   svfloat16x4_t svmin[_single_f16_x4](svfloat16x4_t zdn, svfloat16_t zm)
     __arm_streaming;
   ```
 
-#### SMAX, SMIN, UMAX, UMIN, FMAX, FMIN (multi)
+#### SMAX, SMIN, UMAX, UMIN, BFMAX, BFMIN, FMAX, FMIN (multi)
 
 Multi-vector min/max
 
 ``` c
-  // Variants are also available for _s8_x2, _u8_x2, _s16_x2, _u16_x2,
-  // _s32_x2, _u32_x2, _f32_x2, _s64_x2, _u64_x2 and _f64_x2
+  // Variants are also available for:
+  //   _s8_x2
+  //   _u8_x2
+  //   _bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _s16_x2
+  //   _u16_x2,
+  //   _s32_x2
+  //   _u32_x2
+  //   _f32_x2
+  //   _s64_x2
+  //   _u64_x2
+  //   _f64_x2
   svfloat16x2_t svmax[_f16_x2](svfloat16x2_t zdn, svfloat16x2_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _s8_x4, _u8_x4, _s16_x4, _u16_x4,
-  // _s32_x4, _u32_x4, _f32_x4, _s64_x4, _u64_x4 and _f64_x4
+  // Variants are also available for:
+  //   _s8_x4
+  //   _u8_x4
+  //   _bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _s16_x4
+  //   _u16_x4,
+  //   _s32_x4
+  //   _u32_x4
+  //   _f32_x4
+  //   _s64_x4
+  //   _u64_x4
+  //   _f64_x4
   svfloat16x4_t svmax[_f16_x4](svfloat16x4_t zdn, svfloat16x4_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _s8_x2, _u8_x2, _s16_x2, _u16_x2,
-  // _s32_x2, _u32_x2, _f32_x2, _s64_x2, _u64_x2 and _f64_x2
+  // Variants are also available for:
+  //   _s8_x2
+  //   _u8_x2
+  //   _bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _s16_x2
+  //   _u16_x2,
+  //   _s32_x2
+  //   _u32_x2
+  //   _f32_x2
+  //   _s64_x2
+  //   _u64_x2
+  //   _f64_x2
   svfloat16x2_t svmin[_f16_x2](svfloat16x2_t zdn, svfloat16x2_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _s8_x4, _u8_x4, _s16_x4, _u16_x4,
-  // _s32_x4, _u32_x4, _f32_x4, _s64_x4,_u64_x4 and _f64_x4
+  // Variants are also available for:
+  //   _s8_x4
+  //   _u8_x4
+  //   _bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _s16_x4
+  //   _u16_x4,
+  //   _s32_x4
+  //   _u32_x4
+  //   _f32_x4
+  //   _s64_x4
+  //   _u64_x4
+  //   _f64_x4
   svfloat16x4_t svmin[_f16_x4](svfloat16x4_t zdn, svfloat16x4_t zm)
     __arm_streaming;
   ```
 
-#### FMAXNM, FMINNM (single)
+#### BFMAXNM, BFMINNM, FMAXNM, FMINNM (single)
 
 Multi-vector floating point min/max number
 
 ``` c
-  // Variants are also available for _single_f32_x2 and _single_f64_x2
+  // Variants are also available for:
+  //   _single_bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_f32_x2
+  //   _single_f64_x2
   svfloat16x2_t svmaxnm[_single_f16_x2](svfloat16x2_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_f32_x4 and _single_f64_x4
+  // Variants are also available for:
+  //   _single_bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_f32_x4
+  //   _single_f64_x4
   svfloat16x4_t svmaxnm[_single_f16_x4](svfloat16x4_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_f32_x2 and _single_f64_x2
+  // Variants are also available for:
+  //   _single_bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_f32_x2
+  //   _single_f64_x2
   svfloat16x2_t svminnm[_single_f16_x2](svfloat16x2_t zdn, svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_f32_x4 and _single_f64_x4
+  // Variants are also available for:
+  //   _single_bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_f32_x4
+  //   _single_f64_x4
   svfloat16x4_t svminnm[_single_f16_x4](svfloat16x4_t zdn, svfloat16_t zm)
     __arm_streaming;
   ```
 
-#### FMAXNM, FMINNM (multi)
+#### BFMAXNM, BFMINNM, FMAXNM, FMINNM (multi)
 
 Multi-vector floating point min/max number
 
 ``` c
-  // Variants are also available for _f32_x2 and _f64_x2
+  // Variants are also available for:
+  //   _bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _f32_x2
+  //   _f64_x2
   svfloat16x2_t svmaxnm[_f16_x2](svfloat16x2_t zdn, svfloat16x2_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _f32_x4 and _f64_x4
+  // Variants are also available for:
+  //   _bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _f32_x4
+  //   _f64_x4
   svfloat16x4_t svmaxnm[_f16_x4](svfloat16x4_t zdn, svfloat16x4_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _f32_x2 and _f64_x2
+  // Variants are also available for:
+  //   _bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _f32_x2
+  //   _f64_x2
   svfloat16x2_t svminnm[_f16_x2](svfloat16x2_t zdn, svfloat16x2_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _f32_x4 and _f64_x4
+  // Variants are also available for:
+  //   _bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _f32_x4
+  //   _f64_x4
   svfloat16x4_t svminnm[_f16_x4](svfloat16x4_t zdn, svfloat16x4_t zm)
     __arm_streaming;
   ```
@@ -11573,22 +11766,40 @@ Move multi-vectors to/from ZA
     __arm_streaming __arm_inout("za");
   ```
 
-#### UCLAMP, SCLAMP, FCLAMP
+#### UCLAMP, SCLAMP, BFCLAMP, FCLAMP
 
 Multi-vector clamp to minimum/maximum vector
 
 ``` c
-  // Variants are also available for _single_s8_x2, _single_u8_x2,
-  // _single_s16_x2, _single_u16_x2, _single_s32_x2, _single_u32_x2,
-  // _single_f32_x2, _single_s64_x2, _single_u64_x2 and _single_f64_x2
+  // Variants are also available for:
+  //   _single_s8_x2
+  //   _single_u8_x2,
+  //   _single_bf16_x2 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x2
+  //   _single_u16_x2
+  //   _single_s32_x2
+  //   _single_u32_x2,
+  //   _single_f32_x2
+  //   _single_s64_x2
+  //   _single_u64_x2
+  //   _single_f64_x2
   svfloat16x2_t svclamp[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zn,
                                         svfloat16_t zm)
     __arm_streaming;
 
 
-  // Variants are also available for _single_s8_x4, _single_u8_x4,
-  // _single_s16_x4, _single_u16_x4, _single_s32_x4, _single_u32_x4,
-  // _single_f32_x4, _single_s64_x4, _single_u64_x4 and _single_f64_x4
+  // Variants are also available for:
+  //   _single_s8_x4
+  //   _single_u8_x4,
+  //   _single_bf16_x4 (only if __ARM_FEATURE_SVE_B16B16 != 0)
+  //   _single_s16_x4
+  //   _single_u16_x4
+  //   _single_s32_x4
+  //   _single_u32_x4,
+  //   _single_f32_x4
+  //   _single_s64_x4
+  //   _single_u64_x4
+  //   _single_f64_x4
   svfloat16x4_t svclamp[_single_f16_x4](svfloat16x4_t zd, svfloat16_t zn,
                                         svfloat16_t zm)
     __arm_streaming;
@@ -11810,6 +12021,143 @@ element types.
   svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming;
   ```
 
+### SME2.1 instruction intrinsics
+
+The specification for SME2.1 is in
+[**Alpha** state](#current-status-and-anticipated-changes) and may change or be
+extended in the future.
+
+The intrinsics in this section are defined by the header file
+[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p1` is defined.
+
+#### MOVAZ (tile to vector, single)
+
+Move and zero ZA tile slice to vector register.
+
+```
+  // And similarly for u8.
+  svint8_t svreadz_hor_za8_s8(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u16, bf16 and f16.
+  svint16_t svreadz_hor_za16_s16(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u32 and f32.
+  svint32_t svreadz_hor_za32_s32(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u64 and f64.
+  svint64_t svreadz_hor_za64_s64(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
+  svint8_t svreadz_hor_za128_s8(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u8.
+  svint8_t svreadz_ver_za8_s8(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u16, bf16 and f16.
+  svint16_t svreadz_ver_za16_s16(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u32 and f32.
+  svint32_t svreadz_ver_za32_s32(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for u64 and f64.
+  svint64_t svreadz_ver_za64_s64(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  // And similarly for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
+  svint8_t svreadz_ver_za128_s8(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+```
+
+#### MOVAZ (tile to vector, multiple)
+
+Move and zero multiple ZA tile slices to vector registers
+
+``` c
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x2_t svreadz_hor_za8_s8_vg2(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x4_t svreadz_hor_za8_s8_vg4(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x2_t svreadz_ver_za8_s8_vg2(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x4_t svreadz_ver_za8_s8_vg4(uint64_t tile, uint32_t slice)
+    __arm_streaming __arm_inout("za");
+```
+
+#### MOVAZ (array to vector)
+
+Move and zero multiple ZA single-vector groups to vector registers
+
+```
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x2_t svreadz_za8_s8_vg1x2(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+
+  // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
+  // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
+  // _za64_s64, _za64_u64 and _za64_f64
+  svint8x4_t svreadz_za8_s8_vg1x4(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+```
+
+#### ZERO (vector groups)
+
+Zero ZA vector groups
+
+```
+  void svzero_za64_vg1x2(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg1x4(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg2x1(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg2x2(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg2x4(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg4x1(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg4x2(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+
+  void svzero_za64_vg4x4(uint32_t slice)
+    __arm_streaming __arm_inout("za");
+```
+
 ### Streaming-compatible versions of standard routines
 
 ACLE provides the following streaming-compatible functions,