Commit eb2e37e
authored
intrinsics: align spec with the GCC implementation surface (#41)
* unprivileged/integrated-matrix: Document __riscv_ime_vlen / __riscv_ime_lambda geometry queries
Add a normative subsection for the two geometry-query intrinsics that
GCC and Clang already implement to enable runtime VLEN/lambda detection:
size_t __riscv_ime_vlen (void);
size_t __riscv_ime_lambda (void);
Both fold to compile-time constants when VLEN is statically known
(-mrvv-vector-bits=zvl) and otherwise emit a small runtime sequence
(csrr vlenb + shift, or csrr vlenb + ctz + shift respectively).
These intrinsics are the supported way for software to discover the
implementation's tile geometry without parsing CSR fields directly,
and are the building blocks for the runtime-dispatch pattern described
in the existing VLEN-portable code subsection.
A note clarifies that __riscv_ime_lambda returns a single representative
value; software that needs to enumerate the WARL set must still use
vsetvl write-readback.
* unprivileged/integrated-matrix: Document _scaled and _bs{N} qualifiers for microscaled intrinsics
Microscaled multiply-accumulate intrinsics carry two qualifiers that
the existing intrinsics section did not name:
_scaled - distinguishes the MX-scaled form of vfwmmacc / vfqmmacc /
vf8wmmacc from their unscaled siblings.
_bs{N} - selects the block size (16 or 32). Applies to all MX
intrinsics, including the integer-input ones (vfwimmacc /
vfqimmacc / vf8wimmacc), which exist only in the
microscaled form and therefore do not carry _scaled.
Add a new subsection "Microscaled multiply-accumulate intrinsics"
between the FP multiply-accumulate prototypes and the VLEN-portable
code discussion. The subsection extends the canonical-suffix grammar
already defined in the intrinsics overview, lists representative
prototypes for each (FP, INT)-input case and each block size, and
confirms that the MX scale format is implied by the input data type
(no separate scale-format selector is needed).
Aligns the spec with what GCC and Clang already emit.
* unprivileged/integrated-matrix: Add masked tile load/store intrinsics and clarify _L{N} / _m orthogonality
Tile load/store intrinsics support an optional mask through the _m
suffix (the same convention as base V-extension load/store). The
canonical suffix order is updated to allow _m as the final qualifier,
and the spec explicitly states that _L{N} and _m are orthogonal and
may be combined as _L{N}_m.
Add representative masked prototypes for each of the four mnemonics
(vmtl.v, vmts.v, vmttl.v, vmtts.v) and a combined-suffix example
(vmtl_v_i8m1_L4_m, vmts_v_i8m1_L4_m).
The mask bit width follows V's convention: vbool{N}_t where N matches
the data element width (vbool8_t for i8, vbool32_t for i32, etc.).
Closes a gap between the spec and what GCC emits today
(test: zvmma-tile-masked.c, zvmma-ofp8-tile-imm-lambda.c).
* unprivileged/integrated-matrix: Add typed OFP8/OFP4/Int4/BF16 tile load/store intrinsics
Extend the tile load/store intrinsic table to cover the alternate-format
input vector types (OFP8 E4M3 / E5M2, OFP4 E2M1, signed/unsigned Int4,
and BFloat16) so that input tiles can be loaded and stored without an
intervening vreinterpret.
Element widths match the underlying storage:
- 8 bits for OFP8
- 4 bits for OFP4 / Int4
- 16 bits for BF16
Base pointer type is uint8_t * for OFP8, OFP4, and Int4 (since these are
packed into byte-addressable memory), and __bf16 * for BF16.
The note clarifies that the masking (_m) and immediate-lambda (_L{N})
qualifiers extend to these alternate-format intrinsics on the same
orthogonal basis as for the IEEE FP and standard-int types, and that
the same expansion applies to the transposing variants vmttl.v / vmtts.v.
Closes a gap between the spec and what GCC emits today (gap #14;
commits 561cbb5a and 4d74ffa7 in vrull/ime-intrinsics).
* unprivileged/integrated-matrix: Document three-token mixed names and _su_lm{N} examples
Two intrinsic-naming patterns were permitted by the spec grammar but
never illustrated, leaving the surface ambiguous in practice:
1. Three-token long-form names arise when both altfmt_A and altfmt_B
differ from the default *and* the accumulator type itself is the
alternative encoding (e.g. BF16 from vfwmmacc.vv, or non-default
OFP8 accumulator from non-widening vfmmacc.vv). Add three concrete
examples (vfwmmacc bf16<-E4M3xE5M2; vfmmacc OFP8<-E4M3xE5M2;
matching overloaded short form) and an explanatory paragraph that
states the token order: accumulator first, then A and B input
types in order.
2. The _su/_us mixed-sign suffix and the _lm{N} LMUL suffix combine
in the canonical order _su|_us followed by _lm{N}. Add three
examples covering vmmacc / vwmmacc / vqwmmacc, both _su and _us,
and LMUL = 2, 4, 8.
Both patterns are already implemented and tested in GCC; the spec now
shows them explicitly so users do not have to derive them from the
suffix grammar at line 1448.1 parent e9ce69a commit eb2e37e
1 file changed
Lines changed: 204 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1446 | 1446 | | |
1447 | 1447 | | |
1448 | 1448 | | |
1449 | | - | |
| 1449 | + | |
1450 | 1450 | | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
1451 | 1454 | | |
1452 | 1455 | | |
1453 | 1456 | | |
| |||
1491 | 1494 | | |
1492 | 1495 | | |
1493 | 1496 | | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| 1519 | + | |
| 1520 | + | |
| 1521 | + | |
| 1522 | + | |
| 1523 | + | |
| 1524 | + | |
| 1525 | + | |
| 1526 | + | |
| 1527 | + | |
| 1528 | + | |
| 1529 | + | |
1494 | 1530 | | |
1495 | 1531 | | |
1496 | 1532 | | |
| |||
1542 | 1578 | | |
1543 | 1579 | | |
1544 | 1580 | | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
1545 | 1618 | | |
1546 | 1619 | | |
1547 | 1620 | | |
| |||
1559 | 1632 | | |
1560 | 1633 | | |
1561 | 1634 | | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
1562 | 1667 | | |
1563 | 1668 | | |
1564 | 1669 | | |
| |||
1658 | 1763 | | |
1659 | 1764 | | |
1660 | 1765 | | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
| 1780 | + | |
| 1781 | + | |
1661 | 1782 | | |
1662 | 1783 | | |
1663 | 1784 | | |
| |||
1763 | 1884 | | |
1764 | 1885 | | |
1765 | 1886 | | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
1766 | 1896 | | |
1767 | 1897 | | |
1768 | 1898 | | |
1769 | 1899 | | |
1770 | 1900 | | |
| 1901 | + | |
| 1902 | + | |
1771 | 1903 | | |
1772 | 1904 | | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
| 1911 | + | |
1773 | 1912 | | |
1774 | 1913 | | |
1775 | 1914 | | |
| |||
1793 | 1932 | | |
1794 | 1933 | | |
1795 | 1934 | | |
| 1935 | + | |
| 1936 | + | |
| 1937 | + | |
| 1938 | + | |
| 1939 | + | |
| 1940 | + | |
| 1941 | + | |
| 1942 | + | |
| 1943 | + | |
| 1944 | + | |
| 1945 | + | |
| 1946 | + | |
| 1947 | + | |
| 1948 | + | |
| 1949 | + | |
| 1950 | + | |
| 1951 | + | |
| 1952 | + | |
| 1953 | + | |
| 1954 | + | |
| 1955 | + | |
| 1956 | + | |
| 1957 | + | |
| 1958 | + | |
| 1959 | + | |
| 1960 | + | |
| 1961 | + | |
| 1962 | + | |
| 1963 | + | |
| 1964 | + | |
| 1965 | + | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
| 1982 | + | |
| 1983 | + | |
| 1984 | + | |
| 1985 | + | |
| 1986 | + | |
| 1987 | + | |
| 1988 | + | |
| 1989 | + | |
| 1990 | + | |
| 1991 | + | |
| 1992 | + | |
| 1993 | + | |
| 1994 | + | |
| 1995 | + | |
| 1996 | + | |
| 1997 | + | |
| 1998 | + | |
1796 | 1999 | | |
1797 | 2000 | | |
1798 | 2001 | | |
| |||
0 commit comments