unprivileged/integrated-matrix: Align C tile layout with authoritativ…#40
Merged
joseemoreira merged 1 commit intomainfrom May 2, 2026
Merged
Conversation
…e row-major prose The geometry-section prose at lines 88-90 establishes that the C tile is row-major in vector registers. Several artefacts elsewhere in the spec treated C as column-major in registers, contradicting that authoritative position: - Per-instruction descriptions (11 places across vmmacc.vv, vwmmacc.vv, vqwmmacc.vv, v8wmmacc.vv, vfmmacc.vv, vfwmmacc.vv, vfqwmmacc.vv, vf8wmmacc.vv, vfwimmacc.vv, vfqwimmacc.vv, vf8wimmacc.vv) all said "_vd_ as an M x N accumulator tile (column-major register layout)". Changed to "row-major register layout". - Instruction-selection table at lines 1322-1327 paired the order-preserving load/store (vmtl.v / vmts.v) with column-major C and the transposing variants with row-major C. Swapped so order- preserving maps row-major-in-registers to row-major-in-memory and the transposing variants are used for column-major-in-memory C. - GEMM example at lines 1336-1380 stored C column-major in memory using the order-preserving vmts.v. Rewritten to store C row-major in memory, with the index expression and stride-comment updated accordingly (i*ldc + j; ldc = N_total). - SAIL mat_C_idx already used i * M + j with comment "C is stored row-major" - numerically correct (C is square so M = N), but conceptually misleading. Renamed the third parameter to N (the row stride) and updated all four callers (int_gemm, fp_gemm, fp_scaled_gemm, int_scaled_gemm) to pass g.N instead of g.M. No behaviour change. The out-of-tree QEMU helper, the JIT path, and the example test references in examples/src/test-*.cxx remain non-conforming and are tracked separately.
joseemoreira
approved these changes
May 2, 2026
Collaborator
joseemoreira
left a comment
There was a problem hiding this comment.
Pretty extensive but regular changes.
I am a bit surprise that tests passed, but I guess the confusion was "consistent" across the load/store and computational instructions. It would have been caught in an application-level test.
Collaborator
Author
|
In fact, the application-level tests assumed the wrong layout (given that the insns specifie it this way). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…e row-major prose
The geometry-section prose at lines 88-90 establishes that the C tile is row-major in vector registers. Several artefacts elsewhere in the spec treated C as column-major in registers, contradicting that authoritative position:
Per-instruction descriptions (11 places across vmmacc.vv, vwmmacc.vv, vqwmmacc.vv, v8wmmacc.vv, vfmmacc.vv, vfwmmacc.vv, vfqwmmacc.vv, vf8wmmacc.vv, vfwimmacc.vv, vfqwimmacc.vv, vf8wimmacc.vv) all said "vd as an M x N accumulator tile (column-major register layout)". Changed to "row-major register layout".
Instruction-selection table at lines 1322-1327 paired the order-preserving load/store (vmtl.v / vmts.v) with column-major C and the transposing variants with row-major C. Swapped so order- preserving maps row-major-in-registers to row-major-in-memory and the transposing variants are used for column-major-in-memory C.
GEMM example at lines 1336-1380 stored C column-major in memory using the order-preserving vmts.v. Rewritten to store C row-major in memory, with the index expression and stride-comment updated accordingly (i*ldc + j; ldc = N_total).
SAIL mat_C_idx already used i * M + j with comment "C is stored row-major" - numerically correct (C is square so M = N), but conceptually misleading. Renamed the third parameter to N (the row stride) and updated all four callers (int_gemm, fp_gemm, fp_scaled_gemm, int_scaled_gemm) to pass g.N instead of g.M. No behaviour change.
The out-of-tree QEMU helper, the JIT path, and the example test references in examples/src/test-*.cxx remain non-conforming and are tracked separately.