RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS #5155

nikhil-arm · 2025-02-27T18:00:06Z

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS

Author: Nikhil Gupta
Date: 2025-02-27
Status: Proposal / Draft

1. Abstract

This RFC proposes the addition of two new BLAS routines—BGEMM and BGEMV—to the OpenBLAS project. These routines perform matrix-matrix multiplication and matrix-vector multiplication, respectively, entirely in BFloat16 (BF16) precision. Unlike the existing sbgemm operation—which consumes BF16 inputs but produces FP32 outputs—the new operations will produce BF16 outputs. For architectures that lack native BF16 multiply–accumulate instructions with BF16 outputs, the implementation may perform accumulation in FP32 and subsequently convert the results to BF16.

2. Motivation

Increased Precision Consistency: Many modern deep learning applications rely on BF16 for both inputs and outputs to reduce memory bandwidth and storage costs while maintaining adequate precision.
Hardware Evolution: With the ongoing evolution of hardware architectures, future systems might offer native BF16 operations. The proposed routines will provide a cleaner pathway to leverage such advancements without altering the API.
Performance Optimization: A dedicated BF16 routine can be optimized for architectures that either support native BF16 arithmetic or that can benefit from mixed-precision strategies (e.g., FP32 accumulation with BF16 conversion).

3. Proposed Changes

Introduce two new BLAS routines into OpenBLAS:

BGEMM: Performs matrix-matrix multiplication on BF16 matrices.
BGEMV: Performs matrix-vector multiplication on BF16 matrices.

Both routines will:

Accept BF16 input matrices/vectors.
Use BF16 scalars for scaling factors (alpha and beta).
Return the result in BF16 precision.
Internally, if necessary, perform FP32 accumulation followed by a conversion to BF16.

4. Proposed API Signatures

Below are the example of proposed function signatures, modeled after the existing BLAS conventions:

BGEMM

void bgemm_(char *transa, char *transb, blasint *m, blasint *n, blasint *k,
                         bfloat16 *alpha,
                         const bfloat16 *a, blasint *lda,
                         const bfloat16 *b, blasint *ldb,
                         bfloat16 *beta,
                         bfloat16 *c, blasint *ldc);

The text was updated successfully, but these errors were encountered:

martin-frbg · 2025-02-27T19:12:33Z

No objection from me, in fact I remember that BGEMM was requested a few months ago (may have been in a discussion rather than an issue ticket though) - Edit: was part of the discussion in #4707 last summer

nikhil-arm · 2025-02-27T19:27:24Z

No objection from me, in fact I remember that BGEMM was requested a few months ago (may have been in a discussion rather than an issue ticket though) - Edit: was part of the discussion in #4707 last summer

Thanks for your inputs @martin-frbg .
We'll tape out a reference implementation and a tests around new feature. Then we will add optimized kernel for aarch64.

conradsnicta · 2025-03-03T00:46:06Z

@nikhil-arm In the proposed bgemm_() function, shouldn't everything be marked as const, except for bfloat16 *c ?

Only c is written to. All other variables are read-only. This should be explicitly described as such in the interface.

Proposed API Signatures

Below are the example of proposed function signatures, modeled after the existing BLAS conventions:
BGEMM

void bgemm_(char *transa, char *transb, blasint *m, blasint *n, blasint *k,
                         bfloat16 *alpha,
                         const bfloat16 *a, blasint *lda,
                         const bfloat16 *b, blasint *ldb,
                         bfloat16 *beta,
                         bfloat16 *c, blasint *ldc);

nikhil-arm · 2025-03-03T10:07:42Z

@nikhil-arm In the proposed bgemm_() function, shouldn't everything be marked as const, except for bfloat16 *c ?

Only c is written to. All other variables are read-only. This should be explicitly described as such in the interface.
Proposed API Signatures

Below are the example of proposed function signatures, modeled after the existing BLAS conventions:
BGEMM
void bgemm_(char *transa, char *transb, blasint *m, blasint *n, blasint *k,
                         bfloat16 *alpha,
                         const bfloat16 *a, blasint *lda,
                         const bfloat16 *b, blasint *ldb,
                         bfloat16 *beta,
                         bfloat16 *c, blasint *ldc);

Thanks for you input @conradsnicta . This makes sense, we will try to accommodate this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS #5155

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS #5155

nikhil-arm commented Feb 27, 2025 •

edited

Loading

martin-frbg commented Feb 27, 2025 •

edited

Loading

nikhil-arm commented Feb 27, 2025

conradsnicta commented Mar 3, 2025

nikhil-arm commented Mar 3, 2025

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS #5155

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS #5155

Comments

nikhil-arm commented Feb 27, 2025 • edited Loading

RFC: Introduction of BGEMM and BGEMV for BFloat16 Matrix Operations in OpenBLAS

1. Abstract

2. Motivation

3. Proposed Changes

4. Proposed API Signatures

BGEMM

martin-frbg commented Feb 27, 2025 • edited Loading

nikhil-arm commented Feb 27, 2025

conradsnicta commented Mar 3, 2025

nikhil-arm commented Mar 3, 2025

nikhil-arm commented Feb 27, 2025 •

edited

Loading

martin-frbg commented Feb 27, 2025 •

edited

Loading