Skip to content

[TASK] Add A New Quant Ball for FP32-MXFP Conversion #26

@shirohasuki

Description

@shirohasuki

Deliverables

  • Add an MXFP ball RTL implementation in the prototype lib (under the arch path).
  • A Pull Request (PR) containing a test written in C for this operation and a README to introduce your design.
  • Report the performance results in this issue.

Task Description

  • MXFP is a lower-precision floating-point representation designed to reduce data size and simplify computations in the following process. Using MXFP can improve throughput and hardware efficiency in bandwidth-sensitive workloads, while still maintaining acceptable numerical quality for many ML scenarios.
  • You can learn this format and its variants, starting from this paper, "With Shared Microexponents, A Little Shifting Goes a Long Way".
  • As we envisage, an FP32 matrix will be loaded into the banks, and then a your customised MXFP instruction will read the data from one bank into the ball you are to implement, before outputting it to another bank.
  • You can refer to the previous Pull Request (Completed the development of ReluBall and further improved the operation manual #6) for the detailed implementation.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions