-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for bfloat16 #22
Comments
Hi Pavel, I took a quick glance at bfloat16. If I implement it, I think it would be in a separate project. There would have to be a convenient way for me to compare results with a hardware implementation. I'd like to be able to confirm 100% of float32<-->bfloat16 conversions. float16 was very convenient because the vm I use for coding had hardware instructions (F16C aka FP16C). |
Thank you for a quick response. Yes, it totally make sense to create bfloat16 as separate project (thought IMHO most of the code will be very similar). As far as I know, bfloat16 is supported in AVX-512 - VCVTNE2PS2BF16, VCVTNEPS2BF16 and |
Looks to me like bfloat16 conversion between float32 is a simple and fast shift: type BFloat16 uint16
func ToFloat32(x BFloat16) float32 {
return math.Float32frombits(uint32(x) << 16)
}
func FromFloat32(x float32) BFloat16 {
return BFloat16(math.Float32bits(x) >> 16)
}
func FromBits(u16 uint16) BFloat16 {
return BFloat16(u16)
}
func Bits(f BFloat16) uint16 {
return uint16(f)
}
func (f BFloat16) String() string {
return strconv.FormatFloat(float64(ToFloat32(f)), 'f', -1, 32)
} |
Support for bfloat16 is also requested in comments at #46 |
Is creating a bfloat16 package still in your plans ? I'm trying to port Gemma model to GoMLX, and everything seems to be I'm using your suggested code above, since the truth is most numeric computations happen in XLA/GPU anyway, but, it would be nice to source the type from the same owner 😄 |
As a temporary measure I created the simple Frustrating the patent trolling story with the bfloat16 ... not sure what to make out of it ... |
Thank you for making this very useful and well-tested library! Are you planning to add support for bfloat16 format, which is used in ML field? It has different bit widths for mantissa and exponent, but other rules are the same as in IEEE 754 formats.
The text was updated successfully, but these errors were encountered: