-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to cast 16/32-bit to FP8? #965
Comments
The easiest approach is to use native PyTorch FP8 dtypes: x = torch.randn(128, device="cuda", dtype=torch.float32)
y = x.to(dtype=torch.float8_e4m3fn) # or torch.float8_e5m2 You could also use scale = torch.ones(1, device="cuda", dtype=torch.float32)
y1 = te.Float8Tensor.to_float8(x)
y2 = float8_experimental.Float8Tensor.to_float8(x, scale, torch.float8_e4m3fn) These classes are based on each other and they have some nice convenience features (support for scaling factors, casting to higher precision for ops that don't support FP8, Finally, you could directly use the FP8 kernels from Transformer Engine: y = te.cpp_extensions.cast_to_fp8(
x,
fp8_meta,
0,
transformer_engine_torch.DType.kFloat8E4M3,
) I strongly advise against using these internal functions though. Their APIs are unstable, messy, and tightly integrated with TE's logic for computing FP8 scaling factors. |
Thanks @timmoon10. |
If you just want the performance benefit of FP8 matmuls, I recommend using Transformer Engine modules (like If you want more control, you'll have to get a bit into the weeds. I'm not sure if native PyTorch FP8 tensors support matmuls (even if they did, there would be numerical issues without FP8 scaling factors), but I see that |
Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.
The text was updated successfully, but these errors were encountered: