This repository has been archived by the owner on Aug 7, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a Float8LinearInference module to support static, dynamic, and wo…
… quant (#287) Summary: # Perf script: https://gist.github.com/drisspg/f7a553710d64cce013227a2249d582d2 ## Performance In eager this produces: | Operation | Time (μs) | |-----------------------------------|------------| | bf16 | 2667.9172 | | fp8_dynamic_activations | 2494.7294 | | fp8_static_activations | 2449.1784 | | fp8_weight_only_activations | 4084.7190 | With compile this produces: | Operation | Time (μs) | |------------------------------|------------| | bf16 | 2547.1938 | | fp8_dynamic_activations | 1542.0729 | | fp8_static_activations | 1407.0310 | | fp8_weight_only_activations | 2750.6369 | ## UX #### Dynamic activation quantization ``` Python original_mlp = FeedForward().to("cuda", dtype=dtype) original_mlp.reset_parameters() dynamic_fp8_mlp = copy.deepcopy(original_mlp) quant_config = QuantConfig(ActivationCasting.DYNAMIC) quantize_to_float8(dynamic_fp8_mlp, quant_config) ``` #### Static activation quantization ```Python original_mlp = FeedForward().to("cuda", dtype=dtype) original_mlp.reset_parameters() static_fp8_mlp = copy.deepcopy(original_mlp) quant_config = QuantConfig( ActivationCasting.STATIC, static_quantization_scale=torch.tensor( [1.0], device="cuda", dtype=torch.float32 ), ) quantize_to_float8(static_fp8_mlp, quant_config) ``` #### Weight Only quantization ``` Python original_mlp = FeedForward().to("cuda", dtype=dtype) original_mlp.reset_parameters() wo_fp8_mlp = copy.deepcopy(original_mlp) quant_config = QuantConfig(ActivationCasting.WEIGHT_ONLY) quantize_to_float8(wo_fp8_mlp, quant_config) ``` All of these are using Per-Tensor scaling will add in a follow up PR row-wise scaling and likely make this the default. Pull Request resolved: #287 Reviewed By: vkuzo Differential Revision: D59179113 Pulled By: drisspg fbshipit-source-id: 7938efbcbc51109d2ff7261275ca04d1b90732d3
- Loading branch information
1 parent
0b60496
commit 36405a7
Showing
15 changed files
with
559 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,6 @@ | |
# LICENSE file in the root directory of this source tree. | ||
|
||
import collections | ||
import json | ||
import re | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.