Fix FP8 torchao default config with padding and FSDP2 all-gather support #3831

shimizust · 2025-11-03T18:15:34Z

What does this PR do?

If you use accelerate with transformers-based Trainers, using config-file based approach to launching the training job, fp8 using torchao doesn't work properly
This PR sets reasonable defaults if use of torchao is specified via accelerate configs, specifically enable_fsdp_float8_all_gather=True and pad_inner_dim=True
Added ability to set these params by CLI or accelerate configs

mixed_precision: fp8
fp8_config:
  backend: AO

or

fp8_config:
  backend: AO
  pad_inner_dim: true
  enable_fsdp_float8_all_gather: true

Fixes #3830

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?
Ran pytest tests/test_fp8.py -v successfully

Who can review?

Command Line Interface: @SunMarc @zach-huggingface

SunMarc

Thanks a lot for the changes, really appreciate it ! Left a few minor comments to make it better !

SunMarc · 2025-11-13T14:56:32Z

src/accelerate/utils/dataclasses.py

+            The configuration for the FP8 training. If `None`, a default config will be created with sensible
+            defaults for most use cases:
+            - `pad_inner_dim=True`: Pads matrix dimensions to be divisible by 16, required for `torch._scaled_mm`
+              operations to prevent runtime errors.
+            - `enable_fsdp_float8_all_gather=True`: Enables FP8 all-gather for FSDP2. This provides memory bandwidth
+              savings by casting parameters before the all-gather operation, saving 50% bandwidth compared to BF16.
+            
+            You can override these defaults by providing your own `Float8LinearConfig` instance.


Nice, maybe we can also allow users to easily change that with env var + update the cluster.py file which is responsible of the behavior of accelerate config ? Here's a PR that should help with the changes: #2983

env_prefix = "ACCELERATE_FP8_" enable_fsdp_float8_all_gather = os.environ.get(env_prefix + "ENABLE_FSDP_FLOAT_ALL_GATHER", True) pad_inner_dim = os.environ.get(env_prefix + "PAD_INNER_DIM", True)

Makes sense, thanks for the reference. Will add that

Updated @SunMarc

Hi @SunMarc let me know if there's anything else needed

nope everything looks good !

src/accelerate/accelerator.py

HuggingFaceDocBuilderDev · 2025-11-19T13:09:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

That's really nice, thanks for fixing this !

shimizust · 2025-12-02T18:34:33Z

@SunMarc Sorry, had to fix style issue. Can you re-approve?

shimizust marked this pull request as ready for review November 3, 2025 18:30

SunMarc reviewed Nov 13, 2025

View reviewed changes

shimizust added 5 commits November 27, 2025 01:38

Fixed

d2473cf

Added some sensible default

c8ad1bf

Cleaned up

ca69227

Cleanup

ee8011d

added torchao params as cli launch params

937b6ea

shimizust force-pushed the sshimizu/fp8-fix branch from d8f018f to 937b6ea Compare November 27, 2025 02:36

shimizust added 5 commits November 27, 2025 02:47

Fixed aorecipekwargs

664d3c9

Fixed import

db37096

test

5260b1b

Updated docs

965cdc4

fixed naming

9bb447a

SunMarc approved these changes Dec 2, 2025

View reviewed changes

Fixed make style

3a2f301

SunMarc merged commit 75983a5 into huggingface:main Dec 3, 2025
24 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix FP8 torchao default config with padding and FSDP2 all-gather support #3831

Fix FP8 torchao default config with padding and FSDP2 all-gather support #3831

shimizust commented Nov 3, 2025 •

edited

Loading

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Nov 13, 2025

Uh oh!

shimizust Nov 16, 2025

Uh oh!

shimizust Nov 27, 2025

Uh oh!

shimizust Dec 1, 2025 •

edited

Loading

Uh oh!

SunMarc Dec 2, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 19, 2025

Uh oh!

SunMarc left a comment

Uh oh!

shimizust commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix FP8 torchao default config with padding and FSDP2 all-gather support #3831

Fix FP8 torchao default config with padding and FSDP2 all-gather support #3831

Conversation

shimizust commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 19, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

shimizust commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shimizust commented Nov 3, 2025 •

edited

Loading

shimizust Dec 1, 2025 •

edited

Loading