Add BlackForest Flux Support by JingyaHuang · Pull Request #815 · huggingface/optimum-neuron

JingyaHuang · 2025-03-24T15:14:58Z

What does this PR do?

Compilation

Export of Flux pipeline (TP=8 for Flux transformer 2D) with neuronx_distributed.trace.model_builder.ModelBuilder

Export via CLI

Regular

optimum-cli export neuron --model black-forest-labs/FLUX.1-dev --tensor_parallel_size 8 --batch_size 1 --height 768 --width 1360 --num_images_per_prompt 1 --torch_dtype bfloat16 flux_neuron/

Tiny test

optimum-cli export neuron --model hf-internal-testing/tiny-flux-pipe-gated-silu --tensor_parallel_size 2 --batch_size 1 --height 8 --width 8 --num_images_per_prompt 1 --sequence_length 256 --torch_dtype bfloat16 tiny_flux_neuron/

Export with `NeuronFluxPipeline` API

from optimum.neuron import NeuronFluxPipeline

if __name__ == "__main__":
    compiler_args = {"auto_cast": "none"}
    input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}

    pipe = NeuronFluxPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        torch_dtype=torch.bfloat16,
        export=True,
        tensor_parallel_size=8,
        # disable_neuron_cache=True,
        **compiler_args,
        **input_shapes
    )

    # Save locally
    pipe.save_pretrained("flux_dev_neuron_1024_tp8/")

    # Upload to the HuggingFace Hub
    pipe.push_to_hub(
        "flux_dev_neuron_1024_tp8/", repository_id="Jingya/FLUX.1-dev-neuronx-1024x1024-tp8"  # Replace with your HF Hub repo id
    )

You can find an example of compiled artifacts here (Jingya/flux.1-dev_neuronx_tp8 )

Inference

Flux Inference

For generating an image with NeuronFluxPipeline:

from optimum.neuron import NeuronFluxPipeline

pipe = NeuronFluxPipeline.from_pretrained("flux_neuron")
prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt,
    guidance_scale=3.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
out.save("flux_optimum.png")

Other

Tests
Doc

cc. @yahavb

HuggingFaceDocBuilderDev · 2025-03-25T13:16:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-04-10T08:05:27Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

…ron into add-flux-support

github-actions · 2025-05-09T08:05:36Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

…ron into add-flux-support

dacorvo

Massive pull-request: impressive !!!! I only have a few minor optional comments, and I think one test should be renamed.
Eventually, we should think about how some of the ModelBuilder related code can be reused for non-decoder models: it is unclear to me yet if there are some specifics that would prevent that though.

dacorvo · 2025-07-17T07:16:25Z

+            )
+            neuron_model = model_builder.trace(initialize_model_weights=False)
+
+            model_builder.shard_checkpoint(serialize_path=output.parent / "weights/")


Eventually this could be omitted: the weights can be sharded at loading time only, which makes export a lot faster. You only need to remember where the weights are (local dir or hub).

Do you mean we shard the weights during the loading time instead of after the tracing? Indeed, it increases the whole export time, but somehow, I would rather spend more time exporting one shot and have faster loading + warmup during the deployment.

Maybe in your case it is not that bad, because sharding does not mean loading the weights on the device, does it ?
It is just that you don't cache the sharded weights, so it is only useful when you want to push the exported model to the hub.
This is what I do for decoders:

when using the optimum-cli: export (cache or fetch NEFFs)

when using from_pretrained: export (cache or fetch NEFFs) + load_weights
For large models like llama 70B this makes a huge difference as loading weights takes several minutes.

na na, shard_checkpoint just shard the weights, then we either serialize things to disk like I do here or load to Neuron.

Yeah, I should definitly look into the neff cache part of ModelBuilder (not yet the case) and the work you have done that I could reuse. will do it next!

tengomucho

Few nits, otherwise LGTM!

tengomucho · 2025-07-17T08:13:44Z

+Flux is a series of text-to-image generation models based on diffusion transformers.
+
+> [!TIP]
+> We recommend using a `inf2.24xlarge` instance with tensor parallel size 8 for the model compilation and inference.


why 24x if we only do TP8?

Because inf2 jump from either you want to have just 1 neuron device, or you want to have 6

dacorvo

LGTM, thanks !

@tengomucho

# What does this PR do? This PR will allow Flux Kontext to be used for text2img by building upon the newly added Flux support here: #815 Note: This depends on `diffusers >0.34` and the following PR to be merged huggingface/diffusers#11985 @tengomucho @JingyaHuang --------- Co-authored-by: JingyaHuang <huang_jingya@outlook.com> Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

yahavb and others added 3 commits January 30, 2025 01:37

flux support prep1

0a62ed5

Merge branch 'main' into add-flux-support

64331a4

remove unused

71faae5

github-actions Bot added the Stale label Apr 10, 2025

Merge branch 'main' into add-flux-support

3f1a494

github-actions Bot removed the Stale label Apr 11, 2025

JingyaHuang added 6 commits April 14, 2025 15:00

neuron config and wrapper for flux transformer

d4ad2e2

Merge branch 'add-flux-support' of github.com:huggingface/optimum-neu…

7e3e03c

…ron into add-flux-support

trace fn refactoring

b30857c

Merge branch 'main' into add-flux-support

71acdd3

Merge branch 'main' into add-flux-support

97a7828

export all text encoders

8eaed1b

github-actions Bot added the Stale label May 9, 2025

prep flux transformer

7b5a156

github-actions Bot removed the Stale label May 14, 2025

JingyaHuang added 12 commits May 14, 2025 12:23

Merge branch 'main' into add-flux-support

2c18cae

move modeling files + modeling flux NxD

20b2e16

fix paths

06e8626

fix t5 tp size

fea8064

remove accelerate deprecated

308f2db

fix trainer doc

a9ad8e5

t5 export tested

ead7243

flux transformer fixed

3353421

no trace pos_embed

db36ec1

Merge branch 'add-flux-support' of github.com:huggingface/optimum-neu…

69cd94a

…ron into add-flux-support

no trace rotary emb

6d30e64

transformer tp 8 compilation done

5684993

JingyaHuang added 14 commits July 13, 2025 15:16

fix: MODEL_TYPE_TO_PEFT_MODEL_MAPPING path path

73008eb

fix: fix inputs order when guidance is None

ce66165

comment: gated silu trap

fed78b7

test: cli

a82c1f1

fix: style

f35ea2b

Merge branch 'add-flux-support' of github.com:huggingface/optimum-neu…

792b905

…ron into add-flux-support

Merge branch 'main' into add-flux-support

4070294

chore: remove changes in CIs

18c6ac2

fix: allow passing max seq len for t5

eaf0119

fix: move pixart specific

d8a46b8

test: inference test

e7958eb

removal: ci change

dc9527a

doc: add timestep distilled

5bbc684

fix: type hint

9462753

JingyaHuang marked this pull request as ready for review July 16, 2025 22:48

fix: independant test file

7746b1c

JingyaHuang requested review from dacorvo, michaelbenayoun and tengomucho July 17, 2025 07:03

dacorvo requested changes Jul 17, 2025

View reviewed changes

tengomucho approved these changes Jul 17, 2025

View reviewed changes

JingyaHuang added 2 commits July 17, 2025 09:31

review: typo

149931a

review: improve flux test

da1f8ac

dacorvo approved these changes Jul 17, 2025

View reviewed changes

tengomucho approved these changes Jul 17, 2025

View reviewed changes

JingyaHuang merged commit b898cab into main Jul 17, 2025
8 checks passed

JingyaHuang deleted the add-flux-support branch July 17, 2025 15:38

This was referenced Jul 24, 2025

feat: add flux kontext huggingface/diffusers#11985

Merged

feat: Support flux kontext for text2img #916

Merged

Conversation

JingyaHuang commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Compilation

Export via CLI

Export with NeuronFluxPipeline API

Inference

Other

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

github-actions Bot commented Apr 10, 2025

Uh oh!

github-actions Bot commented May 9, 2025

Uh oh!

dacorvo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dacorvo Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

dacorvo Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tengomucho left a comment

Choose a reason for hiding this comment

Uh oh!

tengomucho Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

tengomucho Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dacorvo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JingyaHuang commented Mar 24, 2025 •

edited

Loading

Export with `NeuronFluxPipeline` API

dacorvo left a comment •

edited

Loading

dacorvo Jul 17, 2025 •

edited

Loading