Add an HLO backend for LLM models #775

dacorvo · 2025-02-04T15:52:46Z

What does this PR do?

This backend is a backport of features first implemented in the transformers-neuronx package from the
AWS Neuron SDK.

As the original transformers-neuronx implementation, it relies on XLA High Level Operations (HLO)
as the compiled language for implementing Neuron optimized transformer decoder classes.
More specifically, it uses a syntax called “PyHLO”, name of a Neuron internal tool for writing/compiling the HLO language in Python.

See backends\hlo\README.md for details.

The Llama, Granite and Qwen2 models previously using transformers-neuronx are now using this new backend directly.

This backend is a backport of features previously implemented in the AWS Neuron SDK transformers-neuronx package.

When using the new HLO backend, the graphs will be slightly modified, so we bump the dev version to avoid trying to reuse the cached test artifacts from the previous dev version.

The name of the class is confusing as there are already NeuronConfig classes in AWS Neuron SDK (both in NxDI and TnX).

These tests are taking some time, so it is better to have them separated.

HuggingFaceDocBuilderDev · 2025-02-05T13:04:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tengomucho

I only had an overview to the backend implementation, the rest LGTM

tengomucho · 2025-02-05T14:11:21Z

optimum/exporters/neuron/__main__.py

@@ -724,7 +730,7 @@ def main():
        submodels = None
    else:
        input_shapes, neuron_config_class = get_input_shapes_and_config_class(task, args)
-        if NeuronDecoderConfig in inspect.getmro(neuron_config_class):
+        if NeuronDecoderExportConfig in inspect.getmro(neuron_config_class):


if !is_transformers_neuronx_available() then NeuronDecoderExportConfig will not be defined.

The HF_ENDPOINT variable is not always taken into account when using the huggingface_hub client depending on the order of imports. This modifies the tests to create temporary dorectories under the testing user account instead.

dacorvo added 5 commits February 4, 2025 13:53

fix(examples): use max_new_tokens for generation

1eabddd

feat: add new HLO backend

53e8a2d

This backend is a backport of features previously implemented in the AWS Neuron SDK transformers-neuronx package.

chore: bump dev version

e018422

When using the new HLO backend, the graphs will be slightly modified, so we bump the dev version to avoid trying to reuse the cached test artifacts from the previous dev version.

refactor(exporter): rename NeuronConfig

0cc691f

The name of the class is confusing as there are already NeuronConfig classes in AWS Neuron SDK (both in NxDI and TnX).

refactor(models): use new HLO backend

4c624fc

dacorvo mentioned this pull request Feb 4, 2025

feat: Add support for phi4 #764

Open

dacorvo marked this pull request as ready for review February 5, 2025 11:10

dacorvo requested review from JingyaHuang, tengomucho and michaelbenayoun February 5, 2025 11:10

dacorvo force-pushed the add_hlo_backend branch from 370867d to e415a8a Compare February 5, 2025 12:47

dacorvo added 2 commits February 5, 2025 13:00

ci: isolate LLM tests

1128d75

These tests are taking some time, so it is better to have them separated.

ci: bump upload-artifact action

c510221

dacorvo force-pushed the add_hlo_backend branch from e415a8a to c510221 Compare February 5, 2025 13:00

tengomucho approved these changes Feb 5, 2025

View reviewed changes

dacorvo added 2 commits February 6, 2025 11:04

test: do not use staging for cache tests

830c553

The HF_ENDPOINT variable is not always taken into account when using the huggingface_hub client depending on the order of imports. This modifies the tests to create temporary dorectories under the testing user account instead.

review: avoid error if transformers-neuronx is not installed

9299f94

dacorvo force-pushed the add_hlo_backend branch from 836271c to 9299f94 Compare February 6, 2025 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an HLO backend for LLM models #775

Add an HLO backend for LLM models #775

dacorvo commented Feb 4, 2025 •

edited by tengomucho

Loading

HuggingFaceDocBuilderDev commented Feb 5, 2025

tengomucho left a comment

tengomucho Feb 5, 2025

Add an HLO backend for LLM models #775

Are you sure you want to change the base?

Add an HLO backend for LLM models #775

Conversation

dacorvo commented Feb 4, 2025 • edited by tengomucho Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 5, 2025

tengomucho left a comment

Choose a reason for hiding this comment

tengomucho Feb 5, 2025

Choose a reason for hiding this comment

dacorvo commented Feb 4, 2025 •

edited by tengomucho

Loading