Skip to content

Conversation

@riccardofelluga
Copy link
Collaborator

What does this PR do?

After the latest update to transformer_engine library(2.10) this executor has stopped working and, since the TEv2 is now bringing the same functionality and more, it is time to move on :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be some script and/or test to make sure there isn't unexpected regressions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convergence, we do compare outputs between Thunder TE executor and vanilla TE in the tests already, but regarding performance there aren't any specific tests but the benchmarks. If we were to compare performance we can do so by using benchmark_litgpt.py in something like this:

# Using TE executor
python thunder/benchmarks/benchmark_litgpt.py --model_name Llama-2-7b-hf --compile thunder --checkpoint_activations False --low_precision_mode fp8-default-te --use_sdpa False
# Using TE without Thunder
python thunder/benchmarks/benchmark_litgpt.py  --model_name Llama-2-7b-hf --compile eager --checkpoint_activations False --low_precision_mode fp8-default-te --use_sdpa False

Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo 🎉, thanks @riccardofelluga

class Context:
"""Helper to use torch.autograd.Function as an implementation for a symbol.
This class provides a minimal interface for saving and retrieving tensors between
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for saving tensors and other metadata

@kshitij12345
Copy link
Collaborator

I still see CI job test_transformer_engine_v1_executor.py waiting and is tagged required. Is there something that needs to be changed from Github settings (as this PR removes the CI job from corresponding .yml)?

image

Copy link
Collaborator

@KaelanDt KaelanDt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @riccardofelluga

@KaelanDt KaelanDt merged commit 9642dee into main Nov 24, 2025
51 checks passed
@KaelanDt KaelanDt deleted the remove-te-v1-ex branch November 24, 2025 11:03
@riccardofelluga
Copy link
Collaborator Author

I still see CI job test_transformer_engine_v1_executor.py waiting and is tagged required. Is there something that needs to be changed from Github settings (as this PR removes the CI job from corresponding .yml)?

image

I've changed the CI file but maybe the CI will record that only after this PR 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants