Skip to content

te_llama.py example fails with transformers 4.57+ #2567

@sbhavani

Description

@sbhavani

Describe the bug

The te_llama.py example in docs/examples/te_llama/ fails with an AttributeError when used with recent versions of HuggingFace transformers (4.57.3). The TELlamaDecoderLayer.forward() method receives hidden_states as a tuple instead of a tensor, causing the error when TransformerLayer tries to call .contiguous() on it. This appears to be a compatibility issue between the example code and newer versions of the transformers library.

Steps/Code to reproduce bug

  1. Start the NVIDIA PyTorch container:

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.08-py3

  1. Clone TransformerEngine and navigate to the example:
git clone https://github.com/NVIDIA/TransformerEngine.git
cd TransformerEngine/docs/examples/te_llama
  1. Install required dependencies:

pip install accelerate datasets peft

  1. Run the tutorial_accelerate_hf_llama_with_te.ipynb notebook, specifically the cells that call init_te_llama_model().

Error message

File "/workspace/TransformerEngine/docs/examples/te_llama/te_llama.py", line 76, in forward
    super().forward(
File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/transformer.py", line 700, in forward
    hidden_states = hidden_states.contiguous()
                    ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'contiguous'

Expected behavior

The TELlamaDecoderLayer should handle hidden_states input correctly and complete the forward pass without errors.

Environment overview

nvcr.io/nvidia/pytorch:25.08-py3

Environment details

Component Version
Transformer Engine 2.5.0+f05f12c
transformers (HuggingFace) 4.57.3
Python 3.12

Additional context

• The error suggests that hidden_states is arriving as a tuple rather than a tensor in the TELlamaDecoderLayer.forward() method.
• This may be related to changes in how HuggingFace transformers handles decoder layer outputs internally in newer versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions