Skip to content

Safetensors files for long-t5-tglobal models fail to load correctly #40635

@pszemraj

Description

@pszemraj

System Info

  • transformers version: 4.55.4
  • Platform: Linux-6.1.123+-x86_64-with-glibc2.35
  • Python version: 3.12.11
  • Huggingface_hub version: 0.34.4
  • Safetensors version: 0.6.2
  • Accelerate version: 1.10.1
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.8.0+cu126 (CUDA)
  • Tensorflow version (GPU?): 2.19.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.10.6 (gpu)
  • Jax version: 0.5.3
  • JaxLib version: 0.5.3
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: No
  • GPU type: NVIDIA L4

Who can help?

@ArthurZucker @younesbelkada @amyeroberts (model loading)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

desc/info

Original discussion: https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary/discussions/22

After several back-and-forth investigations and attempts at solutions, there is an issue with loading and running inference with long-t5 models in transformers that once worked. The "simplest" way to characterize this issue today is that weights are unable to be loaded from .safetensors files, causing garbage output.

repro

Run inference with any long-t5 model in safetensors format

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# This uses safetensors by default and produces incorrect behavior
model_id = "pszemraj/long-t5-tglobal-base-sci-simplify"
model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
)
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
)

# Test generation - model produces garbage output
text = "Summarize: The quick brown fox jumps over the lazy dog. " * 50
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# thea: and to of
# Output is corrupted/nonsensical

worth calling out that when the model is loaded it will display a warning telling you it will not use some weights:

Some weights of LongT5ForConditionalGeneration were not initialized from the model checkpoint at test_model and are newly initialized: ['decoder.embed_tokens.weight', 'encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

![NOTE]
I've put replication and expected functionality in this colab notebook

Expected behavior

The model should load correctly and produce coherent summaries. This works when explicitly loading from PyTorch checkpoint:

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    use_safetensors=False  # Force PyTorch checkpoint (works)
)

In this case:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# This uses safetensors by default and produces incorrect behavior
model_id = "pszemraj/long-t5-tglobal-base-sci-simplify"
model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    use_safetensors=False  # Force PyTorch checkpoint (works)
)
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
)

text = "Summarize: The quick brown fox jumps over the lazy dog. " * 50
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150,)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# SUMMARZE: The Quick Brown Fox Jumps Over The Lazy Dog.
# Imperfect (model size/quality) but sensical

In some cases for later-created models (ex: this checkpoint) the model does not have a .bin, unsure if it's dead in the water, or what.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions