-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Description
System Info
transformers
version: 4.55.4- Platform: Linux-6.1.123+-x86_64-with-glibc2.35
- Python version: 3.12.11
- Huggingface_hub version: 0.34.4
- Safetensors version: 0.6.2
- Accelerate version: 1.10.1
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu126 (CUDA)
- Tensorflow version (GPU?): 2.19.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.10.6 (gpu)
- Jax version: 0.5.3
- JaxLib version: 0.5.3
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: No
- GPU type: NVIDIA L4
Who can help?
@ArthurZucker @younesbelkada @amyeroberts (model loading)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
desc/info
Original discussion: https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary/discussions/22
After several back-and-forth investigations and attempts at solutions, there is an issue with loading and running inference with long-t5 models in transformers that once worked. The "simplest" way to characterize this issue today is that weights are unable to be loaded from .safetensors
files, causing garbage output.
repro
Run inference with any long-t5 model in safetensors format
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# This uses safetensors by default and produces incorrect behavior
model_id = "pszemraj/long-t5-tglobal-base-sci-simplify"
model = AutoModelForSeq2SeqLM.from_pretrained(
model_id,
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
)
# Test generation - model produces garbage output
text = "Summarize: The quick brown fox jumps over the lazy dog. " * 50
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# thea: and to of
# Output is corrupted/nonsensical
worth calling out that when the model is loaded it will display a warning telling you it will not use some weights:
Some weights of LongT5ForConditionalGeneration were not initialized from the model checkpoint at test_model and are newly initialized: ['decoder.embed_tokens.weight', 'encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
![NOTE]
I've put replication and expected functionality in this colab notebook
Expected behavior
The model should load correctly and produce coherent summaries. This works when explicitly loading from PyTorch checkpoint:
model = AutoModelForSeq2SeqLM.from_pretrained(
model_id,
use_safetensors=False # Force PyTorch checkpoint (works)
)
In this case:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# This uses safetensors by default and produces incorrect behavior
model_id = "pszemraj/long-t5-tglobal-base-sci-simplify"
model = AutoModelForSeq2SeqLM.from_pretrained(
model_id,
use_safetensors=False # Force PyTorch checkpoint (works)
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
)
text = "Summarize: The quick brown fox jumps over the lazy dog. " * 50
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150,)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# SUMMARZE: The Quick Brown Fox Jumps Over The Lazy Dog.
# Imperfect (model size/quality) but sensical
In some cases for later-created models (ex: this checkpoint) the model does not have a .bin
, unsure if it's dead in the water, or what.