-
Notifications
You must be signed in to change notification settings - Fork 202
Open
Description
I am using the example from the README.md, but I am running into this error.
Archlinux btw
AMD64
Nvidia RTX 4070 ti super
python -m venv venv
source venv/bin/activate
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
python freeze
accelerate==1.10.0
certifi==2025.8.3
charset-normalizer==3.4.3
diffusers @ git+https://github.com/huggingface/diffusers.git@bb1d9a8b7523819b1846053616ddfecc3b857f6b
filelock==3.19.1
fsspec==2025.7.0
hf-xet==1.1.8
huggingface-hub==0.34.4
idna==3.10
importlib_metadata==8.7.0
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.5
numpy==2.3.2
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.3
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvtx-cu12==12.8.90
packaging==25.0
pillow==11.3.0
psutil==7.0.0
PyYAML==6.0.2
regex==2025.7.34
requests==2.32.5
safetensors==0.6.2
scipy==1.16.1
setuptools==80.9.0
sympy==1.14.0
tokenizers==0.21.4
torch==2.8.0
tqdm==4.67.1
transformers==4.55.3
triton==3.4.0
typing_extensions==4.14.1
urllib3==2.5.0
zipp==3.23.0
python test.py
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 10.73it/s]
Expected types for language_model: (<class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>,), got <class 'transformers.models.gpt2.modeling_gpt2.GPT2Model'>.
Traceback (most recent call last):
File "/run/media/mkp/T7/rvc2/test.py", line 10, in <module>
audio = pipe(prompt, num_inference_steps=200, audio_length_in_s=5.0).audios[0]
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/mkp/T7/rvc2/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/run/media/mkp/T7/rvc2/venv/lib/python3.13/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 1004, in __call__
prompt_embeds, attention_mask, generated_prompt_embeds = self.encode_prompt(
~~~~~~~~~~~~~~~~~~^
prompt,
^^^^^^^
...<11 lines>...
max_new_tokens=max_new_tokens,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/run/media/mkp/T7/rvc2/venv/lib/python3.13/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 527, in encode_prompt
generated_prompt_embeds = self.generate_language_model(
projected_prompt_embeds,
attention_mask=projected_attention_mask,
max_new_tokens=max_new_tokens,
)
File "/run/media/mkp/T7/rvc2/venv/lib/python3.13/site-packages/diffusers/pipelines/audioldm2/pipeline_audioldm2.py", line 324, in generate_language_model
model_kwargs = self.language_model._get_initial_cache_position(**cache_position_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/mkp/T7/rvc2/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
f"'{type(self).__name__}' object has no attribute '{name}'"
)
AttributeError: 'GPT2Model' object has no attribute '_get_initial_cache_position'
from diffusers import AudioLDM2Pipeline
import torch
import scipy
repo_id = "cvssp/audioldm2"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs."
audio = pipe(prompt, num_inference_steps=200, audio_length_in_s=10.0).audios[0]
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
Metadata
Metadata
Assignees
Labels
No labels