-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Description
I tried doing inference usign InspireMusic-Base and everything works, but when I try using the InspireMusic-1.5B model it fails giving me this error:
Exception in thread Thread-8 (llm_job):
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/content/InspireMusic/inspiremusic/cli/model.py", line 148, in llm_job
for i in self.llm.inference(**inference_kwargs):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 57, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "/content/InspireMusic/inspiremusic/llm/llm.py", line 374, in inference
top_ids = self.sampling_ids(logp, out_tokens, ignore_eos=i < min_len).item()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The list of tensors is empty for UUID: 3e0648bc-3981-11f0-b707-0242ac1c000c
Do you have any idea of why this could be happening?
Thanks for your work on the repo!
P.S. I don't think it's related, but I updated qwen_encoder.py in order to use attn_implementation="eager" instead of attn_implementation="flash_attention_2". (just switched to eager as default).
Metadata
Metadata
Assignees
Labels
No labels