Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The demo.py can not work correctly #147

Open
chongkuiqi opened this issue Oct 22, 2024 · 1 comment
Open

The demo.py can not work correctly #147

chongkuiqi opened this issue Oct 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@chongkuiqi
Copy link

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

When I run the demo.py , the error is :

Traceback (most recent call last):
  File "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/sensevoice.py", line 18, in <module>
    res = model.generate(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 303, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 376, in inference_with_vad
    res = self.inference(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 342, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
TypeError: expected Tensor as element 1 in argument 0, but got str

Code sample

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    trust_remote_code=True,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

Environment

  • OS : Ubuntu 20.04
  • FunASR Version : 1.1.12
  • ModelScope Version : 1.15.0
  • PyTorch Version : 2.2.2+cu121
  • How you installed funasr: pip
  • Python version: 3.10
  • GPU : NVIDIA 3090
  • CUDA/cuDNN version : cuda12.1
@chongkuiqi chongkuiqi added the bug Something isn't working label Oct 22, 2024
@chongkuiqi
Copy link
Author

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

When I run the demo.py , the error is :

Traceback (most recent call last):
  File "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/sensevoice.py", line 18, in <module>
    res = model.generate(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 303, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 376, in inference_with_vad
    res = self.inference(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 342, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
TypeError: expected Tensor as element 1 in argument 0, but got str

Code sample

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    trust_remote_code=True,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

Environment

  • OS : Ubuntu 20.04
  • FunASR Version : 1.1.12
  • ModelScope Version : 1.15.0
  • PyTorch Version : 2.2.2+cu121
  • How you installed funasr: pip
  • Python version: 3.10
  • GPU : NVIDIA 3090
  • CUDA/cuDNN version : cuda12.1

Problem solved, don't use the model form huggingface, use the model auto download form modelscope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant