You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
현재 학습 완료된 모델을 불러와 하나의 음성 파일을 예측하고 싶습니다. 패키지 내에 함수들은 다량의 데이터 셋으로 테스트하는 것 같아서요!... 이것저것 보면서 짜고 있는데 계속 에러가나 이렇게 질문드립니다.... 혹시 wav 파일 하나만 가지고 테스트해 해당 예측된 말 소리 텍스트를 볼 수 있을까요? 패키지 내에 어느 부분을 참고하면 될까요?... 죄송합니다
Details
소스코드입니다.
def hydra_main(configs: DictConfig) -> None:
use_cuda = configs.eval.use_cuda and torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
tokenizer = TOKENIZER_REGISTRY[configs.tokenizer.unit](configs)
model = MODEL_REGISTRY[configs.model.model_name]
model = model.load_from_checkpoint(
configs.eval.checkpoint_path, configs=configs, tokenizer=tokenizer
)
model.to(device)
audio_path = "/home/net/바탕화면/tool/sample/test.wav"
waveform, sample_rate = torchaudio.load(audio_path)
# Process audio
mel_transform = T.MelSpectrogram(sample_rate=sample_rate, n_mels=80)
mel_spectrogram = mel_transform(waveform)
input_tensor = mel_spectrogram.unsqueeze(0)
# Compute input_lengths
input_lengths = torch.tensor([mel_spectrogram.shape[1]])
# Run inference
with torch.no_grad():
input_tensor = input_tensor.transpose(
1, 0
) # (배치 크기, 시퀀스 길이, 특성 차원) -> (시퀀스 길이, 배치 크기, 특성 차원)
outputs = model(input_tensor, input_lengths=input_lengths)
# Convert predicted tokens to text using decoder
predicted_tokens = outputs["predictions"][0].argmax(dim=-1).tolist()
predicted_sentence = tokenizer.decode(predicted_tokens)
# Print predicted sentence
print(f"Predicted Sentence: {predicted_sentence}")
에러출력본 입니다.
Traceback (most recent call last):
File "/home/net/바탕화면/tool/tool_stt_test.py", line 147, in
hydra_main(configs)
File "/home/net/바탕화면/tool/tool_stt_test.py", line 45, in hydra_main
outputs = model(input_tensor, input_lengths=input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/models/openspeech_encoder_decoder_model.py", line 136, in forward
encoder_outputs, encoder_logits, encoder_output_lengths = self.encoder(inputs, input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/encoders/lstm_encoder.py", line 121, in forward
conv_outputs = nn.utils.rnn.pack_padded_sequence(inputs.transpose(0, 1), input_lengths.cpu())
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 263, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: Expected len(lengths) to be equal to batch_size, but got 1 (batch_size=2)
The text was updated successfully, but these errors were encountered:
❓ Questions & Help
현재 학습 완료된 모델을 불러와 하나의 음성 파일을 예측하고 싶습니다. 패키지 내에 함수들은 다량의 데이터 셋으로 테스트하는 것 같아서요!... 이것저것 보면서 짜고 있는데 계속 에러가나 이렇게 질문드립니다.... 혹시 wav 파일 하나만 가지고 테스트해 해당 예측된 말 소리 텍스트를 볼 수 있을까요? 패키지 내에 어느 부분을 참고하면 될까요?... 죄송합니다
Details
소스코드입니다.
def hydra_main(configs: DictConfig) -> None:
use_cuda = configs.eval.use_cuda and torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
에러출력본 입니다.
Traceback (most recent call last):
File "/home/net/바탕화면/tool/tool_stt_test.py", line 147, in
hydra_main(configs)
File "/home/net/바탕화면/tool/tool_stt_test.py", line 45, in hydra_main
outputs = model(input_tensor, input_lengths=input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/models/openspeech_encoder_decoder_model.py", line 136, in forward
encoder_outputs, encoder_logits, encoder_output_lengths = self.encoder(inputs, input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/encoders/lstm_encoder.py", line 121, in forward
conv_outputs = nn.utils.rnn.pack_padded_sequence(inputs.transpose(0, 1), input_lengths.cpu())
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 263, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: Expected
len(lengths)
to be equal to batch_size, but got 1 (batch_size=2)The text was updated successfully, but these errors were encountered: