-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
处理日语长视频时报错 #238
Comments
暂时改成这样凑合用了,期待完善这种问题的处理方法
|
日语的支持不是很好,一是 whisper 打标点不好,二是 llm 对于日语的支持不算很完美,之后会尝试更换一下 wav2vac 对齐模型 |
磕磕绊绊做完了一个3小时的音乐剧,总体来说还可以,可以私你看看成品,自己找的sonnet接口跑的 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
2024-11-07 21:11:45.306 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 590, in code_to_exec
exec(code, module.dict)
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\st.py", line 117, in
main()
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\st.py", line 113, in main
text_processing_section()
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\st.py", line 30, in text_processing_section
process_text()
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\st.py", line 47, in process_text
step3_1_spacy_split.split_by_spacy()
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\core\step3_1_spacy_split.py", line 17, in split_by_spacy
split_by_mark(nlp)
File "C:\Users\deepf\Desktop\VideoLingo\VideoLingo\core\spacy_utils\split_by_mark.py", line 21, in split_by_mark
doc = nlp(input_text)
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\spacy\language.py", line 1037, in call
doc = self._ensure_doc(text)
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\spacy\language.py", line 1128, in ensure_doc
return self.make_doc(doc_like)
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\spacy\language.py", line 1120, in make_doc
return self.tokenizer(text)
File "C:\Users\deepf\anaconda3\envs\videolingo\lib\site-packages\spacy\lang\ja_init.py", line 56, in call
sudachipy_tokens = self.tokenizer.tokenize(text)
Exception: Tokenization error: Input is too long, it can't be more than 49149 bytes, was 116123
The text was updated successfully, but these errors were encountered: