Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.arrays_to_mgr报错数组长度必须一致 #270

Open
Brzjomo opened this issue Nov 18, 2024 · 7 comments
Open

pandas.arrays_to_mgr报错数组长度必须一致 #270

Brzjomo opened this issue Nov 18, 2024 · 7 comments

Comments

@Brzjomo
Copy link

Brzjomo commented Nov 18, 2024

部分视频翻译执行到step5的pd.DataFrame({'Source': src, 'Translation': remerged}).to_excel(OUTPUT_REMERGED_FILE, index=False)时,报错arrays_to_mgr数组长度必须一致。
看了下"output/log/translation_results_remerged.xlsx"这个文件只跟翻译纯音频相关,所以目前注释step5和step6相关代码之后,就能正确结束视频翻译的任务。

@Huanshere
Copy link
Owner

请问使用的什么 llm 呢?我在使用推荐的几家模型基本都不会出现翻译行数变少的情况,这个检查是为了最终字幕的稳定

@asu-gkg
Copy link

asu-gkg commented Nov 19, 2024

Qwen2-72B-Instruct就会出现这个问题

@Huanshere
Copy link
Owner

😂现在默认还是推荐 claude 了,Qwen 还需要很长时间的追赶

@lanxichan
Copy link

lanxichan commented Nov 25, 2024

我用 openai 的 gpt-4o 也会报这个错,奇怪的是同一个yt视频,360p的没问题,720(自己加的)和1080都会报错。

2024-11-25 10:47:17.855 Uncaught app exception
Traceback (most recent call last):
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
    exec(code, module.__dict__)
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 116, in <module>
    main()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 112, in main
    text_processing_section()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 30, in text_processing_section
    process_text()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 54, in process_text
    step5_splitforsub.split_for_sub_main()
  File "/Users/dinochan/bs/ai/VideoLingo/core/step5_splitforsub.py", line 104, in split_for_sub_main
    pd.DataFrame({'Source': src_lines, 'Translation': tr_lines}).to_excel("output/log/translation_results_for_subtitles.xlsx", index=False)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/frame.py", line 778, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")

@Huanshere
Copy link
Owner

哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。

@lanxichan
Copy link

哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。

我也是这样想的,只是当时测试过程中连续稳定重现所以我才奇怪。 😂

@Huanshere
Copy link
Owner

gpt_log 会记录所有响应并且重复运行的时候会从中读取历史,所以如果没有删除 log 就重新运行其实还是会报同样错误~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants