Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教贴:文本最大长度 #114

Open
gggdroa opened this issue Dec 8, 2023 · 5 comments
Open

请教贴:文本最大长度 #114

gggdroa opened this issue Dec 8, 2023 · 5 comments

Comments

@gggdroa
Copy link

gggdroa commented Dec 8, 2023

1.m3e最大长度是多少呢?按照文本字算的还是token呀?
2.如果是长文本的话按短句切分并保存到embedding库后续计算效果会更加好一点吗?

@wangyuxinwhy
Copy link
Owner

  1. 按照 token 计算,最大的 token 数量为 512
  2. 嗯嗯,分 chunk 会好一点

@gggdroa
Copy link
Author

gggdroa commented Dec 27, 2023

  1. 按照 token 计算,最大的 token 数量为 512
  2. 嗯嗯,分 chunk 会好一点

好的谢谢

做问答检索的话,是直接计算就行?不需要提示语句吧?
场景是:用户输入一个问题,返回相关的文本段落。

@wangyuxinwhy
Copy link
Owner

嗯嗯,是的。直接计算就行,也不需要提示语句。

@twwch
Copy link

twwch commented Jan 4, 2024

请问一下,512个token大概多少个字符或者汉字呢?

@wangyuxinwhy
Copy link
Owner

大概就是 512 个汉字

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants