We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
开发者,你们好,有个微调样本构造的问题请教: 问题背景:针对给出的query,有一系列的doc可以用于排序。这些doc间也有优劣,(即使是正样本,也是有优劣之分)。那么可以将某些样本既作为正样本,又作为负样本吗?例如,有如下数据: 可以构造出 {"query":"刘亦菲","pos":["刘亦菲百度百科"],"neg":["刘亦菲已出道10年","刘一飞的微博"]} {"query":"刘亦菲","pos":["刘亦菲已出道10年"],"neg":["刘一飞的微博","华语歌手"]} 这样的训练数据吗?如果可以,会因为『刘亦菲已出道10年』既是正样本,也是负样本会给模型造成混乱吗?
The text was updated successfully, but these errors were encountered:
你好,@LawsonAbs。这样的训练数据在使用 in-batch negatives 策略训练时确实会带来一定的问题。建议同一个数据集中的 query 相似度不要太高,同一个 query 下如果有多个 pos 可以合并成同一条训练数据,即 pos 中包含多个正例;如果需要考虑正例的质量优劣,建议使用 reranker 给训练数据打分后,用于训练时的蒸馏。
Sorry, something went wrong.
No branches or pull requests
开发者,你们好,有个微调样本构造的问题请教:
data:image/s3,"s3://crabby-images/d0615/d06150c6218be4b43af162e49904c9f411be404e" alt="Image"
问题背景:针对给出的query,有一系列的doc可以用于排序。这些doc间也有优劣,(即使是正样本,也是有优劣之分)。那么可以将某些样本既作为正样本,又作为负样本吗?例如,有如下数据:
可以构造出
{"query":"刘亦菲","pos":["刘亦菲百度百科"],"neg":["刘亦菲已出道10年","刘一飞的微博"]}
{"query":"刘亦菲","pos":["刘亦菲已出道10年"],"neg":["刘一飞的微博","华语歌手"]}
这样的训练数据吗?如果可以,会因为『刘亦菲已出道10年』既是正样本,也是负样本会给模型造成混乱吗?
The text was updated successfully, but these errors were encountered: