Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there repetitive content in the prompt #56

Open
knightmarehs opened this issue Sep 20, 2024 · 3 comments
Open

Why is there repetitive content in the prompt #56

knightmarehs opened this issue Sep 20, 2024 · 3 comments

Comments

@knightmarehs
Copy link
Contributor

For example the Goal and Target response length and format repeated.
Is this intentional repetition or some kind of error?

PROMPTS[
"local_rag_response"
] = """---Role---

You are a helpful assistant responding to questions about data in the tables provided.

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.
If you don't know the answer, just say so. Do not make anything up.
Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

---Data tables---

{context_data}

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
"""

@rangehow
Copy link
Contributor

This is an interesting topic. This prompt is directly copied from the Microsoft/graphrag repository(https://github.com/microsoft/graphrag/blob/16b4ea5dc9c3c74ec6d97b1551db4631b40949ff/graphrag/query/structured_search/local_search/system_prompt.py#L6-L69). I don't think this is a mistake, as repeating the instruction twice may indeed lead to performance improvements (https://arxiv.org/abs/2402.15449). I think this might have introduced a certain degree of 'bidirectionality' in the expression. Although the paper I provided offers some guidance on embeddings, the concept of repeating twice has already been discussed somewhat in general QA. (I forget the paper lol)

@knightmarehs
Copy link
Contributor Author

@rangehow thanks for your sharing, If repeating requests in a prompt can indeed enhance the model's understanding and generation capabilities, it is indeed quite remarkable. However in the long run, if the model's capabilities are sufficiently advanced, we may not need such techniques (after all, it seems a bit odd and also consumes extra tokens).
I think I will conduct a comparison of the effects when I have the time.

@rangehow
Copy link
Contributor

If you could try this comparison with a small model on some QA benchmarks and obtain quantitative results, that would be very helpful. Otherwise, we might temporarily continue to behave the same way as Microsoft. Once it is confirmed that there are no indeed benefits, you are more than welcome to submit a PR to the repository to modify it. 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants