-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where are the actual answers from the models? #3
Comments
Hi Domenic, I haven't received any notification of this issue in my email, and upon investigation, it seems that it was redirected to the spam folder. Sorry for the inconvenience and thank you for bringing this to my attention via email.
The prompt consists of "instructions" and the "question", both prefaced with "You're a helpful assistant". Figure 8 in the appendix shows an example.
For the few-shot prompt, the prompt contains additionally 5 examples of questions and answers of the same problem size. We generate few-shot examples in the fly while querying the OpenAI API and we do not save them. Here's an example for 2-digit by 1-digit multiplication, the same applies for the 5-digit by 5-digit task:
For the scratchpad prompt, Figure 9 in the appendix shows an example but here's also here an example:
Regarding the script, the Hope this answers your questions. |
Hi!
I'm trying to understand your results better. In particular, in the data zip files, I can only find
{ id, question, prompt }
pairs. I cannot understand how these map to the experiments in the paper.For example, the paper shows that GPT-4 can never successfully multiply 5-digit by 5-digit number problems, either zero-shot, few-shot, or with a scratchpad. Let's say I wanted to reproduce your result for 92137 x 30563,
id
of 881 in the data file.What was your zero-shot prompt? (Probably it is the
question
in the data file? Not theprompt
?) What was your few-shot prompt? (I have no guesses here.) What was your scratchpad prompt? (Maybe you gave it an example, based on theprompt
from the data file, for a different 5-digit by 5-digit task? And then asked it thequestion
forid
881 at the end?)The analysis script seems to imply that there would be a
GPT3 answer
field in the data, but there is none. (And, I wonder what happened to the other non-GPT3 model answers.)Thanks for your time, and apologies if I misunderstood something about the experiment setup or code!
The text was updated successfully, but these errors were encountered: