Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where are the actual answers from the models? #3

Open
domenic opened this issue Jan 10, 2024 · 1 comment
Open

Where are the actual answers from the models? #3

domenic opened this issue Jan 10, 2024 · 1 comment

Comments

@domenic
Copy link

domenic commented Jan 10, 2024

Hi!

I'm trying to understand your results better. In particular, in the data zip files, I can only find { id, question, prompt } pairs. I cannot understand how these map to the experiments in the paper.

For example, the paper shows that GPT-4 can never successfully multiply 5-digit by 5-digit number problems, either zero-shot, few-shot, or with a scratchpad. Let's say I wanted to reproduce your result for 92137 x 30563, id of 881 in the data file.

What was your zero-shot prompt? (Probably it is the question in the data file? Not the prompt?) What was your few-shot prompt? (I have no guesses here.) What was your scratchpad prompt? (Maybe you gave it an example, based on the prompt from the data file, for a different 5-digit by 5-digit task? And then asked it the question for id 881 at the end?)

The analysis script seems to imply that there would be a GPT3 answer field in the data, but there is none. (And, I wonder what happened to the other non-GPT3 model answers.)

Thanks for your time, and apologies if I misunderstood something about the experiment setup or code!

@nouhadziri
Copy link
Owner

Hi Domenic,

I haven't received any notification of this issue in my email, and upon investigation, it seems that it was redirected to the spam folder. Sorry for the inconvenience and thank you for bringing this to my attention via email.
We didn't release models' generations. Here are answers to your questions:

What was your zero-shot prompt?

The prompt consists of "instructions" and the "question", both prefaced with "You're a helpful assistant". Figure 8 in the appendix shows an example.

You're a helpful assistant. To multiply two numbers, start by multiplying the rightmost digit of the
multiplicand by each digit of the multiplier, writing down the products and
carrying over any remainders. Repeat this process for each digit of the
multiplicand, and then add up all the partial products to obtain the final
result. 
Question: what's 76 times 8? Answer:

For the few-shot prompt, the prompt contains additionally 5 examples of questions and answers of the same problem size. We generate few-shot examples in the fly while querying the OpenAI API and we do not save them. Here's an example for 2-digit by 1-digit multiplication, the same applies for the 5-digit by 5-digit task:

You're a helpful assistant. To multiply two numbers, start by multiplying the rightmost digit of the
multiplicand by each digit of the multiplier, writing down the products and
carrying over any remainders. Repeat this process for each digit of the
multiplicand, and then add up all the partial products to obtain the final
result. Here are examples:

Question: what's 22 times 2? Answer 44.
Question: what's 34 times 4? Answer 136.
Question: what's 32 times 7? Answer 224.
Question: what's 67 times 6? Answer 402.
Question: what's 98 times 7? Answer 686.

Question: what's 76 times 8? Answer:

For the scratchpad prompt, Figure 9 in the appendix shows an example but here's also here an example:

You're a helpful assistant. Answer the following question: What is 35 times 90?

Let's perform the multiplication step by step:
Let's multiply 35 by the digit in the ones place of 90, which is 0.
1. Multiply 0 by the digit in the ones place of 35, which is 5. This gives 5 x 0
= 0. Write down the result 0.
2. Multiply 0 by the digit in the tens place of 35, which is 3. This gives 3 x 0
= 0. Write down the result 0.
3. The partial product for this step is A=0 which is the concatenation of the
digits we found in each step.
Now, let's multiply 35 by the digit in the tens place of 90, which is 9.
4. Multiply 9 by the digit in the ones place of 35, which is 5. This gives 5 x 9
= 45. Write down the result 5 and carry over the 4 to the next step.
5. Multiply 9 by the digit in the tens place of 35, which is 3. Add the carryover
from the previous step to account for this. This gives (3 x 9) + 4 = 31. Write
down the result 31.
6. The partial product for this step is B=315 which is the concatenation of the
digits we found in each step.
Now, let's sum the 2 partial products A and B, and take into account the position
of each digit: A=0 (from multiplication by 0) and B=315 (from multiplication by 9
but shifted one place to the left, so it becomes 3150). The final answer is 0 x 1
+ 315 x 10 = 0 + 3150 = 3150.

Question: What is 76 times 80?

Regarding the script, the GPT3 answer can be replaced by other fields (GPT4, GPT3.5, etc). We set nucleus sampling p to 0.7 and temperature to 1 for generations. For each task, we evaluate the performance of each model on 500 test examples.

Hope this answers your questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants