Skip to content

fix bug in python benchmark script #1206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 12, 2025

Conversation

thevishalagarwal
Copy link
Contributor

@thevishalagarwal thevishalagarwal commented Jan 29, 2025

Bug: if we use random token ids (with flag --use_random_token), decoding and encoding generates a different set of tokens which is not equal to the original set. This changes the number of prompt tokens and generates incorrect result during benchmarking. e.g.

original_tokens = np.random.randint(100, size=(1, 50))
prompt = tokenizer.decode(original_tokens )
new_tokens = tokenizer.encode(prompt)

Earlier the number of tokens was 50 but in new_tokens it may not be 50. Updated code to prevent change of prompt length.

@thevishalagarwal
Copy link
Contributor Author

@baijumeswani Can you please review this? Thanks!

@aciddelgado
Copy link
Contributor

Hello @thevishalagarwal , I spoke with the team and we'd like to keep the tokenization metric, perhaps the original random tokens can be used for the benchmark. The decoded version can be tokenized to benchmark tokenization, then this value can be discarded.

@thevishalagarwal
Copy link
Contributor Author

@aciddelgado updated my changes without removing the tokenization metric. Please review it again. Thanks

@thevishalagarwal
Copy link
Contributor Author

BTW, this decoding-encoding thing also changes the prompt length (number of input tokens) when using the default option of generating the prompt using the model itself.

If the initial arg for prompt_length is 300. Then generate_prompt(...) generates 300 tokens which is decoded to some text and then again encoded to tokens. The length of this token is expected to be 300 but I'm getting >300.

IMO, this is a bug and prompt_length should not be changed

@baijumeswani baijumeswani merged commit b60ecf0 into microsoft:main Mar 12, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants