fix bug in python benchmark script #1206

thevishalagarwal · 2025-01-29T13:59:30Z

Bug: if we use random token ids (with flag --use_random_token), decoding and encoding generates a different set of tokens which is not equal to the original set. This changes the number of prompt tokens and generates incorrect result during benchmarking. e.g.

original_tokens = np.random.randint(100, size=(1, 50))
prompt = tokenizer.decode(original_tokens )
new_tokens = tokenizer.encode(prompt)

Earlier the number of tokens was 50 but in new_tokens it may not be 50. Updated code to prevent change of prompt length.

thevishalagarwal · 2025-02-12T08:28:53Z

@baijumeswani Can you please review this? Thanks!

aciddelgado · 2025-03-04T22:16:43Z

Hello @thevishalagarwal , I spoke with the team and we'd like to keep the tokenization metric, perhaps the original random tokens can be used for the benchmark. The decoded version can be tokenized to benchmark tokenization, then this value can be discarded.

thevishalagarwal · 2025-03-05T10:04:01Z

@aciddelgado updated my changes without removing the tokenization metric. Please review it again. Thanks

thevishalagarwal · 2025-03-05T13:12:03Z

BTW, this decoding-encoding thing also changes the prompt length (number of input tokens) when using the default option of generating the prompt using the model itself.

If the initial arg for prompt_length is 300. Then generate_prompt(...) generates 300 tokens which is decoded to some text and then again encoded to tokens. The length of this token is expected to be 300 but I'm getting >300.

IMO, this is a bug and prompt_length should not be changed

natke requested review from RyanUnderhill, aciddelgado and hanbitmyths March 3, 2025 18:42

thevishalagarwal added 2 commits March 4, 2025 12:28

fix merge conflict

4bf8a62

remove unwanted changes

13dc655

thevishalagarwal force-pushed the fix-random-tokens branch from b9c67ae to 13dc655 Compare March 4, 2025 07:03

fix without removing tokenization metrics

2570d34

baijumeswani approved these changes Mar 12, 2025

View reviewed changes

baijumeswani merged commit b60ecf0 into microsoft:main Mar 12, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug in python benchmark script #1206

fix bug in python benchmark script #1206

thevishalagarwal commented Jan 29, 2025 •

edited

Loading

thevishalagarwal commented Feb 12, 2025

aciddelgado commented Mar 4, 2025

thevishalagarwal commented Mar 5, 2025

thevishalagarwal commented Mar 5, 2025

fix bug in python benchmark script #1206

fix bug in python benchmark script #1206

Conversation

thevishalagarwal commented Jan 29, 2025 • edited Loading

thevishalagarwal commented Feb 12, 2025

aciddelgado commented Mar 4, 2025

thevishalagarwal commented Mar 5, 2025

thevishalagarwal commented Mar 5, 2025

thevishalagarwal commented Jan 29, 2025 •

edited

Loading