Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fair Benchmarking of Faster-Whisper - Parameter equivalents to Hugginface #993

Open
asusdisciple opened this issue Sep 6, 2024 · 2 comments

Comments

@asusdisciple
Copy link

I want to benchmark faster-whisper and some pipeline whisper implementations of whisper in huggingface.
For the sake of fairness I would like to parametrize the models as equally as possible.

In HF you have different generation possibilities which are:

greedy decoding if num_beams=1 and do_sample=False
contrastive search if penalty_alpha>0 and top_k>1
multinomial sampling if num_beams=1 and do_sample=True
beam-search decoding if num_beams>1 and do_sample=False
beam-search multinomial sampling if num_beams>1 and do_sample=True
diverse beam-search decoding if num_beams>1 and num_beam_groups>1

How would I for example reproduce greedy decoding in faster-whisper? Is there a do_sample parameter?
Should I set best_of = 1 and beam_size = 1? Also in case I set do_sample = True in HF would that be
equal to setting best_of = 5? Maybe you can share some insights with me, best case I want to reproduce all of the above strategies.

Best regards

@MahmoudAshraf97
Copy link
Collaborator

@BBC-Esq and I are currently working on this, check #974

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Sep 11, 2024

I'll send an invite to the repo if he wants to help out or just kibitz. Like @MahmoudAshraf97 I've been inundated with other stuff but do plan to get back to the benchmarking in the very near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants