Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to audit baseline benchmarks #121

Open
adamomainz opened this issue Dec 19, 2024 · 1 comment
Open

Need to audit baseline benchmarks #121

adamomainz opened this issue Dec 19, 2024 · 1 comment

Comments

@adamomainz
Copy link
Contributor

The baselines do not all seem to represent the best in class version of the kernels. Lets audit and see where we can improve this.

Example: when we have flash_v3 available that really should be the baseline IMO but when it isnt we can default to sdpa as we do now.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Dec 27, 2024

Above all, we allow --baseline to customize using any backend as the baseline when running the benchmark.

For the default baseline impl, I think we should prioritize coverage as the default baseline so that it is broadly available.
For example, if we assign cutlass (like flash_v3)/cudnn/cublas as the baseline to Triton, they won't be available on AMD.

Maybe we should always use the default torch/aten operator as the baseline, if that is available. However, also note that the torch/aten impl might be very slow and might cause OOM on large size of inputs.

Or maybe we can avoid having a default baseline backend at all in the code and always require user to specify one when they want relative metrics like speedup or memory_compression_ratio.

cc @FindHao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants