Need to audit baseline benchmarks #121

adamomainz · 2024-12-19T23:52:37Z

The baselines do not all seem to represent the best in class version of the kernels. Lets audit and see where we can improve this.

Example: when we have flash_v3 available that really should be the baseline IMO but when it isnt we can default to sdpa as we do now.

xuzhao9 · 2024-12-27T21:49:45Z

Above all, we allow --baseline to customize using any backend as the baseline when running the benchmark.

For the default baseline impl, I think we should prioritize coverage as the default baseline so that it is broadly available.
For example, if we assign cutlass (like flash_v3)/cudnn/cublas as the baseline to Triton, they won't be available on AMD.

Maybe we should always use the default torch/aten operator as the baseline, if that is available. However, also note that the torch/aten impl might be very slow and might cause OOM on large size of inputs.

Or maybe we can avoid having a default baseline backend at all in the code and always require user to specify one when they want relative metrics like speedup or memory_compression_ratio.

cc @FindHao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to audit baseline benchmarks #121

Need to audit baseline benchmarks #121

adamomainz commented Dec 19, 2024

xuzhao9 commented Dec 27, 2024 •

edited

Loading

Need to audit baseline benchmarks #121

Need to audit baseline benchmarks #121

Comments

adamomainz commented Dec 19, 2024

xuzhao9 commented Dec 27, 2024 • edited Loading

xuzhao9 commented Dec 27, 2024 •

edited

Loading