Update Inference Benchmarking Scripts - Support AML #868

lekurile · 2024-03-01T01:17:34Z

This PR fixes/updates the inference benchmarking analysis scripts to support [fastgen, vllm, aml] backends. The scripts are generalized to support models beyond just Llama, which was hardcoded in the scripts previously. A number of bugs and formatting issues are also resolved. The scripts that were fixed/updated are:

plot_effective_throughput.py
plot_latency_percentile.py:
plot_repl_scale.py:
plot_th_lat.py:
plot_tp_sizes.py:

Example plots for the scripts:

plot_effective_throughput.py:
plot_latency_percentile.py:
plot_repl_scale.py:
plot_th_lat.py:
NOTE: the resulting data should not be used to draw any conclusions. The GPUs, tp_size, etc are different across the different data points. This is simply demonstrating the plot generation.
plot_tp_sizes.py:

tohtana

Do you have an example that shows how to run them?

lekurile · 2024-03-05T03:07:37Z

Do you have an example that shows how to run them?

@tohtana, thank you for the comment, I will update the README w/ examples showing how to run the scripts.

benchmarks/inference/mii/src/postprocess_results.py

benchmarks/inference/mii/src/plot_effective_throughput.py

benchmarks/inference/mii/src/plot_latency_percentile.py

benchmarks/inference/mii/src/plot_repl_scale.py

…ercentile.py

…entile.py

Update Inference Benchmarking Scripts - Support AML

b32dff3

lekurile requested a review from mrwyattii March 1, 2024 01:17

lekurile requested review from tjruwase, conglongli, awan-10, eltonzheng, duli2012, arashb and xiaoxiawu-microsoft as code owners March 1, 2024 01:17

lekurile added 8 commits March 1, 2024 22:55

Get plot_latency_percentile.py working, refactor

0fccdd0

Create table for set difference warning

c946c4f

Make plot title consistent

1e51ded

Merge branch 'master' into lekurile/update_bench_scripts

81036e2

small fixes

383313e

Fix plot_effective_throughput.py

3903fbc

Generalize plot_repl_scale.py script

519604e

Clean up args in scripts

0143f94

lekurile requested a review from tohtana March 5, 2024 02:10

Update args and filename of plot_repl_scale.py

7acf692

tohtana reviewed Mar 5, 2024

View reviewed changes

Get plot_tp_size.py working

742c76d

mrwyattii reviewed Mar 5, 2024

View reviewed changes

awan-10 approved these changes Mar 5, 2024

View reviewed changes

lekurile added 6 commits March 5, 2024 23:57

Update title in plot_th_lat.py

38fb586

Update how replica_sets and tp_sets are generated for code readiblity

5e5c1fb

Update backend_sets generation for code readibility

aeb137b

Update AML not implemented assertions

b14369b

Update n_params calculation to account for models < 1B in size

99d2ae1

Generalize plot_effective_throughput by looping over backends

4df7cd9

lekurile added 5 commits March 6, 2024 00:49

remove mii_ to make more generic

7536469

Loop over backends in plot_effective_throughput.py and plot_latency_p…

6c4f44e

…ercentile.py

Add plt.figure() to plot_th_lat.py

8971b9e

Find intersection of clients across all backends in plot_latency_perc…

b11d29a

…entile.py

Update README to include all scripts and example command

4b8459b

mrwyattii approved these changes Mar 6, 2024

View reviewed changes

Merge branch 'master' into lekurile/update_bench_scripts

cd41679

lekurile merged commit 6e9ada6 into master Mar 6, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Inference Benchmarking Scripts - Support AML #868

Update Inference Benchmarking Scripts - Support AML #868

lekurile commented Mar 1, 2024 •

edited

Loading

tohtana left a comment

lekurile commented Mar 5, 2024

Update Inference Benchmarking Scripts - Support AML #868

Update Inference Benchmarking Scripts - Support AML #868

Conversation

lekurile commented Mar 1, 2024 • edited Loading

tohtana left a comment

Choose a reason for hiding this comment

lekurile commented Mar 5, 2024

lekurile commented Mar 1, 2024 •

edited

Loading