Update perf_test text output, make columns selectable #3014

travisdowns · 2025-10-02T16:32:17Z

This is a follow up for an earlier series where we made the columns
more generic. Now we add the ability to choose exactly which columns
are output, and also show the "variance" (in the form of MAD / median)
for any numeric column.

Configurable MAD display, selectable columns, tighter numeric packing

Add --mad-columns option allowing 'all' or comma-separated list so any
stats column (runtime, allocs, tasks, inst, cycles, etc.) can show
relative MAD (±pct of median) instead of being fixed
Add --columns option to restrict which columns appear in text/markdown
output (or 'all' to keep previous behavior)
Improve width fitting: adaptive formatting packs large values and
scaled durations (ns/µs/ms/s) more tightly while preserving
significant digits
Derive header widths from representative sample values for more
consistent alignment and reduced horizontal space
Unified duration scaling path reused for both raw durations and
aggregated statistics

This shouldn't affect json output at all.

Also in this series we add a few which benchmarks stress the textual output with large/small amounts
of iterations, allocs, runtime. Useful to check that the table formatting works. The output from these
benchmarks are shown below (the new output is run with --mad-columns=runtime,inst):

text output

before

test                                  iterations      median         mad         min         max      allocs       tasks        inst      cycles
output_check.high_iters              396696730000     0.000ns     0.000ns     0.000ns     0.000ns       0.000       0.000         0.0         0.0
output_check.no_runtime                 71041741     0.141ns     0.000ns     0.141ns     0.142ns       0.000       0.000         4.0         0.6
output_check.low_runtime                  382523    26.094ns     0.005ns    26.049ns    26.099ns       0.000       0.000       307.0       103.5
output_check.high_runtime                     79   126.607us    97.873ns   126.509us   126.865us       0.000       0.000   3000012.1    502851.1
output_check.high_runtime_allocs               1   582.727ms   108.715us   582.619ms   587.716ms 100000000.000       0.000 9600000987.3 2325084710.7
output_check.highly_variable_runtime           1    20.089ms    14.525ms     1.404us    34.614ms 3130404.333       0.000 300519388.3  72540181.0

after

test                                      iters            runtime     allocs      tasks               inst     cycles
output_check.high_iters                   4e+11     0.00ns ± 0.05%      0.000      0.000       0.00 ± 0.00%        0.0
output_check.no_runtime                70937442     0.14ns ± 0.44%      0.000      0.000       4.00 ± 0.00%        0.6
output_check.low_runtime                 382914    26.07ns ± 0.03%      0.000      0.000     307.00 ± 0.00%      103.4
output_check.high_runtime                    79   126.72µs ± 0.18%      0.000      0.000  3000012.1 ± 0.00%   502621.3
output_check.high_runtime_allocs              1   582.97ms ± 0.06%  100000000      0.000   9.60e+09 ± 0.00%    2.3e+09
output_check.highly_variable_runtime          3    18.43ms ± 4.35%  3155514.0      0.000  302929658 ± 4.51%   73260844

md output

before

test	iterations	median	mad	min	max	allocs	inst	cycles
output_check.high_iters	396696730000	0.000ns	0.000ns	0.000ns	0.000ns	0.000	0.0	0.0
output_check.no_runtime	71041741	0.141ns	0.000ns	0.141ns	0.142ns	0.000	4.0	0.6
output_check.low_runtime	382523	26.094ns	0.005ns	26.049ns	26.099ns	0.000	307.0	103.5
output_check.high_runtime	79	126.607us	97.873ns	126.509us	126.865us	0.000	3000012.1	502851.1
output_check.high_runtime_allocs	1	582.727ms	108.715us	582.619ms	587.716ms	100000000.000	9600000987.3	2325084710.7
output_check.highly_variable_runtime	1	20.089ms	14.525ms	1.404us	34.614ms	3130404.333	300519388.3	72540181.0

after

test	iters	runtime	allocs	inst	cycles
output_check.high_iters	4e+11	0.00ns ± 0.05%	0.000	0.00 ± 0.00%	0.0
output_check.no_runtime	70937442	0.14ns ± 0.44%	0.000	4.00 ± 0.00%	0.6
output_check.low_runtime	382914	26.07ns ± 0.03%	0.000	307.00 ± 0.00%	103.4
output_check.high_runtime	79	126.72µs ± 0.18%	0.000	3000012.1 ± 0.00%	502621.3
output_check.high_runtime_allocs	1	582.97ms ± 0.06%	100000000	9.60e+09 ± 0.00%	2.3e+09
output_check.highly_variable_runtime	3	18.43ms ± 4.35%	3155514.0	302929658 ± 4.51%	73260844

travisdowns · 2025-10-02T17:16:28Z

CI falure looks spurious to me, I don't think this even changes code that is linked into the binary that failed.

avikivity · 2025-10-03T17:52:28Z

"2.3e+09" carries too little information. I'm not a fan of 352352984723945871259.23154123 precision-without-accuracy, but this is too little. Often changes in the 0.5% range are measurable, and this hides it.

travisdowns · 2025-10-04T03:39:01Z

"2.3e+09" carries too little information. I'm not a fan of 352352984723945871259.23154123 precision-without-accuracy, but this is too little. Often changes in the 0.5% range are measurable, and this hides it.

Fair. In part this is to discourage you from writing tests which fail to normalize the result down to something reasonable which doesn't need this in the first place, where you'd get 6 sig figs. Right now the width is set to 7, which means with scientific notification there is only room for 2 sig figs, since .e+00 take the other 5. I can set this to 8 or 9 which give 3 or 4 sig figs. 3 is usually enough to measure a 0.5% change, but not a 0.1% one. WDYT?

Other option is to use a suffix like K, M, G to represent magnitudes, but this has the disadvantage of jumping by 3 digits every time, which eats into the 4 chars you save by not using scientific notation.

avikivity · 2025-10-09T11:37:38Z

"2.3e+09" carries too little information. I'm not a fan of 352352984723945871259.23154123 precision-without-accuracy, but this is too little. Often changes in the 0.5% range are measurable, and this hides it.

Fair. In part this is to discourage you from writing tests which fail to normalize the result down to something reasonable which doesn't need this in the first place, where you'd get 6 sig figs. Right now the width is set to 7, which means with scientific notification there is only room for 2 sig figs, since .e+00 take the other 5. I can set this to 8 or 9 which give 3 or 4 sig figs. 3 is usually enough to measure a 0.5% change, but not a 0.1% one. WDYT?

Let's set it to 9.

Other option is to use a suffix like K, M, G to represent magnitudes, but this has the disadvantage of jumping by 3 digits every time, which eats into the 4 chars you save by not using scientific notation.

These benchmarks stress the textual output with large/small amounts of iterations, allocs, runtime. Useful to check that the table formatting works.

This is a follow up for an earlier series where we made the columns more generic. Now we add the ability to choose exactly which columns are output, and also show the "variance" (in the form of MAD / median) for any numeric column. Configurable MAD display, selectable columns, tighter numeric packing - Add --mad-columns option allowing 'all' or comma-separated list so any stats column (runtime, allocs, tasks, inst, cycles, etc.) can show relative MAD (±pct of median) instead of being fixed - Add --columns option to restrict which columns appear in text/markdown output (or 'all' to keep previous behavior) - Improve width fitting: adaptive formatting packs large values and scaled durations (ns/µs/ms/s) more tightly while preserving significant digits - Derive header widths from representative sample values for more consistent alignment and reduced horizontal space - Unified duration scaling path reused for both raw durations and aggregated statistics - Internal printing path simplified (less branching, clearer per-column MAD decision) without changing JSON output schema - Default behavior unchanged when new options are not specified This shouldn't affect json output at all.

travisdowns · 2025-10-15T13:43:18Z

@avikivity wrote:

Let's set it to 9.

Done.

travisdowns changed the title ~~Td perf tests output upstream~~ Update perf_test text output, make columns selectable Oct 2, 2025

xemul force-pushed the master branch from 5b52717 to 8549271 Compare October 10, 2025 08:26

travisdowns added 2 commits October 10, 2025 10:14

perf_test_tests: add some check_output tests

dc2c78e

These benchmarks stress the textual output with large/small amounts of iterations, allocs, runtime. Useful to check that the table formatting works.

travisdowns force-pushed the td-perf-tests-output-upstream branch from 3db285d to db5eda0 Compare October 10, 2025 13:14

avikivity closed this in c7dae27 Oct 16, 2025

avikivity merged commit c7dae27 into scylladb:master Oct 16, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update perf_test text output, make columns selectable #3014

Update perf_test text output, make columns selectable #3014

Uh oh!

travisdowns commented Oct 2, 2025 •

edited

Loading

Uh oh!

travisdowns commented Oct 2, 2025

Uh oh!

avikivity commented Oct 3, 2025

Uh oh!

travisdowns commented Oct 4, 2025

Uh oh!

avikivity commented Oct 9, 2025

Uh oh!

travisdowns commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update perf_test text output, make columns selectable #3014

Update perf_test text output, make columns selectable #3014

Uh oh!

Conversation

travisdowns commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisdowns commented Oct 2, 2025

Uh oh!

avikivity commented Oct 3, 2025

Uh oh!

travisdowns commented Oct 4, 2025

Uh oh!

avikivity commented Oct 9, 2025

Uh oh!

travisdowns commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

travisdowns commented Oct 2, 2025 •

edited

Loading