Collect timing information in CI scripts, & detect regressions #54

septract · 2024-07-21T21:21:42Z

Per @cp526 in rems-project/cerberus#365 (comment)

We would like the CI to find regressions in timing behaviour, though. So we'll probably want to extend our scripts to collect timing statistics and "publish" the results in a way that allows for easy comparison across commits + fail if the overall sum exceed some sensible limit.

septract · 2024-07-25T22:16:56Z

I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:

For generating logs in a single CI run:
- Modify the check.sh script to output per-example run times in a .csv file
- Save this file as an artifact
For detecting regressions in a PR:
- Generate the current performance numbers.
- Pull the most recent N .csv files through the artifact API
- Compute a rolling average / trajectory / whatever we need to identify a regression
- If there has been a regression, apply a label
- Do whatever else we want in order to raise the saliency of the regression - auto-comment on the PR, block the merge ...
For long term logging
- Run a nightly CI job that pulls all the artifacts from the current day
- Aggregate into a daily log file
- Generate a performance graph using GnuPlot
- Commit the log and performance graph to the repo

PeterSewell · 2024-07-26T05:21:22Z

Sounds good to me. For simplicity I might skip the aggregation - just keep the commit timestamp in the csv and ask gnuplot to regen the graph on each commit (with x axis either commit index or time).

septract · 2024-07-26T05:33:51Z

Github artifacts are deleted after 90 days, so it depends if we care about long term retention of performance logs. If we do, we'll need some other approach to aggregation and storage.

septract · 2024-07-26T05:35:29Z

Oh, maybe you mean we just keep a rolling csv artifact with all the previous logs? That could work.

PeterSewell · 2024-07-26T05:41:24Z

Ideally I think we'd end up with all the logs and the current graphs checked in someplace rather than relying on that transient thing - but I don't speak this much github to have an opinion how.

…

On Fri, 26 Jul 2024 at 06:35, Mike Dodds ***@***.***> wrote: Oh, maybe you mean we just keep a rolling csv artifact with all the previous logs? That could work. — Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFMZZQ2B4OFSHDAMKDEVX3ZOHN3RAVCNFSM6AAAAABLHFR7JOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGAYDOOJUGI> . You are receiving this because you commented.Message ID: ***@***.***>

cp526 · 2024-07-26T08:25:11Z

I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:

For generating logs in a single CI run:

Modify the check.sh script to output per-example run times in a .csv file

Save this file as an artifact

For detecting regressions in a PR:

Generate the current performance numbers.

Pull the most recent N .csv files through the artifact API

Compute a rolling average / trajectory / whatever we need to identify a regression

If there has been a regression, apply a label

Do whatever else we want in order to raise the saliency of the regression - auto-comment on the PR, block the merge ...

For long term logging

Run a nightly CI job that pulls all the artifacts from the current day

Aggregate into a daily log file

Generate a performance graph using GnuPlot

Commit the log and performance graph to the repo

That sounds good to me.

And it would indeed be very useful to keep long-term data. However we implement 2 and 3 exactly, I think it's good to have the data from item 3 committed in the repository, to reduce the risk of losing that.

jprider63 · 2024-08-23T20:20:09Z

I've opened a PR that runs benchmarks on every update to master.

Next I'll work on running benchmarks on every PR so that we can detect potential regressions. Ideally this will add a comment with graphs of the updated performance numbers. If we can't get that working quickly, the alternative is to have the (optional) benchmarking CI job fail and add a comment if performance degradation exceeds a threshold (2x-3x slower?).

For the future, it would also be nice to compare CVC5 vs z3 timings.

This PR adds a CI workflow that generates performance graphs on every update to master. It stores the data as json in the `gh-pages` branch and renders the graphs on a Github Page. Here's [an example](https://galoisinc.github.io/cerberus/dev/bench/) of what this looks like with [dummy data](https://github.com/GaloisInc/cerberus/blob/gh-pages/dev/bench/data.js). This currently runs `cn` on all the (successful) .c files in the test suite. Eventually, we probably want to make a proper benchmark suite using something like [Core_bench](https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/). Implements part of rems-project/cn-tutorial/issues/54.

This PR adds a CI workflow that generates performance graphs on every update to master. It stores the data as json in the `gh-pages` branch and renders the graphs on a Github Page. Here's [an example](https://galoisinc.github.io/cerberus/dev/bench/) of what this looks like with [dummy data](https://github.com/GaloisInc/cerberus/blob/gh-pages/dev/bench/data.js). This currently runs `cn` on all the .c files in the test suite. Eventually, we probably want to make a proper benchmark suite using something like [Core_bench](https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/). It relies upon the existence of the (currently empty) `gh-pages` branch. Implements part of rems-project/cn-tutorial/issues/54.

septract mentioned this issue Jul 21, 2024

[CN] Make CN run both Z3 and CVC5 in CI, tidy up runner CI script rems-project/cerberus#365

Merged

jprider63 mentioned this issue Aug 23, 2024

Add benchmarking CI that generates graphs on updates to master rems-project/cerberus#530

Merged

jprider63 mentioned this issue Aug 27, 2024

[CN] Add benchmarking for PRs rems-project/cerberus#537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect timing information in CI scripts, & detect regressions #54

Collect timing information in CI scripts, & detect regressions #54

septract commented Jul 21, 2024

septract commented Jul 25, 2024 •

edited

Loading

PeterSewell commented Jul 26, 2024

septract commented Jul 26, 2024

septract commented Jul 26, 2024

PeterSewell commented Jul 26, 2024 via email

cp526 commented Jul 26, 2024

jprider63 commented Aug 23, 2024

Collect timing information in CI scripts, & detect regressions #54

Collect timing information in CI scripts, & detect regressions #54

Comments

septract commented Jul 21, 2024

septract commented Jul 25, 2024 • edited Loading

PeterSewell commented Jul 26, 2024

septract commented Jul 26, 2024

septract commented Jul 26, 2024

PeterSewell commented Jul 26, 2024 via email

cp526 commented Jul 26, 2024

jprider63 commented Aug 23, 2024

septract commented Jul 25, 2024 •

edited

Loading