Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect timing information in CI scripts, & detect regressions #54

Open
septract opened this issue Jul 21, 2024 · 7 comments
Open

Collect timing information in CI scripts, & detect regressions #54

septract opened this issue Jul 21, 2024 · 7 comments

Comments

@septract
Copy link
Collaborator

Per @cp526 in rems-project/cerberus#365 (comment)

We would like the CI to find regressions in timing behaviour, though. So we'll probably want to extend our scripts to collect timing statistics and "publish" the results in a way that allows for easy comparison across commits + fail if the overall sum exceed some sensible limit.

@septract
Copy link
Collaborator Author

septract commented Jul 25, 2024

I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:

  • For generating logs in a single CI run:
    • Modify the check.sh script to output per-example run times in a .csv file
    • Save this file as an artifact
  • For detecting regressions in a PR:
    • Generate the current performance numbers.
    • Pull the most recent N .csv files through the artifact API
    • Compute a rolling average / trajectory / whatever we need to identify a regression
    • If there has been a regression, apply a label
    • Do whatever else we want in order to raise the saliency of the regression - auto-comment on the PR, block the merge ...
  • For long term logging
    • Run a nightly CI job that pulls all the artifacts from the current day
    • Aggregate into a daily log file
    • Generate a performance graph using GnuPlot
    • Commit the log and performance graph to the repo

@PeterSewell
Copy link
Contributor

Sounds good to me. For simplicity I might skip the aggregation - just keep the commit timestamp in the csv and ask gnuplot to regen the graph on each commit (with x axis either commit index or time).

@septract
Copy link
Collaborator Author

Github artifacts are deleted after 90 days, so it depends if we care about long term retention of performance logs. If we do, we'll need some other approach to aggregation and storage.

@septract
Copy link
Collaborator Author

Oh, maybe you mean we just keep a rolling csv artifact with all the previous logs? That could work.

@PeterSewell
Copy link
Contributor

PeterSewell commented Jul 26, 2024 via email

@cp526
Copy link
Collaborator

cp526 commented Jul 26, 2024

I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:

  • For generating logs in a single CI run:

    • Modify the check.sh script to output per-example run times in a .csv file
    • Save this file as an artifact
  • For detecting regressions in a PR:

    • Generate the current performance numbers.
    • Pull the most recent N .csv files through the artifact API
    • Compute a rolling average / trajectory / whatever we need to identify a regression
    • If there has been a regression, apply a label
    • Do whatever else we want in order to raise the saliency of the regression - auto-comment on the PR, block the merge ...
  • For long term logging

    • Run a nightly CI job that pulls all the artifacts from the current day
    • Aggregate into a daily log file
    • Generate a performance graph using GnuPlot
    • Commit the log and performance graph to the repo

That sounds good to me.

And it would indeed be very useful to keep long-term data. However we implement 2 and 3 exactly, I think it's good to have the data from item 3 committed in the repository, to reduce the risk of losing that.

@jprider63
Copy link

I've opened a PR that runs benchmarks on every update to master.

Next I'll work on running benchmarks on every PR so that we can detect potential regressions. Ideally this will add a comment with graphs of the updated performance numbers. If we can't get that working quickly, the alternative is to have the (optional) benchmarking CI job fail and add a comment if performance degradation exceeds a threshold (2x-3x slower?).

For the future, it would also be nice to compare CVC5 vs z3 timings.

dc-mak pushed a commit to rems-project/cerberus that referenced this issue Aug 27, 2024
This PR adds a CI workflow that generates performance graphs on every update to master. It stores the data as json in the `gh-pages` branch and renders the graphs on a Github Page. Here's [an example](https://galoisinc.github.io/cerberus/dev/bench/) of what this looks like with [dummy data](https://github.com/GaloisInc/cerberus/blob/gh-pages/dev/bench/data.js). 

This currently runs `cn` on all the (successful) .c files in the test suite. Eventually, we probably want to make a proper benchmark suite using something like [Core_bench](https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/).

Implements part of rems-project/cn-tutorial/issues/54.
dc-mak pushed a commit to rems-project/cerberus that referenced this issue Aug 29, 2024
This PR adds a CI workflow that generates performance graphs on every update to master. It stores the data as json in the `gh-pages` branch and renders the graphs on a Github Page. Here's [an example](https://galoisinc.github.io/cerberus/dev/bench/) of what this looks like with [dummy data](https://github.com/GaloisInc/cerberus/blob/gh-pages/dev/bench/data.js). 

This currently runs `cn` on all the .c files in the test suite. Eventually, we probably want to make a proper benchmark suite using something like [Core_bench](https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/).

It relies upon the existence of the (currently empty) `gh-pages` branch.

Implements part of rems-project/cn-tutorial/issues/54.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants