Skip to content

Conversation

@Elvis339
Copy link
Contributor

@Elvis339 Elvis339 commented Nov 27, 2025

Why

Track C-Chain reexecution benchmark performance over time. Catch regressions before production.

Closes #1494

How

Firewood → triggers AvalancheGo benchmark → downloads results → publishes to GitHub Pages

Changes

  • scripts/bench-cchain-reexecution.sh - trigger, wait, download
  • .github/workflows/track-performance.yml - orchestrates benchmark + publish
  • justfile - bench, bench-status, bench-list, bench-help

Usage

just bench firewood-101-250k
FIREWOOD_REF=v0.1.0 just bench firewood-33m-40m
just bench-status <run_id>

Related

AvalancheGo: ava-labs/avalanchego#4650

Implements workflow to trigger C-Chain reexecution benchmarks in AvalancheGo
and track Firewood performance over time. Supports task-based and custom
parameter modes. Results stored in GitHub Pages via github-action-benchmark.
@Elvis339 Elvis339 changed the title Es/enable firewood dev workflow ci: Track Firewood Performance via AvalancheGo Benchmarks Nov 27, 2025
@Elvis339 Elvis339 changed the title ci: Track Firewood Performance via AvalancheGo Benchmarks ci(perf): Track Firewood Performance via AvalancheGo Benchmarks Nov 27, 2025
@Elvis339 Elvis339 marked this pull request as ready for review November 27, 2025 15:04
@Elvis339
Copy link
Contributor Author

Elvis339 commented Nov 27, 2025

@Elvis339 Elvis339 marked this pull request as draft November 27, 2025 15:11
@Elvis339 Elvis339 self-assigned this Dec 1, 2025
@Elvis339 Elvis339 marked this pull request as ready for review December 2, 2025 20:32
@Elvis339 Elvis339 requested a review from alarso16 as a code owner December 4, 2025 17:17
Comment on lines 19 to 21
- c-chain-reexecution-firewood-101-250k
- c-chain-reexecution-firewood-33m-33m500k
- c-chain-reexecution-firewood-33m-40m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why do we have these predefined tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without predefined tasks, you'd need to specify all the granular S3 parameters manually (block-dir-src, current-state-dir-src, etc.). These predefined tasks exist in AvalancheGo's Taskfile for convenience and reusability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you pick these particular ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-f runner="${{ inputs.runner }}" \
$LIBEVM_FLAG
sleep 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what's the purpose of this sleep here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's a race between when you run 'gh workflow run' and 'gh run list' as it might not yet be running. 10s might work, sometimes, but if no runners are available it might take longer. It does seem like we should wait a second and loop up to 60 seconds until it shows up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly sometimes the runners are busy so I've increased this to 3 min.

for i in {1..18}; do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment here explaining why we need to sleep here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced sleep with bit more robust implementation with comment:

f2cf502

workflow_dispatch:
inputs:
firewood:
description: 'Firewood commit/branch/tag to test (leave empty for current HEAD)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Current HEAD of what? main? Or whatever is actually checked out?

Copy link
Contributor Author

@Elvis339 Elvis339 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently checked out commit, I updated docs to reduce ambiguity.

description: 'Firewood commit/branch/tag to test (leave empty to use the commit that triggered the workflow)'

-f runner="${{ inputs.runner }}" \
$LIBEVM_FLAG
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's a race between when you run 'gh workflow run' and 'gh run list' as it might not yet be running. 10s might work, sometimes, but if no runners are available it might take longer. It does seem like we should wait a second and loop up to 60 seconds until it shows up.

comment-on-alert: false

- name: Summary
if: always()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think you need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed you're right, removed.

id: download
shell: nix develop ./ffi --command bash {0}
run: |
just benchmark-download "${{ steps.trigger.outputs.run_id }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be more readable if this command were the same as the name on line 132 -- "just download-benchmark-results"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good insight, I updated related commands to be more readable and intuitive:

  • benchmark-download → download-benchmark-results
  • benchmark-trigger → trigger-benchmark
  • benchmark-wait → wait-benchmark
  • benchmark-list → list-benchmarks

auto-push: true
gh-pages-branch: benchmark-data
benchmark-data-dir-path: ${{ steps.download.outputs.data-dir }}
# Don't fail the workflow if there's an issue with benchmark storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should know if this happens, otherwise we'll silently stop seeing results, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch added warning:

if [ "${{ steps.store.outcome }}" == "failure" ]; then

justfile Outdated
echo "LibEVM: {{ libevm }}"
fi

echo "Triggering reexecution benchmark in AvalancheGo..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this before the others and indent the parameters?

Suggested change
echo "Triggering reexecution benchmark in AvalancheGo..."
echo "Triggering reexecution benchmark in AvalancheGo with parameters:"
echo " - Firewood: $FIREWOOD"
# ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo "Triggering reexecution benchmark in AvalancheGo with parameters:"

justfile Outdated
echo "Triggering reexecution benchmark in AvalancheGo..."
RUN_ID=$(just benchmark-trigger "$FIREWOOD" "{{ avalanchego }}" "{{ task }}" "{{ runner }}" "{{ libevm }}")
echo " Run ID: $RUN_ID"
echo " URL: https://github.com/ava-labs/avalanchego/actions/runs/$RUN_ID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more verbose here so someone knows what the URL does. Maybe something like:

Suggested change
echo " URL: https://github.com/ava-labs/avalanchego/actions/runs/$RUN_ID"
echo " Monitor progress here: https://github.com/ava-labs/avalanchego/actions/runs/$RUN_ID"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo " Monitor progress here: https://github.com/ava-labs/avalanchego/actions/runs/$RUN_ID"

justfile Outdated
benchmark-wait run_id: check-nix
#!/usr/bin/env bash
set -euo pipefail
nix run ./ffi#gh -- run watch "{{ run_id }}" --repo ava-labs/avalanchego --exit-status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can hang if it never finishes, which is possible if someone introduces some infinite loop somewhere.

I think github will let it run for a day, which is way too long. 2h seems like plenty but perhaps a shorter time makes sense.

Suggested change
nix run ./ffi#gh -- run watch "{{ run_id }}" --repo ava-labs/avalanchego --exit-status
nix run ./ffi#gh -- timeout -k 1m 2h run watch "{{ run_id }}" --repo ava-labs/avalanchego --exit-status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout -k 1m 2h nix run ./ffi#gh -- run watch "{{ run_id }}" --repo ava-labs/avalanchego --exit-status

justfile Outdated
echo " URL: https://github.com/ava-labs/avalanchego/actions/runs/$RUN_ID"
echo ""

echo "Waiting for benchmark completion..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks polished. I probably would have just thrown a "set -x" here and called it a day but I like this better.

Elvis339 and others added 9 commits December 8, 2025 18:56
Co-authored-by: rodrigo <77309055+RodrigoVillar@users.noreply.github.com>
Signed-off-by: Elvis <43846394+Elvis339@users.noreply.github.com>
Co-authored-by: rodrigo <77309055+RodrigoVillar@users.noreply.github.com>
Signed-off-by: Elvis <43846394+Elvis339@users.noreply.github.com>
Co-authored-by: Ron Kuris <ron.kuris@avalabs.org>
Signed-off-by: Elvis <43846394+Elvis339@users.noreply.github.com>
@Elvis339
Copy link
Contributor Author

Elvis339 commented Dec 8, 2025

We need to get this merged first: ava-labs/avalanchego#4650 before merging this PR.

DO NOT MERGE.

@Elvis339 Elvis339 added the DO NOT MERGE This PR is not meant to be merged in its current state label Dec 8, 2025
@ava-labs ava-labs deleted a comment from github-actions bot Dec 9, 2025
cp -rv docs/assets ./_site
- name: Include benchmark data
run: |
git fetch origin benchmark-data || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we ignoring errors here?

Comment on lines +37 to +39
git checkout origin/benchmark-data -- dev bench 2>/dev/null || true
cp -rv dev _site/ 2>/dev/null || true
cp -rv bench _site/ 2>/dev/null || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Seems like if these fail, the build should fail

name: Track Performance

on:
workflow_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this manually dispatched? Shouldn't this happen automatically on pushes to main, or is this just an intermediate step for testing purposes?

If the latter, please change the PR description to show the followup or create another PR/task and link it.

Comment on lines 19 to 21
- c-chain-reexecution-firewood-101-250k
- c-chain-reexecution-firewood-33m-33m500k
- c-chain-reexecution-firewood-33m-40m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you pick these particular ones?

c-chain-benchmark:
runs-on: ubuntu-latest
permissions:
contents: write # Required for github-action-benchmark to push to gh-pages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was helpful as a good explanation of what this task does. IMO this would be better documented as a block comment or even better by changing the name from "c-chain-benchmark" to "record-bench-to-gh-pages".

-f runner="${{ inputs.runner }}" \
$LIBEVM_FLAG
# Wait for GitHub API to register the workflow run (retry up to 3min)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same logic we use in just, or are we already out of sync with what just does?

id: download
shell: nix develop ./ffi --command bash {0}
run: |
just download-benchmark-results "${{ steps.trigger.outputs.run_id }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious as to what happens if the benchmark fails to start at all

- name: Summary
run: |
if [ "${{ steps.store.outcome }}" == "failure" ]; then
echo "::warning::Benchmark storage failed - results were not saved to GitHub Pages"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we keep going anyway. Is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. The continue-on-error: true on the store step lets us still show the summary even if publishing to GitHub Pages fails.

The idea is: if the benchmark ran and we got results, we want to surface what was tested and provide links to the AvalancheGo run - even if we couldn't save to gh-pages. The warning ::warning::Benchmark storage failed... makes the failure visible without hiding the useful context.
Alternative would be failing the workflow, but then we lose the summary which makes debugging harder.

Comment on lines 174 to 177
echo "## Firewood Performance Benchmark Results" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Configuration:**" >> $GITHUB_STEP_SUMMARY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group all these together (probably with the ones below too):

Suggested change
echo "## Firewood Performance Benchmark Results" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Configuration:**" >> $GITHUB_STEP_SUMMARY
{
echo "## Firewood Performance Benchmark Results"
echo
echo "**Configuration:**"
# ...
} > $GITHUB_STEP_SUMMARY

It's a little particular about spacing though. There must be a space after the opening { and a newline or a semicolon ; before the closing }.

justfile Outdated

# Wait for GitHub to register the run (retry up to 3min)
for i in {1..18}; do
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate this loop and reduce the possibility of these getting out of sync?

A great example might be lets assume we want to change this to check every 5 seconds. How many places have to be updated?

@Elvis339 Elvis339 marked this pull request as draft December 30, 2025 17:07
- Add bench-cchain-reexecution.sh script to trigger and monitor
  AvalancheGo benchmarks
- Add just commands: bench, bench-status, bench-list, bench-help
- Add track-performance.yml workflow to publish results to GitHub Pages
- Support predefined tests and custom block ranges via env vars
# just bench <test> - predefined test
# FIREWOOD_REF=v0.1.0 just bench firewood-101-250k - with specific ref
# Custom: START_BLOCK=101 END_BLOCK=250000 BLOCK_DIR_SRC=<s3> CURRENT_STATE_DIR_SRC=<s3> just bench
bench test="":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just provides the golden path for developers to run benchmarks locally. Same commands work in CI.

fetch-depth: 0 # Needed for github-action-benchmark

- name: Trigger C-Chain Reexecution Benchmark
run: ./scripts/bench-cchain-reexecution.sh trigger "${{ inputs.test }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using just here it's a thin wrapper around this script. Avoids installing Nix/just in CI for no benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE This PR is not meant to be merged in its current state

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track Firewood Performance via AvalancheGo Reexecution Benchmarks

4 participants