Skip to content

Add report linearization#75

Merged
Liam-DeVoe merged 10 commits intoZac-HD:masterfrom
Liam-DeVoe:db-dataclasses
Apr 21, 2025
Merged

Add report linearization#75
Liam-DeVoe merged 10 commits intoZac-HD:masterfrom
Liam-DeVoe:db-dataclasses

Conversation

@Liam-DeVoe
Copy link
Copy Markdown
Collaborator

@Liam-DeVoe Liam-DeVoe commented Apr 16, 2025

Part of #3 (comment)

Comment thread src/hypofuzz/hypofuzz.py Outdated
@Liam-DeVoe Liam-DeVoe changed the title Add proper database classes and store more data Add report linearization Apr 21, 2025
@Liam-DeVoe
Copy link
Copy Markdown
Collaborator Author

  • Linearized-history view is currently the only dashboard view, option to select runs-in-last-n-days coming later
  • Big PR; important bits are def linearize_reports and the dataclasses in database.py

Copy link
Copy Markdown
Owner

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good!

Comment thread tests/test_linearize.py
Comment on lines +40 to +42
# all of this min_size=len(uuids) etc is going to lead to terrible shrinking.
# But the alternative of while draw(st.booleans()) will generate too-small
# collections. Use `more` from hypothesis internals?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decompose this: generate a worker_and_run() or something, and then have reports() draw a list of those. We can pass in a list of (db_key, nodeid) pairs for each report to sample from; and have the multi-worker reports strategy add whatever offset we need after the fact if overlap=False (noting that we probably want a smaller upper bound on timestamps).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if I come back to this one? I definitely want a strong strategy here, but also want to write the overlapping case first before making it smarter

Comment thread src/hypofuzz/hypofuzz.py Outdated
Copy link
Copy Markdown
Owner

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Liam-DeVoe Liam-DeVoe merged commit 48bbccd into Zac-HD:master Apr 21, 2025
13 checks passed
@Liam-DeVoe Liam-DeVoe deleted the db-dataclasses branch April 21, 2025 23:00
@Zac-HD
Copy link
Copy Markdown
Owner

Zac-HD commented Apr 22, 2025

My bad for the hasty review, seriously, but this implementation is wrong once you have a restart - sorting by elapsed time will interleave all the runs, and then crash with assertion errors.

Having played around a bit in a branch (https://github.com/Zac-HD/hypofuzz/compare/zac/linearize), I think the solution is to change REPORTS to be a nested dictionary: nodeid -> worker_uuid -> list[Report] sorted by timestamp

  • With that structure, it's easy to derive the diffs just be iterative over the elements of each list
    • then linearize by concatenating all the per-worker-uuid lists and sorting by typestamp, and dropping not-at-the-start replay entries.
    • I actually think that computing diffs in the dashboard server kinda sucks; either we should take the space hit and put that in the database (ie denormalize a bit; our worker-identity mapping is already heavy-ish), or do it in the frontend.
  • we need all those individual lists anyway, since we want an option to plot them separately on the per-test pages
  • also, in every place we construct a Report or a Metadata loaded from the database, we need to handle 'parsing errors' due to invalid json, or missing/extra keys due to writes from an older or newer fuzzer version.
    • gosh that's going to be annoying, ugh, at least it's not too many places...
    • I think we just skip over those entries; I'm cautious about deleting stuff (imagine you update the fuzzer, have to roll back, and in the meantime we dropped all your metadata...) - we can do something sensible later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants