-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat(fuzz): ast-seeded dictionary #12015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
.prop_flat_map(move |(use_ast_index, select_index)| { | ||
let dict = state_clone.dictionary_read(); | ||
|
||
// AST string literals available: use 30/70 allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arbitrary value, we could change it if you are opinionated.
|
||
// Seed dict with AST literals if analysis is available. | ||
if let Some(literals) = analysis { | ||
dictionary.ast_values = Some(literals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO better to keep this simpler and just insert / reuse the existing fuzz dict samples - insert_sample_values
which stores values by type and use them during fuzz runs instead having new strategy / weights and ast_values
. We probably need to make the samples limit configurable and bump the default value
foundry/crates/evm/fuzz/src/strategies/state.rs
Lines 374 to 377 in 020d515
/// Insert sample values that are reused across multiple runs. | |
/// The number of samples is limited to invariant run depth. | |
/// If collected samples limit is reached then values are inserted as regular values. | |
pub fn insert_sample_values( |
Not blocking but I would recommend implementing constant folding to some degree i.e. evaluate |
thanks! One thing here - this means we should collect from tests too which we don't do in PR, is this correct? |
AFAIK neither Echidna or slither's printer filters tests out. I think not including forge-std makes sense. Also, I am not sure how the push/pop/log dictionary is managed currently in Foundry, but I think Echidna will always keep the constant pool around and eject the dynamically collected values after running a full sequence. For example, a user's balance that is emitted in one run may help within the same sequence but probably unlikely to help in a totally unrelated sequence. |
👍 @0xrusowsky let's include too
Please let us know if you see any redundant data / ways to improve the dict. Thank you! |
^ note that AST literals are injected into |
6f35d1b
to
fc4f3d4
Compare
a9373d6
to
3440429
Compare
thanks for the advise! will be tackled next on a follow-up PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, looks good! left some comments / nits, pls check
.prop_flat_map(move |(use_ast_index, select_index)| { | ||
let dict = state_clone.dictionary_read(); | ||
|
||
// AST string literals available: use 30/70 allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should follow the sample rules, we already have logic / bias to select them
maybe we could reuse same and return DynSolValue
s from ast analyzed String / bytes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym exactly?
bias
is a randomly generated bool
(50-50) but allocating 50% to ast seeded literals feels like a lot (before it was 0-100).
my idea was that by using Index
we can allocate a smaller pct to AST string literals, but we are already using them (30% of the time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to bool::weighted
as suggested in #12015 (comment)
let max_int_plus1 = U256::from(1).wrapping_shl(n - 1); | ||
let num = I256::from_raw(uint.wrapping_sub(max_int_plus1)); | ||
// Extract lower N bits | ||
let uint_n = U256::from_be_bytes(value.0) % U256::from(1).wrapping_shl(n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, need to make some more tests to see how this affects overall perf
DynSolType::String => { | ||
let state_clone = state.clone(); | ||
(any::<prop::sample::Index>(), any::<prop::sample::Index>()) | ||
.prop_flat_map(move |(use_ast_index, select_index)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use bool::weighted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionary.insert_db_values(accs); | ||
|
||
// Seed dict with AST literals if analysis is available. | ||
if let Some(compiler) = analysis { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer making it even lazier by cloning the compiler into the dictionary, and having OnceLock on the values. for tests you can call .set on the OnceLock and otherwise if let Some() = compiler { values.get_or_init(|| ) } else { None }
I guess it's a bit more complex because of the rwlock but you get the point, we want to initialize only when a value is requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in e7d3159
(#12015)
some impl notes:
- compiler is cloned into
LiteralsDictionary
, which hasOnceLock<LiteralMaps>
(as requested) - because of the
RwLock
limitations, i kinda used a two-stage lazy init:- AST traversal is deferred until first access via
samples()
,ast_strings()
, orast_bytes()
, which trigger the oncelock - merging literals into
sample_values
is deferred until first runtime sample insertion viaseed_samples()
- before merging:
samples()
returns AST literals directly (runtime samples are empty) - after merging:
samples()
returns from unifiedsample_values
- before merging:
- AST traversal is deferred until first access via
the trade-off is some memory overhead from keeping LiteralsDictionary
alive vs avoiding O(n) AST traversal unless fuzzer actually needs samples
}); | ||
|
||
forgetest_init!(should_fuzz_literals, |prj, cmd| { | ||
prj.clear_cache_dir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to clear cache
485efa5
to
e7d3159
Compare
Motivation
closes #10233
Solution
solar::sema::Compiler
to collect all relevant AST literals found in the sources (excluding libs and scripts) and seed theFuzzerDictionary
with them at initialization.TODO
Future improvements
PR Checklist