svm: optimize local program cache creation#6036
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6036 +/- ##
=========================================
- Coverage 83.2% 83.2% -0.1%
=========================================
Files 796 796
Lines 361926 361890 -36
=========================================
- Hits 301468 301345 -123
- Misses 60458 60545 +87 🚀 New features to boost your workflow:
|
9e9d722 to
640be82
Compare
06ba1a0 to
86a8a1c
Compare
86a8a1c to
45b526f
Compare
|
moving this out of draft because @alessandrod demonstrated it does what it is supposed to performance-wise. i annotated a few places where i am unclear on program cache semantics. this can only go in if people who understand the program cache better think this is all safe to do, and im happy to rework around that |
8f78140 to
a6a0ec7
Compare
c8fd8ff to
f53573f
Compare
36cb5ed to
1864d2a
Compare
|
The Firedancer team maintains a line-for-line reimplementation of the |
| pub effective_slot: Slot, | ||
| /// How often this entry was used by a transaction | ||
| pub tx_usage_counter: AtomicU64, | ||
| pub tx_usage_counter: Arc<AtomicU64>, |
There was a problem hiding this comment.
This could be one Arc for both counters, or we could remove the ix_usage_counter.
There was a problem hiding this comment.
after searching through the codebase, it looks like nothing even uses it, it is only ever assigned to. ill remove it
5313198 to
f63e37a
Compare
|
had to squash-rebase due to conflicts in the import list, but i think this is ready for review again! |
svm/src/transaction_processor.rs
Outdated
| ) | ||
| .expect("called load_program_with_pubkey() with nonexistent account"); | ||
| program.tx_usage_counter.store(count, Ordering::Relaxed); | ||
| program.tx_usage_counter.store(0, Ordering::Relaxed); |
There was a problem hiding this comment.
Why reset the statistics here?
in master, usage counts of global cache misses are set to 2x what they should be. when a miss occurs, the executing thread creates an entry with the usage count from filter. but then the cooperative loading task is closed out by a new hit in extract, which means the usage count is added to the new entry a second time. we initialize new entries at 0 to fix this
There was a problem hiding this comment.
b656970 actually this line is totally pointless, they all get initialized at 0 anyway
| &mut execute_timings, | ||
| config.check_program_modification_slot, | ||
| config.limit_to_load_programs, | ||
| false, // increment_usage_counter |
There was a problem hiding this comment.
Why avoid counting the usage of built-ins?
There was a problem hiding this comment.
they arent being used here, we just load them into the cache. the usage counter is incremented in filter if a transaction uses one, just like any other program
svm/src/transaction_processor.rs
Outdated
|
|
||
| // Create the batch-local program cache. | ||
| let mut program_cache_for_tx_batch = { | ||
| let program_cache = self.program_cache.read().unwrap(); |
There was a problem hiding this comment.
Maybe rename it global_program_cache in the local scope and in the struct?
|
Also, I think we can drop |
b656970 to
f5712dc
Compare
Problem
presently, transaction processing constructs a per-batch local program cache before it does anything else. this is done in two steps:
filter_executable_program_accounts()): get all account keys on the batch, make a list of those owned by loaders, include all builtins on this list unconditionallyreplenish_program_cache()): build aProgramCacheForTxBatchfrom this list of required programs, which acts as a view into already compiled bpf in the global cache, and triggers compilation of unseen/evicted programsthe filter step is notoriously incredibly slow because the
account_matches_owners()call it uses is slow, and it must be called on every single account in the batchSummary of Changes
the first version of this pr moved from a per-batch local cache to a set of per-transaction local caches, each populated after individual transaction loading. this solved the filter step but entailed additional locking on the global cache which was deemed undesirable
the new version of this pr creates an empty per-batch local cache and replenishes it with all builtins. then, after each transaction is loaded:
ProgramCacheForTxBatchthis has four major performance impacts:
we also include a handful of improvements and bugfixes incidental to this work:
HashMap<Pubkey, (&'a Pubkey, u64)>toHashSet<Pubkey>. we no longer need program owners, as these are useless since the activation offeature_set::disable_account_loader_special_case. we also no longer need usage counts: because we replenish one transaction at a time, the usage can only ever be 1!AtomicU64. program cache entries are wrapped byArc<_>. this means there are potential race conditions where updates to usage counters on a locally held program cache entry no longer propagate back to the global cache because the entry has been changed. we wrap usage counts in their ownArc<_>s to fix thisArc<_>s to fix thisix_usage_counter. onlytx_usage_counteris necessary for cache eviction