svm: optimize local program cache creation by 2501babe · Pull Request #6036 · anza-xyz/agave

2501babe · 2025-04-30T08:47:27Z

Problem

presently, transaction processing constructs a per-batch local program cache before it does anything else. this is done in two steps:

the "filter" step (filter_executable_program_accounts()): get all account keys on the batch, make a list of those owned by loaders, include all builtins on this list unconditionally
the "replenish" step (replenish_program_cache()): build a ProgramCacheForTxBatch from this list of required programs, which acts as a view into already compiled bpf in the global cache, and triggers compilation of unseen/evicted programs

the filter step is notoriously incredibly slow because the account_matches_owners() call it uses is slow, and it must be called on every single account in the batch

Summary of Changes

the first version of this pr moved from a per-batch local cache to a set of per-transaction local caches, each populated after individual transaction loading. this solved the filter step but entailed additional locking on the global cache which was deemed undesirable

the new version of this pr creates an empty per-batch local cache and replenishes it with all builtins. then, after each transaction is loaded:

the "filter" step: get all account keys on the transaction. if a cache entry is already present in the local cache, increment the usage counter. otherwise, if the account is owned by a loader but not in the cache, add it to a list for replenish
the "replenish" step: add the new programs to the existing ProgramCacheForTxBatch

this has four major performance impacts:

we trivialize filter down to some hashmap lookups, when before it had to go to accounts-db for every account
replenish no longer has to go to accounts-db for anything except unloaded loaderv3 programdata, and once simd186 is active it will not need to go to accounts-db at all
the program cache is populated after transaction account loading and thus is capped by loaded account data size limits
we do not populate the cache at all for fee-only or discarded transactions

we also include a handful of improvements and bugfixes incidental to this work:

filter goes from returning HashMap<Pubkey, (&'a Pubkey, u64)> to HashSet<Pubkey>. we no longer need program owners, as these are useless since the activation of feature_set::disable_account_loader_special_case. we also no longer need usage counts: because we replenish one transaction at a time, the usage can only ever be 1!
in master, builtins are inserted to the replenish list after filter. this means all builtin usage counters were clamped to 0. we now record builtin usage normally
replenish works by locking the global cache, loading global cache hits into the local cache, selecting one global miss to cooperatively load across threads, and unlocking. it does this in a loop until all missing programs are loaded. usage counts are stored as AtomicU64. program cache entries are wrapped by Arc<_>. this means there are potential race conditions where updates to usage counters on a locally held program cache entry no longer propagate back to the global cache because the entry has been changed. we wrap usage counts in their own Arc<_>s to fix this
related to the above, when we hit a tombstone in the global cache, it tends to not return its own entry, but a dummy entry, and any usage counts do not propagate. we copy over the Arc<_>s to fix this
in master, usage counts of global cache misses are set to 2x what they should be. when a miss occurs, the executing thread creates an entry with the usage count from filter. but then the cooperative loading task is closed out by a new hit in extract, which means the usage count is added to the new entry a second time. we initialize new entries at 0 to fix this
we remove ix_usage_counter. only tx_usage_counter is necessary for cache eviction

codecov-commenter · 2025-04-30T09:40:27Z

Codecov Report

❌ Patch coverage is 94.61538% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (74de4ec) to head (e271445).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #6036     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         796      796             
  Lines      361926   361890     -36     
=========================================
- Hits       301468   301345    -123     
- Misses      60458    60545     +87

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

svm/src/transaction_processor.rs

2501babe · 2025-06-18T17:30:36Z

moving this out of draft because @alessandrod demonstrated it does what it is supposed to performance-wise. i annotated a few places where i am unclear on program cache semantics. this can only go in if people who understand the program cache better think this is all safe to do, and im happy to rework around that

mergify · 2025-08-04T10:48:45Z

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

Lichtso · 2025-08-04T12:43:29Z

program-runtime/src/loaded_programs.rs

    pub effective_slot: Slot,
    /// How often this entry was used by a transaction
-    pub tx_usage_counter: AtomicU64,
+    pub tx_usage_counter: Arc<AtomicU64>,


This could be one Arc for both counters, or we could remove the ix_usage_counter.

after searching through the codebase, it looks like nothing even uses it, it is only ever assigned to. ill remove it

2501babe · 2025-08-04T20:00:22Z

had to squash-rebase due to conflicts in the import list, but i think this is ready for review again!

Lichtso · 2025-08-05T08:08:19Z

svm/src/transaction_processor.rs

                    )
                    .expect("called load_program_with_pubkey() with nonexistent account");
-                    program.tx_usage_counter.store(count, Ordering::Relaxed);
+                    program.tx_usage_counter.store(0, Ordering::Relaxed);


~~Why reset the statistics here?~~

in master, usage counts of global cache misses are set to 2x what they should be. when a miss occurs, the executing thread creates an entry with the usage count from filter. but then the cooperative loading task is closed out by a new hit in extract, which means the usage count is added to the new entry a second time. we initialize new entries at 0 to fix this

b656970 actually this line is totally pointless, they all get initialized at 0 anyway

Lichtso · 2025-08-05T08:10:52Z

svm/src/transaction_processor.rs

                &mut execute_timings,
                config.check_program_modification_slot,
                config.limit_to_load_programs,
+                false, // increment_usage_counter


Why avoid counting the usage of built-ins?

they arent being used here, we just load them into the cache. the usage counter is incremented in filter if a transaction uses one, just like any other program

Lichtso · 2025-08-05T08:13:59Z

svm/src/transaction_processor.rs

+
+        // Create the batch-local program cache.
+        let mut program_cache_for_tx_batch = {
+            let program_cache = self.program_cache.read().unwrap();


Maybe rename it global_program_cache in the local scope and in the struct?

variables: a2f59ad
struct: 115cb82

i left variables outside svm/ as-is since they have no concept of a non-global program cache

Lichtso · 2025-08-06T11:59:23Z

Also, I think we can drop TransactionProcessingCallback::account_matches_owners() from the interface / trait.
We can do that in a separate PR.

2501babe mentioned this pull request Apr 30, 2025

svm: optimize local program cache creation #4697

Closed

2501babe self-assigned this Apr 30, 2025

2501babe mentioned this pull request May 6, 2025

svm: AccountLoader::load_account() refactor #4680

Merged

2501babe force-pushed the 20250430_pcache_ultimate_fix branch 2 times, most recently from 9e9d722 to 640be82 Compare May 8, 2025 21:45

2501babe force-pushed the 20250430_pcache_ultimate_fix branch 4 times, most recently from 06ba1a0 to 86a8a1c Compare June 5, 2025 14:15

2501babe force-pushed the 20250430_pcache_ultimate_fix branch from 86a8a1c to 45b526f Compare June 18, 2025 15:35

2501babe commented Jun 18, 2025

View reviewed changes