Skip to content

svm: optimize local program cache creation#6036

Merged
2501babe merged 5 commits intoanza-xyz:masterfrom
2501babe:20250430_pcache_ultimate_fix
Aug 7, 2025
Merged

svm: optimize local program cache creation#6036
2501babe merged 5 commits intoanza-xyz:masterfrom
2501babe:20250430_pcache_ultimate_fix

Conversation

@2501babe
Copy link
Member

@2501babe 2501babe commented Apr 30, 2025

Problem

presently, transaction processing constructs a per-batch local program cache before it does anything else. this is done in two steps:

  • the "filter" step (filter_executable_program_accounts()): get all account keys on the batch, make a list of those owned by loaders, include all builtins on this list unconditionally
  • the "replenish" step (replenish_program_cache()): build a ProgramCacheForTxBatch from this list of required programs, which acts as a view into already compiled bpf in the global cache, and triggers compilation of unseen/evicted programs

the filter step is notoriously incredibly slow because the account_matches_owners() call it uses is slow, and it must be called on every single account in the batch

Summary of Changes

the first version of this pr moved from a per-batch local cache to a set of per-transaction local caches, each populated after individual transaction loading. this solved the filter step but entailed additional locking on the global cache which was deemed undesirable

the new version of this pr creates an empty per-batch local cache and replenishes it with all builtins. then, after each transaction is loaded:

  • the "filter" step: get all account keys on the transaction. if a cache entry is already present in the local cache, increment the usage counter. otherwise, if the account is owned by a loader but not in the cache, add it to a list for replenish
  • the "replenish" step: add the new programs to the existing ProgramCacheForTxBatch

this has four major performance impacts:

  • we trivialize filter down to some hashmap lookups, when before it had to go to accounts-db for every account
  • replenish no longer has to go to accounts-db for anything except unloaded loaderv3 programdata, and once simd186 is active it will not need to go to accounts-db at all
  • the program cache is populated after transaction account loading and thus is capped by loaded account data size limits
  • we do not populate the cache at all for fee-only or discarded transactions

we also include a handful of improvements and bugfixes incidental to this work:

  • filter goes from returning HashMap<Pubkey, (&'a Pubkey, u64)> to HashSet<Pubkey>. we no longer need program owners, as these are useless since the activation of feature_set::disable_account_loader_special_case. we also no longer need usage counts: because we replenish one transaction at a time, the usage can only ever be 1!
  • in master, builtins are inserted to the replenish list after filter. this means all builtin usage counters were clamped to 0. we now record builtin usage normally
  • replenish works by locking the global cache, loading global cache hits into the local cache, selecting one global miss to cooperatively load across threads, and unlocking. it does this in a loop until all missing programs are loaded. usage counts are stored as AtomicU64. program cache entries are wrapped by Arc<_>. this means there are potential race conditions where updates to usage counters on a locally held program cache entry no longer propagate back to the global cache because the entry has been changed. we wrap usage counts in their own Arc<_>s to fix this
  • related to the above, when we hit a tombstone in the global cache, it tends to not return its own entry, but a dummy entry, and any usage counts do not propagate. we copy over the Arc<_>s to fix this
  • in master, usage counts of global cache misses are set to 2x what they should be. when a miss occurs, the executing thread creates an entry with the usage count from filter. but then the cooperative loading task is closed out by a new hit in extract, which means the usage count is added to the new entry a second time. we initialize new entries at 0 to fix this
  • we remove ix_usage_counter. only tx_usage_counter is necessary for cache eviction

@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2025

Codecov Report

❌ Patch coverage is 94.61538% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (74de4ec) to head (e271445).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #6036     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         796      796             
  Lines      361926   361890     -36     
=========================================
- Hits       301468   301345    -123     
- Misses      60458    60545     +87     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch 2 times, most recently from 9e9d722 to 640be82 Compare May 8, 2025 21:45
@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch 4 times, most recently from 06ba1a0 to 86a8a1c Compare June 5, 2025 14:15
@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch from 86a8a1c to 45b526f Compare June 18, 2025 15:35
@2501babe
Copy link
Member Author

moving this out of draft because @alessandrod demonstrated it does what it is supposed to performance-wise. i annotated a few places where i am unclear on program cache semantics. this can only go in if people who understand the program cache better think this is all safe to do, and im happy to rework around that

@2501babe 2501babe marked this pull request as ready for review June 18, 2025 17:30
@2501babe 2501babe requested a review from a team as a code owner June 18, 2025 17:30
@2501babe 2501babe requested review from Lichtso and alessandrod June 18, 2025 17:31
@2501babe 2501babe marked this pull request as draft July 2, 2025 16:56
@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch 3 times, most recently from 8f78140 to a6a0ec7 Compare July 11, 2025 07:56
@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch from c8fd8ff to f53573f Compare July 21, 2025 19:21
@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch 2 times, most recently from 36cb5ed to 1864d2a Compare August 4, 2025 10:48
@mergify
Copy link

mergify bot commented Aug 4, 2025

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

pub effective_slot: Slot,
/// How often this entry was used by a transaction
pub tx_usage_counter: AtomicU64,
pub tx_usage_counter: Arc<AtomicU64>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be one Arc for both counters, or we could remove the ix_usage_counter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after searching through the codebase, it looks like nothing even uses it, it is only ever assigned to. ill remove it

@2501babe 2501babe force-pushed the 20250430_pcache_ultimate_fix branch from 5313198 to f63e37a Compare August 4, 2025 15:36
@2501babe 2501babe marked this pull request as ready for review August 4, 2025 19:59
@2501babe
Copy link
Member Author

2501babe commented Aug 4, 2025

had to squash-rebase due to conflicts in the import list, but i think this is ready for review again!

)
.expect("called load_program_with_pubkey() with nonexistent account");
program.tx_usage_counter.store(count, Ordering::Relaxed);
program.tx_usage_counter.store(0, Ordering::Relaxed);
Copy link

@Lichtso Lichtso Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reset the statistics here?

in master, usage counts of global cache misses are set to 2x what they should be. when a miss occurs, the executing thread creates an entry with the usage count from filter. but then the cooperative loading task is closed out by a new hit in extract, which means the usage count is added to the new entry a second time. we initialize new entries at 0 to fix this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b656970 actually this line is totally pointless, they all get initialized at 0 anyway

&mut execute_timings,
config.check_program_modification_slot,
config.limit_to_load_programs,
false, // increment_usage_counter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why avoid counting the usage of built-ins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they arent being used here, we just load them into the cache. the usage counter is incremented in filter if a transaction uses one, just like any other program


// Create the batch-local program cache.
let mut program_cache_for_tx_batch = {
let program_cache = self.program_cache.read().unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it global_program_cache in the local scope and in the struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables: a2f59ad
struct: 115cb82

i left variables outside svm/ as-is since they have no concept of a non-global program cache

Lichtso
Lichtso previously approved these changes Aug 5, 2025
@Lichtso
Copy link

Lichtso commented Aug 6, 2025

Also, I think we can drop TransactionProcessingCallback::account_matches_owners() from the interface / trait.
We can do that in a separate PR.

@Lichtso Lichtso added the automerge automerge Merge this Pull Request automatically once CI passes label Aug 7, 2025
@2501babe 2501babe removed the automerge automerge Merge this Pull Request automatically once CI passes label Aug 7, 2025
@2501babe 2501babe merged commit 5bf937d into anza-xyz:master Aug 7, 2025
42 checks passed
@2501babe 2501babe deleted the 20250430_pcache_ultimate_fix branch August 7, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants