Skip to content

accounts-db: relax intrabatch account locks#4253

Merged
2501babe merged 10 commits intoanza-xyz:masterfrom
2501babe:20250103_simd83locking
Jun 1, 2025
Merged

accounts-db: relax intrabatch account locks#4253
2501babe merged 10 commits intoanza-xyz:masterfrom
2501babe:20250103_simd83locking

Conversation

@2501babe
Copy link
Member

@2501babe 2501babe commented Jan 3, 2025

Problem

simd83 proposes removing the constraint that transactions in the same entry cannot have read/write or write/write contention on any account. a previous pr modified svm to be able to carry over state changes between such transactions while processing a transaction batch. this pr modifies consensus to remove the account locking rules that prevent such batches from being created or replayed

Summary of Changes

add a new function to AccountLocks, try_lock_transaction_batch, which only checks for locking conflicts with other batches, allowing any account overlap within the batch itself. modify Accounts and Bank to use it instead when the feature gate relax_intrabatch_account_locks is activated. also modify prepare_sanitized_batch_with_results to deduplicate transactions within a batch by signature to prevent replay attacks, such that two instances of the same transaction cause the first to lock out the second, in a similar manner to the non-simd83 behavior for this special case

since transaction results are used more extensively than previous, some functions with *_wiith_results variants have bene collapsed into wrappers around that variant. we also refactor several things to favor iterators over vectors, to avoid places where iters are collected and transformed back into iters

important code changes are confined to accounts-db/src/accounts.rs, accounts-db/src/account_locks.rs, and runtime/src/bank.rs. changes in core, ledger, and runtime transaction batch only affect tests. overall the large majority of changes are fixes or improvements to tests

Feature Gate Issue: https://github.com/anza-xyz/feature-gate-tracker/issues/76

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 3e016b3 to f748a5f Compare January 3, 2025 10:46
@2501babe 2501babe self-assigned this Jan 3, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 4 times, most recently from 2fb8527 to 11d1dcb Compare January 7, 2025 11:23
@2501babe 2501babe changed the title accounts-db: only lock accounts across threads accounts-db: disable intrabatch account locks Jan 7, 2025
// HANA TODO the vec allocation here is unfortunate but hard to avoid
// we cannot do this in one closure because of borrow rules
// play around with alternate strategies, according to benches this may be up to
// 50% slower for small batches and few locks, but for large batches and many locks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bench using jemalloc? i'd think it would do a reasonable job of just keeping the mem in thread-local cache for re-use

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 95b17b3 to d1ec289 Compare January 13, 2025 12:13
@2501babe 2501babe changed the title accounts-db: disable intrabatch account locks accounts-db: relax intrabatch account locks Jan 13, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from a11ef4d to c234cc6 Compare January 17, 2025 14:23
Comment on lines 13 to 16
#[derive(Debug, Default)]
pub struct AccountLocks {
write_locks: AHashSet<Pubkey>,
write_locks: AHashMap<Pubkey, u64>,
readonly_locks: AHashMap<Pubkey, u64>,
}
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the read- and write-lock hashmaps have the same type now, all the functions that change them are basically the same. we could use an enum or hashmap reference to discriminate and delete half of the functions, but i left it like this for your review before butchering it

Comment on lines -565 to -577
relax_intrabatch_account_locks: bool,
) -> Vec<Result<()>> {
// Validate the account locks, then get iterator if successful validation.
let tx_account_locks_results: Vec<Result<_>> = txs
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accounts::lock_accounts() could be reimpled as a wrapper on lock_accounts_with_results() or possibly deleted, it isnt really required anymore since all batch-building needs to have results. but we could leave it as-is, or could do some kind of refactor with TransactionAccountLocksIterator

Comment on lines +596 to +610
fn lock_accounts_inner(
fn lock_accounts_inner<'a>(
&self,
tx_account_locks_results: Vec<Result<TransactionAccountLocksIterator<impl SVMMessage>>>,
tx_account_locks_results: impl Iterator<
Item = Result<TransactionAccountLocksIterator<'a, impl SVMMessage + 'a>>,
>,
relax_intrabatch_account_locks: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general where possible i changed vecs to iters, we eliminate several uses of collect() and just pass around the closure chains. this makes the type signatures look kind of stupid tho and there are possibly things we could refactor to be better (like maybe combining transactions with the transaction results instead of taking for granted they always have the same length). im undecided about style tho

Comment on lines +603 to +628
if relax_intrabatch_account_locks {
let validated_batch_keys = tx_account_locks_results.map(|tx_account_locks_result| {
tx_account_locks_result
.map(|tx_account_locks| tx_account_locks.accounts_with_is_writable())
});

account_locks.try_lock_transaction_batch(validated_batch_keys)
} else {
tx_account_locks_results
.map(|tx_account_locks_result| match tx_account_locks_result {
Ok(tx_account_locks) => account_locks
.try_lock_accounts(tx_account_locks.accounts_with_is_writable()),
Err(err) => Err(err),
})
.collect()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main branch that the feature controls. we pass it in as a param so Accounts doesnt need FeatureSet, only Bank. the only other place we use the feature is to enable signature-based transaction deduplication

Comment on lines +3183 to +3247
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());
let transaction_results =
transaction_results
.enumerate()
.map(|(i, tx_result)| match tx_result {
Ok(())
if relax_intrabatch_account_locks
&& !batch_signatures.insert(transactions[i].signature()) =>
{
Err(TransactionError::AccountInUse)
}
Ok(()) => Ok(()),
Err(e) => Err(e),
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the dedupe step mentioned in a comment above. we could use a double for loop instead of a hashset but this seemed much more straightforward since the inner loop would have to abort based on the outer loop index

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess it's almost certainly faster to just do a brute-force since our batches are small (replay will be size 1 for unified-scheduler); but might get complicated with the current iterator interface. Fine to leave it as is.

@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from d378d3a to c6d7105 Compare January 24, 2025 09:04
@2501babe
Copy link
Member Author

2501babe commented Jan 24, 2025

recent sample of bench comparisons, master vs this branch with simd83 enabled

bench_lock_accounts/batch_size_1_locks_count_2
                        time:   [188.03 µs 188.11 µs 188.19 µs]
                        thrpt:  [5.4414 Melem/s 5.4438 Melem/s 5.4461 Melem/s]
bench_lock_accounts/batch_size_1_locks_count_2_old
                        time:   [194.10 µs 194.20 µs 194.32 µs]
                        thrpt:  [5.2696 Melem/s 5.2729 Melem/s 5.2757 Melem/s]

bench_lock_accounts/batch_size_32_locks_count_64
                        time:   [5.3985 ms 5.6560 ms 5.9096 ms]
                        thrpt:  [173.28 Kelem/s 181.05 Kelem/s 189.68 Kelem/s]
bench_lock_accounts/batch_size_32_locks_count_64_simd83
                        time:   [3.0115 ms 3.0123 ms 3.0132 ms]
                        thrpt:  [339.84 Kelem/s 339.94 Kelem/s 340.03 Kelem/s]

bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts
                        time:   [2.8057 ms 2.8066 ms 2.8075 ms]
                        thrpt:  [364.74 Kelem/s 364.86 Kelem/s 364.97 Kelem/s]
bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts_simd83
                        time:   [2.8175 ms 2.8181 ms 2.8188 ms]
                        thrpt:  [363.28 Kelem/s 363.36 Kelem/s 363.44 Kelem/s]

in general we perform slightly worse for tiny batches and as well or better for large batches. note these benches call code in Accounts and AccountLocks but not Bank

@2501babe 2501babe marked this pull request as ready for review January 24, 2025 12:19
@2501babe 2501babe requested a review from apfitzge January 24, 2025 12:19
Comment on lines +3179 to +3180
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must dedup because we check for already_processed in a batch, then process, then add to the status_cache.

Is that correct summary of why we need to do this now?

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not fully confident in my understanding of the status cache but i believe the way it works is you execute the batch and the signatures of processed (non-dropped) transactions go in the status cache only after theyve all been run. there are no checks in svm (nor should there be) for duplicate transactions within a batch

if replay is going to single-batch everything i guess it would enforce this as a side effect but it seemed good to do it here, since this code already did enforce this constraint (a malicious block that put the same transaction in one entry multiple times would fail locking in replay, without anything involving status cache, because the transactions would take the same write lock on the fee-payer)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should enforce it for sure


// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use message_hash here instead of signature.

status-cache uses both, but signature is only necessary for RPC operation for fast signature lookup.
iirc reason to use message hash is because of signature malleability.

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used it because message_hash isnt provided by SVMMessage or SVMTransaction. would you like me to add it to SVMTransaction? its available on SanitizedTransaction but providing it from SanitizedMessage would require us to add it to LegacyMessage and v0::LoadedMessage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just change the trait bound to TransactionWithMeta on this function, that trait should provide it, and I'm fairly certain the things we actually call this with impl it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to squash the past because rebasing was getting ugly but this is bae70c0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the function directly as part of TransactionWithMeta, just call message_hash() on the TransactionWithMeta tx.

as_sanitized_transaction (unless it IS a sanitized transaction) will create a sanitized transaction. and possibly do 100s of allocations.
That fn is only there because of legacy interfaces - we shouldn't use it anywhere it's not strictly necessary (should be only geyser rn)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, i see it now. i only looked at TransactionWithMeta rather than StaticMeta

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment on that fn. I'll update it to be more clear that basically no one should be using that, except for the couple places we call into geyser.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made comments on that trait better here - #4827

@joncinque
Copy link

This PR contains changes to the solana sdk, which will be moved to a new repo within the next week, when v2.2 is branched from master.

Please merge or close this PR as soon as possible, or re-create the sdk changes when the new repository is ready at https://github.com/anza-xyz/solana-sdk

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from d2cf375 to bae70c0 Compare February 5, 2025 15:03
@2501babe
Copy link
Member Author

2501babe commented Feb 5, 2025

@joncinque this is a core runtime pr, the only sdk change is adding the feature gate. i assume when the new repo is created the procedure is going to be like:

  • pr the new feature gate to sdk which will be accepted and merged without needing to see any outside code
  • keep this pr, rebase on new sdk dependency and remove changes to sdk/feature-set/ which will no longer exist
  • continue with code review as normal

right?

@2501babe 2501babe requested a review from apfitzge May 12, 2025 11:39
@2501babe
Copy link
Member Author

reopened a re-requested reviews now that rebatching removal (#4833) and transaction balance collection (#5588) have both landed. this required some rebasing given the age but the only change is i rerolled the feature gate key, because it has migrated repos a couple times and it would be safer to not have to worry about the complications of that

Ok(()) => validate_account_locks(tx.account_keys(), tx_account_lock_limit)
.map(|_| TransactionAccountLocksIterator::new(tx)),
Err(err) => Err(err),
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterator will call into validate_account_locks on .next().

The iterator is passed into lock_accounts_inner and used inside the account locks - we should undo this change. I think it's better to have a vector collect here on validated account locks and pass that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually still pass the same iterator I think. We just shouldn't redo the validate_account_locks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i believe d0cabcc accomplishes the goal here. we keep the same outer iterator but collect the inner transaction results and writable bools before passing off to try_lock_accounts() or try_lock_transaction_batch()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That changeset actually looks like it made things even worse because now both the collect's and the expensive lock validation are all done while the accounts lock is acquired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i understand your meaning now. i misinterpreted the talk about reducing work done inside "account locks" to mean minimizing time inside AccountLocks::try_lock_accounts() / AccountLocks::try_lock_transaction_batch(), but now i see you mean minimizing time after let account_locks = &mut self.account_locks.lock().unwrap();. will refactor again tomorrow

Copy link
Member Author

@2501babe 2501babe May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9e73e22 ok all work is collected before the mutex. i deleted the old lock_accounts() since nothing used it anymore except tests (since we need to pass results for message hash dedupe) and consolidated all the logic in one statement in a renamed lock_accounts_with_results(), which hopefully makes it easier to follow

edit: eliminated a level of nesting

Comment on lines +52 to +53
.collect();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.collect() into immediate into_iter, this seems unnecessary. Am I missing something?

Copy link
Member Author

@2501babe 2501babe May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to prevent a double-borrow of self in the two maps (and two are needed because every can_lock_accounts() needs to be called before any lock_accounts())

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get by with only one collect for building the result vec. The intermediate vec for "available keys" isn't really needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do need it, unless theres something im missing in what you mean. if we were to write:

        validated_batch_keys
            .map(|validated_keys| match validated_keys {
                Ok(ref keys) => match self.can_lock_accounts(keys.clone()) {
                    Ok(_) => validated_keys,
                    Err(e) => Err(e),
                },
                Err(e) => Err(e),
            })
            .map(|available_keys| available_keys.map(|keys| self.lock_accounts(keys)))
            .collect()

the borrow checker fails us because the second map borrows &mut self while the first map still has &self. and this is actually good: if the code compiled, it would be wrong. every can_lock_accounts() call needs to complete before any lock_accounts() call, so if this was evaluated lazily it would essentially behave like the pre-simd83 version

if the cost of the allocation is a concern we could pass in a mut Vec<_> and .iter_mut().foreach() editing in-place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

37f5095 went ahead and did that, i think this makes the most sense

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something different in mind but I like this better, nice!

.feature_set
.is_active(&feature_set::relax_intrabatch_account_locks::id());

// with simd83 enabled, we must deduplicate transactions by message hash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 this is going to make our lives sooo much easier

Ok(()) => validate_account_locks(tx.account_keys(), tx_account_lock_limit)
.map(|_| TransactionAccountLocksIterator::new(tx)),
Err(err) => Err(err),
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That changeset actually looks like it made things even worse because now both the collect's and the expensive lock validation are all done while the accounts lock is acquired.

@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from 8f84ec7 to defe0eb Compare May 24, 2025 06:20
@2501babe 2501babe force-pushed the 20250103_simd83locking branch from defe0eb to 9e73e22 Compare May 24, 2025 16:30
}
}
/// NOTE this is the pre-SIMD83 logic and can be removed once SIMD83 is active.
pub fn try_lock_accounts(&mut self, keys: &[(&Pubkey, bool)]) -> TransactionResult<()> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a slice now? Seems like iterator was a better interface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

169f1ea (same as below)

Comment on lines +52 to +53
.collect();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get by with only one collect for building the result vec. The intermediate vec for "available keys" isn't really needed

.map(|_| {
TransactionAccountLocksIterator::new(tx)
.accounts_with_is_writable()
.collect::<Vec<_>>()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary to collect here. Why not keep the same approach as before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apfitzge
apfitzge previously approved these changes May 28, 2025
} else {
assert_eq!(results, vec![Ok(()), Err(TransactionError::AccountInUse)]);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do in a separate PR. IMO we should add in an additional test that checks the "partial" w/ intrabatch conflicts.

i.e.

  • outstanding lock on A
  • batch containing tx1, tx2:
    • tx1 locks (A, B)
    • tx2 locks (B)

I fully expect this works correctly as is, but just want some additional testing on it since the SIMD83 path relies on this matching our expectations.

if batch_message_hashes.insert(transactions[i].message_hash()) {
Ok(())
} else {
Err(TransactionError::AccountInUse)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand why you used AccountInUse here but it feels confusing in the context that it's totally fine to re-use accounts between transactions intra batch. Seems a little more accurate to use already processed instead:

Suggested change
Err(TransactionError::AccountInUse)
Err(TransactionError::AlreadyProcessed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this error as suggested and it turned out to have a runtime effect, as you can see from the test changes i made after: e2a0190. basically AccountInUse is retryable but AlreadyProcessed is not. i believe this is an unproblematic bug fix but please let me know if you think there could be any downstream ramifications. this is consensus-safe however because this match arm only triggers on the feature gate

depending on how many duplicate transactions make it through the scheduler, this could be a trivial or nontrivial optimization. im not familiar with the retry flow yet but it seems like because the current code in master dedupes using lock conflicts:

  • a batch with duplicates only weeds them out after the n^2 lock validation and after going partway through lock-taking
  • these duplicates then go back in the transaction queue to be retried, which means they will be validated again, take all locks, and then only get dropped when check_transactions() goes to the status cache

whereas with an AlreadyProcessed error they will get weeded out before lock validation and then not be retried. it doesnt help unless theyre in the same batch, which maybe is a rare or nonexistent case, but it seems like an unexpectedly good change!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, it seems like a good thing to drop them sooner rather than later. I'm not aware of any other downstream effects.

Comment on lines +46 to +53
let result = std::mem::replace(validated_keys, Err(TransactionError::AccountInUse));
*validated_keys = match result {
Ok(ref keys) => match self.can_lock_accounts(keys.clone()) {
Ok(_) => result,
Err(e) => Err(e),
},
Err(e) => Err(e),
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a little hard to follow with the use of mem::replace, could we simplify to something like this?

Suggested change
let result = std::mem::replace(validated_keys, Err(TransactionError::AccountInUse));
*validated_keys = match result {
Ok(ref keys) => match self.can_lock_accounts(keys.clone()) {
Ok(_) => result,
Err(e) => Err(e),
},
Err(e) => Err(e),
};
if let Ok(keys) = validated_keys {
if let Err(e) = self.can_lock_accounts(keys.clone()) {
*validated_keys = Err(e);
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +3230 to +3231
// with simd83 enabled, we must deduplicate transactions by message hash
// previously, conflicting account locks would dedupe transactions as a side effect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the same check on replay, this is only checking on the leader. Maybe we can move this into the TransactionBatch so that it's impossible to create a batch with duplicated transactions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think im going to move it to Bank::try_lock_accounts_with_results()... it cant go in TransactionBatch::new() as-written because the batch is constructed after locks are taken, and if you change any Ok to Err at that stage youll never fully unlock

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, that sounds better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch, thank you for this one 83bf900

Comment on lines +52 to +53
.collect();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something different in mind but I like this better, nice!


pub mod relax_intrabatch_account_locks {
solana_pubkey::declare_id!("EbAhnReKK8Sf88CvAfAXbgKji8DV48rsp4q2sgHqgWef");
solana_pubkey::declare_id!("B31HjiHtU5KCvzeGW2pa9DLDvrYazWdF6qFeKE5UQoqG");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this key ENTRYnPAoT5Swwx73YDGzMp3XnNH1kxacyvLosRHza1i from the issue here: #1029

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1108 to +1114
let conflicting_transaction = sanitize_transactions(vec![system_transaction::transfer(
&Keypair::new(),
&pubkey,
1,
genesis_config.hash(),
)]);
bank.try_lock_accounts(&conflicting_transaction);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is supposed to be testing. Shouldn't we be inspecting the result of try_lock_accounts? Feels like that method should have a #[must_use] annotation too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isnt testing try_lock_accounts(), its just setup code so the test batch sees a lock conflict on the second transaction regardless of simd83. my read of this test was "a conflict happens, we dont care why, is the cost tracker right?" in contrast to the test_bank_process_and_record_transactions_account_in_use() test below which is testing the locking itself

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so before simd 83 we would always be failing on intra batch account lock conflict. But after simd 83 it wouldn't fail so you changed the test so that it would fail effectively on inter batch account lock conflict. The comments above should be updated to reflect this. And I think the conflicting transaction creation and locking should happen before the creation of the test transactions.

),
]);
assert_eq!(
transactions[0].signature() == transactions[1].signature(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically fine but can we check for message hash equality instead? Whenever testing or handling duplicate transactions we should make it clear that transactions are identified by message hash not signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this line and the comment below got left behind because originally we were using the signatures

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let consumer = Consumer::new(committer, recorder, QosService::new(1), None);

// with simd83 and no duplicate, we take a cross-batch lock on an account to create a conflict
// with a duplicate transaction and simd83 it comes from signature equality in the batch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// with a duplicate transaction and simd83 it comes from signature equality in the batch
// with a duplicate transaction and simd83 it comes from message hash equality in the batch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1,
genesis_config.hash(),
)]);
bank.try_lock_accounts(&conflicting_transaction);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking the results right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above this is setup code and the real tests are on the return of consumer.process_and_record_transactions(&bank, &transactions);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, comments make more sense to me now, thanks!

Comment on lines +1295 to +1300
// with simd3, duplicate transactions are not retryable
if relax_intrabatch_account_locks && use_duplicate_transaction {
assert_eq!(retryable_transaction_indexes, Vec::<usize>::new());
} else {
assert_eq!(retryable_transaction_indexes, vec![1]);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I would expect retryable indexes to be empty for relax_intrabatch_account_locks && !use_duplicate_transaction too, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the test that sets up a lock conflict in that case:

        // with simd83 and no duplicate, we take a cross-batch lock on an account to create a conflict
        // with a duplicate transaction and simd83 it comes from message hash equality in the batch
        // without simd83 the conflict comes from locks in batch
        if relax_intrabatch_account_locks && !use_duplicate_transaction {
            let conflicting_transaction =
                sanitize_transactions(vec![system_transaction::transfer(
                    &Keypair::new(),
                    &pubkey1,
                    1,      
                    genesis_config.hash(),
                )]);
            bank.try_lock_accounts(&conflicting_transaction);
        }

for context the orginal version of this test had one case: no simd83 (obviously) and the same transaction twice. i expanded it into four cases to cover all codepaths

bank.get_balance(&keypair2.pubkey()),
];

assert!(result.is_err());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we changed the error for message hash duplication to "already processed" we would be able to differentiate the error type here before/after the feature activation. Seems kinda useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with this going in as is but think that one test could be cleaned up a bit and I think we should assert that try-lock in tests is actually locking successfully. That can all be done in a follow up since the tests look correct as is.

Comment on lines +1108 to +1114
let conflicting_transaction = sanitize_transactions(vec![system_transaction::transfer(
&Keypair::new(),
&pubkey,
1,
genesis_config.hash(),
)]);
bank.try_lock_accounts(&conflicting_transaction);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so before simd 83 we would always be failing on intra batch account lock conflict. But after simd 83 it wouldn't fail so you changed the test so that it would fail effectively on inter batch account lock conflict. The comments above should be updated to reflect this. And I think the conflicting transaction creation and locking should happen before the creation of the test transactions.

1,
genesis_config.hash(),
)]);
bank.try_lock_accounts(&conflicting_transaction);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, comments make more sense to me now, thanks!

@2501babe 2501babe merged commit 636f6c1 into anza-xyz:master Jun 1, 2025
58 checks passed
@2501babe 2501babe deleted the 20250103_simd83locking branch June 1, 2025 15:23
mircea-c pushed a commit to mircea-c/agave that referenced this pull request Jun 12, 2025
change locking rules in line with SIMD-0083 such that read/write and write/write overlap on accounts is allowed intra-batch, and thus intra-entry
mircea-c added a commit to mircea-c/agave that referenced this pull request Jun 12, 2025
change locking rules in line with SIMD-0083 such that read/write and write/write overlap on accounts is allowed intra-batch, and thus intra-entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants