You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A lot of channel.send()/channel.recv() get contended on high number of txs (this is the same problem as poh.record conceptually) - this happens with the prio cache, with accountsdb.write_accounts_to etc.
This is also easy to fix: increase batching, sprinkle some spin looping. This is about not making threads go in a "i go to sleep; [other thread] wake up the sleeping thread" in a tight loop (syscall storm to wake up/sleep)
Another "fun" contention pattern we have is channel.send(item) <- this done from N threads, where the receiver is sleeping so it must be woken up. To be woken up, a mutex must be acquired on the channel, so multiple threads (producers) acquire it at the same time and only 1 thread succeeds, the others go to sleep. The one that succeeds sends the syscall to wake up the receiver, releases the mutex. All the other threads now race to acquire the mutex to wake up the receiver... which has been woken up already
So they manage to lock, see it's awake, do nothing. They contend the mutex for absolutely 0 reason. This happens with the prio cache, accountsdb, pretty much anything that gets executed with many txs from replay stage.
A lot of channel.send()/channel.recv() get contended on high number of txs (this is the same problem as poh.record conceptually) - this happens with the prio cache, with accountsdb.write_accounts_to etc.
This is also easy to fix: increase batching, sprinkle some spin looping. This is about not making threads go in a "i go to sleep; [other thread] wake up the sleeping thread" in a tight loop (syscall storm to wake up/sleep)
Another "fun" contention pattern we have is channel.send(item) <- this done from N threads, where the receiver is sleeping so it must be woken up. To be woken up, a mutex must be acquired on the channel, so multiple threads (producers) acquire it at the same time and only 1 thread succeeds, the others go to sleep. The one that succeeds sends the syscall to wake up the receiver, releases the mutex. All the other threads now race to acquire the mutex to wake up the receiver... which has been woken up already
So they manage to lock, see it's awake, do nothing. They contend the mutex for absolutely 0 reason. This happens with the prio cache, accountsdb, pretty much anything that gets executed with many txs from replay stage.
We could also consider moving to this: https://github.com/temporalxyz/que
The text was updated successfully, but these errors were encountered: