Skip to content

Conversation

Aaronontheweb
Copy link
Member

Purpose

This is an experimental draft PR to test whether removing ConfigureAwait(false) from AsyncWriteJournal.ExecuteBatch is sufficient to fix the intermittent InMemoryEventsByTagSpec test failures, WITHOUT any of the locking or data structure changes.

What This Tests

This branch is based on commit 59b2cbb87 (before PR #7869 and #7887) and applies ONLY the ConfigureAwait fix.

No other changes:

  • ❌ No ReaderWriterLockSlim → Monitor conversion
  • ❌ No ConcurrentDictionary changes
  • ❌ No lock timing diagnostics
  • ✅ Only removes ConfigureAwait(false) from line 395 of AsyncWriteJournal.cs

Hypothesis

The ConfigureAwait(false) was causing async continuations to run on thread pool threads instead of the actor's dispatcher thread. This created race conditions when WriteMessagesAsync implementations tried to access actor state or when timing-sensitive operations occurred.

If this PR's tests pass consistently, it would suggest:

  1. The locking changes in PR Fix writer starvation in MemoryJournal #7887 were treating symptoms, not the root cause
  2. The original locking mechanism (ReaderWriterLockSlim) was fine
  3. The ConfigureAwait was the actual problem

Testing

Focus on: Akka.Persistence.Query.InMemory.Tests.InMemoryEventsByTagSpec.ReadJournal_live_query_EventsByTag_should_find_events_from_offset_exclusive

This test has been failing intermittently for a long time and was the motivation for both PR #7869 and PR #7887.

…e timing issues

This change removes ConfigureAwait(false) from the ExecuteBatch method to ensure
async continuations run on the actor's dispatcher thread instead of arbitrary
thread pool threads. This maintains proper synchronization context and prevents
potential race conditions and deadlocks when WriteMessagesAsync implementations
access actor state or context.

Testing hypothesis: ConfigureAwait(false) may have been the root cause of
intermittent MemoryJournal test failures, not the locking mechanism.
@Aaronontheweb
Copy link
Member Author

Closing this draft PR as investigation revealed the hypothesis was incorrect.

The concurrency specialist analysis showed that:

  1. ConfigureAwait(false) is redundant here since ExecuteBatch is fire-and-forget and the circuit breaker already uses ConfigureAwait(false)
  2. No synchronization context exists to preserve anyway
  3. The persistence framework was correctly designed from the start - all Context access happens before async boundaries
  4. The actual fixes were the locking changes in PR Fix race condition in MemoryJournal #7869 and Fix writer starvation in MemoryJournal #7887, not ConfigureAwait removal

This was a valuable investigation that confirmed the architecture is sound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant