-
Notifications
You must be signed in to change notification settings - Fork 118
Fix race condition crash in LambdaRuntimeClient channel lifecycle (Bug #624) #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@fabianfett I believe this fixes the problem #624 you reported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
Sources/AWSLambdaRuntime/HTTPClient/LambdaRuntimeClient.swift:152
- The
close()method checks onlyclosingConnections.isEmptybut doesn't checkchannelsBeingClosed.isEmpty. This could cause the continuation to resume prematurely when there are still channels being closed, leading to incomplete cleanup. The condition should beif self.closingConnections.isEmpty && channelsBeingClosed.isEmptyto be consistent with the other close completion checks.
if self.closingConnections.isEmpty {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Track channels that are in the process of closing to handle race conditions | ||
| // where an old channel's closeFuture fires after a new connection is established | ||
| private var channelsBeingClosed: Set<ObjectIdentifier> = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between closingConnections and channelsBeingClosed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh, in my view, channelsBeingClosed provides early detection at the top of channelClosed() to identify old channels and handle them gracefully without touching the main state machine. closingConnections ensures we wait for all channels (both current and old) to fully close before completing shutdown.
But you're question makes me think more about this (thank you) and maybe, these can be merged to keep only one source of truth.
If we keep just closingConnections, we will write
// Instead of:
if channelsBeingClosed.contains(channelID) { ... }
// We could do:
if self.closingConnections.contains(where: { $0 === channel }) { ... }Let me test that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam-fowler Thank you for the question. I simplified and now use only closingConnections to keep track of closingc onnections. One single source of truth and a simpler mental model
This PR fixes a race condition in
LambdaRuntimeClientthat causes a fatal crash when an old channel'scloseFuturecallback fires after a new connection has been established. The fix adds proper channel lifecycle tracking and replaces the fatal error with graceful handling.Problem
Crash Location:
LambdaRuntimeClient.swift:270inchannelClosed()Error Message:
Root Cause: Race condition where:
closeFuturecallback firesconnectionState = .connected)closingStateis.closedfrom a previous close operationfatalErrorThis can occur when:
Solution
Key Changes
Added channel identity tracking:
Tracks which channels are in the process of closing to distinguish old channels from the current one.
Enhanced
connectionWillClose():ObjectIdentifierRewrote
channelClosed()with defensive logic:fatalErrorwith warning log: The(_, .closed)case now logs a warning instead of crashingconnectionState = .disconnectedfor ANY channel close, now only for the current channelWhy This Fixes the Bug
The fix addresses the race condition by:
Changes
Modified Files
channelsBeingClosed: Set<ObjectIdentifier>propertyconnectionWillClose()with channel trackingchannelClosed()with defensive logic and identity checksfatalErrorwith warning log for unexpected statescloseFuturecallbackLines Changed: ~150 lines modified/added
Backward Compatibility: ✅ Fully compatible, no API changes
Testing
✅ All Existing Tests Pass
All original functionality is preserved with no regressions.
While we cannot reproduce the exact race condition from bug #624 in a deterministic test (it requires specific network timing), the fix:
Related Issues
Fixes #624