Fix NPE in ChannelPool metrics methods when pool is uninitialized#928
Conversation
|
Warning Review limit reached
More reviews will be available in 33 minutes and 53 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughTwo defensive fixes are applied to 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
When the initial TCP connection to a node fails, ChannelPool.connectFuture completes (so the pool is added to PoolManager.pools) while `channels` is still null — initialize() is only called on the success path. The four methods size(), getAvailableIds(), getInFlight(), and getOrphanedIds() called Arrays.stream(channels) without guarding against this, throwing NullPointerException whenever a Dropwizard Metrics reporter (JMX, Graphite, etc.) scraped the gauge values during the reconnection window. Fix: mirror the pattern already used by next(), which checks singleThreaded.initialized before touching channels. Using the volatile initialized flag is correct by the JMM: the volatile write at line 365 (initialized = true) happens-after all channels writes, so any thread that reads initialized == true is guaranteed to see channels as fully populated. Fixes: CUSTOMER-413
d6dc64c to
250cb9d
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a race where ChannelPool metric accessor methods could throw NullPointerException when a pool instance is published before its channels array is initialized (e.g., after an initial connection failure while reconnecting). This aligns the metric accessors with the existing safety pattern already used by next().
Changes:
- Guard
size(),getAvailableIds(),getInFlight(), andgetOrphanedIds()withsingleThreaded.initializedto safely return0when uninitialized. - Add a regression test ensuring those accessors return
0(and do not throw) whenchannelsis stillnullafter a failed initial connection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| core/src/main/java/com/datastax/oss/driver/internal/core/pool/ChannelPool.java | Adds initialized guards to prevent NPEs when channels is still null. |
| core/src/test/java/com/datastax/oss/driver/internal/core/pool/ChannelPoolInitTest.java | Adds a regression test covering metric accessor behavior for an uninitialized pool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@nikagra , it looks good, but it is too narrow, there are other problems due to postponed initialization, please fix them too update pr and commit message |
If the initial pool connection attempt fails, ChannelPool completes while still uninitialized and starts a reconnect loop. A session close during that state can complete immediately because there are no active channels yet. When the pending reconnect later succeeds, the uninitialized reconnect path previously initialized the pool and kept the new channel open even though the pool had already been closed. This could leave a leaked channel and incorrectly publish channelOpened after shutdown. Guard that reconnect-success path with isClosing: force-close the new channel and complete the reconnect attempt without initializing the pool. Add coverage for both closeAsync and forceCloseAsync when the first successful channel arrives after shutdown.
92221eb to
fdcbf39
Compare
Summary
Fixes a
NullPointerExceptionwhen Dropwizard Metrics reporters scrape per-node gauge values while the driver is reconnecting to a node.Affects:
com.scylladb:java-driver-core(confirmed in4.18.0.1, same code present inscylla-4.x).Root Cause
ChannelPoolhas aChannelSet[] channelsfield that is not initialized at construction — it starts asnull:channelsis only assigned insideSingleThreaded.initialize(), which is called only when the first channel connection succeeds:When the initial connection fails,
makeInitialConnection()completesconnectFutureimmediately — withchannelsstillnull— and starts a reconnect in the background:PoolManager.onPoolsInit()unconditionally puts every completed pool into thepoolsmap (PoolManager.java:252). The Javadoc comment there even notes: "pool init always succeeds". This is intentional design — but it means a pool withchannels == nullcan be in the map.Why the existing null-check doesn't help
AbstractMetricUpdaterguards against a missing pool entry, but not against an uninitialized pool:The NPE
The three gauge methods — and
size()— callArrays.stream(channels)with no null guard:Arrays.stream(null)throwsNullPointerException. This is triggered by any Dropwizard reporter (JMX, Graphite, Prometheus bridge, etc.) that callsGauge.getValue()during the window between a failed initial connection and a successful reconnect.Full call chain
Same path applies to
getAvailableIds()andgetOrphanedIds().Fix
Mirror the guard already used by
next(), which correctly handles this case via theinitializedflag:Apply the same guard to all four affected methods:
Why
initializedand notchannels != nullinitializedis declaredvolatile(meaning the JMM guarantees that a thread readinginitialized == truewill also observe all writes that preceded the volatile write, includingchannels = new ChannelSet[...]). Thechannelsfield itself is notvolatile, so checking it directly would be a weaker guarantee. Usinginitializedis both JMM-correct and consistent with the existing pattern innext().Why
size()is includedsize()has the identical unguardedArrays.stream(channels)pattern. While it is not currently called from any metrics path, it is a public method and would throw the same NPE if called before initialization.Changed files
core/src/main/java/com/datastax/oss/driver/internal/core/pool/ChannelPool.javasize()— addinitializedguardgetAvailableIds()— addinitializedguardgetInFlight()— addinitializedguardgetOrphanedIds()— addinitializedguard