Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553
Draft
JasonWildMe wants to merge 7 commits into
Draft
Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553JasonWildMe wants to merge 7 commits into
JasonWildMe wants to merge 7 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1553 +/- ##
=======================================
Coverage 51.54% 51.54%
=======================================
Files 308 308
Lines 11952 11952
Branches 3842 3833 -9
=======================================
Hits 6161 6161
- Misses 5503 5510 +7
+ Partials 288 281 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Both Shepherds in scanEndApplet.jsp previously had their close calls
outside any try/finally. The `indShepherd` block (lines 603-786) was
the source of accumulating `scanEndApplet.jsp_displayNames:begin`
entries on Sharkbook dbconnections.jsp: any exception during
`xmlReader.read(file)` on a partially-written scan XML, or any
`getMarkedIndividual` failure, skipped the close, and the page
auto-refreshes every 15s during active scans.
Wraps both Shepherds with try { ... } finally { rollbackAndClose(); }
matching the pattern used elsewhere in this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Most of the stuck "begin" entries on Sharkbook's dbconnections.jsp are not individual Shepherd leaks but symptoms of one upstream cause: threads waiting for a Postgres connection that is pinned by a long-running operation. Two fixes: 1. Encounter.opensearchIndexPermissions() — the periodic permissions sweep was holding a Postgres tx open for the entire duration of per-encounter OpenSearch HTTP updates. On installs with hundreds of thousands of encounters this pinned a connection for tens of minutes per run, starving every concurrent request behind it. Refactored into two phases: phase 1 loads users/collab maps and all eligible encounter rows into in-memory structures under a short-lived Shepherd, which is then closed; phase 2 iterates the in-memory rows and issues OpenSearch updates with no DB connection held. Same OpenSearch updates, same filter logic. 2. jdoconfig.properties — datanucleus.connectionPool.maxWait was -1 (wait forever). Threads blocked on the pool sat indefinitely and showed up as stuck "begin" entries. Set to 30000 ms so a request under contention fails fast with a clear error and pool slots free up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier commits on this branch saved Encounter.java, scanEndApplet.jsp and jdoconfig.properties with CRLF line endings, which made the GitHub PR diff display each file as entirely changed. main has these files as LF, so this commit normalizes back so reviewers see only the actual code changes from this branch. No functional change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The method opened a Shepherd at line 241 and only closed it on three
specific success/early-return paths (lines 290, 339, 375). Anything
that threw between begin and one of those closes leaked the PM:
- IBEISIAIdentificationMatchingState.allAsJSONArray() — a JDO query
- WildbookIAM.getIASpecies() and other lazy-loading getters in the
qanns/tanns iteration loops
- qanns.get(0).getMatchingSet() — runs an OpenSearch HTTP call while
the Postgres tx is still open; fails on slow/hung OpenSearch
- annotGetIndiv() — a JDO query
- Any unchecked Throwable
Wraps the body in try { ... } finally { rollbackAndClose() }, removes
the three explicit close pairs, and hoists the HashMap declaration
out of the try so the post-try RestClient.post() (which intentionally
runs after the DB connection is released) can still see it.
Same JSONObject results on every path; same "close DB before HTTP"
ordering preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eae68f9 to
a6d9942
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a set of PersistenceManager / connection leaks observed on production Sharkbook and Flukebook via
dbconnections.jsp(stuck entries atIBEISIA.processCallback:rollback,RestServlet:new, plus in-flightEncounter.opensearchDocumentSerializer:beginandapi.Bulk.doGet:begin).The single commit (993a81e) touches five files with +184 / -150 lines (functional changes only — ignoring CRLF→LF noise in the rest of the tree):
Shepherd.closeDBTransaction/rollbackDBTransaction— move theShepherdStatewrites intofinallyblocks so state always advances past"begin". Whenpm.close()orrollback()throws, the entry now stays as"close-failed"/"rollback-failed"so the dashboard preserves diagnostic evidence instead of either silently losing it or getting stuck at"begin".IBEISIA.processCallback— wrap both Shepherds intry { ... } finally { rollbackAndClose(); }. The barereturn rtn;on the no-log path (the source of theIBEISIA.processCallback:rollbackstuck entries) now runs the finally. Setup moved inside the try for consistency.RestServlet.doPost/doDelete/doHead— outertry/finallyaround each method body removes theRestServlet.class_<id>state entry on every return path, includingdoPost's empty-body 400 early return. Fixes the accumulatingRestServlet:newentries.BulkImport.doGet/doPost— Shepherd construction /setAction/beginDBTransactionmoved inside the try with a null-safe finally. Background-thread inner class capturesbgContextas final.OpenSearch.setPermissionsNeeded/setActive/unsetActive/updatePermissionsIndex/updateEncounterIndexes— same try-with-null-safe-finally hardening.Flagged as follow-up (not in this PR)
Executors.newFixedThreadPool(4)inEncounter/MarkedIndividual/OccurrenceopensearchIndexDeep, plus a nested Shepherd created per Jackson serialization inBase.opensearchDocumentSerializer(jgen)— this multiplies concurrent DB transactions during indexing storms. It is what drives the 70+ in-flightEncounter.opensearchDocumentSerializer:beginentries. Needs a design pass (shared bounded executor + pass the outer Shepherd through serialization).commitDBTransactionswallows exceptions — separate correctness bug:IBEISIA.processCallbackcan still claimsuccess=trueafter a silently-failed commit. Not a leak, so out of scope here.Test plan
mvn compileclean (verified locally)IBEISIA.processCallback: exercise the no-log early-return path, the successful detect path, and the "no annotations suitable for identification" path; confirm no stuckIBEISIA.processCallback:*entries accumulate ondbconnections.jspRestServlet: POST (empty body + normal), DELETE, HEAD requests while watchingdbconnections.jsp— confirmRestServlet:newno longer accumulatesBulkImportlist + detaildoGetanddoPost/doDeleteflowspm.close()failure — confirm the dashboard showsclose-failedrather than vanishingdbconnections.jspon staging for 24h post-deploy; confirm entry count stabilizesCredits
🤖 Generated with Claude Code and reviewed with Codex CLI