Fix FK violations when deleting annotations (bulk import delete + others)#1560
Closed
JasonWildMe wants to merge 2 commits intomainfrom
Closed
Fix FK violations when deleting annotations (bulk import delete + others)#1560JasonWildMe wants to merge 2 commits intomainfrom
JasonWildMe wants to merge 2 commits intomainfrom
Conversation
Bulk import deletion (and any flow that deletes an annotation that has been through identification) was failing with: ERROR: update or delete on table "ANNOTATION" violates foreign key constraint "MATCHRESULT_FK1" on table "MATCHRESULT" Multiple tables hold FK references to ANNOTATION.ID and JDO mappings do not cascade-delete on those references: MatchResult.queryAnnotation -> MATCHRESULT_FK1 (allows-null=false) MatchResultProspect.annotation -> FK (allows-null=false) Embedding.annotation -> EMBEDDING.ANNOTATION_ID FK Task.objectAnnotations -> TASK_OBJECTANNOTATIONS join FK MatchResult.candidates -> MATCHRESULT_CANDIDATES join FK (rarely populated) Fix: Shepherd.throwAwayAnnotation now cleans up all of these before deleting the annotation, with explicit pm.flush() between groups so the dependent rows commit to the DB before the annotation FK check runs. Five direct callers that bypassed throwAwayAnnotation (EncounterPatchValidator, EncounterRemoveAnnotation x2, AnnotationEdit) are migrated to use it. Also adds dependent-element="true" on MatchResult.prospects in the JDO mapping so any future code that deletes a MatchResult directly also cascade-deletes its prospects, preventing the same bug from re-emerging. Caller's only remaining responsibility is to detach the annotation from its owning Encounter (enc.removeAnnotation) before calling. That's a business decision about which encounter the annotation belongs to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1560 +/- ##
==========================================
+ Coverage 51.54% 51.57% +0.03%
==========================================
Files 307 307
Lines 11942 11951 +9
Branches 3827 3833 +6
==========================================
+ Hits 6155 6164 +9
Misses 5509 5509
Partials 278 278
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Bulk-import deletion was extremely slow even after the FK-violation fix
landed. A 4-encounter import took 2-3 minutes; a 200-encounter import
would take hours. Three contributors: per-annotation cleanup queries,
unindexed FK columns on the IA tables, and no UI feedback during the
wait.
Backend (B+C):
* New Shepherd.throwAwayAnnotations(Collection<Annotation>) batches
the FK cleanup. One JDOQL query per dependent table (MatchResult,
MatchResultProspect, Embedding, Task.objectAnnotations, candidates
collection) instead of N x 6 queries. Existing throwAwayAnnotation
now delegates to this with a singleton list.
* ImportTask.deleteWithRelated refactored: collect all annotations in
the encounter loop, do per-encounter cleanup (occurrence, individual,
project, throwAwayEncounter) inside the loop, then call
throwAwayAnnotations once at the end. Removes the redundant per-
annotation Task.removeObject loop now that the batch helper handles
it. Util.mark timing logs around the encounter loop and the batch
cleanup phase.
* JDO indexes added on the FK columns the batch queries filter by:
MATCHRESULT.queryAnnotation -> MATCHRESULT_QUERYANNOTATION_idx
MATCHRESULTPROSPECT.annotation -> MATCHRESULTPROSPECT_ANNOTATION_idx
EMBEDDING.annotation -> EMBEDDING_ANNOTATION_idx
Postgres does not auto-index FK columns, so previously each cleanup
query was a sequential scan of (potentially huge) IA tables. With
the indexes plus batching, query count and per-query cost both
drop dramatically.
datanucleus.schema.autoCreateAll=true means these are created on
next startup automatically. For very large existing deployments the
blocking CREATE INDEX may need to be replaced with a manual
CREATE INDEX CONCURRENTLY before the upgrade.
Frontend (A):
* Spinner + disabled state on the Delete Import Task button while the
delete is in flight. New i18n key BULK_IMPORT_DELETE_TASK_IN_PROGRESS
added to all 5 locale files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
|
Collaborator
Author
|
Closing in preference to a better fix noted above. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related fixes for
BulkImport.doDeletein one PR:MATCHRESULT_FK1because annotations were deleted while still referenced fromMATCHRESULTand other IA tables.After both fixes: deletion is functionally correct AND fast (4-encounter import should drop from 2-3 minutes to seconds), and users get a spinner while it runs.
Commit 1 — Fix FK violations when deleting annotations
Five tables hold FK references to
ANNOTATION.IDand JDO mappings do not cascade-delete on any of them:MatchResult.queryAnnotationMATCHRESULT_FK1(allows-null=false)MatchResultProspect.annotationallows-null=falseEmbedding.annotationEMBEDDING.ANNOTATION_IDFKTask.objectAnnotationsTASK_OBJECTANNOTATIONSjoin FKMatchResult.candidatesMATCHRESULT_CANDIDATESjoin FK (rarely populated)Shepherd.throwAwayAnnotationnow cleans all five before deleting the annotation, withpm.flush()between groups so dependent rows commit to the DB before the annotation FK check.Five direct callers that bypassed
throwAwayAnnotationand calledpm.deletePersistent(ann)directly are migrated:EncounterPatchValidator.java:251EncounterRemoveAnnotation.java:107and:124AnnotationEdit.java:166MatchResult.prospectsis also declareddependent-element="true"defensively so future code that deletes aMatchResultcascades correctly.Caller contract: caller is still responsible for detaching from
Encounter.annotations(enc.removeAnnotation(ann)). Everything else is handled.Commit 2 — Speed up bulk-import deletion + add UI feedback
Backend (B+C)
Batched cleanup (
Shepherd.throwAwayAnnotations(Collection<Annotation>)): one JDOQL query per dependent table for an entire import, instead of N×6 per-annotation queries.throwAwayAnnotation(Annotation)now delegates to this with a singleton list.Refactored
ImportTask.deleteWithRelated: collects all annotations across all encounters in the encounter loop, does per-encounter cleanup (occurrence/individual/project/throwAwayEncounter) inside the same loop, then callsthrowAwayAnnotationsonce at the end. Removed the redundant per-annotationTask.removeObjectloop now that the batch helper handles it.Util.marktiming logs added around the encounter loop and the batch cleanup phase so future slowness can be diagnosed quickly.JDO indexes on the FK columns the batch queries filter by:
MATCHRESULT.queryAnnotation→MATCHRESULT_QUERYANNOTATION_idxMATCHRESULTPROSPECT.annotation→MATCHRESULTPROSPECT_ANNOTATION_idxEMBEDDING.annotation→EMBEDDING_ANNOTATION_idxPostgreSQL does not auto-index FK columns. Without these, every cleanup query was a sequential scan of (potentially huge) IA tables. With batching + indexes, query count and per-query cost both drop dramatically.
Frontend (A)
Spinner + disabled state on the Delete Import Task button while delete is in flight. New i18n key
BULK_IMPORT_DELETE_TASK_IN_PROGRESSin all 5 locale files (en/de/fr/es/it).datanucleus.schema.autoCreateAll=trueis enabled injdoconfig.properties, so the three new indexes will be created on next startup automatically. The defaultCREATE INDEXstatement is blocking — on very large existing IA tables (millions of rows inMATCHRESULT/MATCHRESULTPROSPECT/EMBEDDING) this can lock the tables for the duration of the build, blocking writes (and detection callbacks).For large deployments, run these once before the deploy, then deploy normally:
Verify column names against the actual schema first (DataNucleus generates them; sample query:
\d "MATCHRESULT"in psql). If indexes already exist with the same name, DataNucleus' auto-create will be a no-op on startup and there's no further action needed.For small/staging deployments, just deploy and let
autoCreateAllcreate them on startup.Test plan
mvn compile -DskipTestssucceeds with the new JDO mappings.Util.marklines for the encounter loop and batched cleanup phases, with reasonable durations.EncounterDeleteservlet) and an annotation via React encounter page (EncounterPatchValidatorpatch remove) — both should still work, no FK regressions.MATCHRESULT,MATCHRESULTPROSPECT,EMBEDDING, orTASK_OBJECTANNOTATIONS.Codex review trail
This PR was developed with Codex peer review at the diagnosis, design, implementation, and final-QA stages. Codex caught (and we addressed): a missed direct deletion caller in
AnnotationEdit.java, the latentEmbedding.annotationFK that would FK-fail next, the need forTask.objectAnnotationscleanup insidethrowAwayAnnotationitself, the need to verify the:ids.contains(...)JDOQL syntax (verified — already used inEncounterExport.java:114), and theCREATE INDEX CONCURRENTLYnote for large prod tables. Final response on the perf + UX iteration: "ship it".🤖 Generated with Claude Code