Skip to content

Fix FK violations when deleting annotations (bulk import delete + others)#1560

Closed
JasonWildMe wants to merge 2 commits intomainfrom
fix/bulk-import-delete-matchresult-fk
Closed

Fix FK violations when deleting annotations (bulk import delete + others)#1560
JasonWildMe wants to merge 2 commits intomainfrom
fix/bulk-import-delete-matchresult-fk

Conversation

@JasonWildMe
Copy link
Copy Markdown
Collaborator

@JasonWildMe JasonWildMe commented Apr 27, 2026

Summary

Two related fixes for BulkImport.doDelete in one PR:

  1. FK violation fix (commit 1) — bulk import deletion was throwing MATCHRESULT_FK1 because annotations were deleted while still referenced from MATCHRESULT and other IA tables.
  2. Performance + UX fix (commit 2) — even after Corrected some Spanish mistakes #1, a 4-encounter import took 2-3 minutes to delete (200 encounters → hours). Combined with no UI feedback, deletion was effectively unusable on real-world imports.

After both fixes: deletion is functionally correct AND fast (4-encounter import should drop from 2-3 minutes to seconds), and users get a spinner while it runs.


Commit 1 — Fix FK violations when deleting annotations

Five tables hold FK references to ANNOTATION.ID and JDO mappings do not cascade-delete on any of them:

Reference Constraint Cleanup
MatchResult.queryAnnotation MATCHRESULT_FK1 (allows-null=false) Delete the MR row
MatchResultProspect.annotation allows-null=false Delete the prospect row
Embedding.annotation EMBEDDING.ANNOTATION_ID FK Delete the embedding row
Task.objectAnnotations TASK_OBJECTANNOTATIONS join FK Remove join row
MatchResult.candidates MATCHRESULT_CANDIDATES join FK (rarely populated) Remove join row

Shepherd.throwAwayAnnotation now cleans all five before deleting the annotation, with pm.flush() between groups so dependent rows commit to the DB before the annotation FK check.

Five direct callers that bypassed throwAwayAnnotation and called pm.deletePersistent(ann) directly are migrated:

  • EncounterPatchValidator.java:251
  • EncounterRemoveAnnotation.java:107 and :124
  • AnnotationEdit.java:166

MatchResult.prospects is also declared dependent-element="true" defensively so future code that deletes a MatchResult cascades correctly.

Caller contract: caller is still responsible for detaching from Encounter.annotations (enc.removeAnnotation(ann)). Everything else is handled.


Commit 2 — Speed up bulk-import deletion + add UI feedback

Backend (B+C)

Batched cleanup (Shepherd.throwAwayAnnotations(Collection<Annotation>)): one JDOQL query per dependent table for an entire import, instead of N×6 per-annotation queries. throwAwayAnnotation(Annotation) now delegates to this with a singleton list.

Refactored ImportTask.deleteWithRelated: collects all annotations across all encounters in the encounter loop, does per-encounter cleanup (occurrence/individual/project/throwAwayEncounter) inside the same loop, then calls throwAwayAnnotations once at the end. Removed the redundant per-annotation Task.removeObject loop now that the batch helper handles it. Util.mark timing logs added around the encounter loop and the batch cleanup phase so future slowness can be diagnosed quickly.

JDO indexes on the FK columns the batch queries filter by:

  • MATCHRESULT.queryAnnotationMATCHRESULT_QUERYANNOTATION_idx
  • MATCHRESULTPROSPECT.annotationMATCHRESULTPROSPECT_ANNOTATION_idx
  • EMBEDDING.annotationEMBEDDING_ANNOTATION_idx

PostgreSQL does not auto-index FK columns. Without these, every cleanup query was a sequential scan of (potentially huge) IA tables. With batching + indexes, query count and per-query cost both drop dramatically.

Frontend (A)

Spinner + disabled state on the Delete Import Task button while delete is in flight. New i18n key BULK_IMPORT_DELETE_TASK_IN_PROGRESS in all 5 locale files (en/de/fr/es/it).


⚠️ Deployment note for large existing databases

datanucleus.schema.autoCreateAll=true is enabled in jdoconfig.properties, so the three new indexes will be created on next startup automatically. The default CREATE INDEX statement is blocking — on very large existing IA tables (millions of rows in MATCHRESULT/MATCHRESULTPROSPECT/EMBEDDING) this can lock the tables for the duration of the build, blocking writes (and detection callbacks).

For large deployments, run these once before the deploy, then deploy normally:

CREATE INDEX CONCURRENTLY "MATCHRESULT_QUERYANNOTATION_idx"
  ON "MATCHRESULT" ("QUERYANNOTATION_ID_OID");
CREATE INDEX CONCURRENTLY "MATCHRESULTPROSPECT_ANNOTATION_idx"
  ON "MATCHRESULTPROSPECT" ("ANNOTATION_ID_OID");
CREATE INDEX CONCURRENTLY "EMBEDDING_ANNOTATION_idx"
  ON "EMBEDDING" ("ANNOTATION_ID");

Verify column names against the actual schema first (DataNucleus generates them; sample query: \d "MATCHRESULT" in psql). If indexes already exist with the same name, DataNucleus' auto-create will be a no-op on startup and there's no further action needed.

For small/staging deployments, just deploy and let autoCreateAll create them on startup.


Test plan

  • mvn compile -DskipTests succeeds with the new JDO mappings.
  • Lint clean on the changed JS file.
  • Locale JSON validates parse-able.
  • Delete a 4-encounter bulk import that has been through identification → completes in seconds (was 2-3 minutes).
  • Delete a 200-encounter bulk import → completes in tens of seconds at most (was effectively impossible).
  • During delete: UI shows spinner with "Deleting..." label, button is disabled, no double-submit possible.
  • Catalina log shows Util.mark lines for the encounter loop and batched cleanup phases, with reasonable durations.
  • Delete a single encounter (EncounterDelete servlet) and an annotation via React encounter page (EncounterPatchValidator patch remove) — both should still work, no FK regressions.
  • Verify after each deletion that orphan rows aren't left in MATCHRESULT, MATCHRESULTPROSPECT, EMBEDDING, or TASK_OBJECTANNOTATIONS.

Codex review trail

This PR was developed with Codex peer review at the diagnosis, design, implementation, and final-QA stages. Codex caught (and we addressed): a missed direct deletion caller in AnnotationEdit.java, the latent Embedding.annotation FK that would FK-fail next, the need for Task.objectAnnotations cleanup inside throwAwayAnnotation itself, the need to verify the :ids.contains(...) JDOQL syntax (verified — already used in EncounterExport.java:114), and the CREATE INDEX CONCURRENTLY note for large prod tables. Final response on the perf + UX iteration: "ship it".

🤖 Generated with Claude Code

Bulk import deletion (and any flow that deletes an annotation that has
been through identification) was failing with:

  ERROR: update or delete on table "ANNOTATION" violates foreign key
  constraint "MATCHRESULT_FK1" on table "MATCHRESULT"

Multiple tables hold FK references to ANNOTATION.ID and JDO mappings do
not cascade-delete on those references:

  MatchResult.queryAnnotation -> MATCHRESULT_FK1 (allows-null=false)
  MatchResultProspect.annotation -> FK (allows-null=false)
  Embedding.annotation -> EMBEDDING.ANNOTATION_ID FK
  Task.objectAnnotations -> TASK_OBJECTANNOTATIONS join FK
  MatchResult.candidates -> MATCHRESULT_CANDIDATES join FK (rarely populated)

Fix: Shepherd.throwAwayAnnotation now cleans up all of these before
deleting the annotation, with explicit pm.flush() between groups so
the dependent rows commit to the DB before the annotation FK check
runs. Five direct callers that bypassed throwAwayAnnotation
(EncounterPatchValidator, EncounterRemoveAnnotation x2, AnnotationEdit)
are migrated to use it.

Also adds dependent-element="true" on MatchResult.prospects in the
JDO mapping so any future code that deletes a MatchResult directly
also cascade-deletes its prospects, preventing the same bug from
re-emerging.

Caller's only remaining responsibility is to detach the annotation
from its owning Encounter (enc.removeAnnotation) before calling.
That's a business decision about which encounter the annotation
belongs to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JasonWildMe JasonWildMe requested a review from naknomum April 27, 2026 01:16
@JasonWildMe JasonWildMe self-assigned this Apr 27, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.57%. Comparing base (34de7cb) to head (d62d1f4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1560      +/-   ##
==========================================
+ Coverage   51.54%   51.57%   +0.03%     
==========================================
  Files         307      307              
  Lines       11942    11951       +9     
  Branches     3827     3833       +6     
==========================================
+ Hits         6155     6164       +9     
  Misses       5509     5509              
  Partials      278      278              
Flag Coverage Δ
backend 51.57% <100.00%> (+0.03%) ⬆️
frontend 51.57% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bulk-import deletion was extremely slow even after the FK-violation fix
landed. A 4-encounter import took 2-3 minutes; a 200-encounter import
would take hours. Three contributors: per-annotation cleanup queries,
unindexed FK columns on the IA tables, and no UI feedback during the
wait.

Backend (B+C):

* New Shepherd.throwAwayAnnotations(Collection<Annotation>) batches
  the FK cleanup. One JDOQL query per dependent table (MatchResult,
  MatchResultProspect, Embedding, Task.objectAnnotations, candidates
  collection) instead of N x 6 queries. Existing throwAwayAnnotation
  now delegates to this with a singleton list.

* ImportTask.deleteWithRelated refactored: collect all annotations in
  the encounter loop, do per-encounter cleanup (occurrence, individual,
  project, throwAwayEncounter) inside the loop, then call
  throwAwayAnnotations once at the end. Removes the redundant per-
  annotation Task.removeObject loop now that the batch helper handles
  it. Util.mark timing logs around the encounter loop and the batch
  cleanup phase.

* JDO indexes added on the FK columns the batch queries filter by:
    MATCHRESULT.queryAnnotation -> MATCHRESULT_QUERYANNOTATION_idx
    MATCHRESULTPROSPECT.annotation -> MATCHRESULTPROSPECT_ANNOTATION_idx
    EMBEDDING.annotation -> EMBEDDING_ANNOTATION_idx
  Postgres does not auto-index FK columns, so previously each cleanup
  query was a sequential scan of (potentially huge) IA tables. With
  the indexes plus batching, query count and per-query cost both
  drop dramatically.

  datanucleus.schema.autoCreateAll=true means these are created on
  next startup automatically. For very large existing deployments the
  blocking CREATE INDEX may need to be replaced with a manual
  CREATE INDEX CONCURRENTLY before the upgrade.

Frontend (A):

* Spinner + disabled state on the Delete Import Task button while the
  delete is in flight. New i18n key BULK_IMPORT_DELETE_TASK_IN_PROGRESS
  added to all 5 locale files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JasonWildMe JasonWildMe marked this pull request as draft April 27, 2026 18:55
@naknomum
Copy link
Copy Markdown
Member

naknomum commented May 5, 2026

⚠️ this is being left in draft temporarily, likely to be deleted.
it has been superceded by PR1563.

@JasonWildMe
Copy link
Copy Markdown
Collaborator Author

Closing in preference to a better fix noted above.

@JasonWildMe JasonWildMe closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants