Skip to content

idea: saga compensation support across the casehub platform #238

@mdproctor

Description

@mdproctor

What This Is

An ideas-capture issue. Nothing here is a spec. These are concepts surfaced during discussion of WorkItem lifecycle alignment with WS-HumanTask, BPMN 2.0, and CMMN specs. Saga compensation is a platform-wide concern — this issue is filed in casehub-work as a starting point but the design will span casehub-engine, casehub-ledger, casehub-qhorus, and casehub-connectors.


The Problem

casehub has no compensation model. When a case fails mid-execution, completed steps cannot be undone in a structured, auditable way. Current workarounds:

  • Create a new WorkItem to manually undo previous work (loses the causal link)
  • CANCEL the case (loses the semantic distinction between stopping and compensating)
  • Leave completed steps as-is and note the failure in audit (leaves the system in an inconsistent state for the domain)

This is the Saga pattern gap. A saga is a sequence of steps where each step has a corresponding compensating transaction that undoes its effect if a later step fails or if an operator triggers rollback.


Why This Matters for casehub

casehub is positioned for regulated, compliance-first deployments (EU AI Act Art.12, GDPR Art.17/22). In these contexts:

  • A clinical trial modification approved by an IRB WorkItem may need compensating if a downstream protocol constraint is violated
  • A financial decision completed by an agent may need compensating if fraud is detected post-completion
  • A multi-step onboarding process may need unwinding if a KYC step fails at step 7 of 10

Compensation is not optional in these domains — it is a compliance requirement. The ledger must capture both the original action and the compensation as immutable, causally-linked entries.


BPMN 2.0 Reference

BPMN defines compensation as first-class:

  • Compensating — a completed UserTask being undone
  • Compensated — the compensation has been applied
  • Compensation boundary event — attached to an activity, fires when a compensate event is thrown
  • Compensate intermediate throwing event — triggers compensation of a specific activity or all activities in a scope

casehub-work currently has no equivalent states. WorkItemStatus needs COMPENSATING and COMPENSATED.


Scope Across Repos

casehub-work

WorkItemStatus gains:

  • COMPENSATING — a completed WorkItem is having its effects undone (a new compensating task is in progress or the original actor is reversing their decision)
  • COMPENSATED — compensation is complete; the original action has been reversed

New service method: compensate(UUID id, String triggeredBy, String reason)

  • Only callable from COMPLETED
  • Creates a compensation audit entry
  • Fires WorkItemLifecycleEvent(COMPENSATING) and eventually (COMPENSATED)
  • The compensation may be automated (system undoes) or require a new human task (actor reverses their own decision)

The compensating task (if human-driven) is a new WorkItem linked to the original via a compensatesWorkItemId reference. This preserves the causal chain.

casehub-engine

  • CaseStatus gains COMPENSATING — a case in COMPENSATING state executes compensating bindings
  • Each Binding can optionally declare a compensate: binding (the step to run when this binding needs to be undone)
  • On compensation trigger, the engine runs compensating bindings in reverse-completion order for all COMPLETED PlanItems
  • CasePlanModel tracks completed PlanItems in order — the compensation sequence is deterministic
  • EventLog gains COMPENSATION_STARTED and COMPENSATION_COMPLETED event types
  • Sub-cases propagate compensation: compensating a parent case compensates child cases recursively

casehub-ledger

The ledger is immutable. Compensation does not delete or modify existing entries — it creates new forward entries with causal links:

  • Compensation entry points back to original via causedByEntryId (already in the ledger's hash chain)
  • New CompensationEntry type (or ComplianceSupplement extension) captures: original entry ID, compensating actor, reason, timestamp, regulatory basis (e.g., GDPR Art.17, EU AI Act Art.12)
  • LedgerErasureService (GDPR Art.17) is distinct from compensation — erasure suppresses PII while compensation records the intent to reverse an action

The hash chain continues forward: compensation entries are part of the tamper-evident record, not exceptions to it.

casehub-qhorus

When a case compensates and an agent was involved:

  • A FAILURE or HANDOFF message is sent on the relevant channel to notify the agent their completed work has been compensated
  • Qhorus Commitment lifecycle: a FULFILLED commitment can transition to a compensation state (not currently modelled — this is a gap in the Qhorus normative layer)
  • Speech act implications: compensation involves a HANDOFF or FAILURE after DONE — the normative layer needs a compensation speech act

casehub-connectors

External notification of compensation:

  • Email/Slack/Teams notification to affected parties when compensation is triggered
  • WebHook outbound to external systems if they need to be notified of the reversal
  • Follows the same casehub-work-notificationscasehub-connectors delegation pattern

The Two Compensation Modes

Automated compensation — the system undoes the effect without human involvement. An agent's output is rolled back programmatically. The compensating binding calls a capability that reverses the action.

Human-driven compensation — a new WorkItem is created asking a human (typically the original actor or a reviewer) to reverse their prior decision. This WorkItem has status COMPENSATING until complete, then COMPENSATED. The original WorkItem transitions to COMPENSATED when the new task completes.

Both modes must be supported. The trigger is always the engine (compensate event on a binding or explicit operator action), but the execution varies.


Choreography vs Orchestration

Orchestrated saga (preferred for casehub): The engine (casehub-engine) drives the compensation sequence. It knows which steps completed, in what order, and coordinates compensating steps in reverse. This is consistent with casehub-engine's role as the orchestration layer.

Choreographic saga (fallback): Each service independently compensates when it receives a compensation CDI event. Lower coordination overhead, less deterministic. Appropriate for loosely coupled scenarios where the engine is not present.

casehub should support both — orchestrated when the engine is present, choreographic via CDI events when running casehub-work standalone.


Relationship to Existing Mechanisms

  • ProvenanceLink (ProvenanceLink — PROV-O causal graph across WorkItems, cases, and agent activities #39)causedByEntryId in the ledger is the natural hook for compensation causal chains. Saga compensation is a specific use of PROV-O's wasInvalidatedBy relationship.
  • PROV-OActivity wasInvalidatedBy CompensationActivity — direct mapping from PROV-O to casehub's ledger compensation entries
  • EventLog (casehub-engine) — already tracks WORK_SUBMITTED, WORK_COMPLETED. Needs COMPENSATION_STARTED, COMPENSATION_COMPLETED, COMPENSATION_FAULTED.
  • casehub-work-ledgerWorkItemLedgerEntry needs a COMPENSATION action type alongside CREATED, CLAIMED, COMPLETED, etc.
  • Structured progress (idea: structured progress — schema-validated, hierarchical, forward-only progress tracking #237) — a ProgressInstance for a compensating task would track "undo progress" separately from forward progress. Compensation is a distinct progress dimension.

Open Questions (not for now)

  • What triggers compensation? Error event only, or also explicit operator action?
  • Can compensation itself fail? What happens to a COMPENSATING case that cannot complete compensation (e.g., the compensating human task is rejected)?
  • Should compensation be partial? Can you compensate step 5 without compensating steps 1-4?
  • How does compensation interact with SUSPENDED cases?
  • For the ledger: should compensation entries be their own LedgerEntry subtype or a ComplianceSupplement on the original?
  • Cross-tenant compensation: can a compensation trigger in tenant A affect a WorkItem in tenant B?
  • Time limits on compensation: is there a window after which a completed WorkItem can no longer be compensated?
  • Agent compensation: if an AI agent's output has been used downstream (e.g., the agent's recommendation influenced subsequent steps), can compensation un-apply that influence?

Connection to WorkItem Lifecycle Gaps

This issue arose from the WorkItem lifecycle alignment discussion (comparing casehub-work to WS-HumanTask 1.1, OpenHumanTask, and BPMN 2.0). COMPENSATING/COMPENSATED are the most significant missing states from the BPMN perspective.

See also:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions