idea: saga compensation support across the casehub platform

## What This Is

An ideas-capture issue. Nothing here is a spec. These are concepts surfaced during discussion of WorkItem lifecycle alignment with WS-HumanTask, BPMN 2.0, and CMMN specs. Saga compensation is a platform-wide concern — this issue is filed in casehub-work as a starting point but the design will span casehub-engine, casehub-ledger, casehub-qhorus, and casehub-connectors.

---

## The Problem

casehub has no compensation model. When a case fails mid-execution, completed steps cannot be undone in a structured, auditable way. Current workarounds:

- Create a new WorkItem to manually undo previous work (loses the causal link)
- CANCEL the case (loses the semantic distinction between stopping and compensating)
- Leave completed steps as-is and note the failure in audit (leaves the system in an inconsistent state for the domain)

This is the Saga pattern gap. A saga is a sequence of steps where each step has a corresponding compensating transaction that undoes its effect if a later step fails or if an operator triggers rollback.

---

## Why This Matters for casehub

casehub is positioned for regulated, compliance-first deployments (EU AI Act Art.12, GDPR Art.17/22). In these contexts:

- A clinical trial modification approved by an IRB WorkItem may need compensating if a downstream protocol constraint is violated
- A financial decision completed by an agent may need compensating if fraud is detected post-completion
- A multi-step onboarding process may need unwinding if a KYC step fails at step 7 of 10

Compensation is not optional in these domains — it is a compliance requirement. The ledger must capture both the original action and the compensation as immutable, causally-linked entries.

---

## BPMN 2.0 Reference

BPMN defines compensation as first-class:
- **Compensating** — a completed UserTask being undone
- **Compensated** — the compensation has been applied
- **Compensation boundary event** — attached to an activity, fires when a compensate event is thrown
- **Compensate intermediate throwing event** — triggers compensation of a specific activity or all activities in a scope

casehub-work currently has no equivalent states. WorkItemStatus needs COMPENSATING and COMPENSATED.

---

## Scope Across Repos

### casehub-work

WorkItemStatus gains:
- `COMPENSATING` — a completed WorkItem is having its effects undone (a new compensating task is in progress or the original actor is reversing their decision)
- `COMPENSATED` — compensation is complete; the original action has been reversed

New service method: `compensate(UUID id, String triggeredBy, String reason)`
- Only callable from COMPLETED
- Creates a compensation audit entry
- Fires `WorkItemLifecycleEvent(COMPENSATING)` and eventually `(COMPENSATED)`
- The compensation may be automated (system undoes) or require a new human task (actor reverses their own decision)

The compensating task (if human-driven) is a new WorkItem linked to the original via a `compensatesWorkItemId` reference. This preserves the causal chain.

### casehub-engine

- `CaseStatus` gains `COMPENSATING` — a case in COMPENSATING state executes compensating bindings
- Each `Binding` can optionally declare a `compensate:` binding (the step to run when this binding needs to be undone)
- On compensation trigger, the engine runs compensating bindings in reverse-completion order for all COMPLETED PlanItems
- `CasePlanModel` tracks completed PlanItems in order — the compensation sequence is deterministic
- `EventLog` gains `COMPENSATION_STARTED` and `COMPENSATION_COMPLETED` event types
- Sub-cases propagate compensation: compensating a parent case compensates child cases recursively

### casehub-ledger

The ledger is immutable. Compensation does not delete or modify existing entries — it creates new forward entries with causal links:

- Compensation entry points back to original via `causedByEntryId` (already in the ledger's hash chain)
- New `CompensationEntry` type (or `ComplianceSupplement` extension) captures: original entry ID, compensating actor, reason, timestamp, regulatory basis (e.g., GDPR Art.17, EU AI Act Art.12)
- `LedgerErasureService` (GDPR Art.17) is distinct from compensation — erasure suppresses PII while compensation records the intent to reverse an action

The hash chain continues forward: compensation entries are part of the tamper-evident record, not exceptions to it.

### casehub-qhorus

When a case compensates and an agent was involved:
- A FAILURE or HANDOFF message is sent on the relevant channel to notify the agent their completed work has been compensated
- Qhorus `Commitment` lifecycle: a FULFILLED commitment can transition to a compensation state (not currently modelled — this is a gap in the Qhorus normative layer)
- Speech act implications: compensation involves a HANDOFF or FAILURE after DONE — the normative layer needs a compensation speech act

### casehub-connectors

External notification of compensation:
- Email/Slack/Teams notification to affected parties when compensation is triggered
- WebHook outbound to external systems if they need to be notified of the reversal
- Follows the same `casehub-work-notifications` → `casehub-connectors` delegation pattern

---

## The Two Compensation Modes

**Automated compensation** — the system undoes the effect without human involvement. An agent's output is rolled back programmatically. The compensating binding calls a capability that reverses the action.

**Human-driven compensation** — a new WorkItem is created asking a human (typically the original actor or a reviewer) to reverse their prior decision. This WorkItem has status COMPENSATING until complete, then COMPENSATED. The original WorkItem transitions to COMPENSATED when the new task completes.

Both modes must be supported. The trigger is always the engine (compensate event on a binding or explicit operator action), but the execution varies.

---

## Choreography vs Orchestration

**Orchestrated saga** (preferred for casehub): The engine (casehub-engine) drives the compensation sequence. It knows which steps completed, in what order, and coordinates compensating steps in reverse. This is consistent with casehub-engine's role as the orchestration layer.

**Choreographic saga** (fallback): Each service independently compensates when it receives a compensation CDI event. Lower coordination overhead, less deterministic. Appropriate for loosely coupled scenarios where the engine is not present.

casehub should support both — orchestrated when the engine is present, choreographic via CDI events when running casehub-work standalone.

---

## Relationship to Existing Mechanisms

- **ProvenanceLink (#39)** — `causedByEntryId` in the ledger is the natural hook for compensation causal chains. Saga compensation is a specific use of PROV-O's `wasInvalidatedBy` relationship.
- **PROV-O** — `Activity wasInvalidatedBy CompensationActivity` — direct mapping from PROV-O to casehub's ledger compensation entries
- **EventLog (casehub-engine)** — already tracks WORK_SUBMITTED, WORK_COMPLETED. Needs COMPENSATION_STARTED, COMPENSATION_COMPLETED, COMPENSATION_FAULTED.
- **casehub-work-ledger** — `WorkItemLedgerEntry` needs a `COMPENSATION` action type alongside CREATED, CLAIMED, COMPLETED, etc.
- **Structured progress (#237)** — a ProgressInstance for a compensating task would track "undo progress" separately from forward progress. Compensation is a distinct progress dimension.

---

## Open Questions (not for now)

- What triggers compensation? Error event only, or also explicit operator action?
- Can compensation itself fail? What happens to a COMPENSATING case that cannot complete compensation (e.g., the compensating human task is rejected)?
- Should compensation be partial? Can you compensate step 5 without compensating steps 1-4?
- How does compensation interact with SUSPENDED cases?
- For the ledger: should compensation entries be their own LedgerEntry subtype or a ComplianceSupplement on the original?
- Cross-tenant compensation: can a compensation trigger in tenant A affect a WorkItem in tenant B?
- Time limits on compensation: is there a window after which a completed WorkItem can no longer be compensated?
- Agent compensation: if an AI agent's output has been used downstream (e.g., the agent's recommendation influenced subsequent steps), can compensation un-apply that influence?

---

## Connection to WorkItem Lifecycle Gaps

This issue arose from the WorkItem lifecycle alignment discussion (comparing casehub-work to WS-HumanTask 1.1, OpenHumanTask, and BPMN 2.0). COMPENSATING/COMPENSATED are the most significant missing states from the BPMN perspective.

See also:
- casehub-work#237 (structured progress — compensation progress tracking)
- casehub-work#235 (S/XS sweep — lifecycle enrichment in progress)
- casehub-engine#398 (HumanTask JVM restart durability)
- casehubio/parent ProvenanceLink tracking issue (#39 in this repo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: saga compensation support across the casehub platform #238

What This Is

The Problem

Why This Matters for casehub

BPMN 2.0 Reference

Scope Across Repos

casehub-work

casehub-engine

casehub-ledger

casehub-qhorus

casehub-connectors

The Two Compensation Modes

Choreography vs Orchestration

Relationship to Existing Mechanisms

Open Questions (not for now)

Connection to WorkItem Lifecycle Gaps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

idea: saga compensation support across the casehub platform #238

Description

What This Is

The Problem

Why This Matters for casehub

BPMN 2.0 Reference

Scope Across Repos

casehub-work

casehub-engine

casehub-ledger

casehub-qhorus

casehub-connectors

The Two Compensation Modes

Choreography vs Orchestration

Relationship to Existing Mechanisms

Open Questions (not for now)

Connection to WorkItem Lifecycle Gaps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions