Overview
Platform-level epic tracking all foundation and application work identified in the cross-repo coherence audit and Gastown gap analysis. Organised by phase — each phase has a gate condition that must be met before P1 can begin. Phases P0/P1 address correctness and scale; P2 addresses production quality; P3 expands capability.
Complexity: S = 2–5 days · M = 1 week · L = 2–3 weeks · XL = months
Priority within phase: items are listed in recommended execution order.
P0 — Wiring (do first — breaks immediately with multiple agents)
Gate: Normative layer functional end-to-end. Trust accumulates from real agent behaviour. Same actor identity across all repos.
| # |
Item |
Repo |
Complexity |
Effort |
Issue |
Untracked deps |
| 1 |
Commitment outcomes → LedgerAttestation (FULFILLED→SOUND, FAILED→FLAGGED) |
quarkus-qhorus |
Low |
S |
qhorus#123 |
— |
| 2 |
ActorTypeResolver utility — unified ActorType derivation across all consumers |
quarkus-ledger |
Low |
S |
ledger#47 |
— |
| 3 |
InstanceActorIdProvider SPI — map Qhorus instanceId → ledger actorId (persona format) |
quarkus-qhorus |
Medium |
S |
qhorus#124 |
ledger#47 done first |
| 4 |
Normative→prescriptive wiring — CaseHub work assignments send Qhorus COMMAND |
casehub-engine |
High |
L |
engine#186 |
qhorus#123 done first |
Why this order: 1 and 2 are standalone with no inter-dependencies and highest leverage. 3 depends conceptually on the identity model from 2. 4 is last because it depends on the commitment lifecycle being functional (1) and is the most invasive change — it touches CaseContextChangedEventHandler, WorkOrchestrator, and requires a WorkerResponseHandler.
Risk in P0: Item 4 (engine#186) requires CaseLedgerEntry to be on main for the ledger side to work. If the branch merge (P1.4) is not done first, item 4 must be implemented without ledger integration and revisited. Consider pulling P1.4 into P0 if the branch is close to mergeable.
P1 — Scale (breaks at 10+ concurrent cases/agents)
Gate: Can run 10+ simultaneous cases without manual intervention, API exhaustion, or stuck agents. Trust scores drive routing decisions.
| # |
Item |
Repo |
Complexity |
Effort |
Issue |
Notes |
| 1 |
Merge CaseLedgerEntry branch (feat/casehub-ledger-integration) |
casehub-engine |
Medium |
M |
(not tracked) |
Resolve merge conflicts; verify OTel propagation via @EntityListeners inheritance; add invariant test |
| 2 |
Agent concurrency throttling — SpawnThrottle in ClaudonyConfig (global + per-case ceiling, back-pressure queue) |
claudony |
Medium |
M |
(not tracked) |
No inter-dependencies; pure Claudony addition |
| 3 |
RecoveryPolicy SPI — detect stalled workers and take action (REPROVISION / ESCALATE / CANCEL / WAIT) |
casehub-engine + claudony |
Medium |
M |
(not tracked) |
SPI in engine api/spi/; ReprovisioningRecoveryPolicy in claudony-casehub |
| 4 |
Trust routing wired — WorkerSelectionStrategy injectable in CaseContextChangedEventHandler + TrustWeightedSelectionStrategy |
casehub-engine |
Medium |
M |
(not tracked) |
Depends on P0.1 (trust scores must be computed before routing them) |
Why this order: 1 unblocks the compliance story and should be done first. 2 and 3 are independent and can run in parallel. 4 depends on P0.1 being complete — routing by trust is pointless if trust scores are never updated from behaviour.
Risk in P1: Item 3 (RecoveryPolicy) requires careful design — what constitutes "stalled" at the casehub-engine level vs the qhorus Watchdog level needs to be clearly defined to avoid double-recovery. The three tiers (qhorus Watchdog → casehub-engine WorkerStatusListener → claudony fleet health) need coordinated stall detection thresholds.
P2 — Production quality (full observability, audit trail, cross-deployment trust)
Gate: Full audit trail complete. Case spans correlatable in Jaeger/Grafana. Compliance story holds end-to-end.
| # |
Item |
Repo |
Complexity |
Effort |
Issue |
Notes |
| 1 |
OTel trace alignment — PropagationContext.traceId from LedgerTraceIdProvider at case creation |
casehub-engine |
Low |
S |
engine#185 |
One-line change + fallback; quick win |
| 2 |
Cross-deployment trust federation — TrustExportService / TrustImportService SPIs |
quarkus-ledger |
Medium |
L |
(not tracked) |
Canonical format design is the hard part; transport (webhook/Kafka) is pluggable |
| 3 |
Cross-repo causal chain — causedByEntryId at provisioning; CaseLineageQuery JPA implementation |
claudony |
High |
L |
claudony#94 |
CaseLineageQuery JPA is non-trivial; requires casehub datasource configured in claudony |
Why this order: 1 is a quick win with no dependencies. 2 and 3 are both high-value but complex — run in parallel if capacity allows. 3 depends on CaseLedgerEntry being merged (P1.1).
P3 — Capability expansion (new capabilities on a solid foundation)
Gate: P0 and P1 complete. Foundation is solid. Team has capacity for new work.
| # |
Item |
Repo |
Complexity |
Effort |
Issue |
Notes |
| 1 |
Notification consolidation — quarkus-work-notifications delegates Slack/Teams to casehub-connectors |
quarkus-work + casehub-connectors |
Medium |
M |
parent#5 |
Unblocks P3.3 and P3.5 |
| 2 |
SLA propagation — case budget bounds child WorkItem and Commitment deadlines |
casehub-engine + quarkus-work |
Medium |
M |
parent#6 |
Adapter-level change; no foundation changes needed |
| 3 |
Critical event notifications — stalled obligations, case faults, escalations → casehub-connectors |
qhorus + engine + work + connectors |
Medium |
M |
(not tracked) |
Depends on P3.1 (unified delivery pipeline first) |
| 4 |
Human-in-the-loop end-to-end — casehub-work-adapter: WorkItem COMPLETED → CaseHubReactor.signal() → case continues |
casehub-engine + quarkus-work |
High |
L |
(not tracked) |
Most important HITL integration; currently blocked on engine stability |
| 5 |
casehub-assisteddev — AI-assisted development application (merge queue, code review orchestration) |
new repo |
Very High |
XL |
(not tracked — needs its own epic) |
Separate repo; uses foundation primitives; needs domain design first |
Hypothesis test (parallel track — not a blocker for P0-P3)
| # |
Item |
Repo |
Complexity |
Effort |
Issue |
Notes |
| — |
Normative layer interoperability experiment — LangChain4j vs CaseHub on production incident scenario |
casehub-engine |
High |
L |
engine#189 |
Can proceed once P0.1 (qhorus#123) is done; generates external evidence for normative layer claims |
Untracked issues to create (P1–P3)
The following items are specified in the roadmap but not yet tracked as GitHub issues:
| Item |
Recommended repo |
Notes |
| Merge CaseLedgerEntry branch |
casehub/engine |
May warrant a PR not an issue |
| Agent concurrency throttling (SpawnThrottle) |
casehubio/claudony |
|
| RecoveryPolicy SPI |
casehubio/engine |
|
| Trust routing wired (injectable WorkerSelectionStrategy) |
casehubio/engine |
|
| Cross-deployment trust federation |
casehubio/quarkus-ledger |
|
| Critical event notifications |
casehubio/casehub-parent (cross-repo) |
|
| HITL end-to-end (casehub-work-adapter completion) |
casehubio/engine |
|
| casehub-assisteddev |
new repo |
Needs its own epic |
Summary
| Phase |
Items |
Estimated total effort |
Gate condition |
| P0 — Wiring |
4 items |
~3–4 weeks |
Normative layer functional; trust accumulates |
| P1 — Scale |
4 items |
~4–5 weeks |
10+ agents; no manual intervention needed |
| P2 — Quality |
3 items |
~4–6 weeks |
Full audit trail; Jaeger correlation |
| P3 — Expand |
5 items |
~3 months + XL |
New capabilities; casehub-assisteddev is a separate product epic |
Total to production-quality foundation: ~3–4 months of focused engineering.
casehub-assisteddev is a separate product investment beyond the foundation.
References
Overview
Platform-level epic tracking all foundation and application work identified in the cross-repo coherence audit and Gastown gap analysis. Organised by phase — each phase has a gate condition that must be met before P1 can begin. Phases P0/P1 address correctness and scale; P2 addresses production quality; P3 expands capability.
Complexity: S = 2–5 days · M = 1 week · L = 2–3 weeks · XL = months
Priority within phase: items are listed in recommended execution order.
P0 — Wiring (do first — breaks immediately with multiple agents)
Gate: Normative layer functional end-to-end. Trust accumulates from real agent behaviour. Same actor identity across all repos.
Why this order: 1 and 2 are standalone with no inter-dependencies and highest leverage. 3 depends conceptually on the identity model from 2. 4 is last because it depends on the commitment lifecycle being functional (1) and is the most invasive change — it touches CaseContextChangedEventHandler, WorkOrchestrator, and requires a WorkerResponseHandler.
Risk in P0: Item 4 (engine#186) requires
CaseLedgerEntryto be on main for the ledger side to work. If the branch merge (P1.4) is not done first, item 4 must be implemented without ledger integration and revisited. Consider pulling P1.4 into P0 if the branch is close to mergeable.P1 — Scale (breaks at 10+ concurrent cases/agents)
Gate: Can run 10+ simultaneous cases without manual intervention, API exhaustion, or stuck agents. Trust scores drive routing decisions.
feat/casehub-ledger-integration)Why this order: 1 unblocks the compliance story and should be done first. 2 and 3 are independent and can run in parallel. 4 depends on P0.1 being complete — routing by trust is pointless if trust scores are never updated from behaviour.
Risk in P1: Item 3 (RecoveryPolicy) requires careful design — what constitutes "stalled" at the casehub-engine level vs the qhorus Watchdog level needs to be clearly defined to avoid double-recovery. The three tiers (qhorus Watchdog → casehub-engine WorkerStatusListener → claudony fleet health) need coordinated stall detection thresholds.
P2 — Production quality (full observability, audit trail, cross-deployment trust)
Gate: Full audit trail complete. Case spans correlatable in Jaeger/Grafana. Compliance story holds end-to-end.
Why this order: 1 is a quick win with no dependencies. 2 and 3 are both high-value but complex — run in parallel if capacity allows. 3 depends on CaseLedgerEntry being merged (P1.1).
P3 — Capability expansion (new capabilities on a solid foundation)
Gate: P0 and P1 complete. Foundation is solid. Team has capacity for new work.
Hypothesis test (parallel track — not a blocker for P0-P3)
Untracked issues to create (P1–P3)
The following items are specified in the roadmap but not yet tracked as GitHub issues:
Summary
Total to production-quality foundation: ~3–4 months of focused engineering.
casehub-assisteddev is a separate product investment beyond the foundation.
References