You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ActorTrustScore has three score types: GLOBAL, CAPABILITY (per capability tag), and DIMENSION (per quality dimension). A LedgerAttestation can carry both capabilityTag and trustDimension simultaneously, but these are stored in independent rows — the composite is lost.
The gap: you cannot express per-capability quality. An agent who is meticulous at security review but sloppy at architecture review looks identical to one who is uniformly mediocre at both.
Concrete use case (from casehub-devtown)
An attestation records: capabilityTag="security-review", trustDimension="review-thoroughness", dimensionScore=0.92.
Today this creates/updates two independent rows:
CAPABILITY row: scope_key="security-review" — binary trust for security review work
DIMENSION row: scope_key="review-thoroughness" — thoroughness averaged across ALL review types
When devtown routes a high-stakes security PR, it wants to ask: "is agent-7 thorough specifically when doing security reviews?"
Today's answer: impossible — the ledger stores "thoroughness across all capabilities" and "binary trust for security-review" as independent signals. An agent who is thorough on security (0.92) but careless on architecture (0.31) has a blended DIMENSION thoroughness score that misrepresents both.
The correct answer requires a (security-review, review-thoroughness) composite score.
Proposed solution
Add ScoreType.CAPABILITY_DIMENSION — one row per (actor, capability tag, dimension):
scope_key format for CAPABILITY_DIMENSION: "{capabilityTag}:{dimensionName}" — simple, human-readable, queryable with a LIKE 'security-review:%' pattern.
TrustScoreJob updates CAPABILITY_DIMENSION rows when an attestation carries both capabilityTag (non-GLOBAL) and trustDimension (non-null), in addition to the existing CAPABILITY and DIMENSION updates.
New score_type enum value requires a migration. Existing rows unaffected — CAPABILITY_DIMENSION rows are additive. Suggest V1005 or consumer-owned range per the numbering convention.
Impact
casehub-devtown routing policies can specify per-capability quality floors, not just binary trust thresholds
Example: "route security-review only to agents whose security-review thoroughness ≥ 0.75, not just global thoroughness"
Composable with existing CAPABILITY score for full picture: binary trust AND per-capability quality
Does not change the statistical model — CAPABILITY_DIMENSION uses the same decay-weighted average as DIMENSION
ADR recommended: document the deliberate choice of decay-weighted average for continuous scores vs Bayesian Beta for binary scores (currently implicit)
Problem
ActorTrustScorehas three score types:GLOBAL,CAPABILITY(per capability tag), andDIMENSION(per quality dimension). ALedgerAttestationcan carry bothcapabilityTagandtrustDimensionsimultaneously, but these are stored in independent rows — the composite is lost.The gap: you cannot express per-capability quality. An agent who is meticulous at security review but sloppy at architecture review looks identical to one who is uniformly mediocre at both.
Concrete use case (from casehub-devtown)
An attestation records:
capabilityTag="security-review",trustDimension="review-thoroughness",dimensionScore=0.92.Today this creates/updates two independent rows:
CAPABILITYrow:scope_key="security-review"— binary trust for security review workDIMENSIONrow:scope_key="review-thoroughness"— thoroughness averaged across ALL review typesWhen devtown routes a high-stakes security PR, it wants to ask: "is agent-7 thorough specifically when doing security reviews?"
Today's answer: impossible — the ledger stores "thoroughness across all capabilities" and "binary trust for security-review" as independent signals. An agent who is thorough on security (0.92) but careless on architecture (0.31) has a blended
DIMENSIONthoroughness score that misrepresents both.The correct answer requires a
(security-review, review-thoroughness)composite score.Proposed solution
Add
ScoreType.CAPABILITY_DIMENSION— one row per(actor, capability tag, dimension):scope_keyformat forCAPABILITY_DIMENSION:"{capabilityTag}:{dimensionName}"— simple, human-readable, queryable with aLIKE 'security-review:%'pattern.TrustScoreJobupdatesCAPABILITY_DIMENSIONrows when an attestation carries bothcapabilityTag(non-GLOBAL) andtrustDimension(non-null), in addition to the existingCAPABILITYandDIMENSIONupdates.TrustGateServicegains an overload:Flyway migration
New
score_typeenum value requires a migration. Existing rows unaffected —CAPABILITY_DIMENSIONrows are additive. Suggest V1005 or consumer-owned range per the numbering convention.Impact
casehub-devtownrouting policies can specify per-capability quality floors, not just binary trust thresholdsCAPABILITYscore for full picture: binary trust AND per-capability qualityNotes
CAPABILITY_DIMENSIONuses the same decay-weighted average asDIMENSIONReferences
ActorTrustScore.java—ScoreTypeenumLedgerAttestation.java—capabilityTag+trustDimensionfieldsTrustGateService.java— current threshold APITrustScoreJob— where the new composite update logic goes