Skip to content

feat: CAPABILITY_DIMENSION composite trust score — per-capability quality dimensions #76

@mdproctor

Description

@mdproctor

Problem

ActorTrustScore has three score types: GLOBAL, CAPABILITY (per capability tag), and DIMENSION (per quality dimension). A LedgerAttestation can carry both capabilityTag and trustDimension simultaneously, but these are stored in independent rows — the composite is lost.

The gap: you cannot express per-capability quality. An agent who is meticulous at security review but sloppy at architecture review looks identical to one who is uniformly mediocre at both.

Concrete use case (from casehub-devtown)

An attestation records: capabilityTag="security-review", trustDimension="review-thoroughness", dimensionScore=0.92.

Today this creates/updates two independent rows:

  • CAPABILITY row: scope_key="security-review" — binary trust for security review work
  • DIMENSION row: scope_key="review-thoroughness" — thoroughness averaged across ALL review types

When devtown routes a high-stakes security PR, it wants to ask: "is agent-7 thorough specifically when doing security reviews?"

Today's answer: impossible — the ledger stores "thoroughness across all capabilities" and "binary trust for security-review" as independent signals. An agent who is thorough on security (0.92) but careless on architecture (0.31) has a blended DIMENSION thoroughness score that misrepresents both.

The correct answer requires a (security-review, review-thoroughness) composite score.

Proposed solution

Add ScoreType.CAPABILITY_DIMENSION — one row per (actor, capability tag, dimension):

public enum ScoreType {
    GLOBAL,              // scope_key = null
    CAPABILITY,          // scope_key = "security-review"
    DIMENSION,           // scope_key = "review-thoroughness"
    CAPABILITY_DIMENSION // scope_key = "security-review:review-thoroughness"
}

scope_key format for CAPABILITY_DIMENSION: "{capabilityTag}:{dimensionName}" — simple, human-readable, queryable with a LIKE 'security-review:%' pattern.

TrustScoreJob updates CAPABILITY_DIMENSION rows when an attestation carries both capabilityTag (non-GLOBAL) and trustDimension (non-null), in addition to the existing CAPABILITY and DIMENSION updates.

TrustGateService gains an overload:

// Existing: capability-scoped binary trust
boolean meetsThreshold(String actorId, String capabilityTag, double minTrust);

// New: capability+dimension composite quality score
OptionalDouble qualityScore(String actorId, String capabilityTag, String dimension);

Flyway migration

New score_type enum value requires a migration. Existing rows unaffected — CAPABILITY_DIMENSION rows are additive. Suggest V1005 or consumer-owned range per the numbering convention.

Impact

  • casehub-devtown routing policies can specify per-capability quality floors, not just binary trust thresholds
  • Example: "route security-review only to agents whose security-review thoroughness ≥ 0.75, not just global thoroughness"
  • Composable with existing CAPABILITY score for full picture: binary trust AND per-capability quality

Notes

  • Tracked from devtown side: dependency: CAPABILITY_DIMENSION composite trust score (ledger#76) devtown#19 (will be updated)
  • Does not change the statistical model — CAPABILITY_DIMENSION uses the same decay-weighted average as DIMENSION
  • ADR recommended: document the deliberate choice of decay-weighted average for continuous scores vs Bayesian Beta for binary scores (currently implicit)

References

  • ActorTrustScore.javaScoreType enum
  • LedgerAttestation.javacapabilityTag + trustDimension fields
  • TrustGateService.java — current threshold API
  • TrustScoreJob — where the new composite update logic goes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions