Skip to content

Proposal: contributor trust for open source PR routing #24

@mdproctor

Description

@mdproctor

Contributor Trust for PR Routing

The problem: AI-generated PRs are flooding open source projects. Reviewers can't keep up. Manual triage doesn't scale.

The idea: Route PRs by contributor reputation — built automatically from outcomes, not assigned manually.


How it works

Contributors earn a trust score from PR history:

Event Score impact
PR merged, first submission Rises
PR returned for rework Drops
PR rejected Drops faster

Score feeds directly into queue priority:

Queue Who lands here
Fast-track High-trust contributors
Borderline Promising but limited history
Triage New or low-trust accounts — ordered by file-path risk
Security Any PR touching sensitive paths, regardless of trust

No history = triage. Always. A slop generator that creates a new account every PR can never escape triage — their score never builds because their PRs don't merge.


The scoring model

  • Uses Bayesian Beta — the same model used in medical trial analysis and A/B testing
  • Tracks confidence alongside success rate — 2 PRs merged ≠ 200 PRs merged
  • Requires a configurable minimum number of observations before acting on a score
  • Bootstrapped from existing history on day one — GitHub's API gives us years of PR data before the system goes live

Vouching

Genuine newcomers have no history. Vouching solves this.

  • A high-trust contributor sponsors a newcomer → newcomer gets a score boost
  • The voucher's own score is at risk if the vouchee's PRs get returned
  • Uses EigenTrust (same family as PageRank) — a vouch from a 10-year committer carries more weight than one from a recent joiner
  • Can only vouch for contributors with lower trust than you — no inflation rings

How it fits DevTown

DevTown already routes PRs to the right reviewer by trust. This adds the same model to the intake side.

PR arrives → contributor trust → queue priority
                                       ↓
                            reviewer routing (existing)
                                       ↓
                            outcomes update both scores

One trust ledger. One pipeline. Not two apps.

Internal teams: set minimum observations to zero — everyone pre-trusted. System operates as reviewer routing only.
Open source: both sides active. Full pipeline.


What's already built

Capability Status
Bayesian Beta scoring model ✅ In platform
EigenTrust propagation ✅ In platform
Work queues + labels + SLA tracking ✅ In platform
Cryptographic audit trail ✅ In platform
GitHub event integration 🔲 To build
History mining (bootstrap) 🔲 To build
Vouching UI 🔲 To build
Contributor score visibility 🔲 To build

Open questions

  • Score scope: per-project, per-org, or portable across projects?
  • Vouching: maintainers only, or any trusted contributor?
  • Score decay: does dormancy reduce a score over time?
  • Transparency: do contributors see their score? Can they dispute events?

📄 Full specification (deeper dive)

The problem

Open source maintainers are overwhelmed. GitHub makes submission free, and AI coding assistants have made it trivial to generate plausible-looking but low-quality PRs at scale. Projects like Quarkus are seeing a flood of submissions that require more reviewer time to reject than the code took to generate.

The standard fixes don't hold up:

  • Rate limiting — punishes prolific genuine contributors
  • Minimum account age — gamed in a day
  • AI detection — an arms race with no sustainable winning side
  • Stricter guidelines — read by humans, ignored by slop generators

The underlying problem isn't that we can't identify AI slop. It's that a first-time human contributor and a fresh slop account look identical — no history.


The proposal

Route PRs based on accumulated contributor reputation. Not blocking who submits, but controlling what gets reviewed first.

A contributor starts with no score. Outcomes build it:

  • PR merges cleanly on first submission → score rises
  • PR returned for rework → score drops
  • PR rejected outright → score drops faster

High-reputation contributors get fast-tracked. New or low-reputation contributors land in a triage queue. Nothing is blocked — the queue just has a different priority.

The key property: slop generators can't game this. A new GitHub account means zero score, which means triage. To build reputation, PRs have to actually get merged. The score is the barrier.


How the score works: the Bayesian Beta model

A naive approach counts merged PRs divided by submitted PRs. The problem: 2 merged from 2 submitted gives the same percentage as 200 merged from 200 — but these are not equivalent. One might be luck.

The platform uses a Bayesian Beta model — the same probabilistic framework used in medical trial analysis and A/B testing — which tracks two things simultaneously: the estimated success rate and the confidence in that estimate. With 2 data points, the confidence is low. With 200, the estimate is reliable. The model knows the difference.

Before seeing any evidence, every contributor starts with a prior assumption — roughly "unknown." Each PR outcome updates that assumption. A merge shifts the distribution toward "reliable"; a rejection shifts it toward "unreliable." A small number of outcomes produces a wide, uncertain distribution. The model won't act confidently on thin evidence.

The practical consequence: an account with 1 merged PR does not jump the queue. The platform requires a minimum number of observations before the score carries weight — configurable per project. A small library might set this at 5; a large security-critical project at 20.

This is also why the score doesn't just track acceptance rate — it tracks confidence in that acceptance rate. A contributor at 80% from 5 PRs gets treated more cautiously than one at 75% from 50. The raw percentage would rank them incorrectly. The Bayesian model ranks them correctly.


Bootstrapping from history

Most open source projects don't need to wait months for the system to accumulate evidence. The evidence already exists.

GitHub's API exposes the full PR history for any public repository. Before deploying, you mine that history: for every contributor account, count submissions, merges, rejections, and return-for-rework cycles going back years. Feed those outcomes into the Bayesian model. On day one, established contributors arrive with scores that reflect their actual track record — not a blank slate.

A project like Quarkus with ten years of PR history can immediately differentiate between a contributor with 47 merged PRs over three years and an account created last week. New accounts start at zero, which is exactly right.

The history mining can go further: which PRs touched which file paths? A contributor with 30 merged PRs all touching documentation scores differently for a security-module change than one with 30 PRs spanning authentication, crypto, and networking. The platform can segment trust by domain if the project chooses.


What the platform provides

Bayesian Beta scoring — already implemented in the foundation for scoring AI agent reliability. Bayesian Beta is well-established across decades of clinical trial design, A/B testing, and recommender systems. We're applying proven mathematics through new infrastructure, pointed at a new problem.

EigenTrust propagation — published at WWW 2003 and widely used in peer-to-peer reputation systems, this is the same family of ideas as Google's PageRank. When a maintainer with 8 years of commits vouches for a newcomer, that vouch carries more weight than one from someone who joined three months ago. Trust propagates through the vouching graph, attenuated at each step. Circular inflation rings collapse automatically under the mathematics. Also already implemented in the platform.

Work queues and priority views — the trust score becomes a practical workflow tool through the platform's work queue layer. Labels can combine trust signals with static analysis. The queues also support SLA tracking and reviewer load balancing.

Cryptographic audit trail — every score update is recorded in a tamper-evident ledger. The records are cryptographically chained: you can't retroactively alter a score history without breaking the chain. The EU AI Act's Art. 12 requires traceability for automated decisions affecting human participants — the platform has this built in.


Vouching: the bootstrapping problem

The scoring system handles established contributors and clear bad actors well. The gap is genuine newcomers — good contributors with no history.

Vouching solves this. A high-reputation contributor vouches for a newcomer, temporarily elevating their score. The voucher's own score is at risk: if the vouchee's PRs get returned, the voucher loses points too.

EigenTrust governs the weight of each vouch. A vouch from a 10-year core committer carries more than one from a contributor who merged their first PR last month. Constraints enforced by the system:

  • You can only vouch for contributors with lower reputation than you — no mutual inflation
  • Vouching capacity is limited — you can't sponsor hundreds of people simultaneously
  • The trust graph is visible — the chain of vouches is inspectable

A day in the life

Maria is a Quarkus core maintainer. Monday morning, she opens her review dashboard.

Fast-track queue: 4 PRs. Contributors the system is confident in — combined, they have 300+ merged PRs and clean track records. She skims them efficiently. Three merge same-day. One needs a small change; she leaves a comment and it's back within the hour.

Borderline queue: 2 PRs. Contributors with enough history to be promising but not enough to be certain. She gives these a proper read. One is solid — she approves it and the contributor's score climbs. The other has an issue she flags; the contributor fixes it.

Triage queue: 31 PRs. The flood. But already ordered: PRs touching documentation at the top, PRs touching security or networking at the bottom. She works through the top 10, rejects 8 as slop, approves 2 from what look like genuine first-timers.

One of those first-timers messages her — they've been trying to contribute for weeks and don't understand the delay. She looks at their history: 3 merged PRs, clean, but the minimum observations threshold is 5. She vouches for them. They move to the borderline queue immediately.

By noon, Maria has processed 40 PRs without touching the bottom half of triage. Before this system, she'd have spent the morning deciding which pile to ignore.


Identity

The weak point: reputation lives on GitHub account identity. A determined adversary creates new accounts.

But new accounts always start in triage. Building reputation requires PRs that actually get merged — real effort, even for adversaries. Vouching provides a social escape valve for genuine newcomers. The adversary who games this has to put in the work of a genuine contributor, at which point the incentive disappears.


What we'd still need to build

The scoring model, trust propagation, audit ledger, and work queue infrastructure are already in the platform. What's new:

  1. GitHub event integration — webhooks for PR lifecycle events feeding into contributor score updates
  2. History mining — a one-time import of existing PR history to bootstrap scores at deployment
  3. Routing logic — queue assignment at PR submission, driven by score and confidence threshold
  4. Vouching interface — a maintainer-facing tool for issuing vouches
  5. Contributor score visibility — so contributors understand their standing and can raise disputes

Metadata

Metadata

Assignees

No one assigned

    Labels

    ideaDeferred idea worth revisiting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions