Skip to content

Conversation

@bledden
Copy link

@bledden bledden commented Jan 6, 2026

Summary

Adds detection of fundamental disagreements between models based on how they rank each other during Stage 2.

This helps distinguish between:

  • Wording differences: Models agree on substance but phrase things differently
  • Fundamental conflicts: Models genuinely disagree about response quality

Conflict Types

Mutual Opposition (High Severity)

Both models rank each other poorly while ranking themselves highly. This pattern strongly indicates fundamental disagreement rather than stylistic preferences.

Example: Model A ranks itself #1 and Model B #3, while Model B ranks itself #1 and Model A #3.

Ranking Swap (Medium Severity)

Large position difference in mutual rankings. One model places the other high, but the other places them low.

Example: Model A ranks Model B #1, but Model B ranks Model A #4.

Changes

Backend (backend/council.py):

  • New detect_ranking_conflicts() function
  • Builds ranking matrix: ranker_rankings[ranker_model][ranked_model] = position
  • Detects mutual opposition when both models rank other poorly + self highly
  • Detects ranking swaps when position difference ≥ (total_models - 1)
  • Returns: model_a, model_b, conflict_type, details, severity

Frontend:

  • Stage2.jsx: Added rankingConflicts prop and display component
  • Stage2.css: Red-styled card (distinct from yellow minority opinions)
  • Severity-colored left border (high=red, medium=orange, low=yellow)
  • ChatInterface.jsx: Pass ranking_conflicts to Stage2

Validation

8 test cases in tests/test_ranking_conflicts.py:

Test Description
test_no_conflict_when_agreement No false positives when models roughly agree
test_mutual_opposition_detected Correctly detects mutual opposition pattern
test_ranking_swap_detected Detects large position disagreements
test_empty_inputs Handles edge cases gracefully
test_single_model_no_conflict Single model has no conflicts
test_5_model_conflict_scenario Realistic 5-model scenario
test_conflict_details_populated Verifies detail fields are correct
test_severity_ordering Conflicts sorted by severity
$ python3 tests/test_ranking_conflicts.py
✓ No conflict when agreement - PASSED
✓ Mutual opposition detected - PASSED
✓ Ranking swap detected - PASSED
✓ Empty inputs - PASSED
✓ Single model no conflict - PASSED
  Found 3 conflicts in 5-model scenario
    gpt-4 vs grok: mutual_opposition (high)
    gpt-4 vs llama: mutual_opposition (high)
    grok vs llama: mutual_opposition (high)
✓ 5-model conflict scenario - PASSED
  Details: A ranks B=3, B ranks A=3
  Self ranks: A=1, B=1
✓ Conflict details populated - PASSED
✓ Severity ordering - PASSED

==================================================
All ranking conflict tests passed!
==================================================

Dependencies

This PR builds on:

Test plan

  • Run python3 tests/test_ranking_conflicts.py - all 8 tests pass
  • Run python3 tests/test_minority_opinions.py - all 8 tests pass
  • Start the app and verify conflicts display correctly in Stage 2
  • Verify no conflicts shown when models have similar rankings

🤖 Generated with Claude Code

bledden and others added 4 commits January 5, 2026 18:26
Adds calculate_tournament_rankings() as an alternative to simple mean ranking.

Algorithm:
- Convert ordinal rankings to pairwise matchups
- For each pair of models, majority vote determines winner
- Ties awarded 0.5 points to each
- Final score = wins / total_matchups

Benefits over mean ranking:
- More robust to outlier rankings
- Theoretically principled (Condorcet-style)
- Handles cyclic preferences gracefully

Both ranking methods now included in metadata:
- aggregate_rankings: mean position (existing)
- tournament_rankings: pairwise win percentage (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Documents the tournament-style pairwise comparison algorithm with:
- Explanation of why it's more robust than mean averaging
- Concrete example showing self-promotion bias scenario
- Tables comparing mean vs tournament results
- Outlier robustness validation (mean degrades 1.0→1.5, tournament stays 100%)
- Summary of validation test coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Detects when ≥30% of rankers significantly disagree with the consensus
ranking for a model (placing it more than 1 position away from consensus).

Backend changes:
- Add detect_minority_opinions() function to council.py
- Uses tournament ranking as consensus baseline
- Reports dissent rate, positions, dissenters, and direction (overvalued/undervalued)
- Configurable threshold (default 30%) and position tolerance (default 1)
- Include minority_opinions in run_full_council metadata

Frontend changes:
- Add minorityOpinions prop to Stage2 component
- Display minority opinions in a warning-styled card
- Show direction badges (overvalued in red, undervalued in green)
- List consensus position, dissent positions, and dissenter models

Validation tests:
- 8 test cases covering consensus, dissent detection, direction,
  threshold filtering, tolerance, edge cases, and realistic scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Detects fundamental disagreements between models based on how they rank
each other. Two types of conflicts are identified:

1. Mutual Opposition (high severity): Both models rank the other poorly
   while ranking themselves highly - indicates fundamental disagreement
   about response quality.

2. Ranking Swap (medium severity): Large position difference in how
   models rank each other - one places the other high, the other places
   them low.

Backend changes:
- Add detect_ranking_conflicts() function to council.py
- Builds ranking matrix showing how each model ranked every other model
- Detects mutual opposition and ranking swaps with configurable thresholds
- Returns conflict type, severity, and detailed ranking positions
- Include ranking_conflicts in run_full_council metadata

Frontend changes:
- Add rankingConflicts prop to Stage2 component
- Display conflicts in a red-styled card (distinct from yellow minority opinions)
- Severity badges (high=red, medium=orange, low=yellow)
- Show which models are in conflict and their mutual rankings

Validation tests:
- 8 test cases covering agreement, mutual opposition, ranking swaps,
  edge cases, 5-model scenarios, detail population, and severity ordering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant