Assertion generation for data-driven questions #42

ha2trinh · 2025-12-20T00:30:32Z

This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.

Key Changes

New Features

Assertion Generation: Added LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with a map-reduce approach for processing claims into ranked assertions
Assertion Validation: Optional validation step that scores assertions on grounding, relevance, and verifiability (enabled by default with min score 3/5)

Configuration

New assertions config section with:
- max_assertions: Maximum assertions per question (default: 10)
- enable_validation: Enable/disable validation (default: true)
- min_validation_score: Minimum score threshold (default: 3)
- batch_size: Batch size for processing (default: 50)
- max_data_tokens: Max tokens per batch (default: 32000)

AutoE Updates

Updated assertion-based scoring:
- Updated scoring prompt
- Added options to run scoring on top-k assertions
- Added utilities to run assertion evaluation for multiple question sets and RAG methods
- Added utilities to visualize assertion scoring results

CLI Updates

Added assertion configuration options to settings.yaml

Documentation

Updated CLI docs with assertion configuration options
Updated autoq notebook with assertion settings
Added assertion_gen notebook for generating assertions retrospectively

Output Format

Assertions are saved to assertions.json in each question output folder:

{
  "question_id": "...",
  "question_text": "...",
  "assertions": [
    {
      "statement": "The response should state that...",
      "source_count": 2,
      "score": 10,
      "rank": 1,
      "validation": {
        "is_valid": true,
        "scores": {
          "grounding": 5,
          "relevance": 5,
          "verifiability": 5,
          "reasoning": "..."
        }
      }
    }
  ]
}

Testing

Tested CLI end-to-end with AP news dataset
Verified assertion generation for both data-local and data-global questions
Confirmed validation scoring works correctly

…h optional validation - Add LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with map-reduce approach - Add AssertionValidator for grounding, relevance, and verifiability scoring - Add tqdm progress bar for assertion generation step - Update CLI to show assertion generation status - Add assertion prompts to config init command - Set enable_validation: true as default - Update notebooks with batch_size and max_data_tokens configs - Fix G004 ruff linting errors (f-string logging to %-style) - Update documentation with assertion generation notes

benchmark_qed/autoq/io/question.py

benchmark_qed/autoq/question_gen/data_questions/assertion_gen/base.py

benchmark_qed/autoq/question_gen/data_questions/assertion_gen/validator.py

Copilot

Pull request overview

This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.

Key Changes

New Assertion Generation Framework: Introduced LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with a map-reduce approach for processing claims into ranked assertions
Optional Validation: Added AssertionValidator that scores assertions on grounding, relevance, and verifiability (enabled by default with min score 3/5)
Configuration System: New AssertionConfig and AssertionPromptConfig classes with configurable parameters for max assertions, validation settings, batch size, and token limits
CLI Integration: Updated CLI to support assertion generation with new configuration options in settings.yaml
Visualization Support: Added matplotlib and seaborn dependencies with utility functions for visualizing assertion scoring results

Reviewed changes

Copilot reviewed 43 out of 45 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
uv.lock, pyproject.toml	Added matplotlib>=3.10.5 and seaborn>=0.13.2 dependencies
benchmark_qed/autoq/question_gen/data_questions/assertion_gen/	New module with base classes, validators, generators, and ranking utilities
benchmark_qed/autoq/question_gen/data_questions/local_question_gen.py	Integrated assertion generation for local questions
benchmark_qed/autoq/question_gen/data_questions/global_question_gen.py	Integrated assertion generation for global questions
benchmark_qed/autoq/config.py	Added AssertionConfig and AssertionPromptConfig classes
benchmark_qed/autoq/cli.py	Added assertion configuration support to CLI
benchmark_qed/autoq/io/question.py	Added _save_assertions function to save assertions separately
docs/	Updated documentation and notebooks with assertion configuration examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ssertionConfig - Convert Assertion to Pydantic BaseModel with model_dump() instead of asdict() - Split AssertionConfig into nested LocalAssertionConfig and GlobalAssertionConfig - Add separate max_concurrent_questions defaults: 8 for local, 2 for global - Update assertion generators to use agenerate_assertions_for_questions() for parallel processing - Update CLI, docs, and notebooks to use new nested config structure

ha2trinh added 10 commits August 4, 2025 16:44

add assertion generation for data questions

aea6fb2

assertion generation v2

5abf06a

assertion generation v3

1bd62b8

prompt updates

b3e4186

update autoe notebook

2b4236b

Fix formatting issues

abc5aa1

Remove temp PR body file

a8bfcbe

Fix formatting issues (ruff preview mode)

50461ad

Fix lint and type errors (FURB110, BLE001, pyright type annotations)

4209107

andresmor-ms reviewed Dec 22, 2025

View reviewed changes

benchmark_qed/autoq/io/question.py Outdated Show resolved Hide resolved

andresmor-ms reviewed Dec 22, 2025

View reviewed changes

benchmark_qed/autoq/io/question.py Outdated Show resolved Hide resolved

andresmor-ms reviewed Dec 22, 2025

View reviewed changes

benchmark_qed/autoq/question_gen/data_questions/assertion_gen/base.py Show resolved Hide resolved

andresmor-ms reviewed Dec 22, 2025

View reviewed changes

benchmark_qed/autoq/question_gen/data_questions/assertion_gen/validator.py Show resolved Hide resolved

andresmor-ms requested a review from Copilot December 22, 2025 13:45

Copilot started reviewing on behalf of andresmor-ms December 22, 2025 13:45 View session

Copilot AI reviewed Dec 22, 2025

View reviewed changes

andresmor-ms approved these changes Dec 23, 2025

View reviewed changes

Merge branch 'main' into feat/assertion-extractor

603f612

ha2trinh merged commit facc5ad into main Dec 26, 2025
4 checks passed

ha2trinh deleted the feat/assertion-extractor branch December 26, 2025 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assertion generation for data-driven questions #42

Assertion generation for data-driven questions #42

Uh oh!

ha2trinh commented Dec 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Assertion generation for data-driven questions #42

Assertion generation for data-driven questions #42

Uh oh!

Conversation

ha2trinh commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

New Features

Configuration

AutoE Updates

CLI Updates

Documentation

Output Format

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ha2trinh commented Dec 20, 2025 •

edited

Loading