Skip to content

Conversation

@ha2trinh
Copy link
Contributor

@ha2trinh ha2trinh commented Dec 20, 2025

This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.

Key Changes

New Features

  • Assertion Generation: Added LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with a map-reduce approach for processing claims into ranked assertions
  • Assertion Validation: Optional validation step that scores assertions on grounding, relevance, and verifiability (enabled by default with min score 3/5)

Configuration

  • New assertions config section with:
    • max_assertions: Maximum assertions per question (default: 10)
    • enable_validation: Enable/disable validation (default: true)
    • min_validation_score: Minimum score threshold (default: 3)
    • batch_size: Batch size for processing (default: 50)
    • max_data_tokens: Max tokens per batch (default: 32000)

AutoE Updates

  • Updated assertion-based scoring:
    • Updated scoring prompt
    • Added options to run scoring on top-k assertions
    • Added utilities to run assertion evaluation for multiple question sets and RAG methods
    • Added utilities to visualize assertion scoring results

CLI Updates

  • Added assertion configuration options to settings.yaml

Documentation

  • Updated CLI docs with assertion configuration options
  • Updated autoq notebook with assertion settings
  • Added assertion_gen notebook for generating assertions retrospectively

Output Format

Assertions are saved to assertions.json in each question output folder:

{
  "question_id": "...",
  "question_text": "...",
  "assertions": [
    {
      "statement": "The response should state that...",
      "source_count": 2,
      "score": 10,
      "rank": 1,
      "validation": {
        "is_valid": true,
        "scores": {
          "grounding": 5,
          "relevance": 5,
          "verifiability": 5,
          "reasoning": "..."
        }
      }
    }
  ]
}

Testing

  • Tested CLI end-to-end with AP news dataset
  • Verified assertion generation for both data-local and data-global questions
  • Confirmed validation scoring works correctly

ha2trinh added 10 commits August 4, 2025 16:44
…h optional validation

- Add LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with map-reduce approach
- Add AssertionValidator for grounding, relevance, and verifiability scoring
- Add tqdm progress bar for assertion generation step
- Update CLI to show assertion generation status
- Add assertion prompts to config init command
- Set enable_validation: true as default
- Update notebooks with batch_size and max_data_tokens configs
- Fix G004 ruff linting errors (f-string logging to %-style)
- Update documentation with assertion generation notes
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.

Key Changes

  • New Assertion Generation Framework: Introduced LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with a map-reduce approach for processing claims into ranked assertions
  • Optional Validation: Added AssertionValidator that scores assertions on grounding, relevance, and verifiability (enabled by default with min score 3/5)
  • Configuration System: New AssertionConfig and AssertionPromptConfig classes with configurable parameters for max assertions, validation settings, batch size, and token limits
  • CLI Integration: Updated CLI to support assertion generation with new configuration options in settings.yaml
  • Visualization Support: Added matplotlib and seaborn dependencies with utility functions for visualizing assertion scoring results

Reviewed changes

Copilot reviewed 43 out of 45 changed files in this pull request and generated no comments.

Show a summary per file
File Description
uv.lock, pyproject.toml Added matplotlib>=3.10.5 and seaborn>=0.13.2 dependencies
benchmark_qed/autoq/question_gen/data_questions/assertion_gen/ New module with base classes, validators, generators, and ranking utilities
benchmark_qed/autoq/question_gen/data_questions/local_question_gen.py Integrated assertion generation for local questions
benchmark_qed/autoq/question_gen/data_questions/global_question_gen.py Integrated assertion generation for global questions
benchmark_qed/autoq/config.py Added AssertionConfig and AssertionPromptConfig classes
benchmark_qed/autoq/cli.py Added assertion configuration support to CLI
benchmark_qed/autoq/io/question.py Added _save_assertions function to save assertions separately
docs/ Updated documentation and notebooks with assertion configuration examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ssertionConfig

- Convert Assertion to Pydantic BaseModel with model_dump() instead of asdict()
- Split AssertionConfig into nested LocalAssertionConfig and GlobalAssertionConfig
- Add separate max_concurrent_questions defaults: 8 for local, 2 for global
- Update assertion generators to use agenerate_assertions_for_questions() for parallel processing
- Update CLI, docs, and notebooks to use new nested config structure
@ha2trinh ha2trinh merged commit facc5ad into main Dec 26, 2025
4 checks passed
@ha2trinh ha2trinh deleted the feat/assertion-extractor branch December 26, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants