-
Notifications
You must be signed in to change notification settings - Fork 13
Assertion generation for data-driven questions #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…h optional validation - Add LocalClaimAssertionGenerator and GlobalClaimAssertionGenerator with map-reduce approach - Add AssertionValidator for grounding, relevance, and verifiability scoring - Add tqdm progress bar for assertion generation step - Update CLI to show assertion generation status - Add assertion prompts to config init command - Set enable_validation: true as default - Update notebooks with batch_size and max_data_tokens configs - Fix G004 ruff linting errors (f-string logging to %-style) - Update documentation with assertion generation notes
benchmark_qed/autoq/question_gen/data_questions/assertion_gen/validator.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.
Key Changes
- New Assertion Generation Framework: Introduced
LocalClaimAssertionGeneratorandGlobalClaimAssertionGeneratorwith a map-reduce approach for processing claims into ranked assertions - Optional Validation: Added
AssertionValidatorthat scores assertions on grounding, relevance, and verifiability (enabled by default with min score 3/5) - Configuration System: New
AssertionConfigandAssertionPromptConfigclasses with configurable parameters for max assertions, validation settings, batch size, and token limits - CLI Integration: Updated CLI to support assertion generation with new configuration options in settings.yaml
- Visualization Support: Added matplotlib and seaborn dependencies with utility functions for visualizing assertion scoring results
Reviewed changes
Copilot reviewed 43 out of 45 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock, pyproject.toml | Added matplotlib>=3.10.5 and seaborn>=0.13.2 dependencies |
| benchmark_qed/autoq/question_gen/data_questions/assertion_gen/ | New module with base classes, validators, generators, and ranking utilities |
| benchmark_qed/autoq/question_gen/data_questions/local_question_gen.py | Integrated assertion generation for local questions |
| benchmark_qed/autoq/question_gen/data_questions/global_question_gen.py | Integrated assertion generation for global questions |
| benchmark_qed/autoq/config.py | Added AssertionConfig and AssertionPromptConfig classes |
| benchmark_qed/autoq/cli.py | Added assertion configuration support to CLI |
| benchmark_qed/autoq/io/question.py | Added _save_assertions function to save assertions separately |
| docs/ | Updated documentation and notebooks with assertion configuration examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ssertionConfig - Convert Assertion to Pydantic BaseModel with model_dump() instead of asdict() - Split AssertionConfig into nested LocalAssertionConfig and GlobalAssertionConfig - Add separate max_concurrent_questions defaults: 8 for local, 2 for global - Update assertion generators to use agenerate_assertions_for_questions() for parallel processing - Update CLI, docs, and notebooks to use new nested config structure
This PR adds assertion generation capabilities to the AutoQ pipeline for data-local and data-global questions. Assertions are factual statements that can be used to evaluate the accuracy of RAG system answers.
Key Changes
New Features
LocalClaimAssertionGeneratorandGlobalClaimAssertionGeneratorwith a map-reduce approach for processing claims into ranked assertionsConfiguration
assertionsconfig section with:max_assertions: Maximum assertions per question (default: 10)enable_validation: Enable/disable validation (default: true)min_validation_score: Minimum score threshold (default: 3)batch_size: Batch size for processing (default: 50)max_data_tokens: Max tokens per batch (default: 32000)AutoE Updates
CLI Updates
Documentation
Output Format
Assertions are saved to
assertions.jsonin each question output folder:{ "question_id": "...", "question_text": "...", "assertions": [ { "statement": "The response should state that...", "source_count": 2, "score": 10, "rank": 1, "validation": { "is_valid": true, "scores": { "grounding": 5, "relevance": 5, "verifiability": 5, "reasoning": "..." } } } ] }Testing