Add Google Gemini API Support to MEQ-Bench by heilcheng · Pull Request #11 · heilcheng/medexplain-evals

heilcheng · 2025-07-04T14:46:29Z

Summary

This PR enhances the MEQ-Bench framework by adding comprehensive support for Google Gemini models in the run_benchmark.py script. The implementation follows the existing patterns used for OpenAI and Anthropic APIs, ensuring consistency and maintainability.

✨ Key Features

New Gemini Integration: Complete _create_gemini_model() function with robust error handling
Factory Pattern Integration: Seamless integration with existing create_model_function() factory
Environment-Based Auth: Proper API key validation using GOOGLE_API_KEY environment variable
Resilient API Calls: Retry mechanism with exponential backoff for handling API failures
Safety Configuration: Built-in safety filters optimized for medical content evaluation
Comprehensive Logging: Detailed logging and error recovery throughout the pipeline

🚀 Usage Examples

# Set up API key
export GOOGLE_API_KEY="your_api_key_here"

# Evaluate with Gemini Pro
python run_benchmark.py --model_name gemini:gemini-pro --max_items 100 --output_dir results/gemini/

# Evaluate with Gemini Pro Vision (future multimodal support)
python run_benchmark.py --model_name gemini:gemini-pro-vision --max_items 50

📋 Supported Gemini Models

gemini-pro - Standard text generation model
gemini-pro-vision - Multimodal model (text + images)
Any future Gemini model variants

🔧 Technical Implementation Details

Follows Established Patterns:

Consistent function signatures matching OpenAI/Anthropic implementations
Same error handling and retry patterns as existing API backends
Maintains the factory pattern architecture

Robust Error Handling:

Graceful handling of missing google-generativeai library
Clear error messages for missing API keys with setup instructions
API call failures handled with exponential backoff
Safety filter blocks handled appropriately for medical content

Safety & Medical Content:

Configured safety settings appropriate for medical AI evaluation
Handles content blocking with informative error messages
Maintains response quality while respecting safety guidelines

📚 Documentation Updates

Updated main docstring with Gemini API examples
Enhanced create_model_function() documentation
Updated command-line help text and examples
Added environment variable requirements

📦 Dependencies

Required:

google-generativeai library: pip install google-generativeai
GOOGLE_API_KEY environment variable

API Key Setup:

Available from: https://makersuite.google.com/app/apikey
Free tier available for development and testing

🧪 Testing

The implementation has been tested for:

✅ Correct backend recognition in factory function
✅ Proper error handling for missing dependencies
✅ Environment variable validation
✅ Syntax and import validation
✅ Integration with existing codebase patterns

🔄 Backward Compatibility

Zero breaking changes to existing functionality
All existing model backends continue to work unchanged
New Gemini backend is purely additive

📝 Code Quality

Follows existing code style and patterns
Comprehensive type hints throughout
Detailed docstrings with examples
Consistent logging patterns
Error messages provide clear next steps

🎯 Benefits for MEQ-Bench Users

Extended Model Coverage: Access to Google's latest Gemini models for medical explanation evaluation
Cost-Effective Options: Gemini offers competitive pricing for large-scale evaluations
Multimodal Potential: Foundation for future image-based medical explanation evaluation
Production Ready: Robust implementation suitable for research and production workloads

This enhancement significantly expands MEQ-Bench's capabilities while maintaining the high standards of reliability and usability established by the existing codebase.

🤖 Generated with Claude Code

This commit enhances the MEQ-Bench framework by adding support for Google Gemini models in the run_benchmark.py script. **Key Features:** - New `_create_gemini_model()` function with comprehensive error handling - Integration with existing `create_model_function()` factory pattern - Proper API key validation (GOOGLE_API_KEY environment variable) - Retry mechanism with exponential backoff for API calls - Safety filter configuration for medical content - Comprehensive logging and error recovery **Usage:** ```bash # Set API key export GOOGLE_API_KEY="your_api_key_here" # Run evaluation with Gemini Pro python run_benchmark.py --model_name gemini:gemini-pro --max_items 100 # Run with Gemini Pro Vision (for future multimodal support) python run_benchmark.py --model_name gemini:gemini-pro-vision --max_items 50 ``` **Technical Implementation:** - Follows existing code patterns from OpenAI and Anthropic integrations - Uses google-generativeai library with proper import error handling - Implements same retry and logging patterns as other API backends - Maintains consistent function signatures and error handling - Updates all documentation strings and help text **Dependencies:** - Requires `google-generativeai` library: `pip install google-generativeai` - Requires GOOGLE_API_KEY environment variable - API key available from: https://makersuite.google.com/app/apikey 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

heilcheng merged commit bdebd28 into main Jul 4, 2025
1 of 8 checks passed

heilcheng deleted the feat/project-enhancements-and-polish branch December 28, 2025 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Google Gemini API Support to MEQ-Bench#11

Add Google Gemini API Support to MEQ-Bench#11
heilcheng merged 1 commit intomainfrom
feat/project-enhancements-and-polish

heilcheng commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heilcheng commented Jul 4, 2025

Summary

✨ Key Features

🚀 Usage Examples

📋 Supported Gemini Models

🔧 Technical Implementation Details

📚 Documentation Updates

📦 Dependencies

🧪 Testing

🔄 Backward Compatibility

📝 Code Quality

🎯 Benefits for MEQ-Bench Users

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant