Skip to content

Add Google Gemini API Support to MEQ-Bench#11

Merged
heilcheng merged 1 commit intomainfrom
feat/project-enhancements-and-polish
Jul 4, 2025
Merged

Add Google Gemini API Support to MEQ-Bench#11
heilcheng merged 1 commit intomainfrom
feat/project-enhancements-and-polish

Conversation

@heilcheng
Copy link
Owner

Summary

This PR enhances the MEQ-Bench framework by adding comprehensive support for Google Gemini models in the run_benchmark.py script. The implementation follows the existing patterns used for OpenAI and Anthropic APIs, ensuring consistency and maintainability.

✨ Key Features

  • New Gemini Integration: Complete _create_gemini_model() function with robust error handling
  • Factory Pattern Integration: Seamless integration with existing create_model_function() factory
  • Environment-Based Auth: Proper API key validation using GOOGLE_API_KEY environment variable
  • Resilient API Calls: Retry mechanism with exponential backoff for handling API failures
  • Safety Configuration: Built-in safety filters optimized for medical content evaluation
  • Comprehensive Logging: Detailed logging and error recovery throughout the pipeline

🚀 Usage Examples

# Set up API key
export GOOGLE_API_KEY="your_api_key_here"

# Evaluate with Gemini Pro
python run_benchmark.py --model_name gemini:gemini-pro --max_items 100 --output_dir results/gemini/

# Evaluate with Gemini Pro Vision (future multimodal support)
python run_benchmark.py --model_name gemini:gemini-pro-vision --max_items 50

📋 Supported Gemini Models

  • gemini-pro - Standard text generation model
  • gemini-pro-vision - Multimodal model (text + images)
  • Any future Gemini model variants

🔧 Technical Implementation Details

Follows Established Patterns:

  • Consistent function signatures matching OpenAI/Anthropic implementations
  • Same error handling and retry patterns as existing API backends
  • Maintains the factory pattern architecture

Robust Error Handling:

  • Graceful handling of missing google-generativeai library
  • Clear error messages for missing API keys with setup instructions
  • API call failures handled with exponential backoff
  • Safety filter blocks handled appropriately for medical content

Safety & Medical Content:

  • Configured safety settings appropriate for medical AI evaluation
  • Handles content blocking with informative error messages
  • Maintains response quality while respecting safety guidelines

📚 Documentation Updates

  • Updated main docstring with Gemini API examples
  • Enhanced create_model_function() documentation
  • Updated command-line help text and examples
  • Added environment variable requirements

📦 Dependencies

Required:

  • google-generativeai library: pip install google-generativeai
  • GOOGLE_API_KEY environment variable

API Key Setup:

🧪 Testing

The implementation has been tested for:

  • ✅ Correct backend recognition in factory function
  • ✅ Proper error handling for missing dependencies
  • ✅ Environment variable validation
  • ✅ Syntax and import validation
  • ✅ Integration with existing codebase patterns

🔄 Backward Compatibility

  • Zero breaking changes to existing functionality
  • All existing model backends continue to work unchanged
  • New Gemini backend is purely additive

📝 Code Quality

  • Follows existing code style and patterns
  • Comprehensive type hints throughout
  • Detailed docstrings with examples
  • Consistent logging patterns
  • Error messages provide clear next steps

🎯 Benefits for MEQ-Bench Users

  1. Extended Model Coverage: Access to Google's latest Gemini models for medical explanation evaluation
  2. Cost-Effective Options: Gemini offers competitive pricing for large-scale evaluations
  3. Multimodal Potential: Foundation for future image-based medical explanation evaluation
  4. Production Ready: Robust implementation suitable for research and production workloads

This enhancement significantly expands MEQ-Bench's capabilities while maintaining the high standards of reliability and usability established by the existing codebase.


🤖 Generated with Claude Code

This commit enhances the MEQ-Bench framework by adding support for Google Gemini models in the run_benchmark.py script.

**Key Features:**
- New `_create_gemini_model()` function with comprehensive error handling
- Integration with existing `create_model_function()` factory pattern
- Proper API key validation (GOOGLE_API_KEY environment variable)
- Retry mechanism with exponential backoff for API calls
- Safety filter configuration for medical content
- Comprehensive logging and error recovery

**Usage:**
```bash
# Set API key
export GOOGLE_API_KEY="your_api_key_here"

# Run evaluation with Gemini Pro
python run_benchmark.py --model_name gemini:gemini-pro --max_items 100

# Run with Gemini Pro Vision (for future multimodal support)
python run_benchmark.py --model_name gemini:gemini-pro-vision --max_items 50
```

**Technical Implementation:**
- Follows existing code patterns from OpenAI and Anthropic integrations
- Uses google-generativeai library with proper import error handling
- Implements same retry and logging patterns as other API backends
- Maintains consistent function signatures and error handling
- Updates all documentation strings and help text

**Dependencies:**
- Requires `google-generativeai` library: `pip install google-generativeai`
- Requires GOOGLE_API_KEY environment variable
- API key available from: https://makersuite.google.com/app/apikey

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@heilcheng heilcheng merged commit bdebd28 into main Jul 4, 2025
1 of 8 checks passed
@heilcheng heilcheng deleted the feat/project-enhancements-and-polish branch December 28, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant