Fix: Make CI/CD pipeline robust and non-interactive by heilcheng · Pull Request #12 · heilcheng/medexplain-evals

heilcheng · 2025-07-04T15:03:12Z

Summary

This pull request resolves critical issues in the CI/CD pipeline that were causing the integration-test job to fail and makes the overall pipeline more robust by ensuring linting, security, and type checking failures properly fail the build.

🔧 Interactive Script Fix

Problem: The examples/basic_usage.py script was hanging during CI/CD execution due to input() calls waiting for user interaction.

Solution:

✅ Added os.isatty() check to detect non-interactive environments (CI/CD)
✅ Automatically uses dummy model by default in CI/CD environments
✅ Skips all user prompts and runs minimal evaluation in CI mode
✅ Preserves full interactive functionality when run manually
✅ Proper error handling and variable scope management

Before:

choice = input("Enter choice (1 or 2, default=1): ").strip()
run_full = input("\nRun full benchmark evaluation? (y/N): ").strip().lower()

After:

if is_interactive:
    choice = input("Enter choice (1 or 2, default=1): ").strip()
    run_full = input("\nRun full benchmark evaluation? (y/N): ").strip().lower()
else:
    # Non-interactive mode (CI/CD) - use dummy model by default
    choice = "1"
    run_full = 'y'
    print("Non-interactive environment detected, using dummy model...")

🛡️ CI/CD Workflow Improvements

Problem: The CI/CD pipeline was not failing builds when it should due to permissive flags that treated errors as warnings.

Changes Made:

1. Linting (flake8)

❌ Removed: --exit-zero flag that treated all errors as warnings
✅ Added: examples/ directory to linting scope
🎯 Result: Linting errors now properly fail the build

2. Security Scanning (bandit)

❌ Removed: || true that prevented security failures from failing the build
🎯 Result: Security vulnerabilities now properly fail the build

3. Type Checking (mypy)

❌ Removed: --ignore-missing-imports flag for stricter checking
✅ Added: --strict-optional flag for better type safety
🎯 Result: Type errors now properly fail the build with better coverage

📊 Impact

Before (Problematic)

❌ Integration tests hung indefinitely on input() calls
⚠️ Linting errors ignored with --exit-zero
⚠️ Security vulnerabilities ignored with || true
⚠️ Missing imports ignored in type checking

After (Robust)

✅ Integration tests run automatically without user interaction
✅ Linting errors fail the build appropriately
✅ Security vulnerabilities fail the build appropriately
✅ Type checking is more comprehensive and strict

🧪 Testing

Non-Interactive Mode Verification:

# Runs without hanging, uses dummy model automatically
python examples/basic_usage.py < /dev/null

Interactive Mode Preserved:

# Still prompts for user input when run manually
python examples/basic_usage.py

🎯 Files Changed

examples/basic_usage.py
- Added environment detection with os.isatty()
- Implemented non-interactive defaults
- Fixed variable scope issues
- Enhanced error handling for CI/CD mode
.github/workflows/ci.yml
- Removed permissive flags from linting, security, and type checking
- Enhanced coverage and strictness
- Ensured proper build failures on issues

✅ Verification Checklist

Integration tests no longer hang on user input
Script works in both interactive and non-interactive modes
Linting errors will fail the build
Security vulnerabilities will fail the build
Type checking is more comprehensive
All existing functionality preserved for manual use
No breaking changes to the public API

🚀 Benefits

Reliable CI/CD: Integration tests now run automatically without manual intervention
Better Code Quality: Stricter linting, security, and type checking standards
Faster Feedback: Developers get immediate feedback on code quality issues
Maintainability: Easier to catch and fix issues before they reach production
User Experience: Interactive script still works perfectly for manual testing

This PR ensures the MEQ-Bench CI/CD pipeline is robust, reliable, and maintains high code quality standards while preserving the excellent user experience for manual testing and development.

🤖 Generated with Claude Code

**Interactive Script Fix:** - Add os.isatty() check to detect non-interactive environments - Use dummy model by default in CI/CD environments - Skip user prompts and run minimal evaluation in CI mode - Preserve all interactive functionality for manual use - Ensures integration tests can run without hanging on input() **CI/CD Workflow Improvements:** - Remove --exit-zero flag from flake8 to fail build on linting errors - Remove || true from bandit to fail build on security vulnerabilities - Remove --ignore-missing-imports from mypy for stricter type checking - Add examples/ directory to flake8 linting scope - Add --strict-optional flag to mypy for better type safety These changes make the CI/CD pipeline more robust by ensuring that linting errors, security issues, and type checking problems will properly fail the build instead of being ignored.

- Set choice variable in non-interactive branch to prevent UnboundLocalError - Script now properly runs without hanging on input() calls - Maintains compatibility for both interactive and CI/CD environments

heilcheng added 2 commits July 4, 2025 23:00

Fix: Resolve variable scope issue in non-interactive mode

9e0ee1c

- Set choice variable in non-interactive branch to prevent UnboundLocalError - Script now properly runs without hanging on input() calls - Maintains compatibility for both interactive and CI/CD environments

heilcheng closed this Jul 4, 2025

heilcheng deleted the fix/cicd-pipeline branch December 28, 2025 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Make CI/CD pipeline robust and non-interactive#12

Fix: Make CI/CD pipeline robust and non-interactive#12
heilcheng wants to merge 2 commits intomainfrom
fix/cicd-pipeline

heilcheng commented Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heilcheng commented Jul 4, 2025

Summary

🔧 Interactive Script Fix

🛡️ CI/CD Workflow Improvements

1. Linting (flake8)

2. Security Scanning (bandit)

3. Type Checking (mypy)

📊 Impact

Before (Problematic)

After (Robust)

🧪 Testing

🎯 Files Changed

✅ Verification Checklist

🚀 Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant