Skip to content

Conversation

@KyleKing
Copy link
Owner

Research Phase:

  • Thoroughly researched pydantic-ai and pydantic-evals
  • Analyzed official docs, GitHub examples, HackerNews discussions, blog posts
  • Compared frameworks (PydanticAI vs LangChain vs Instructor)
  • Investigated tool calling patterns, SQL agents, RAG architectures
  • Studied evaluation methodologies and testing patterns
  • Researched pgvector integration, cost tracking, observability

Claude Skills Created:

  • .claude/skills/pydantic-ai.md: Complete PydanticAI guide

    • Agent definition and execution patterns
    • Dependency injection with RunContext
    • Tool calling best practices
    • SQL generation and validation
    • Observability with Logfire
    • Framework comparisons
    • Production-ready patterns
  • .claude/skills/pydantic-evals.md: Evaluation framework guide

    • Dataset and case management
    • Built-in and custom evaluators
    • pytest integration patterns
    • VCR caching for deterministic tests
    • Versioning and regression testing
    • Cost tracking and quality metrics
    • A/B testing patterns

Research Platform (agent-research-platform/):

  • Comprehensive architecture for agentic LLM research
  • Python 3.12 with uv dependency management
  • PostgreSQL 16 + pgvector for semantic search
  • Multi-tenant B2B schema with vector embeddings

Core Infrastructure:

  • Database models (B2B SaaS: Organizations → Users, Customers, Products, Orders, Support Tickets)
  • Vector embeddings for products, documents, support tickets
  • HNSW indexes for fast similarity search
  • Alembic migrations with pgvector support
  • Session management with async SQLAlchemy 2.0
  • Embedding service with semantic search capabilities
  • Configuration management with pydantic-settings

Setup & Utilities:

  • scripts/setup_db.py: Database initialization with pgvector
  • scripts/seed_data.py: Generate realistic B2B demo data
  • Seed data includes embeddings for semantic search demo
  • Environment configuration (.env.example)
  • Comprehensive .gitignore

Documentation:

  • README.md: Platform overview and quick start
  • ARCHITECTURE.md: Detailed system architecture
  • QUICK_START.md: 5-minute setup guide
  • docs/RESEARCH_SUMMARY.md: Complete research findings
    • PydanticAI vs alternatives analysis
    • Tool calling best practices
    • Evaluation patterns
    • Cost tracking approaches
    • Multi-tenant patterns
    • Community feedback and reviews

Key Features Demonstrated:

  • Type-safe agent development patterns
  • Multi-tenant B2B data modeling
  • Semantic search with pgvector
  • Evaluation framework design
  • Cost and quality monitoring
  • Pipeline versioning concepts
  • VCR-style caching for tests
  • Production-ready patterns

Ready for Implementation:
The platform provides complete infrastructure for:

  • SQL agents with validation
  • RAG agents with semantic search
  • Multi-step reasoning agents
  • Customer support automation
  • Comprehensive evaluation suite
  • Experiment tracking and comparison

All core infrastructure is in place, following research-backed best practices from official docs, community examples, and production use cases.

Research sources documented in:

  • docs/RESEARCH_SUMMARY.md
  • .claude/skills/*.md

Research Phase:
- Thoroughly researched pydantic-ai and pydantic-evals
- Analyzed official docs, GitHub examples, HackerNews discussions, blog posts
- Compared frameworks (PydanticAI vs LangChain vs Instructor)
- Investigated tool calling patterns, SQL agents, RAG architectures
- Studied evaluation methodologies and testing patterns
- Researched pgvector integration, cost tracking, observability

Claude Skills Created:
- .claude/skills/pydantic-ai.md: Complete PydanticAI guide
  * Agent definition and execution patterns
  * Dependency injection with RunContext
  * Tool calling best practices
  * SQL generation and validation
  * Observability with Logfire
  * Framework comparisons
  * Production-ready patterns

- .claude/skills/pydantic-evals.md: Evaluation framework guide
  * Dataset and case management
  * Built-in and custom evaluators
  * pytest integration patterns
  * VCR caching for deterministic tests
  * Versioning and regression testing
  * Cost tracking and quality metrics
  * A/B testing patterns

Research Platform (agent-research-platform/):
- Comprehensive architecture for agentic LLM research
- Python 3.12 with uv dependency management
- PostgreSQL 16 + pgvector for semantic search
- Multi-tenant B2B schema with vector embeddings

Core Infrastructure:
- Database models (B2B SaaS: Organizations → Users, Customers, Products, Orders, Support Tickets)
- Vector embeddings for products, documents, support tickets
- HNSW indexes for fast similarity search
- Alembic migrations with pgvector support
- Session management with async SQLAlchemy 2.0
- Embedding service with semantic search capabilities
- Configuration management with pydantic-settings

Setup & Utilities:
- scripts/setup_db.py: Database initialization with pgvector
- scripts/seed_data.py: Generate realistic B2B demo data
- Seed data includes embeddings for semantic search demo
- Environment configuration (.env.example)
- Comprehensive .gitignore

Documentation:
- README.md: Platform overview and quick start
- ARCHITECTURE.md: Detailed system architecture
- QUICK_START.md: 5-minute setup guide
- docs/RESEARCH_SUMMARY.md: Complete research findings
  * PydanticAI vs alternatives analysis
  * Tool calling best practices
  * Evaluation patterns
  * Cost tracking approaches
  * Multi-tenant patterns
  * Community feedback and reviews

Key Features Demonstrated:
- Type-safe agent development patterns
- Multi-tenant B2B data modeling
- Semantic search with pgvector
- Evaluation framework design
- Cost and quality monitoring
- Pipeline versioning concepts
- VCR-style caching for tests
- Production-ready patterns

Ready for Implementation:
The platform provides complete infrastructure for:
- SQL agents with validation
- RAG agents with semantic search
- Multi-step reasoning agents
- Customer support automation
- Comprehensive evaluation suite
- Experiment tracking and comparison

All core infrastructure is in place, following research-backed
best practices from official docs, community examples, and
production use cases.

Research sources documented in:
- docs/RESEARCH_SUMMARY.md
- .claude/skills/*.md
Complete implementation of medical domain agents with HIPAA-aware
patterns, comprehensive testing, and custom evaluators.

Medical/Biomedical Schema (models_medical.py):
- Clinical data: Institution, Staff, Patient, Encounter, Diagnosis, Medication, LabResult
- Research data: ResearchProject, ClinicalTrial, Publication
- All with pgvector embeddings for semantic search
- Audit logging for HIPAA compliance

Medical Tools (tools/medical.py):
- find_patient: Lookup with audit logging and de-identification
- get_patient_diagnoses: ICD-10 coded diagnosis retrieval
- get_patient_medications: Active/historical medication lists
- get_lab_results: Lab values with abnormal flags
- search_patients_by_diagnosis: Cohort discovery by ICD-10
- get_encounter_summary: Visit history
- Multi-tenant isolation via institution_id
- All access logged for compliance

Research Tools (tools/research.py):
- search_publications: Semantic literature search
- search_clinical_trials: Trial discovery with phase/status filters
- find_publications_by_author: Author-based search
- get_research_projects: Active research tracking
- get_trial_enrollment_stats: Enrollment metrics
- Vector similarity search using pgvector

Medical Records Agent (medical_agent.py):
- Patient lookup and demographics
- Diagnosis review with ICD-10 codes
- Medication analysis
- Lab result interpretation
- Encounter history summaries
- HIPAA-compliant with audit logging
- PHI access flagging
- Confidence scoring

Biomedical Research Agent (research_agent.py):
- Semantic publication search
- Clinical trial discovery
- Author-based searches
- Research project tracking
- Proper academic citations (DOI, PubMed ID)
- Research area classification

Custom Evaluators (evaluators/medical_evaluators.py):
- PHILeakageEvaluator: Detects SSN, phone, names, DOB patterns
- MedicalAccuracyEvaluator: Validates ICD-10 codes, terminology
- AuditComplianceEvaluator: Verifies HIPAA audit logging
- ResponseCompletenessEvaluator: Answer quality and structure
- CostBudgetEvaluator: Cost efficiency tracking
- LatencyEvaluator: Response time requirements
- ResearchCitationEvaluator: Academic citation quality

Comprehensive Test Suite:
- test_medical_agent.py: 15+ tests for clinical workflows
  * Patient lookup, diagnosis, medications, labs
  * Comprehensive summaries
  * PHI leakage prevention
  * Medical accuracy validation
  * Edge cases and error handling
- test_research_agent.py: Publication and trial search tests
  * Semantic search validation
  * Citation quality checks
  * Research area classification
- conftest.py: pytest configuration
  * VCR setup for LLM response caching
  * Mock dependency fixtures
  * Sample data fixtures (MRNs, ICD-10 codes)
  * Evaluator fixtures
- All tests use pytest-recording for deterministic execution

Key Innovations:
- HIPAA-aware agent design with audit logging
- Multi-tenant security with institution_id enforcement
- PHI detection and de-identification
- Semantic search for medical data (encounters, publications, trials)
- Privacy-first evaluation framework
- Deterministic medical tests with VCR caching
- ICD-10 and LOINC code support

Documentation:
- MEDICAL_IMPLEMENTATION.md: Complete implementation guide
  * Detailed feature documentation
  * HIPAA compliance patterns
  * Use case examples
  * Testing strategies
  * Important disclaimers

Ready for:
- Medical records processing
- Biomedical literature review
- Clinical trial discovery
- Population health analysis
- Clinical decision support (with human oversight)

IMPORTANT: This is a demonstration/research platform.
Production medical systems require proper HIPAA certification,
security audits, and regulatory approval.
@KyleKing KyleKing changed the title Research tool calling with pydantic-ai feat: research tool calling with pydantic-ai Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants