-
Notifications
You must be signed in to change notification settings - Fork 0
feat: research tool calling with pydantic-ai #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
KyleKing
wants to merge
2
commits into
yak-shears-py
Choose a base branch
from
claude/research-pydantic-ai-tools-01EayLiuNnxcT2RKK7srFy3b
base: yak-shears-py
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: research tool calling with pydantic-ai #4
KyleKing
wants to merge
2
commits into
yak-shears-py
from
claude/research-pydantic-ai-tools-01EayLiuNnxcT2RKK7srFy3b
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Research Phase: - Thoroughly researched pydantic-ai and pydantic-evals - Analyzed official docs, GitHub examples, HackerNews discussions, blog posts - Compared frameworks (PydanticAI vs LangChain vs Instructor) - Investigated tool calling patterns, SQL agents, RAG architectures - Studied evaluation methodologies and testing patterns - Researched pgvector integration, cost tracking, observability Claude Skills Created: - .claude/skills/pydantic-ai.md: Complete PydanticAI guide * Agent definition and execution patterns * Dependency injection with RunContext * Tool calling best practices * SQL generation and validation * Observability with Logfire * Framework comparisons * Production-ready patterns - .claude/skills/pydantic-evals.md: Evaluation framework guide * Dataset and case management * Built-in and custom evaluators * pytest integration patterns * VCR caching for deterministic tests * Versioning and regression testing * Cost tracking and quality metrics * A/B testing patterns Research Platform (agent-research-platform/): - Comprehensive architecture for agentic LLM research - Python 3.12 with uv dependency management - PostgreSQL 16 + pgvector for semantic search - Multi-tenant B2B schema with vector embeddings Core Infrastructure: - Database models (B2B SaaS: Organizations → Users, Customers, Products, Orders, Support Tickets) - Vector embeddings for products, documents, support tickets - HNSW indexes for fast similarity search - Alembic migrations with pgvector support - Session management with async SQLAlchemy 2.0 - Embedding service with semantic search capabilities - Configuration management with pydantic-settings Setup & Utilities: - scripts/setup_db.py: Database initialization with pgvector - scripts/seed_data.py: Generate realistic B2B demo data - Seed data includes embeddings for semantic search demo - Environment configuration (.env.example) - Comprehensive .gitignore Documentation: - README.md: Platform overview and quick start - ARCHITECTURE.md: Detailed system architecture - QUICK_START.md: 5-minute setup guide - docs/RESEARCH_SUMMARY.md: Complete research findings * PydanticAI vs alternatives analysis * Tool calling best practices * Evaluation patterns * Cost tracking approaches * Multi-tenant patterns * Community feedback and reviews Key Features Demonstrated: - Type-safe agent development patterns - Multi-tenant B2B data modeling - Semantic search with pgvector - Evaluation framework design - Cost and quality monitoring - Pipeline versioning concepts - VCR-style caching for tests - Production-ready patterns Ready for Implementation: The platform provides complete infrastructure for: - SQL agents with validation - RAG agents with semantic search - Multi-step reasoning agents - Customer support automation - Comprehensive evaluation suite - Experiment tracking and comparison All core infrastructure is in place, following research-backed best practices from official docs, community examples, and production use cases. Research sources documented in: - docs/RESEARCH_SUMMARY.md - .claude/skills/*.md
Complete implementation of medical domain agents with HIPAA-aware patterns, comprehensive testing, and custom evaluators. Medical/Biomedical Schema (models_medical.py): - Clinical data: Institution, Staff, Patient, Encounter, Diagnosis, Medication, LabResult - Research data: ResearchProject, ClinicalTrial, Publication - All with pgvector embeddings for semantic search - Audit logging for HIPAA compliance Medical Tools (tools/medical.py): - find_patient: Lookup with audit logging and de-identification - get_patient_diagnoses: ICD-10 coded diagnosis retrieval - get_patient_medications: Active/historical medication lists - get_lab_results: Lab values with abnormal flags - search_patients_by_diagnosis: Cohort discovery by ICD-10 - get_encounter_summary: Visit history - Multi-tenant isolation via institution_id - All access logged for compliance Research Tools (tools/research.py): - search_publications: Semantic literature search - search_clinical_trials: Trial discovery with phase/status filters - find_publications_by_author: Author-based search - get_research_projects: Active research tracking - get_trial_enrollment_stats: Enrollment metrics - Vector similarity search using pgvector Medical Records Agent (medical_agent.py): - Patient lookup and demographics - Diagnosis review with ICD-10 codes - Medication analysis - Lab result interpretation - Encounter history summaries - HIPAA-compliant with audit logging - PHI access flagging - Confidence scoring Biomedical Research Agent (research_agent.py): - Semantic publication search - Clinical trial discovery - Author-based searches - Research project tracking - Proper academic citations (DOI, PubMed ID) - Research area classification Custom Evaluators (evaluators/medical_evaluators.py): - PHILeakageEvaluator: Detects SSN, phone, names, DOB patterns - MedicalAccuracyEvaluator: Validates ICD-10 codes, terminology - AuditComplianceEvaluator: Verifies HIPAA audit logging - ResponseCompletenessEvaluator: Answer quality and structure - CostBudgetEvaluator: Cost efficiency tracking - LatencyEvaluator: Response time requirements - ResearchCitationEvaluator: Academic citation quality Comprehensive Test Suite: - test_medical_agent.py: 15+ tests for clinical workflows * Patient lookup, diagnosis, medications, labs * Comprehensive summaries * PHI leakage prevention * Medical accuracy validation * Edge cases and error handling - test_research_agent.py: Publication and trial search tests * Semantic search validation * Citation quality checks * Research area classification - conftest.py: pytest configuration * VCR setup for LLM response caching * Mock dependency fixtures * Sample data fixtures (MRNs, ICD-10 codes) * Evaluator fixtures - All tests use pytest-recording for deterministic execution Key Innovations: - HIPAA-aware agent design with audit logging - Multi-tenant security with institution_id enforcement - PHI detection and de-identification - Semantic search for medical data (encounters, publications, trials) - Privacy-first evaluation framework - Deterministic medical tests with VCR caching - ICD-10 and LOINC code support Documentation: - MEDICAL_IMPLEMENTATION.md: Complete implementation guide * Detailed feature documentation * HIPAA compliance patterns * Use case examples * Testing strategies * Important disclaimers Ready for: - Medical records processing - Biomedical literature review - Clinical trial discovery - Population health analysis - Clinical decision support (with human oversight) IMPORTANT: This is a demonstration/research platform. Production medical systems require proper HIPAA certification, security audits, and regulatory approval.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Research Phase:
Claude Skills Created:
.claude/skills/pydantic-ai.md: Complete PydanticAI guide
.claude/skills/pydantic-evals.md: Evaluation framework guide
Research Platform (agent-research-platform/):
Core Infrastructure:
Setup & Utilities:
Documentation:
Key Features Demonstrated:
Ready for Implementation:
The platform provides complete infrastructure for:
All core infrastructure is in place, following research-backed best practices from official docs, community examples, and production use cases.
Research sources documented in: