Skip to content

The-Obstacle-Is-The-Way/ai-psychiatrist

 
 

Repository files navigation

AI Psychiatrist

LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews

Python 3.11+ License codecov Code style: ruff Type checked: mypy


Overview

AI Psychiatrist implements a research paper's methodology for automated depression assessment using a four-agent LLM pipeline. The system analyzes clinical interview transcripts to selectively infer PHQ-8 item scores when supported by transcript evidence, abstaining (N/A) when evidence is insufficient.

Task validity note: PHQ-8 is a 2-week frequency self-report instrument, while DAIC-WOZ transcripts are not structured as PHQ administration. Transcript-only item scoring is often underdetermined; interpret results with coverage-aware metrics (AURC/AUGRC). See docs/clinical/task-validity.md.

Key Features

  • Four-Agent Pipeline: Qualitative, Judge, Quantitative, and Meta-Review agents collaborate for comprehensive assessment
  • Embedding-Based Few-Shot Learning: Paper reports 22% lower item-level MAE vs zero-shot (0.796 → 0.619, Section 3.2); this repo tracks coverage-adjusted metrics (AURC/AUGRC/Cmax) in run artifacts
  • Iterative Self-Refinement: Judge agent feedback loop improves assessment quality
  • Engineering-Focused: Clean architecture, strict type checking, structured logging, 80%+ test coverage

Paper Reference

Greene et al. "AI Psychiatrist Assistant: An LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews" OpenReview

Clinical disclaimer: This repository is a research/engineering implementation intended for paper reproduction and experimentation. It is not a medical device and should not be used for clinical diagnosis or treatment decisions.


Quick Start

Prerequisites

  • Python 3.11+
  • Ollama installed and running
  • 16GB+ RAM (for 27B models)

Installation

# Clone repository
git clone https://github.com/The-Obstacle-Is-The-Way/ai-psychiatrist.git
cd ai-psychiatrist

# Install dependencies (uses uv)
make dev  # installs dev + docs + HuggingFace (recommended)

# Pull required models
ollama pull gemma3:27b-it-qat  # or gemma3:27b
ollama pull qwen3-embedding:8b

# Configure (uses validated baseline configuration)
cp .env.example .env

# Start server
make serve

Note (Embeddings backend): Chat and embeddings can use different backends:

  • LLM_BACKEND controls chat for agents (default: ollama)
  • EMBEDDING_BACKEND controls embeddings (default: huggingface) If you want a pure-Ollama setup (no HuggingFace dependencies), set EMBEDDING_BACKEND=ollama in .env.

Why HF deps matter even if embeddings exist: few-shot retrieval embeds the query (participant evidence) at runtime in the same embedding space. If EMBEDDING_BACKEND=huggingface, you still need HF deps (make dev) to compute query embeddings, even when reference *.npz files are already present.

Optional (Appendix F): The paper evaluates MedGemma 27B as an alternative model for the quantitative agent. There is no official MedGemma model in the Ollama library; use the HuggingFace backend (make dev, LLM_BACKEND=huggingface, MODEL_QUANTITATIVE_MODEL=medgemma:27b) to load the official gated weights.

Run Your First Assessment

curl -X POST http://localhost:8000/full_pipeline \
  -H "Content-Type: application/json" \
  -d '{
    "transcript_text": "Ellie: How are you doing today?\nParticipant: I have been feeling really down lately."
  }'

Documentation

Document Description
Quickstart Get running in 5 minutes
Architecture System design and layers
Pipeline How the 4-agent pipeline works
PHQ-8 Understanding depression assessment
Configuration All configuration options
API Reference REST API documentation
Glossary Terms and definitions
Reproduction Results Current-state reproduction summary
Run History Canonical timeline + per-run statistics

For Developers

Document Description
CLAUDE.md Development guidelines
Specs Specs index (implemented specs are distilled into canonical docs)
Data Schema Dataset format documentation

Project Structure

ai-psychiatrist/
├── src/ai_psychiatrist/
│   ├── agents/           # Four assessment agents
│   ├── domain/           # Entities, enums, value objects
│   ├── services/         # Business logic (feedback loop, embeddings)
│   ├── infrastructure/   # Ollama client, logging
│   └── config.py         # Pydantic settings
├── tests/
│   ├── unit/             # Unit tests
│   ├── integration/      # Integration tests
│   └── e2e/              # End-to-end tests
├── docs/
│   ├── getting-started/  # Quickstart and tutorials
│   ├── configs/          # Configuration reference + philosophy
│   ├── embeddings/       # Embeddings + retrieval documentation
│   ├── pipeline-internals/ # Feature wiring and internals
│   ├── preflight-checklist/ # Run checklists (zero-shot / few-shot)
│   ├── results/          # Reproduction results + run history
│   ├── statistics/       # Metrics + evaluation methodology (AURC/AUGRC)
│   └── _specs/           # Specs index (implemented specs distilled into docs)
└── data/                 # DAIC-WOZ dataset (gitignored)

Development

# Full CI pipeline
make ci

# Individual commands
make test           # Run all tests with coverage
make test-unit      # Fast unit tests only
make lint-fix       # Auto-fix linting issues
make typecheck      # mypy strict mode
make format         # Format code with ruff

# Development server with hot reload
make serve

Testing with Real Ollama

# Enable Ollama integration tests
AI_PSYCHIATRIST_OLLAMA_TESTS=1 make test-e2e

Configuration

All settings via environment variables or .env file:

# Models (recommended defaults; see `.env.example`)
MODEL_QUALITATIVE_MODEL=gemma3:27b-it-qat  # or gemma3:27b
MODEL_QUANTITATIVE_MODEL=gemma3:27b-it-qat  # or gemma3:27b
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b

# Backends (chat vs embeddings)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface

# Few-shot retrieval (Appendix D optimal)
EMBEDDING_DIMENSION=4096
EMBEDDING_CHUNK_SIZE=8
EMBEDDING_TOP_K_REFERENCES=2

# Reference embeddings selection (NPZ + JSON sidecar)
# Default: FP16 HuggingFace embeddings (paper-train)
EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
# Transcript source must match how embeddings were built
DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
# Alternative: legacy Ollama embeddings (paper-train)
# EMBEDDING_EMBEDDINGS_FILE=paper_reference_embeddings
# DATA_EMBEDDINGS_PATH=/absolute/or/relative/path/to/artifact.npz  # full-path override

# Chunk scoring (Spec 35; requires {name}.chunk_scores.json sidecar)
EMBEDDING_REFERENCE_SCORE_SOURCE=chunk

# Feedback loop (Section 2.3.1)
FEEDBACK_MAX_ITERATIONS=10
FEEDBACK_SCORE_THRESHOLD=3

# Appendix F (optional): use official MedGemma via HuggingFace backend
# LLM_BACKEND=huggingface
# MODEL_QUANTITATIVE_MODEL=medgemma:27b

See Configuration Reference for all options.


Paper Results (Reported)

From the paper:

  • Quantitative (PHQ-8 item scoring, 0–3 per item): MAE 0.796 (zero-shot) vs 0.619 (few-shot)
  • Appendix F (optional): MedGemma 27B MAE 0.505, with lower coverage (“fewer predictions overall”)
  • Meta-review (binary classification): 78% accuracy (comparable to the human expert)

Note: The MAE values are item-level (per PHQ-8 item) and exclude items marked “N/A”.


Technology Stack

Tool Purpose
uv Package management
Ollama Local LLM inference
FastAPI REST API
Pydantic v2 Configuration & validation
structlog Structured logging
pytest Testing
Ruff Linting & formatting
mypy Type checking

License

Licensed under Apache 2.0. See LICENSE and NOTICE.

This repository is a clean-room, production-grade reimplementation of the paper’s method. It does not distribute the DAIC-WOZ dataset. The original research code is referenced in the paper under “Data and Code Availability”.


Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes following CLAUDE.md guidelines
  4. Run make ci to verify
  5. Submit a pull request