AI Psychiatrist

LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews

Overview

AI Psychiatrist implements a research paper's methodology for automated depression assessment using a four-agent LLM pipeline. The system analyzes clinical interview transcripts to selectively infer PHQ-8 item scores when supported by transcript evidence, abstaining (N/A) when evidence is insufficient.

Task validity note: PHQ-8 is a 2-week frequency self-report instrument, while DAIC-WOZ transcripts are not structured as PHQ administration. Transcript-only item scoring is often underdetermined; interpret results with coverage-aware metrics (AURC/AUGRC). See docs/clinical/task-validity.md.

Key Features

Four-Agent Pipeline: Qualitative, Judge, Quantitative, and Meta-Review agents collaborate for comprehensive assessment
Embedding-Based Few-Shot Learning: Paper reports 22% lower item-level MAE vs zero-shot (0.796 → 0.619, Section 3.2); this repo tracks coverage-adjusted metrics (AURC/AUGRC/Cmax) in run artifacts
Iterative Self-Refinement: Judge agent feedback loop improves assessment quality
Engineering-Focused: Clean architecture, strict type checking, structured logging, 80%+ test coverage

Paper Reference

Greene et al. "AI Psychiatrist Assistant: An LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews" OpenReview

Clinical disclaimer: This repository is a research/engineering implementation intended for paper reproduction and experimentation. It is not a medical device and should not be used for clinical diagnosis or treatment decisions.

Quick Start

Prerequisites

Python 3.11+
Ollama installed and running
16GB+ RAM (for 27B models)

Installation

# Clone repository
git clone https://github.com/The-Obstacle-Is-The-Way/ai-psychiatrist.git
cd ai-psychiatrist

# Install dependencies (uses uv)
make dev  # installs dev + docs + HuggingFace (recommended)

# Pull required models
ollama pull gemma3:27b-it-qat  # or gemma3:27b
ollama pull qwen3-embedding:8b

# Configure (uses validated baseline configuration)
cp .env.example .env

# Start server
make serve

Note (Embeddings backend): Chat and embeddings can use different backends:

LLM_BACKEND controls chat for agents (default: ollama)

EMBEDDING_BACKEND controls embeddings (default: huggingface) If you want a pure-Ollama setup (no HuggingFace dependencies), set EMBEDDING_BACKEND=ollama in .env.

Why HF deps matter even if embeddings exist: few-shot retrieval embeds the query (participant evidence) at runtime in the same embedding space. If EMBEDDING_BACKEND=huggingface, you still need HF deps (make dev) to compute query embeddings, even when reference *.npz files are already present.

Optional (Appendix F): The paper evaluates MedGemma 27B as an alternative model for the quantitative agent. There is no official MedGemma model in the Ollama library; use the HuggingFace backend (make dev, LLM_BACKEND=huggingface, MODEL_QUANTITATIVE_MODEL=medgemma:27b) to load the official gated weights.

Run Your First Assessment

curl -X POST http://localhost:8000/full_pipeline \
  -H "Content-Type: application/json" \
  -d '{
    "transcript_text": "Ellie: How are you doing today?\nParticipant: I have been feeling really down lately."
  }'

Documentation

Document	Description
Quickstart	Get running in 5 minutes
Architecture	System design and layers
Pipeline	How the 4-agent pipeline works
PHQ-8	Understanding depression assessment
Configuration	All configuration options
API Reference	REST API documentation
Glossary	Terms and definitions
Reproduction Results	Current-state reproduction summary
Run History	Canonical timeline + per-run statistics

For Developers

Document	Description
CLAUDE.md	Development guidelines
Specs	Specs index (implemented specs are distilled into canonical docs)
Data Schema	Dataset format documentation

Project Structure

ai-psychiatrist/
├── src/ai_psychiatrist/
│   ├── agents/           # Four assessment agents
│   ├── domain/           # Entities, enums, value objects
│   ├── services/         # Business logic (feedback loop, embeddings)
│   ├── infrastructure/   # Ollama client, logging
│   └── config.py         # Pydantic settings
├── tests/
│   ├── unit/             # Unit tests
│   ├── integration/      # Integration tests
│   └── e2e/              # End-to-end tests
├── docs/
│   ├── getting-started/  # Quickstart and tutorials
│   ├── configs/          # Configuration reference + philosophy
│   ├── embeddings/       # Embeddings + retrieval documentation
│   ├── pipeline-internals/ # Feature wiring and internals
│   ├── preflight-checklist/ # Run checklists (zero-shot / few-shot)
│   ├── results/          # Reproduction results + run history
│   ├── statistics/       # Metrics + evaluation methodology (AURC/AUGRC)
│   └── _specs/           # Specs index (implemented specs distilled into docs)
└── data/                 # DAIC-WOZ dataset (gitignored)

Development

# Full CI pipeline
make ci

# Individual commands
make test           # Run all tests with coverage
make test-unit      # Fast unit tests only
make lint-fix       # Auto-fix linting issues
make typecheck      # mypy strict mode
make format         # Format code with ruff

# Development server with hot reload
make serve

Testing with Real Ollama

# Enable Ollama integration tests
AI_PSYCHIATRIST_OLLAMA_TESTS=1 make test-e2e

Configuration

All settings via environment variables or .env file:

# Models (recommended defaults; see `.env.example`)
MODEL_QUALITATIVE_MODEL=gemma3:27b-it-qat  # or gemma3:27b
MODEL_QUANTITATIVE_MODEL=gemma3:27b-it-qat  # or gemma3:27b
MODEL_EMBEDDING_MODEL=qwen3-embedding:8b

# Backends (chat vs embeddings)
LLM_BACKEND=ollama
EMBEDDING_BACKEND=huggingface

# Few-shot retrieval (Appendix D optimal)
EMBEDDING_DIMENSION=4096
EMBEDDING_CHUNK_SIZE=8
EMBEDDING_TOP_K_REFERENCES=2

# Reference embeddings selection (NPZ + JSON sidecar)
# Default: FP16 HuggingFace embeddings (paper-train)
EMBEDDING_EMBEDDINGS_FILE=huggingface_qwen3_8b_paper_train_participant_only
# Transcript source must match how embeddings were built
DATA_TRANSCRIPTS_DIR=data/transcripts_participant_only
# Alternative: legacy Ollama embeddings (paper-train)
# EMBEDDING_EMBEDDINGS_FILE=paper_reference_embeddings
# DATA_EMBEDDINGS_PATH=/absolute/or/relative/path/to/artifact.npz  # full-path override

# Chunk scoring (Spec 35; requires {name}.chunk_scores.json sidecar)
EMBEDDING_REFERENCE_SCORE_SOURCE=chunk

# Feedback loop (Section 2.3.1)
FEEDBACK_MAX_ITERATIONS=10
FEEDBACK_SCORE_THRESHOLD=3

# Appendix F (optional): use official MedGemma via HuggingFace backend
# LLM_BACKEND=huggingface
# MODEL_QUANTITATIVE_MODEL=medgemma:27b

See Configuration Reference for all options.

Paper Results (Reported)

From the paper:

Quantitative (PHQ-8 item scoring, 0–3 per item): MAE 0.796 (zero-shot) vs 0.619 (few-shot)
Appendix F (optional): MedGemma 27B MAE 0.505, with lower coverage (“fewer predictions overall”)
Meta-review (binary classification): 78% accuracy (comparable to the human expert)

Note: The MAE values are item-level (per PHQ-8 item) and exclude items marked “N/A”.

Technology Stack

Tool	Purpose
uv	Package management
Ollama	Local LLM inference
FastAPI	REST API
Pydantic v2	Configuration & validation
structlog	Structured logging
pytest	Testing
Ruff	Linting & formatting
mypy	Type checking

License

Licensed under Apache 2.0. See LICENSE and NOTICE.

This repository is a clean-room, production-grade reimplementation of the paper’s method. It does not distribute the DAIC-WOZ dataset. The original research code is referenced in the paper under “Data and Code Availability”.

Contributing

Fork the repository
Create a feature branch
Make changes following CLAUDE.md guidelines
Run make ci to verify
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/ai_psychiatrist		src/ai_psychiatrist
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
BUG_AUDIT_CHECKLIST.md		BUG_AUDIT_CHECKLIST.md
CLAUDE.md		CLAUDE.md
EXPLANATION-HYPOTHESIS.md		EXPLANATION-HYPOTHESIS.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
Makefile		Makefile
NEXT-STEPS.md		NEXT-STEPS.md
NOTICE		NOTICE
README.md		README.md
mkdocs.yml		mkdocs.yml
prompt.md		prompt.md
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Psychiatrist

Overview

Key Features

Paper Reference

Quick Start

Prerequisites

Installation

Run Your First Assessment

Documentation

For Developers

Project Structure

Development

Testing with Real Ollama

Configuration

Paper Results (Reported)

Technology Stack

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Psychiatrist

Overview

Key Features

Paper Reference

Quick Start

Prerequisites

Installation

Run Your First Assessment

Documentation

For Developers

Project Structure

Development

Testing with Real Ollama

Configuration

Paper Results (Reported)

Technology Stack

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages