Skip to content

Commit

Permalink
server test coverage (#141)
Browse files Browse the repository at this point in the history
    server tests with real server running and real calls to openai, behind pytest flags --use-server and --use-openai so they don't get autorun in CI
    use a new CSVHandler with tempfiles to get results back to the caller in pytest so you can assert on expected output
    use openai_responses lib for mocking openai
    fix server settings
    update readme with settings and pytest instructions
    remove python 3.8 from supported versions due to use of asyncio.TaskGroup in server
    use VCR.py to mock openai in local (non-server) pytests to replace custom mock functions
    add test coverage for all skills
  • Loading branch information
matt-bernstein authored Jul 3, 2024
1 parent 988aa2f commit bc69ce6
Show file tree
Hide file tree
Showing 28 changed files with 20,377 additions and 772 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ jobs:
fail-fast: false
matrix:
os: [ ubuntu-latest, windows-latest ]
python-version: ['3.8', '3.9', '3.10', '3.11']
python-version: ['3.9', '3.10', '3.11']

steps:
- uses: actions/checkout@v4
Expand Down
3 changes: 1 addition & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,7 @@ services:
condition: service_healthy
environment:
- REDIS_URL=redis://redis:6379/0
- KAFKA_BOOTSTRAP_SERVERS=kafka:9093 # TODO pull from .env
- KAFKA_RETENTION_MS=180000 # TODO pull from .env
- KAFKA_BOOTSTRAP_SERVERS=kafka:9093
command:
["poetry", "run", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
worker:
Expand Down
518 changes: 402 additions & 116 deletions poetry.lock

Large diffs are not rendered by default.

11 changes: 10 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ classifiers = [
]

[tool.poetry.dependencies]
python = ">=3.8.8,<3.12"
python = ">=3.9,<3.12"
pandas = "*"
openai = "^1.14.3"
guidance = "0.0.64"
Expand All @@ -37,6 +37,8 @@ uvicorn = "*"
pydantic-settings = "^2.2.1"
label-studio-sdk = "^0.0.32"
kafka-python = "^2.0.2"
# https://github.com/geerlingguy/ansible-role-docker/issues/462#issuecomment-2144121102
requests = "2.31.0"

[tool.poetry.dev-dependencies]
pytest = "^7.4.3"
Expand All @@ -52,6 +54,13 @@ jupyter = "^1.0.0"
jupyterlab = "^4.0.10"
jupyter-client = "8.4.0"
matplotlib = "^3.7.4"
fakeredis = "^2.23.2"
flower = "^2.0.1"
pytest-asyncio = "^0.23.7"
celery = {extras = ["pytest"], version = "^5.4.0"}
openai-responses = "^0.8.1"
pytest-recording = "^0.13.1"
mockafka-py = "^0.1.57"

[tool.poetry.group.label-studio]
optional = true
Expand Down
4 changes: 0 additions & 4 deletions server/.env.example

This file was deleted.

30 changes: 17 additions & 13 deletions server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Once running, see `http://localhost:30001/docs` for API documentation.

Run `poetry install` first if you have not already done so.

The default settings in `server.utils.Settings` should be ok for native use. Override any you want in a `server/.env` file or by setting them as environment variables in the shells where the app and workers are running.

## start kafka and redis

```bash
Expand All @@ -16,32 +18,20 @@ docker-compose -f docker-compose.native.yml up
## start app

```bash
# set env vars
cp .env.example .env
# set AWS creds (only if S3 access needed)
eval "$(aws configure export-credentials --profile <YOUR_PROFILE> --format env)"
# start app
poetry run uvicorn app:app --host 0.0.0.0 --port 30001
```

## start celery workers

```bash
# set AWS creds (only if S3 access needed)
eval "$(aws configure export-credentials --profile <YOUR_PROFILE> --format env)"
# start celery workers
cd tasks/
poetry run celery -A process_file worker --loglevel=info
```

# run in Docker

```bash
# set AWS creds (only if S3 access needed)
eval "$(aws configure export-credentials --profile <YOUR_PROFILE> --format env)"
```

Edit the env variables in `docker-compose.yml` if needed - the defaults should be ok. There should be differences from the `.env` file for native use.
Edit the env variables in `docker-compose.yml` if needed - the defaults should be ok. There should be differences from the defaults in `server.utils.Settings` for native use.

```bash
# from repo root
Expand All @@ -55,8 +45,22 @@ cd ui/
yarn start
```

## hosted instances

settings are configured here: https://github.com/HumanSignal/infra/blob/master/base/charts/humansignal/adala/values.yaml

The only significant difference from local as of now is that kafka topic autocreation is enabled as a backup.

# development


## testing

Server pytests in `test/test_server.py` rely on a running server (kafka, redis, and celery, but not app) behind the `--use-server` flag, and a real openai key available in the `OPENAI_API_KEY` env var behind the `--use-openai` flag. These tests are not run by default. To run them, use:
```bash
poetry run pytest --use-server --use-openai
```

## rebuild on code change

```bash
Expand Down
29 changes: 16 additions & 13 deletions server/app.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,23 @@
import logging
import pickle
from enum import Enum
from typing import Any, Dict, Generic, List, Literal, Optional, TypeVar, Union
from typing import Any, Dict, Generic, List, Optional, TypeVar
import os
import json

import fastapi
from adala.agents import Agent
from aiokafka import AIOKafkaProducer
from aiokafka.errors import UnknownTopicOrPartitionError
from fastapi import HTTPException
from fastapi import HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, SerializeAsAny, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic.functional_validators import AfterValidator
from typing_extensions import Annotated
import uvicorn
from redis import Redis

from log_middleware import LogMiddleware
from tasks.process_file import app as celery_app
from tasks.process_file import streaming_parent_task
from utils import get_input_topic_name, get_output_topic_name, Settings, delete_topic
from server.log_middleware import LogMiddleware
from server.tasks.process_file import app as celery_app
from server.tasks.process_file import streaming_parent_task
from server.utils import get_input_topic_name, get_output_topic_name, Settings, delete_topic
from server.handlers.result_handlers import ResultHandler


Expand Down Expand Up @@ -258,16 +254,23 @@ async def health():
return {"status": "ok"}


async def _get_redis_conn():
"""
needs to be in a separate function to allow dependency injection for testing
"""
redis_url = os.environ.get("REDIS_URL", "redis://localhost:6379/0")
redis_conn = Redis.from_url(redis_url, socket_connect_timeout=1)
return redis_conn


@app.get("/ready")
async def ready():
async def ready(redis_conn: Redis = Depends(_get_redis_conn)):
"""
Check if the app is ready to serve requests.
See if we can reach redis. If not, raise a 500 error. Else, return 200.
"""
try:
redis_url = os.environ.get("REDIS_URL", "redis://localhost:6379/0")
redis_conn = Redis.from_url(redis_url, socket_connect_timeout=1)
redis_conn.ping()
except Exception as exception:
raise HTTPException(
Expand Down
37 changes: 35 additions & 2 deletions server/handlers/result_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import json
from abc import abstractmethod
from pydantic import BaseModel, Field, computed_field, ConfigDict, model_validator
from pathlib import Path
import csv

from adala.utils.registry import BaseModelInRegistry

Expand Down Expand Up @@ -56,7 +58,7 @@ class DummyHandler(ResultHandler):
"""

def __call__(self, batch):
logger.info(f"\n\nHandler received batch: {batch}\n\n")
logger.debug(f"\n\nHandler received batch: {batch}\n\n")


class LSEBatchItem(BaseModel):
Expand Down Expand Up @@ -131,7 +133,7 @@ def ready(self):
return self

def __call__(self, result_batch: list[LSEBatchItem]):
logger.info(f"\n\nHandler received batch: {result_batch}\n\n")
logger.debug(f"\n\nHandler received batch: {result_batch}\n\n")

# coerce dicts to LSEBatchItems for validation
result_batch = [LSEBatchItem(**record) for record in result_batch]
Expand All @@ -153,3 +155,34 @@ def __call__(self, result_batch: list[LSEBatchItem]):
}
),
)


class CSVHandler(ResultHandler):
"""
Handler to write a batch of results to a CSV file
"""

output_path: str
columns: Optional[list[str]] = None

@model_validator(mode="after")
def write_header(self):
if self.columns is None:
self.columns = list(LSEBatchItem.model_fields.keys())

with open(self.output_path, "w") as f:
writer = csv.DictWriter(f, fieldnames=self.columns)
writer.writeheader()

return self

def __call__(self, result_batch: list[LSEBatchItem]):
logger.debug(f"\n\nHandler received batch: {result_batch}\n\n")

# coerce dicts to LSEBatchItems for validation
result_batch = [LSEBatchItem(**record) for record in result_batch]

# open and write to file
with open(self.output_path, "a") as f:
writer = csv.DictWriter(f, fieldnames=self.columns)
writer.writerows([record.dict() for record in result_batch])
2 changes: 1 addition & 1 deletion server/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class Settings(BaseSettings):
https://docs.pydantic.dev/latest/concepts/pydantic_settings/#field-value-priority
"""

kafka_bootstrap_servers: Union[str, List[str]]
kafka_bootstrap_servers: Union[str, List[str]] = "localhost:9092"
kafka_retention_ms: int = 180000 # 30 minutes
kafka_input_consumer_timeout_ms: int = 1500 # 1.5 seconds
kafka_output_consumer_timeout_ms: int = 1500 # 1.5 seconds
Expand Down
Loading

0 comments on commit bc69ce6

Please sign in to comment.