A robust and minimal Python API service that routes user prompts to different open-source Large Language Models (LLMs) and logs their performance and token usage. This project was built to fulfill the requirements of the PromptCue AI Engineer Internship assignment.
- Dynamic Model Switching: Route prompts to different models (e.g.,
llama3,mistral) using a simple URL query parameter. - JSON API: Accepts prompts via a
POSTrequest and returns model responses in a clean JSON format. - Performance & Quality Logging: Automatically logs the round-trip latency (in ms) and token count for every prompt and response into a
logs.csvfile. - Simple & Robust: Built with modern, reliable tools like FastAPI and Poetry.
- Tested: Includes a simple test suite using
pytestto ensure API reliability.
- Language: Python
- Framework: FastAPI
- Dependency Management: Poetry
- Local LLM Hosting: Ollama
- Models Used: Llama 3, Mistral
- Testing: Pytest
Follow these instructions to get the project set up and running on your local machine.
Make sure you have the following software installed before you begin:
- Python (version 3.10+ recommended)
- Poetry (for managing Python packages)
- Ollama (for running the models locally)
-
Clone the Repository
git clone <your-github-repository-url> cd multi-model-chat
-
Install Dependencies Use Poetry to install all the required Python packages from the
pyproject.tomlfile.poetry install
-
Download Local LLMs Use the Ollama CLI to download the language models. This may take some time depending on your internet connection.
ollama pull llama3 ollama pull mistral
- Ensure the Ollama application is running on your machine.
- Use Poetry to run the FastAPI server with Uvicorn. The
--reloadflag will automatically restart the server when you make code changes.bash poetry run uvicorn src.main:app --reloadThe API will now be live and accessible athttp://127.0.0.1:8000.
To ensure all endpoints and logic are working correctly, run the test suite using pytest.
poetry run pytest