Skip to content

himax12/PromptCue-Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Model LLM Chat Service

A robust and minimal Python API service that routes user prompts to different open-source Large Language Models (LLMs) and logs their performance and token usage. This project was built to fulfill the requirements of the PromptCue AI Engineer Internship assignment.


✨ Key Features

  • Dynamic Model Switching: Route prompts to different models (e.g., llama3, mistral) using a simple URL query parameter.
  • JSON API: Accepts prompts via a POST request and returns model responses in a clean JSON format.
  • Performance & Quality Logging: Automatically logs the round-trip latency (in ms) and token count for every prompt and response into a logs.csv file.
  • Simple & Robust: Built with modern, reliable tools like FastAPI and Poetry.
  • Tested: Includes a simple test suite using pytest to ensure API reliability.

🛠️ Tech Stack

  • Language: Python
  • Framework: FastAPI
  • Dependency Management: Poetry
  • Local LLM Hosting: Ollama
  • Models Used: Llama 3, Mistral
  • Testing: Pytest

🚀 Getting Started

Follow these instructions to get the project set up and running on your local machine.

Prerequisites

Make sure you have the following software installed before you begin:

  • Python (version 3.10+ recommended)
  • Poetry (for managing Python packages)
  • Ollama (for running the models locally)

Installation Guide

  1. Clone the Repository

    git clone <your-github-repository-url>
    cd multi-model-chat
  2. Install Dependencies Use Poetry to install all the required Python packages from the pyproject.toml file.

    poetry install
  3. Download Local LLMs Use the Ollama CLI to download the language models. This may take some time depending on your internet connection.

    ollama pull llama3
    ollama pull mistral

▶️ Running the Application

  1. Ensure the Ollama application is running on your machine.
  2. Use Poetry to run the FastAPI server with Uvicorn. The --reload flag will automatically restart the server when you make code changes. bash poetry run uvicorn src.main:app --reload The API will now be live and accessible at http://127.0.0.1:8000.

✅ Running Tests

To ensure all endpoints and logic are working correctly, run the test suite using pytest.

poetry run pytest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages