This service utilizes llama.cpp and whisper.cpp to provide an API compatible with OpenAI's endpoints. It enables you to run large language models and speech recognition models locally, offering features like text completion and audio transcription through familiar API calls.
- Clone the repository
git clone https://github.com/EmanuelJr/cpp-backend.git
cd cpp-backend
- Setup your environment
This script creates a virtual environment, figures out the build type, and installs the required dependencies.
./setup.sh
- Set your API key (optional)
To protect your API, create an API key and set it as an environment variable:
export API_KEY=your-api-key-here
- Start the server
python3 run.py
By default, the server runs on http://127.0.0.1:8000.
You also can run to set host and port:
python3 run.py 127.0.0.1:8080
- Verify the Server is Running
Visit http://127.0.0.1:8000/health in your browser or use curl:
curl http://127.0.0.1:8000/health
You should receive a response indicating the server is operational.
Feel free to inspect our Swagger at http://127.0.0.1:8000/docs
- llama-cpp-python for the library and the base server
- whisper-cpp-python