This is a FastAPI-based backend service that powers a Retrieval-Augmented Generation (RAG) pipeline using a local SQLite knowledge base and Pinecone vector search.
- Query a multimodal (text + image) knowledge base via
/queryendpoint - Uses Pinecone for vector similarity search
- GPT-4o-mini model via AIProxy (supports vision for image analysis)
- Auto-generated embeddings using OpenAI-compatible embedding model
- Returns contextual answers along with source links
- Handles rate-limiting and retries on failures
- Health-check endpoint at
/health
- Python 3.8+
- Environment Variables:
API_KEY(for AIProxy OpenAI-compatible API)PINECONE_API_KEY(your Pinecone API key)PINECONE_ENV(Pinecone environment, e.g.,us-east-1-aws)
- Clone the repository
- Create and activate a virtual environment
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile with the required keys:API_KEY=your_aipipe_api_key PINECONE_API_KEY=your_pinecone_key PINECONE_ENV=us-east-1-aws
uvicorn app:app --reload --port 8000Query the knowledge base with text and optionally an image (base64).
Request:
{
"question": "Explain cosine similarity",
"image": "base64_string_if_any"
}Response:
{
"data": {
"answer": "Cosine similarity measures ...",
"links": [
{"url": "https://example.com", "text": "Relevant source"}
]
}
}Checks DB connectivity and whether embeddings exist.
discourse_chunks: stores forum chunks with embeddingsmarkdown_chunks: stores markdown doc chunks with embeddings
- Handles rate limits and retries gracefully
- Image support via GPT-4o multimodal model
- Logs important events and errors with full tracebacks
MIT