This repository contains a Retrieval-Augmented Generation (RAG) based chatbot. The project utilizes a vector database for document retrieval, an embedding model for vectorizing text, and a large language model (LLM) for generation. The following instructions will help you set up the project on your local machine.
- Ollama - for hosting models via a REST API.
- Python 3.12.8 installed.
- Docker and Docker Compose installed.
Install and configure Ollama following the official instructions. Once installed:
-
Download the Embedding Model:
Download theminilmmodel for embeddings. -
Download the Chat Generation Model:
Download a lightweight generation model (e.g.,phi4 minior similar, as per your requirements).
Ensure that both models are available and accessible via the REST API endpoints specified later in the configuration.
Edit the .env file in the root directory of your project.
Create a folder named Guides in the root directory. Place all your PDF documents inside this folder. These documents will be used by the document parser to create embeddings and store them in the vector database.
mkdir GuidesCreate and activate a new Python virtual environment, then install the necessary dependencies:
# Create virtual environment
python -m venv venv
# Activate virtual environment (Linux/Mac)
source venv/bin/activate
# Activate virtual environment (Windows)
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThis project uses Weaviate for storing document embeddings. Use the provided Docker Compose file to set up Weaviate:
docker-compose up -dThis command will start Weaviate on port 8080.
Run the document parser script to chunk PDFs from the Guides folder, generate embeddings using the embedding model, and store them in your vector database:
python docparser.pyEnsure that the document parser is properly configured to read from the Guides folder and use the vector store directory specified in the .env file.
Finally, launch the chatbot UI by running:
python demo_experiment.pyThis will start the UI in your browser, allowing you to interact with the chatbot. Ask questions and explore its capabilities.
The project uses several environment variables to control its configuration. Set these in your .env file:
COLLECTION_NAME: Name of the collection in your vector database.PERSIST_DIRECTORY: Directory where the vector store is persisted.LLM_URL: URL endpoint for the chat generation model.EMBED_MODEL_URL: URL endpoint for the embedding model.EMBEDDING_MODEL: Model identifier for the embedding model (e.g.,all-minilm).CHAT_MODEL: Model identifier for the generation model (e.g.,llama3.1:8b).RETRIEVAL: Retrieval method type (options:keyword,vector, orhybrid).LOG_LEVEL: Log level for debugging (e.g.,DEBUG,INFO).
Ensure these variables are correctly set for the project to run smoothly.
The provided docker-compose.yml file configures and launches a Weaviate instance with anonymous access enabled. This setup is necessary for storing and retrieving document embeddings:
- Install and configure Ollama: Host models for embeddings and generation via a REST API.
- Configure Environment: Set up the
.envfile with correct endpoints and model identifiers. - Prepare Documents: Place PDFs in the
Guidesdirectory. - Virtual Environment: Create and activate the virtual environment, then install dependencies.
- Weaviate Setup: Start the Weaviate database with Docker Compose.
- Document Parsing: Run the document parser to embed and store documents.
- Launch Chatbot: Execute
demo_experiment.pyto start the chatbot UI in your browser.