- Secured Account Management: OpenID Connect-based account management (signup, login, recovery, API security) powered by Amazon Cognito.
- NoSQL Database: Powered by Amazon DynamoDB.
- CDN File Hosting: Utilizes AWS S3 with secure, pre-signed, time-limited URLs for accessing documents.
- Backend: Developed with the Java Spring Boot framework and uses session token-based authentication.
- Frontend: Responsive interface built using React, TailwindCSS, and React Markdown. Features include document upload, view, and selection.
- Real-time Chat: WebSocket-based chat powered by STOMP.js, integrated with a large language model.
- AI Service: Built using the FastAPI web framework, leveraging LlamaIndex for interaction with Ollama server and Redis-based vector storage for RAG.
- Deployment: The entire stack can be efficiently deployed and self-hosted using Docker Compose.
-
Create a
.env
file in the project root containing access keys and secrets for AWS services, the LLM, and embedding models. A sample file (.env.sample
) is provided for reference. -
Run the following command:
docker compose up
Note: For Nvidia or AMD GPU users, follow the official Ollama instructions to install necessary drivers and update the
docker-compose.yaml
file for improved inference speed during chats.
Local development is recommended, especially for utilizing the Ollama desktop app, which supports Apple Silicon GPUs.
- Copy the
.env
file to the root directories of the services (ollama
,rag
,web
). Retain a copy in the project root for thekhojapp
service. - Run Redis Stack Server and Ollama locally. Ensure that the required LLM and embedding models are pulled into Ollama.
- Conversation History: Currently, the user interface does not store earlier conversations with the LLM. Implementing this feature will allow users to resume old conversations with retained context and selected documents.
- Docker GPU Support: Add a
docker-compose-gpu.yaml
file for faster inference using GPUs.