diff --git a/README.md b/README.md index a156de1..58f5c91 100644 --- a/README.md +++ b/README.md @@ -1,91 +1 @@ -# LLM RAG Project - -## Overview - -This project builds on top of the main LLM Chat app. It provides RAG functionality on top of the standard chat. -It also features a Gradio chatbot interface for the RAG chat. - -This implementation is a baseline vector similarity-based RAG. - -## Setup - -Follow the installation steps described in the main branch, nothing extra is required. - -Retrieval and reranking models are specified in `rag_service/docker-compose.yaml` - don't forget to adjust the ENV at the bottom. -RAG-related parameters along with a prompt template are in `rag_sevice/src/config.py` - -## Parametrization - -The parameters set during the build: - -Retrieval/reranker models, in `rag_service/docker-compose.yaml`: -- `command:` pass a model as `--model-id` -- `environment:` `EMB_MODEL` - -If you need to switch to a gpu set TEI containers accordingly: -- `image:` https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker-images - -Retrieval parameters are in `rag_service/src/config.py` - -Rag-gradio model is in `rag_gradio_service/src/config.py`, `OPENAI_MODEL` - -The default parameters of a RAG request are in `rag_service/src/main.py`, `RAGRequest` - -## Usage - -RAG endpoints use the same authorization scheme as the main chat service. -To utilize RAG start by adding documents to your vector DB using the dedicated endpoint: - -```python -import requests -import os - -docs_path = "/path/to/your/docs/dir" - -for doc in os.listdir(docs_path): - doc_path = os.path.join(docs_path, doc) - if os.path.isfile(doc_path): - text = open(doc_path).read() - resp = requests.post( - "http://0.0.0.0:8001/add_to_rag_db/", - json={"text": text}, - headers={"Authorization": os.getenv("YOUR_TOKEN")} - ) - if resp.status_code != 200: - print(f"error uploading document {doc}") -``` -Documents are expected to be preprocessed (chunked [1](https://towardsdatascience.com/rag-101-chunking-strategies-fdc6f6c2aaec), [2](https://antematter.io/blogs/optimizing-rag-advanced-chunking-techniques-study) etc). - -If you have a lot of documents (>100k) you may benefit from indexing your document table (more on this [here]()). -You can do it as follows: -```commandline -curl -H "Authorization: " -X GET http://0.0.0.0:8001/reindex/ -``` - -On inference, you can send a request with the following parameters: -```commandline - query: str - the text query - chat_id: Optional[str] - chat id to maintain conversation history - model: str = "gpt-4o-mini" - name of an LLM - use_reranker: bool = True - whether to rerank after retreival - top_k_retrieve: int = 20 - num of items to retrieve - top_k_rank: int = 4 - num of items to return after reranking - max_out_tokens: int = 512 - max num of tokens. Used to drop retrieved pieces - of content if total prompt length goes beyond max contex length. -``` -Note you can optionally use a reranker ([3](https://www.pinecone.io/learn/series/rag/rerankers/)). -Both embedding and reranking models are light enough to run on a cpu. - -To access the Gradio interface, open `http://0.0.0.0:7860` in a browser. - -## Note - -Every query submitted via the gradio interface goes through the context retrieval stage, -which means multi-turn dialogue may not work as expected. E.g. for a query "and you?", what may make -sense given the conversation history, irrelevant context will likely be retrieved, which will negatively affect generation quality. - -Note that calls to RAG endpoints count toward the rate limit, as they use the chat endpoint under the hood. - -This project is licensed under the GNU GPL v3 License. See the [LICENSE](LICENSE) file for more information. - -© Nebius BV, 2024 +# This branch is under construction now diff --git a/docker-compose.yaml b/docker-compose.yaml index 2a49850..b08a431 100755 --- a/docker-compose.yaml +++ b/docker-compose.yaml @@ -92,6 +92,26 @@ services: ports: - "7860:7860" + jupyter: + image: jupyter/datascience-notebook + container_name: jupyter_notebook + ports: + - "8888:8888" + volumes: + - ./notebooks:/home/jovyan/work + - .:/home/jovyan/project + environment: + - JUPYTER_ENABLE_LAB=yes + - GRANT_SUDO=yes + user: root + command: start-notebook.sh --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.ip='0.0.0.0' + networks: + - default + volumes: postgres_data: driver: local + +networks: + default: + driver: bridge