Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video rag #63

Merged
merged 5 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -167,5 +167,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

*.json
.DS_Store
10 changes: 10 additions & 0 deletions examples/docker/video-rag/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.11

COPY app.py app.py
COPY .env .env
COPY requirements.txt requirements.txt
RUN pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cpu
COPY static/ static/
RUN pip install -r requirements.txt

ENTRYPOINT ["solara", "run", "app.py", "--host=0.0.0.0", "--port=80"]
73 changes: 73 additions & 0 deletions examples/docker/video-rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
## RAG from video

This is a full stack project that uses a video as input, populates an ElasticSearch instance with vectors through an indexing pipeline, and initializes a retriver pipeline with a Solara app.

A complete walk through of this application can be found [here](https://ploomber.io/blog/rag-video-app/)

### Requirements

- Python 3.11
- Docker
- OpenAI Key

### Installation

1. Clone the repository
2. Create a virtual environment with with Python 3.11 and activate it
3. Install the requirements through the `requirements.txt` file

### If you are running a local ElasticSearch instance

Use docker-compose to run the ElasticSearch instance

```bash
docker-compose up -d
```

### If you are running a remote ElasticSearch instance

Create a `.env` file with the following variables:

```bash
elastic_search_host=<host>
elastic_username=<username>
elastic_password=<password>
OPENAI=<key>
```

Replace host, username, and password with the corresponding values for your cloud-based instance.

Modify the document store in `video_indexing.py` and `app.py` to use the remote instance.

### Execution of indexing pipeline

```bash
python indexing_pipeline.py
```

### Local execution of retriever pipeline through the Solara app

```bash
cd app/
solara run app.py
```

### Deploy app on Ploomber cloud (assumes cloud-based set up of an ElasticSearch instance)

Generate an API key on [Ploomber Cloud](https://www.platform.ploomber.io/applications) under Account. Paste your API key via the CLI.

```bash
ploomber-cloud key
```

Initialize your deployment environment

```bash
ploomber-cloud init
```

Deploy your app

```bash
ploomber-cloud deploy
```
Empty file.
212 changes: 212 additions & 0 deletions examples/docker/video-rag/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
from pathlib import Path
from dataclasses import dataclass

import solara

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import GPTGenerator
from elasticsearch_haystack.embedding_retriever import ElasticsearchEmbeddingRetriever
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from dotenv import load_dotenv
import os

load_dotenv(".env")
openaikey = os.getenv("OPENAI")
elastic_search_cloud_id = os.getenv("elastic_search_cloud_id")
elastic_search_host = os.getenv("elastic_search_host")
elastic_username = os.getenv("elastic_username")
elastic_password = os.getenv("elastic_password")
#
# Build RAG pipeline
print("Initializing QA pipeline")
prompt_template = """\
Use the following context to answer the user's question in a friendly manner. \
If the context provided doesn't answer the question - \
please respond with: "There is no information in my knowledge base about this".

### CONTEXT
{% for doc in documents %}
{{ doc.content }}
{% endfor %}

### USER QUESTION
{{query}}
"""

#document_store = ElasticsearchDocumentStore(hosts= "http://localhost:9200/")
document_store = ElasticsearchDocumentStore(hosts=elastic_search_host,
basic_auth=(elastic_username, elastic_password))

prompt_builder = PromptBuilder(prompt_template)
############################################
query_embedder = SentenceTransformersTextEmbedder()
retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)
llm = GPTGenerator(api_key=openaikey)

pipeline = Pipeline()
pipeline.add_component(instance=query_embedder, name="query_embedder")
pipeline.add_component(instance=retriever, name="retriever")
pipeline.add_component(instance=prompt_builder, name="prompt_builder")
pipeline.add_component(instance=llm, name="llm")

pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")

###########################################
# Solara app

class State:
input = solara.reactive("")

css = """
.main {
width: 100%;
height: 100%;
max-width: 1200px;
margin: auto;
padding: 1em;
}

#app > div > div:nth-child(2) > div:nth-child(2) {
display: none;
}
"""

chatbox_css = """
.message {
max-width: 450px;
width: 100%;
}

.user-message, .user-message > * {
background-color: #f0f0f0 !important;
}

.assistant-message, .assistant-message > * {
background-color: #9ab2e9 !important;
}

.avatar {
width: 50px;
height: 50px;
border-radius: 50%;
border: 2px solid transparent;
overflow: hidden;
display: flex;
}

.avatar img {
width: 100%;
height: 100%;
object-fit: cover;
}
"""


@dataclass
class Message:
role: str
content: str


def ChatBox(message: Message) -> None:
solara.Style(chatbox_css)

align = "start" if message.role == "assistant" else "end"
with solara.Column(align=align):
with solara.Card(classes=["message", f"{message.role}-message"]):
if message.content:
with solara.Card():
solara.Markdown(message.content)


# Image reference: https://www.flaticon.com/free-icons/bot;
# https://www.flaticon.com/free-icons/use

with solara.HBox(align_items="center"):
image_path = Path(f"static/{message.role}-logo.png")
solara.Image(str(image_path), classes=["avatar"])
solara.Text(message.role.capitalize())

@solara.component
def Chat() -> None:
solara.Style(
"""
.chat-input {
max-width: 800px;
})
"""
)

messages, set_messages = solara.use_state(
[
Message(
role="assistant",
content=f"Welcome. Please post your queries! My knowledge base\
has been curated on a small collection of videos from NASA. \
This collection of videos consist of short clips that talk \
about the topics: Mars Perseverance Rover.\
Sample questions: \n\nWhat is the Mars Perseverance Rover? \
What is the Mars Perseverance Rover mission? \
Tell me about the helicopter on Mars."
)
]
)
input, set_input = solara.use_state("")

def ask_rag(pipeline):
try:
input_text = State.input.value
_messages = messages + [Message(role="user", content=input_text)]
set_input("")
State.input.value = ""
set_messages(_messages)

result = pipeline.run(data={"query_embedder": {"text": input_text}, "prompt_builder": {"query": input_text}})
rag_response = result['llm']['replies'][0]

set_messages(_messages + [Message(role="assistant", content=rag_response)])

except Exception as e:
set_messages(_messages + [Message(role="assistant", content=f"Cannot answer your current question. Please try again {e}")])

with solara.VBox():
for message in messages:
ChatBox(message)

with solara.Row(justify="center"):
with solara.HBox(align_items="center", classes=["chat-input"]):
solara.InputText(label="Query", value=State.input, continuous_update=False)

if State.input.value:
ask_rag(pipeline)

@solara.component
def Page():

with solara.AppBarTitle():
solara.Text("Deepen your understanding of our video collection through a Q&A AI assistant")

with solara.Card(title="About", elevation=6, style="background-color: #f5f5f5;"):
with solara.Row(justify="center"):
solara.Image(image="static/nasa-logo.svg", width="100") # Adjust width and height as needed

solara.Markdown("Ask questions about our curated database of video using advanced AI tools. \n \
This database is curated from the following list of videos: \n \
https://images.nasa.gov/search?q=nasa%20perseverance%20rover&page=1&media=video&yearStart=2023&yearEnd=2024")

solara.Style(css)
with solara.VBox(classes=["main"]):
solara.HTML(
tag="h3", style="margin: auto;", unsafe_innerHTML="Chat with the assistant to answer questions about the video topics"
)

Chat()

@solara.component
def Layout(children):
route, routes = solara.use_route()
return solara.AppLayout(children=children)
Empty file.
Empty file.
15 changes: 15 additions & 0 deletions examples/docker/video-rag/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
services:
elasticsearch:
image: "docker.elastic.co/elasticsearch/elasticsearch:8.11.1"
ports:
- 9200:9200
restart: on-failure
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
healthcheck:
test: curl --fail http://localhost:9200/_cat/health || exit 1
interval: 10s
timeout: 1s
retries: 10
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
8 changes: 8 additions & 0 deletions examples/docker/video-rag/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
haystack-ai==2.0.0b3
elasticsearch-haystack==0.1.2
solara
python-dotenv
sentence-transformers>=2.2.0
moviepy
pydub
ploomber-cloud
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/docker/video-rag/static/user-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading