Skip to content

Commit

Permalink
video rag example (#63)
Browse files Browse the repository at this point in the history
* rag for video

* upgrade video from rag

* add complete app

* add deployment instructions
  • Loading branch information
lfunderburk authored Jan 16, 2024
1 parent 4788b96 commit 6e23af4
Show file tree
Hide file tree
Showing 15 changed files with 430 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -167,5 +167,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

*.json
.DS_Store
10 changes: 10 additions & 0 deletions examples/docker/video-rag/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.11

COPY app.py app.py
COPY .env .env
COPY requirements.txt requirements.txt
RUN pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cpu
COPY static/ static/
RUN pip install -r requirements.txt

ENTRYPOINT ["solara", "run", "app.py", "--host=0.0.0.0", "--port=80"]
73 changes: 73 additions & 0 deletions examples/docker/video-rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
## RAG from video

This is a full stack project that uses a video as input, populates an ElasticSearch instance with vectors through an indexing pipeline, and initializes a retriver pipeline with a Solara app.

A complete walk through of this application can be found [here](https://ploomber.io/blog/rag-video-app/)

### Requirements

- Python 3.11
- Docker
- OpenAI Key

### Installation

1. Clone the repository
2. Create a virtual environment with with Python 3.11 and activate it
3. Install the requirements through the `requirements.txt` file

### If you are running a local ElasticSearch instance

Use docker-compose to run the ElasticSearch instance

```bash
docker-compose up -d
```

### If you are running a remote ElasticSearch instance

Create a `.env` file with the following variables:

```bash
elastic_search_host=<host>
elastic_username=<username>
elastic_password=<password>
OPENAI=<key>
```

Replace host, username, and password with the corresponding values for your cloud-based instance.

Modify the document store in `video_indexing.py` and `app.py` to use the remote instance.

### Execution of indexing pipeline

```bash
python indexing_pipeline.py
```

### Local execution of retriever pipeline through the Solara app

```bash
cd app/
solara run app.py
```

### Deploy app on Ploomber cloud (assumes cloud-based set up of an ElasticSearch instance)

Generate an API key on [Ploomber Cloud](https://www.platform.ploomber.io/applications) under Account. Paste your API key via the CLI.

```bash
ploomber-cloud key
```

Initialize your deployment environment

```bash
ploomber-cloud init
```

Deploy your app

```bash
ploomber-cloud deploy
```
Empty file.
212 changes: 212 additions & 0 deletions examples/docker/video-rag/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
from pathlib import Path
from dataclasses import dataclass

import solara

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import GPTGenerator
from elasticsearch_haystack.embedding_retriever import ElasticsearchEmbeddingRetriever
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from dotenv import load_dotenv
import os

load_dotenv(".env")
openaikey = os.getenv("OPENAI")
elastic_search_cloud_id = os.getenv("elastic_search_cloud_id")
elastic_search_host = os.getenv("elastic_search_host")
elastic_username = os.getenv("elastic_username")
elastic_password = os.getenv("elastic_password")
#
# Build RAG pipeline
print("Initializing QA pipeline")
prompt_template = """\
Use the following context to answer the user's question in a friendly manner. \
If the context provided doesn't answer the question - \
please respond with: "There is no information in my knowledge base about this".
### CONTEXT
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
### USER QUESTION
{{query}}
"""

#document_store = ElasticsearchDocumentStore(hosts= "http://localhost:9200/")
document_store = ElasticsearchDocumentStore(hosts=elastic_search_host,
basic_auth=(elastic_username, elastic_password))

prompt_builder = PromptBuilder(prompt_template)
############################################
query_embedder = SentenceTransformersTextEmbedder()
retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)
llm = GPTGenerator(api_key=openaikey)

pipeline = Pipeline()
pipeline.add_component(instance=query_embedder, name="query_embedder")
pipeline.add_component(instance=retriever, name="retriever")
pipeline.add_component(instance=prompt_builder, name="prompt_builder")
pipeline.add_component(instance=llm, name="llm")

pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")

###########################################
# Solara app

class State:
input = solara.reactive("")

css = """
.main {
width: 100%;
height: 100%;
max-width: 1200px;
margin: auto;
padding: 1em;
}
#app > div > div:nth-child(2) > div:nth-child(2) {
display: none;
}
"""

chatbox_css = """
.message {
max-width: 450px;
width: 100%;
}
.user-message, .user-message > * {
background-color: #f0f0f0 !important;
}
.assistant-message, .assistant-message > * {
background-color: #9ab2e9 !important;
}
.avatar {
width: 50px;
height: 50px;
border-radius: 50%;
border: 2px solid transparent;
overflow: hidden;
display: flex;
}
.avatar img {
width: 100%;
height: 100%;
object-fit: cover;
}
"""


@dataclass
class Message:
role: str
content: str


def ChatBox(message: Message) -> None:
solara.Style(chatbox_css)

align = "start" if message.role == "assistant" else "end"
with solara.Column(align=align):
with solara.Card(classes=["message", f"{message.role}-message"]):
if message.content:
with solara.Card():
solara.Markdown(message.content)


# Image reference: https://www.flaticon.com/free-icons/bot;
# https://www.flaticon.com/free-icons/use

with solara.HBox(align_items="center"):
image_path = Path(f"static/{message.role}-logo.png")
solara.Image(str(image_path), classes=["avatar"])
solara.Text(message.role.capitalize())

@solara.component
def Chat() -> None:
solara.Style(
"""
.chat-input {
max-width: 800px;
})
"""
)

messages, set_messages = solara.use_state(
[
Message(
role="assistant",
content=f"Welcome. Please post your queries! My knowledge base\
has been curated on a small collection of videos from NASA. \
This collection of videos consist of short clips that talk \
about the topics: Mars Perseverance Rover.\
Sample questions: \n\nWhat is the Mars Perseverance Rover? \
What is the Mars Perseverance Rover mission? \
Tell me about the helicopter on Mars."
)
]
)
input, set_input = solara.use_state("")

def ask_rag(pipeline):
try:
input_text = State.input.value
_messages = messages + [Message(role="user", content=input_text)]
set_input("")
State.input.value = ""
set_messages(_messages)

result = pipeline.run(data={"query_embedder": {"text": input_text}, "prompt_builder": {"query": input_text}})
rag_response = result['llm']['replies'][0]

set_messages(_messages + [Message(role="assistant", content=rag_response)])

except Exception as e:
set_messages(_messages + [Message(role="assistant", content=f"Cannot answer your current question. Please try again {e}")])

with solara.VBox():
for message in messages:
ChatBox(message)

with solara.Row(justify="center"):
with solara.HBox(align_items="center", classes=["chat-input"]):
solara.InputText(label="Query", value=State.input, continuous_update=False)

if State.input.value:
ask_rag(pipeline)

@solara.component
def Page():

with solara.AppBarTitle():
solara.Text("Deepen your understanding of our video collection through a Q&A AI assistant")

with solara.Card(title="About", elevation=6, style="background-color: #f5f5f5;"):
with solara.Row(justify="center"):
solara.Image(image="static/nasa-logo.svg", width="100") # Adjust width and height as needed

solara.Markdown("Ask questions about our curated database of video using advanced AI tools. \n \
This database is curated from the following list of videos: \n \
https://images.nasa.gov/search?q=nasa%20perseverance%20rover&page=1&media=video&yearStart=2023&yearEnd=2024")

solara.Style(css)
with solara.VBox(classes=["main"]):
solara.HTML(
tag="h3", style="margin: auto;", unsafe_innerHTML="Chat with the assistant to answer questions about the video topics"
)

Chat()

@solara.component
def Layout(children):
route, routes = solara.use_route()
return solara.AppLayout(children=children)
Empty file.
Empty file.
15 changes: 15 additions & 0 deletions examples/docker/video-rag/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
services:
elasticsearch:
image: "docker.elastic.co/elasticsearch/elasticsearch:8.11.1"
ports:
- 9200:9200
restart: on-failure
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
healthcheck:
test: curl --fail http://localhost:9200/_cat/health || exit 1
interval: 10s
timeout: 1s
retries: 10
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
8 changes: 8 additions & 0 deletions examples/docker/video-rag/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
haystack-ai==2.0.0b3
elasticsearch-haystack==0.1.2
solara
python-dotenv
sentence-transformers>=2.2.0
moviepy
pydub
ploomber-cloud
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/docker/video-rag/static/user-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading

0 comments on commit 6e23af4

Please sign in to comment.