Verba

🐕 The Golden RAGtriever - with Leanovate extensions

Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally or through LLM providers such as OpenAI, Cohere, and HuggingFace.

pip install goldenverba

Verba
✨ Getting Started with Verba
🐍 Installing Python and Setting Up a Virtual Environment
- Installing Python
- Setting Up a Virtual Environment
🛠️ Quickstart: Build from Source with Leanovate extension
🔑 API Keys
- Weaviate
- OpenAI
- Cohere
- HuggingFace
  - Llama2
- Unstructured
- Github
- Confluence
🐳 Quickstart: Deploy with Docker
- Large Language Model (LLM) Costs
💾 Importing Your Data into Verba
🛠️ Project Architecture
💖 Open Source Contribution

🎯 What Is Verba?

Verba is more than just a tool—it's a personal assistant for querying and interacting with your data, either locally or deployed via cloud. Have questions about your documents? Need to cross-reference multiple data points? Want to gain insights from your existing knowledge base? Verba empowers you with the combined capabilities of Weaviate's context-aware database and the analytical power of Large Language Models (LLMs). Interact with your data through an intuitive chat interface that refines search results by using the ongoing conversation context to deliver even more accurate and relevant information.

⚙️ Under the Hood

Verba is engineered with Weaviate's cutting-edge Generative Search technology at its core, extracting relevant context from your pool of documents to resolve queries with precision. By utilizing the power of Large Language Models, Verba doesn't just search for answers—it understands and provides responses that are contextually rich and informed by the content of your documents, all through an intuitive user interface designed for simplicity and efficiency.

💡 Effortless Data Import with Weaviate

Verba offers seamless data import functionality through its frontend, supporting a diverse range of file types including .txt, .md, .pdf and more. Before feeding your data into Weaviate, Verba handles chunking and vectorization to optimize it for search and retrieval. Together with collaborative partners we support popular libraries such as HuggingFace, Haystack, Unstructured and many more!

💥 Advanced Query Resolution with Hybrid Search

Experience the hybrid search capabilities of Weaviate within Verba, which merges vector and lexical search methodologies for even greater precision. This dual approach not only navigates through your documents to pinpoint exact matches but also understands the nuance of context, enabling the Large Language Models to craft responses that are both comprehensive and contextually aware. It's an advanced technique that redefines document retrieval, providing you with precisely what you need, when you need it.

🔥 Accelerate Queries with Semantic Cache

Verba enhances search efficiency with Weaviate's Semantic Cache, a sophisticated system that retains the essence of your queries, results, and dialogues. This proactive feature means that Verba anticipates your needs, using cached data to expedite future inquiries. With semantic matching, it quickly determines if your question has been asked before, delivering instant results, and even suggests auto-completions based on historical interactions, streamlining your search experience to be faster and more intuitive.

✨ Getting Started with Verba

Starting your Verba journey is super easy, with multiple deployment options tailored to your preferences. Follow these simple steps to get Verba up and running:

Deploy with pip (Quickstart)

pip install goldenverba

Build from Source (Quickstart)

git clone https://github.com/weaviate/Verba

pip install -e .

🐍 Installing Python and Setting Up a Virtual Environment

Before you can use Verba, you'll need to ensure that Python >=3.9.0 is installed on your system and that you can create a virtual environment for a safer and cleaner project setup.

Installing Python

Python is required to run Verba. If you don't have Python installed, follow these steps:

For Windows:

Download the latest Python installer from the official Python website. Run the installer and make sure to check the box that says Add Python to PATH during installation.

For macOS:

You can install Python using Homebrew, a package manager for macOS, with the following command in the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install Python:

brew install python

For Linux:

Python usually comes pre-installed on most Linux distributions. If it's not, you can install it using your distribution's package manager. You can read more about it here

Setting Up a Virtual Environment

It's recommended to use a virtual environment to avoid conflicts with other projects or system-wide Python packages.

Install the virtualenv package:

First, ensure you have pip installed (it comes with Python if you're using version 3.4 and above). Install virtualenv by running:

pip install virtualenv

Create a Virtual Environment:

Navigate to your project's directory in the terminal. Run the following command to create a virtual environment named venv (you can name it anything you like):

python3 -m virtualenv venv

Activate the Virtual Environment:

On Windows, activate the virtual environment by running:

venv\Scripts\activate.bat

On macOS and Linux, activate it with:

source venv/bin/activate

Once your virtual environment is activated, you'll see its name in the terminal prompt. Now you're ready to install Verba using the steps provided in the Quickstart sections.

Remember to deactivate the virtual environment when you're done working with Verba by simply running deactivate in the terminal.

Linting

We use ruff for automatic code formation and linting. The process is automated with a pre-commit hook. To install the hook, run:

pre-commit install

or for shorthand:

make pre-commit

After that all your commits will be automatically linted and formatted. The linting will happen only on the files you changed.

make pre-commit formats all files in the repository and install the hooks if needed.

🛠️ Quickstart: Build from Source with leanovate extensions

Initialize a new Python Environment

python3 -m virtualenv venv

2.Install Verba

pip install -e .[dev,confluence,unstructured]

3.Create .env file and add environment variables

cp goldenverba/.env-example to goldenverba/.env
xdg-edit goldenveba/.env

To use the LEANOVATE extensions add the CONFLUENCE and COHERE (or diffrent LLM) api keys.

4.Launch Verba

verba start

5.Access Verba

Visit localhost:8000

6.Add documents

Add confluence space pages from command line:

verba load --embedder CohereEmbedder --units 60 --overlap 10 --reader ConfluenceReader --path AL

Add pdf documents from command line:

verba load --embedder CohereEmbedder --units 60 --overlap 10 --reader PDFReader --path ./data/leanovate/KanbanAndScrum-German.pdf

🔑 API Keys

Before diving into Verba's capabilities, you'll need to configure access to various components depending on your chosen technologies, such as OpenAI, Cohere, and HuggingFace. Start by obtaining the necessary API keys and setting them up through a .env file based on our provided example , or by declaring them as environment variables on your system. If you're building from source or using Docker, make sure your .env file is within the goldenverba directory.

Below is a comprehensive list of the API keys and variables you may require:

Weaviate

Verba provides flexibility in connecting to Weaviate instances based on your needs. By default, Verba opts for Weaviate Embedded if it doesn't detect the WEAVIATE_URL_VERBA and WEAVIATE_API_KEY_VERBA environment variables. This local deployment is the most straightforward way to launch your Weaviate database for prototyping and testing.

However, you have other compelling options to consider:

🌩️ Weaviate Cloud Service (WCS)

If you prefer a cloud-based solution, Weaviate Cloud Service (WCS) offers a scalable, managed environment. Learn how to set up a cloud cluster and get the API keys by following the Weaviate Cluster Setup Guide.

🐳 Docker Deployment Another robust local alternative is deploying Weaviate using Docker. For more details, consult the Weaviate Docker Guide.

WEAVIATE_URL_VERBA=URL-TO-YOUR-WEAVIATE-CLUSTER

WEAVIATE_API_KEY_VERBA=API-KEY-OF-YOUR-WEAVIATE-CLUSTER

OpenAI

Verba supports OpenAI Models such as Ada, GPT3, and GPT4. To use them, you need to specify the OPENAI_API_KEY environment variable. You can get it from OpenAI

OPENAI_API_KEY=YOUR-OPENAI-KEY

You can also add a OPENAI_BASE_URL to use proxies such as LiteLLM (https://github.com/BerriAI/litellm)

OPENAI_BASE_URL=YOUR-OPENAI_BASE_URL

Azure OpenAI

To use Azure OpenAI, you need to set

The API type:

OPENAI_API_TYPE="azure"

The key and the endpoint:

OPENAI_API_KEY=<YOUR_KEY>
OPENAI_BASE_URL=http://XXX.openai.azure.com

Azure OpenAI ressource name, which is XXX if your endpoint is XXX.openai.azure.com

AZURE_OPENAI_RESOURCE_NAME=<YOUR_AZURE_RESOURCE_NAME>

You need to set the models, for the embeddings and for the query.

AZURE_OPENAI_EMBEDDING_MODEL="text-embedding-ada-002"
OPENAI_MODEL="gpt-4"

Finally, as Azure is using per-minute quota, you might need to add a waiting time between each chunk upload. For example, if you have a limit of 240k tokens per minute, if your chunks are 400 tokens max, then 100ms between queries should be fine. If you get error 429 from weaviate, then increase this value.

WAIT_TIME_BETWEEN_INGESTION_QUERIES_MS="100"

Cohere

Verba supports Cohere Models, to use them, you need to specify the COHERE_API_KEY environment variable. You can get it from Cohere

COHERE_API_KEY=YOUR-COHERE-KEY

HuggingFace

Verba supports HuggingFace models, such as SentenceTransformers and Llama2. To use them you need the HF_TOKEN environment variable. You can get it from HuggingFace

HF_TOKEN=YOUR-HUGGINGFACE-TOKEN

Llama2

To use the Llama2 model from Meta, you first need to request access to it. Read more about accessing the Llama model here. To enable the LLama2 model for Verba use:

LLAMA2-7B-CHAT-HF=True

Unstructured

Verba supports importing documents through Unstructured (e.g .pdf). To use them you need the UNSTRUCTURED_API_KEY environment variable. You can get it from Unstructured

UNSTRUCTURED_API_KEY=YOUR-UNSTRUCTURED-KEY
UNSTRUCTURED_API_URL=YOUR-SELF-HOSTED-INSTANCE # If you are self hosting, in the form of `http://localhost:8000/general/v0/general`

Github

If you want to use the Github Reader, you need the GITHUB_TOKEN environment variable. You can get it from GitHub

GITHUB_TOKEN=YOUR-GITHUB-TOKEN

Confluence

If you want to use the ConfluenceReaders, you need the follwoing keys:

CONFLUENCE_API_KEY=XXXX
[email protected]
CONFLUENCE_URL=https://leanovate.atlassian.net

Status Page

Once configured, you can monitor your Verba installation's health and status via the 'Status Verba' page. This dashboard provides insights into your deployment type, libraries, environment settings, Weaviate schema counts, and more. It's also your go-to for maintenance tasks like resetting Verba, clearing the cache, or managing auto-complete suggestions.

🐳 Quickstart: Deploy with Docker

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries, and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating system kernel and are thus more lightweight than virtual machines. Docker provides an additional layer of abstraction and automation of operating-system-level virtualization on Windows and Linux.

Docker's use of containers to package software means that the application and its dependencies, libraries, and other binaries are packaged together and can be moved between environments easily. This makes it incredibly useful for developers looking to create predictable environments that are isolated from other applications.

To get started with deploying Verba using Docker, follow the steps below. If you need more detailed instructions on Docker usage, check out the Docker Curriculum.

If you're unfamiliar with Docker, you can learn more about it here.

Clone the Verba repos Ensure you have Git installed on your system. Then, open a terminal or command prompt and run the following command to clone the Verba repository:

git clone https://github.com/weaviate/Verba.git

Deploy using Docker With Docker installed and the Verba repository cloned, navigate to the directory containing the Docker Compose file in your terminal or command prompt. Run the following command to start the Verba application in detached mode, which allows it to run in the background:

docker compose up -d

This command will download the necessary Docker images, create containers, and start Verba. Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.

💾 Importing Your Data into Verba

With Verba configured, you're ready to import your data and start exploring. Follow these simple steps to get your data into Verba:

Initiate the Import Process
- Click on "Add Documents" to begin.
Select Your Data Processing Tools
- At the top, you'll find three tabs labeled Reader, Chunker, and Embedder, each offering different options for handling your data.
Choose a Reader
- The Reader is responsible for importing your data. Select from the available options:
  - SimpleReader: For importing .txt and .md files.
  - GitHubReader: For loading data directly from a GitHub repository by specifying the path (owner/repo/folder_path).
  - PDFReader: For importing .pdf files.
Select a Chunker
- Chunkers break down your data into manageable pieces. Choose a suitable chunker:
  - WordChunker: Chunks the text by words.
  - SentenceChunker: Chunks the text by sentences.
Pick an Embedder
- Embedders are crucial for integrating your data into Weaviate. Select one based on your preference:
  - AdaEmbedder: Utilizes OpenAI's ADA model for embedding.
  - MiniLMEmbedder: Employs Sentence Transformers for embedding.
  - CohereEmbedder: Uses Cohere for embedding.
Commence Data Ingestion
- After setting up your preferences, click on "Import" to ingest your data into Verba.

Now your data is ready to be used within Verba, enabling you to leverage its powerful search and retrieval capabilities.

💰 Large Language Model (LLM) Costs

Verba utilizes LLM models through APIs. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding and answer generation processes.

💖 Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Before contributing, please read the Contribution Guide. Visit our Weaviate Community Forum if you need any help!

🛠️ Project Architecture

You can learn more about Verba's architecture and implementation in its technical documentation and frontend documentation. It's recommended to read them before making any contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
data		data
frontend		frontend
goldenverba		goldenverba
img		img
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
FRONTEND.md		FRONTEND.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
TECHNICAL.md		TECHNICAL.md
docker-compose.yml		docker-compose.yml
pypi_commands.sh		pypi_commands.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

leanovate/ai_playground_rag_verba

Folders and files

Latest commit

History

Repository files navigation