RAG Content provides a shared codebase for generating vector databases. It serves as the core framework for Lightspeed-related projects (e.g., OpenShift Lightspeed, OpenStack Lightspeed, etc.) to generate their own vector databases that can be used for RAG.
The lightspeed_rag_content library is not available via pip, but it's included:
- in the base container image or
- it can be installed via UV.
To install the library via uv, do:
-
Run the command
uv syncuv sync
-
Test if the library can be imported (expect
lightspeed-rag-contentin the output):uv run python -c "import lightspeed_rag_content; print(lightspeed_rag_content.__name__)"
The base container image can be manually generated or pulled from a container registry.
There are prebuilt two images. One with CPU support only (size cca 3.7 GB) and image with GPU support with CUDA support (size cca 12 GB).
-
Pull the CPU variant:
podman pull quay.io/lightspeed-core/rag-content-cpu:latest
-
Pull the GPU variant:
podman pull quay.io/lightspeed-core/rag-content-gpu:latest
To build the image locally, follow these steps:
-
Install the requirements:
makeandpodman. -
Generate the container image:
podman build -t localhost/lightspeed-rag-content-cpu:latest . -
The
lightspeed_rag_contentand its dependencies will be installed in the image (expectlightspeed-rag-contentin the output):podman run localhost/lightspeed-rag-content-cpu:latest python -c "import lightspeed_rag_content; print(lightspeed_rag_content.__name__)"
You can generate the vector database either using
- Llama-Index Faiss Vector Store
- Llama-Index Postgres (PGVector) Vector Store
- Llama-Stack Faiss Vector-IO
- Llama-Stack SQLite-vec Vector-IO
- Llama-Stack Postgres (PGVector) Vector Store
Llama-Index approaches require you to download the embedding model, and we also recommend it for Llama-Stack targets even though it should work even without manually downloading the model, model-download.
All cases require you to prepare documentation in text format that is going to be chunked and map to embeddings generated using the model:
-
Download the embedding model (sentence-transformers/all-mpnet-base-v2) from HuggingFace as follows:
mkdir ./embeddings_model pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
-
Prepare dummy documentation:
mkdir -p ./custom_docs/0.1 echo "Vector Database is an efficient way how to provide information to LLM" > ./custom_docs/0.1/info.txt
-
Prepare a custom script (
./custom_processor.py) for populating the vector database. We provide an example of how such a script might look like using thelightspeed_rag_contentlibrary. Note that in your case the script will be different:from lightspeed_rag_content.metadata_processor import MetadataProcessor from lightspeed_rag_content.document_processor import DocumentProcessor from lightspeed_rag_content import utils class CustomMetadataProcessor(MetadataProcessor): def __init__(self, url): self.url = url def url_function(self, file_path: str) -> str: # Return a URL for the file, so it can be referenced when used # in an answer return self.url if __name__ == "__main__": parser = utils.get_common_arg_parser() args = parser.parse_args() # Instantiate custom Metadata Processor metadata_processor = CustomMetadataProcessor("https://www.redhat.com") # Instantiate Document Processor document_processor = DocumentProcessor( chunk_size=args.chunk, chunk_overlap=args.overlap, model_name=args.model_name, embeddings_model_dir=args.model_dir, num_workers=args.workers, vector_store_type=args.vector_store_type, ) # Load and embed the documents, this method can be called multiple times # for different sets of documents document_processor.process(args.folder, metadata=metadata_processor) # Save the new vector database to the output directory document_processor.save(args.index, args.output)
Generate the documentation using the script from the previous section (Generating the Vector Database):
uv run ./custom_processor.py -o ./vector_db/custom_docs/0.1 -f ./custom_docs/0.1/ -md embeddings_model/ -mn sentence-transformers/all-mpnet-base-v2 -i custom_docs-0_1Once the command is done, you can find the vector database at ./vector_db, the
embedding model at ./embeddings_model, and the Index ID set to custom-docs-0_1.
To generate a vector database stored in Postgres (PGVector), run the following commands:
-
Start Postgres with the pgvector extension by running:
make start-postgres-debug
The
datafolder of Postgres is created at./postgresql/data. This command also creates./outputfor the output directory, in which the metadata is saved. -
Run:
POSTGRES_USER=postgres \ POSTGRES_PASSWORD=somesecret \ POSTGRES_HOST=localhost \ POSTGRES_PORT=15432 \ POSTGRES_DATABASE=postgres \ uv run python ./custom_processor.py \ -o ./output \ -f custom_docs/0.1/ \ -md embeddings_model/ \ -mn sentence-transformers/all-mpnet-base-v2 \ -i custom_docs-0_1 \ --vector-store-type postgres
Which generates embeddings on PostgreSQL, which can be used for RAG, and
metadata.jsonin./output. Generated embeddings are stored in thedata_table_nametable.$ podman exec -it pgvector bash $ psql -U postgres psql (16.4 (Debian 16.4-1.pgdg120+2)) Type "help" for help. postgres=# \dt List of relations Schema | Name | Type | Owner --------+------------------------+-------+---------- public | data_table_name | table | postgres (1 row)
When using Llama-Stack vector stores (Faiss or SQLite-vec), the embedding model path
specified via the -md (or --model-dir) parameter is written into the generated
llama-stack.yaml configuration file as an absolute path. This path is also registered
in the llama-stack kv_store database.
When llama-stack later consumes the vector database, it reads the embedding model location from the kv_store. Therefore, the embedding model must be available at the exact same path that was specified during database creation.
Recommendation:
- Use absolute paths for the
-mdparameter to avoid ambiguity (e.g.,-md /app/embeddingsinstead of-md embeddings_model). - Alternatively, set
-md ''(empty string) and use only the-mnflag with a HuggingFace model ID (e.g.,-md "" -mn sentence-transformers/all-mpnet-base-v2). Setting-mdto empty forces the tool to use the HuggingFace model ID instead of checking for a local directory. This allows llama-stack to download the model from HuggingFace automatically, making the vector database fully portable without path dependencies.
Important
When using the --auto-chunking flag, chunking happens within llama-stack using the
OpenAI-compatible Files API. This makes vector stores significantly larger than manual
chunking because the Files API stores a redundant copy of the embeddings.
Manual chunking results in smaller database files.
The process is basically the same as in the
Llama-Index Faiss Vector Store but passing the
--vector-store-type parameter; so you generate the documentation using the
custom_processor.py script from earlier section
(Generating the Vector Database):
pdm run ./custom_processor.py \
-o ./vector_db/custom_docs/0.1 \
-f ./custom_docs/0.1/ \
-md embeddings_model/ \
-mn sentence-transformers/all-mpnet-base-v2 \
-i custom_docs-0_1 \
--vector-store-type=llamastack-faissOnce the command is done, you can find the vector database (embedded with the registry metadata) at
./vector_db/custom_docs/0.1 with the name faiss_store.db as well as a
barebones llama-stack configuration file named llama-stack.yaml for
reference, since it's not necessary for the final deployment.
The vector-io will be named custom-docs-0_1:
providers:
vector_io:
- provider_id: custom-docs-0_1
provider_type: inline::faiss
config:
kvstore:
type: sqlite
namespace: null
db_path: /home/<user>/rag-content/vector_db/custom_docs/0.1/faiss_store.dbOnce we have a database we can use script query_rag.py to check some results:
python scripts/query_rag.py \
-p vector_db/custom_docs/0.1 \
-x custom-docs-0_1 \
-m embeddings_model \
-k 5 \
-q "how can I configure a cinder backend"The process is the same as in the
Llama-Stack Faiss but passing a different value on the
--vector-store-type parameter; so you generate the documentation using the
custom_processor.py script from earlier section
(Generating the Vector Database):
pdm run ./custom_processor.py \
-o ./vector_db/custom_docs/0.1 \
-f ./custom_docs/0.1/ \
-md embeddings_model/ \
-mn sentence-transformers/all-mpnet-base-v2 \
-i custom_docs-0_1 \
--vector-store-type=llamastack-sqlite-vecOnce the command is done, you can find the vector database at
./vector_db/custom_docs/0.1 with the name sqlitevec_store.db as well as a
barebones llama-stack configuration file named llama-stack.yaml for
reference, since it's not necessary for the final deployment.
The vector-io will be named custom-docs-0_1:
providers:
vector_io:
- provider_id: custom-docs-0_1
provider_type: inline::sqlite-vec
config:
db_path: /home/<user>/rag-content/vector_db/custom_docs/0.1/sqlitevec_store.dbOnce we have a database we can use script query_rag.py to check some results:
python scripts/query_rag.py \
-p vector_db/custom_docs/0.1 \
-x custom-docs-0_1 \
-m embeddings_model \
-k 5 \
-q "how can I configure a cinder backend"To generate a vector database stored in Postgres (PGVector) for Llama-Stack, run the following commands:
-
Start Postgres with the pgvector extension by running:
make start-postgres-debug
The
datafolder of Postgres is created at./postgresql/data. Note that this command also creates./output, which is not used for the Llama-Stack version while it is used for Llama-Index version. -
Run:
POSTGRES_USER=postgres \ POSTGRES_PASSWORD=somesecret \ POSTGRES_HOST=localhost \ POSTGRES_PORT=15432 \ POSTGRES_DATABASE=postgres \ uv run python ./custom_processor.py \ -o ./output \ -f custom_docs/0.1/ \ -md embeddings_model/ \ -mn sentence-transformers/all-mpnet-base-v2 \ -i custom_docs-0_1 \ --vector-store-type llamastack-pgvector
Which generates embeddings on PostgreSQL, which can be used for RAG.
-
When you run
query_rag.pyto check some results, specify these environment variables for database access:POSTGRES_USER=postgres \ POSTGRES_PASSWORD=somesecret \ POSTGRES_HOST=localhost \ POSTGRES_PORT=15432 \ POSTGRES_DATABASE=postgres \ uv run python scripts/query_rag.py \ -p vector_db/custom_docs/0.1 \ -x custom-docs-0_1 \ -m embeddings_model \ -k 5 \ -q "how can I configure a cinder backend"
The lock file is used in this repository:
uv.lock
The lock file needs to be regenerated when new updates (dependencies) are available. Use following commands in order to do it:
uv lock --upgrade
uv sync
Konflux builds run in hermetic mode (air-gapped from the internet), so all dependencies must be prefetched and locked. When you add or update dependencies, you need to regenerate the lock files.
Update these files when you:
- Add/remove/update Python packages in the project
- Add/remove/update RPM packages in the Containerfile
- Change the base image version
Quick command:
make konflux-requirementsThis compiles Python dependencies from pyproject.toml using uv, splits packages by their source index (PyPI vs Red Hat's internal registry), and generates hermetic requirements files with pinned versions and hashes for Konflux builds.
Files produced:
requirements.hashes.source.txt– PyPI packages with hashesrequirements.hashes.wheel.txt– Red Hat registry packages with hashesrequirements.hashes.wheel.pypi.txt- PyPI wheels packages with hashesrequirements-build.txt– Build-time dependencies for source packages
The script also updates the Tekton pipeline configurations (.tekton/lightspeed-stack-*.yaml) with the list of pre-built wheel packages.
Prerequisites:
- Install rpm-lockfile-prototype
- Have an active RHEL Subscription, get activation keys from RH console
- Have
dnfinstalled in system
Steps:
-
List your RPM packages in
rpms.in.yamlunder thepackagesfield -
If you changed the base image, extract its repo file:
# UBI images
podman run -it $BASE_IMAGE cat /etc/yum.repos.d/ubi.repo > ubi.repo
# RHEL images, the current base image.
podman run -it $BASE_IMAGE cat /etc/yum.repos.d/redhat.repo > redhat.repoIf the repo file contains too many entries, we can filter them and keep only required repositories. Here is the command to check active repositories:
dnf repolistReplace the architecture tag (uname -m) to $basearch so that rpm-lockfile-prototype can replace it with requested architecture names.
sed -i "s/$(uname -m)/\$basearch/g" redhat.repo- Generate the lock file:
make konflux-rpm-lockThis creates rpms.lock.yaml with pinned RPM versions.