Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions comps/dataprep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,7 @@ For details, please refer to this [readme](src/README_neo4j_llamaindex.md)
## Dataprep Microservice for financial domain data

For details, please refer to this [readme](src/README_finance.md)

## Dataprep Microservice with MariaDB Vector

For details, please refer to this [readme](src/README_mariadb.md)
24 changes: 24 additions & 0 deletions comps/dataprep/deployment/docker_compose/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ include:
- ../../../third_parties/tei/deployment/docker_compose/compose.yaml
- ../../../third_parties/vllm/deployment/docker_compose/compose.yaml
- ../../../third_parties/arangodb/deployment/docker_compose/compose.yaml
- ../../../third_parties/mariadb/deployment/docker_compose/compose.yaml

services:

Expand Down Expand Up @@ -414,6 +415,29 @@ services:
retries: 10
restart: unless-stopped

dataprep-mariadb-vector:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-mariadb-vector
ports:
- "${DATAPREP_PORT:-5000}:5000"
depends_on:
mariadb-server:
condition: service_healthy
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
DATAPREP_COMPONENT_NAME: "OPEA_DATAPREP_MARIADBVECTOR"
MARIADB_CONNECTION_URL: ${MARIADB_CONNECTION_URL:-mariadb+mariadbconnector://dbuser:password@mariadb-server:3306/vectordb}
LOGFLAG: ${LOGFLAG}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 10
restart: unless-stopped

networks:
default:
driver: bridge
Expand Down
1 change: 1 addition & 0 deletions comps/dataprep/src/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missin
libcairo2 \
libgl1-mesa-glx \
libjemalloc-dev \
libmariadb-dev \
libpq-dev \
libreoffice \
poppler-utils \
Expand Down
100 changes: 100 additions & 0 deletions comps/dataprep/src/README_mariadb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Dataprep Microservice with MariaDB Vector

## 🚀1. Start Microservice with Docker

### 1.1 Build Docker Image

```bash
cd GenAIComps
docker build -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
```

### 1.2 Run Docker with CLI (Option A)

#### 1.2.1 Start MariaDB Server

Please refer to this [readme](../../third_parties/mariadb/src/README.md).

#### 1.2.2 Start the data preparation service

```bash

export HOST_IP=$(hostname -I | awk '{print $1}')
# If you've configured the server with the default env values then:
export MARIADB_CONNECTION_URL: mariadb+mariadbconnector://dbuser:password@${HOST_IP}$:3306/vectordb

docker run -d --rm --name="dataprep-mariadb-vector" -p 5000:5000 --ipc=host -e MARIADB_CONNECTION_URL=$MARIADB_CONNECTION_URL -e DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_MARIADBVECTOR" opea/dataprep:latest
```

### 1.3 Run with Docker Compose (Option B)

```bash
cd comps/dataprep/deployment/docker_compose
docker compose -f compose.yaml up dataprep-mariadb-vector -d
```

## 🚀2. Consume Microservice

### 2.1 Consume Upload API

Once the data preparation microservice for MariaDB Vector is started, one can use the below command to invoke the microservice to convert documents/links to embeddings and save them to the vector store.

```bash
export document="/path/to/document"
curl -X POST \
-H "Content-Type: application/json" \
-d '{"path":"${document}"}' \
http://localhost:6007/v1/dataprep/ingest
```

### 2.2 Consume get API

To get the structure of the uploaded files, use the `get` API endpoint:

```bash
curl -X POST \
-H "Content-Type: application/json" \
http://localhost:6007/v1/dataprep/get
```

A JSON formatted response similar to the one below will follow:

```json
[
{
"name": "uploaded_file_1.txt",
"id": "uploaded_file_1.txt",
"type": "File",
"parent": ""
},
{
"name": "uploaded_file_2.txt",
"id": "uploaded_file_2.txt",
"type": "File",
"parent": ""
}
]
```

### 2.3 Consume delete API

To delete uploaded files/links, use the `delete` API endpoint.

The `file_path` is the `id` returned by the `/v1/dataprep/get` API.

```bash
# delete link
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "https://www.ces.tech/.txt"}'

# delete file
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "uploaded_file_1.txt"}'

# delete all files and links
curl -X POST "http://${HOST_IP}:5000/v1/dataprep/delete"
-H "Content-Type: application/json" \
-d '{"file_path": "all"}'
```
Loading
Loading