Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions crates/qwen2-5-05b-instruct/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# KServe LLM: Qwen2.5-0.5B-Instruct (vLLM CPU)

This example deploys an LLM service on OSCAR using KServe,
vLLM on CPU, and an OCI modelcar image that contains the
`Qwen/Qwen2.5-0.5B-Instruct` model.

## Example files

| File | Description |
|---|---|
| `fdl.yaml` | OSCAR service definition with a KServe `llm_inference` block. |
| `docker/Dockerfile.vllm` | vLLM CPU runtime wrapper with user `uid=1010` for KServe modelcar compatibility. |
| `docker/Dockerfile.model` | Modelcar image that downloads the model from Hugging Face. |

## Requirements

- OSCAR cluster with KServe enabled.
- `oscar-cli` configured against your cluster.

## 1. Deploy the service

```bash
oscar-cli apply fdl.yaml
```

Verify that the service was created:

```bash
oscar-cli service list
```

The service name in this example is `qwen2-5-05b-instruct`.

## 2. Test the OpenAI-compatible endpoint

Once the service is ready, the model will be exposed on `https://<YOUR_CLUSTER>/system/services/<SERVICE_NAME>/models` and you can test your service in different ways:

### Direct request with `curl`

1. Open a terminal and try:

```bash
curl -X POST "https://<YOUR_CLUSTER>/system/services/qwen2-5-05b-instruct/models/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <TOKEN>" \
--data '{
"model": "qwen2-5-05b-instruct",
"messages": [
{
"role": "user",
"content": "Write a short explanation about KServe"
}
]
}'
```
> Replace `<TOKEN>` with your service token or four personal OIDC token.

> Note: If there is only one model, it will have the same name as the OSCAR service.

### Through Open WebUI

1. Install [Docker](https://www.docker.com)
2. Run Open WebUI:
```bash
docker run -d -p 3000:8080 -e WEBUI_AUTH=False -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
```
3. Go to [http://localhost:3000/](http://localhost:3000/)
4. Add a connection to the service:
`Top right corner → Admin Panel → Settings → Connections → OpenAI API`
5. Try it

## Build the images

### vLLM CPU runtime

```bash
docker buildx build --platform linux/amd64,linux/arm64 -t ghcr.io/grycap/kserve-vllm-openai-cpu:v0.22.1 -f Dockerfile.vllm . --push
```

### OCI modelcar (Qwen2.5 model)

```bash
docker buildx build --platform linux/amd64,linux/arm64 -t ghcr.io/grycap/kserve-qwen2-5-05b-instruct:latest -f Dockerfile.model . --push
```

If you use a local registry (for example `localhost:5001`), update the tags in
the commands above and in `fdl.yaml` (`runtime_image` and `storage_uri`).

## Notes

- The first startup can take several minutes (model download and pod rollout).
- The current example defines modest resources (`cpu: 2`, `memory: 6Gi`); adjust them for your cluster.
- `fdl.yaml` uses `--dtype=auto` and `--enforce-eager` for more stable CPU execution.

## Additional Resources

- [vLLM Documentation](https://docs.vllm.ai/en/latest/)
- [OSCAR Documentation](https://docs.oscar.grycap.net/)
- [KServe](https://kserve.github.io/website/)
- [API](https://docs.oscar.grycap.net/latest/api/)
- [OSCAR CLI](https://github.com/grycap/oscar-cli)
21 changes: 21 additions & 0 deletions crates/qwen2-5-05b-instruct/docker/Dockerfile.model
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM --platform=$BUILDPLATFORM alpine:3.20 AS builder
ARG TARGETPLATFORM
ARG BUILDPLATFORM
RUN apk add --no-cache ca-certificates git git-lfs
RUN git lfs install --system
RUN git clone --depth 1 https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct /models
RUN cd /models && git lfs pull && rm -rf .git

FROM --platform=$BUILDPLATFORM busybox
ARG TARGETPLATFORM
ARG BUILDPLATFORM
# Create a non-root user and group, and set permissions for the /models directory
# Necesary to avoid permission issues when KServe tries to access the model files
# Default KServe modelcard uid is 1010
RUN addgroup -g 1010 storage \
&& adduser -D -u 1010 -G storage storage \
&& mkdir -p /models \
&& chown -R 1010:1010 /models \
&& chmod 755 /models
COPY --from=builder --chown=1010:1010 /models/ /models/
USER 1010:1010
5 changes: 5 additions & 0 deletions crates/qwen2-5-05b-instruct/docker/Dockerfile.vllm
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM --platform=$BUILDPLATFORM vllm/vllm-openai-cpu:v0.22.1
ARG BUILDPLATFORM
ARG TARGETPLATFORM
USER root
RUN useradd -u 1010 -m storage
18 changes: 18 additions & 0 deletions crates/qwen2-5-05b-instruct/fdl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
functions:
oscar:
- oscar-kserve-cluster:
name: qwen2-5-05b-instruct
image: ubuntu
kserve:
type: llm_inference
llm_inference:
runtime_image: ghcr.io/grycap/kserve-vllm-openai-cpu:v0.22.1
storage_uri: "oci://ghcr.io/grycap/kserve-qwen2-5-05b-instruct:latest"
cpu: '2.0'
memory: 6Gi
args:
- --dtype=auto
- --enforce-eager
env:
VLLM_CPU_KVCACHE_SPACE: "1"
log_level: CRITICAL
Binary file added crates/qwen2-5-05b-instruct/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 113 additions & 0 deletions crates/qwen2-5-05b-instruct/ro-crate-metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
{
"@context": [
"https://w3id.org/ro/crate/1.2/context"
],
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.2"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": [
"Dataset",
"Service",
"SoftwareApplication"
],
"datePublished": "2025-11-26",
"url": "https://github.com/grycap/oscar-hub/tree/main/crates/qwen2-5-05b-instruct",
"name": "OSCAR vLLM Qwen2-5-05b-Instruct",
"description": "OSCAR service that deploys a vLLM-based Qwen model for efficient large language model inference.",
"license": {
"@id": "https://www.apache.org/licenses/LICENSE-2.0"
},
"version": "0.1.0",
"applicationCategory": "OSCAR-KServe Service",
"memoryRequirements": "256 MiB",
"processorRequirements": [
"0.1 vCPU"
],
"serviceType": "exposed",
"isBasedOn": [
{
"@id": "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct"
}
],
"author": {
"@id": "#author"
},
"hasPart": [
{
"@id": "fdl.yml"
},
{
"@id": "script.sh"
},
{
"@id": "icon.png"
}
]
},
{
"@id": "fdl.yml",
"@type": [
"File",
"SoftwareSourceCode"
],
"name": "Service Definition (FDL)",
"encodingFormat": "text/yaml"
},
{
"@id": "script.sh",
"@type": [
"File",
"SoftwareSourceCode"
],
"name": "Service Execution Script",
"encodingFormat": "text/x-shellscript"
},
{
"@id": "icon.png",
"@type": [
"File",
"ImageObject"
],
"name": "Service Icon",
"encodingFormat": "image/png"
},
{
"@id": "#author",
"@type": "Person",
"affiliation": {
"@id": "UPV"
},
"name": "Robert Kazaryan"
},
{
"@id": "UPV",
"@type": "Organization",
"name": "Universitat Politècnica de València",
"url": "https://www.upv.es"
},
{
"@id": "https://www.apache.org/licenses/LICENSE-2.0",
"@type": "CreativeWork",
"name": "Apache License 2.0",
"identifier": "SPDX:Apache-2.0"
},
{
"@id": "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct",
"@type": "SoftwareApplication",
"name": "Qwen2.5-0.5B-Instruct",
"description": "Qwen 2.5 0.5B parameter model fine-tuned for instruction following.",
"url": "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct",
"version": "latest"
}
]
}
3 changes: 3 additions & 0 deletions crates/qwen2-5-05b-instruct/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

echo "Running Qwen2-5-05b-instruct script..."
42 changes: 42 additions & 0 deletions crates/rabbitmq-broker/AMQP Client/queue-publisher-amqp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import pika
import time

# topic username and password
SERVICE_NAME= 'service-name'
SERVICE_TOKEN= 'token-service'
TOPIC='oscar.service-name'

delay=5 # Delay time between messages

credentials = pika.PlainCredentials(SERVICE_NAME,SERVICE_TOKEN)

#connection = pika.BlockingConnection(pika.ConnectionParameters('localhost', credentials=credentials))

# REPLACE the long URL with the mapped host and port
# If you're in the same environment as Rabbit: 'localhost'
# If it's remote: the domain without 'https://' or routes
host_cluster = 'cluster.im.grycap.net'
amqp_port = 30300 # # Make sure this is the AMQP NodePort, not the HTTPS one

connection = pika.BlockingConnection(
pika.ConnectionParameters(
host=host_cluster,
port=amqp_port,
credentials=credentials
)
)

channel = connection.channel()
number_message=8
# We posted x messages in a row to test the accumulator
for i in range(1, number_message):
message = f"Message - {i}"
channel.basic_publish(
exchange='amq.topic',
routing_key=TOPIC, # topic
body=message
)
print(f" [!] Send: {message}")
time.sleep(delay)

connection.close()
62 changes: 62 additions & 0 deletions crates/rabbitmq-broker/AMQP Client/queue-publisher-http.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import requests
import json
import time

def send_burst_of_messages(number):
# --- Configuration---
USER = "service-name"
PASS = "service-token"
URL = "http://cluster.im.grycap.net:30100/api/exchanges/%2f/amq.topic/publish"

print(f"🚀 Starting to send {number} messages...")

for i in range(1, number + 1):
# Variable message body
message = {
"id": i,
"timestamp": time.time(),
"content": f"Burst message number {i}",
"source": "python-script"
}

payload_api = {
"properties": {
"content_type": "application/json",
"delivery_mode": 2
},
"routing_key": f"oscar.{USER}",
"payload": json.dumps(message),
"payload_encoding": "string"
}

try:
response = requests.post(
URL,
auth=(USER, PASS),
data=json.dumps(payload_api),
allow_redirects=False,
headers={"Content-Type": "application/json"}
)

if response.status_code == 200:
result = response.json()
if result.get("routed"):
print(f"✅ [{i}/{number}] Message successfully routed.")
else:
print(f"⚠️ [{i}/{number}] Sent but NOT routed. Check routing_key.")
else:
print(f"❌ Error in message {i}: {response.status_code} - {response.text}")

except Exception as e:
print(f"❌ Connection error in message {i}: {e}")
break

# Optional: a short pause of 3s between messages for improved flow
time.sleep(3)

print("🏁 Process completed.")

if __name__ == "__main__":
# Change this value as needed
AMOUNT_TO_SEND = 10
send_burst_of_messages(AMOUNT_TO_SEND)
Loading
Loading