Securing Model Inference with F5 Distributed Cloud API Security

Detailed description
- Architecture diagrams
Requirements
Deploy
- Prerequisites
- Supported Models
Deploying the Quickstart Baseline (Step 1)
Next Steps: Deploying and Securing (Steps 2 & 3)
- Step 2: Deploy F5 Distributed Cloud
- Step 3: Configure and Run Use Cases for F5 Distributed Cloud
Delete
References
Technical details
Tags

Detailed description

This QuickStart shows how to protect AI inference endpoints on Red Hat OpenShift AI using F5 Distributed Cloud (XC) Web App & API Protection (WAAP) + API Security. You’ll deploy a KServe/vLLM model service in OpenShift AI, front it with an F5 XC HTTP Load Balancer, and enforce API discovery, OpenAPI schema validation, rate limiting, bot defense, and sensitive-data controls—without changing your ML workflow. OpenShift AI’s single-model serving is KServe-based (recommended for LLMs), and KServe’s HuggingFace/vLLM runtime exposes OpenAI-compatible endpoints, which we’ll secure via F5 XC

Key Components

Red Hat OpenShift AI – Unified MLOps platform for developing and inference models at scale
F5 Distributed Cloud API Security – Provides LLM-aware threat detection, schema validation, and sensitive data redaction
Chat Assistant – AI-powered chat interface
Direct Mode RAG – Retrieval-Augmented Generation without agent complexity
Integration Blueprint – Demonstrates secure model inference across hybrid environments

Quick Start

Prerequisites

OpenShift cluster with RHOAI installed
Helm CLI installed
oc CLI logged into OpenShift

Deploy

Clone the repository:

git clone https://github.com/rh-ai-quickstart/F5-API-Security.git
cd F5-API-Security/deploy/helm

Deploy the application:

make install NAMESPACE=<namespace>

Access and configure:

# Get the route URL
oc get route -n <namespace>

# Open the application URL in your browser
# Configure LLM settings via the web UI:
# • XC URL: Set your chat completions endpoint
# • Model ID: Specify the model to use  
# • API Key: Add authentication if required

Document Management

Documents can be uploaded directly through the UI:

📄 Supported Formats

PDF Documents: Upload security policies, manuals, and reports
Text Files: Plain text documents

Navigate to Settings → Vector Databases to create vector databases and upload documents.

Architecture diagrams

Layer/Component	Technology	Purpose/Description
Orchestration	OpenShift AI	Container orchestration and GPU acceleration
Framework	LLaMA Stack	Standardizes core building blocks and simplifies AI application development
UI Layer	Streamlit	User-friendly chatbot interface for chat-based interaction
LLM	Llama-3.2-3B-Instruct	Generates contextual responses based on retrieved documents
Embedding	all-MiniLM-L6-v2	Converts text to vector embeddings
Vector DB	PostgreSQL + PGVector	Stores embeddings and enables semantic search
Retrieval	Vector Search	Retrieves relevant documents based on query similarity
Storage	S3 Bucket	Document source for enterprise content

Requirements

Minimum hardware requirements

Minimum software requirements

OpenShift Client CLI - oc
OpenShift Cluster 4.18+
OpenShift AI
Helm CLI - helm

Required user permissions

Regular user permission for default deployment
Cluster admin required for advanced configurations

Deploy

The instructions below will deploy this quickstart to your OpenShift environment.

Prerequisites

huggingface-cli (optional)
Hugging Face Token
Access to Meta Llama model
Access to Meta Llama Guard model
Some of the example scripts use jq a JSON parsing utility which you can acquire via brew install jq

Supported Models

Function	Model Name	Hardware	AWS
Embedding	`all-MiniLM-L6-v2`	CPU/GPU/HPU
Generation	`meta-llama/Llama-3.2-3B-Instruct`	L4/HPU	g6.2xlarge
Generation	`meta-llama/Llama-3.1-8B-Instruct`	L4/HPU	g6.2xlarge
Generation	`meta-llama/Meta-Llama-3-70B-Instruct`	A100 x2/HPU	p4d.24xlarge
Safety	`meta-llama/Llama-Guard-3-8B`	L4/HPU	g6.2xlarge

Note: the 70B model is NOT required for initial testing of this example. The safety/shield model Llama-Guard-3-8B is also optional.

Deploying the Quickstart Baseline (Step 1)

The instructions below will deploy the core AI stack (pgvector, llm-service, llama-stack) to your OpenShift environment.

Installation Steps

1. Login to OpenShift

Log in to your OpenShift cluster using your token and API endpoint:

oc login --token=<your_sha256_token> --server=<cluster-api-endpoint>

Example: The observed deployment logged into https://api.gpu-ai.bd.f5.com:6443 using a specific token and used project z-ji initially.

2. Clone the Repository

Clone the F5-API-Security repository:

git clone https://github.com/rh-ai-quickstart/F5-API-Security

The repository was cloned into the local directory.

3. Navigate to Deployment Directory

Change into the cloned repository and then into the deploy/helm folder:

cd F5-API-Security
cd deploy/helm

The deployment process navigated to ~/F5-API-Security/deploy/helm.

4. Configure and Deploy

First, configure your deployment values:

# Copy the example configuration file
cp rag-values.yaml.example rag-values.yaml

# Edit the configuration file to set your values
vim rag-values.yaml  # or use your preferred editor

Then deploy using the Makefile:

make install NAMESPACE=f5-ai-security

During installation, the make command:

Checks required dependencies (helm, oc).
Creates the namespace if it doesn't exist.
Updates Helm dependencies.
Downloads required charts (pgvector, llama-stack).
Installs the Helm chart with your custom values from rag-values.yaml.

A successful deployment will show:

[SUCCESS] All dependencies are installed.
[INFO] Creating namespace f5-ai-security...
[SUCCESS] Namespace f5-ai-security is ready
[INFO] Installing rag helm chart with rag-values.yaml...
NAME: rag
LAST DEPLOYED: Thu Dec 11 10:38:36 2025
NAMESPACE: f5-ai-security
STATUS: deployed
REVISION: 1
[SUCCESS] rag installed successfully

Post-Deployment Verification (Optional)

Once deployed, you can verify that the model endpoints are running correctly using curl.

Check Deployed Models (LlamaStack Endpoint)

curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/models

Expected output: Two models available — a large language model and an embedding model.

Test Chat Completion (LlamaStack Endpoint)

curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/openai/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "remote-llm/RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 64,
    "temperature": 0
  }' | jq

Example output:
"Hello, how can I assist you today?"

Test Chat Completion (Secured vLLM Endpoint)

curl -sS http://your-xc-endpoint.com/v1/openai/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "your-model-id",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 64,
    "temperature": 0
  }' | jq

This test against the dedicated vLLM endpoint also returned a successful response.

Summary

The deployment successfully sets up the F5-API-Security QuickStart environment on OpenShift, installs the Helm chart, and exposes model endpoints that can be verified using standard API calls.

Next Steps: Deploying and Securing (Steps 2 & 3)

With the core AI baseline deployed, proceed to the detailed guides for configuring the F5 Distributed Cloud components and running security use cases:

Step 2: Deploy F5 Distributed Cloud

Configure the F5 Distributed Cloud components and integrate the LLM endpoint.
➡️ Deployment and Configuration of F5 Distributed Cloud

Step 3: Configure and Run Use Cases for F5 Distributed Cloud

Run security testing to demonstrate how F5 API Security protects the deployed model inference services.
➡️ Security Use Cases and Testing

Delete

To completely remove the F5-API-Security application from your OpenShift cluster:

Uninstall the Application

cd F5-API-Security/deploy/helm
make uninstall NAMESPACE=f5-ai-security

This will:

Uninstall the Helm release
Delete all pods, services, and routes
Remove the pgvector PVC (persistent volume claim)
Clean up all resources in the namespace

Complete Cleanup (Optional)

If you also want to delete the namespace itself:

oc delete project f5-ai-security

Available Make Commands

make help              # Show all available commands
make install           # Deploy the application
make uninstall         # Remove the application
make clean             # Clean up all resources including namespace
make logs              # Show logs for all pods
make monitor           # Monitor deployment status
make validate-config   # Validate configuration values

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github		.github
deploy		deploy
docs		docs
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

rh-ai-quickstart/f5-api-security

Folders and files

Latest commit

History

Repository files navigation

Securing Model Inference with F5 Distributed Cloud API Security

Table of contents

Detailed description

Quick Start

Prerequisites

Deploy

Document Management

📄 Supported Formats

Architecture diagrams

Requirements

Minimum hardware requirements

Minimum software requirements

Required user permissions

Deploy

Prerequisites

Supported Models

Deploying the Quickstart Baseline (Step 1)

Installation Steps

1. Login to OpenShift

2. Clone the Repository

3. Navigate to Deployment Directory

4. Configure and Deploy

Post-Deployment Verification (Optional)

Check Deployed Models (LlamaStack Endpoint)

Test Chat Completion (LlamaStack Endpoint)

Test Chat Completion (Secured vLLM Endpoint)

Summary

Next Steps: Deploying and Securing (Steps 2 & 3)

Step 2: Deploy F5 Distributed Cloud

Step 3: Configure and Run Use Cases for F5 Distributed Cloud

Delete

Uninstall the Application

Complete Cleanup (Optional)

Available Make Commands

References

Technical details

Tags

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages