Overview
Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.
System Requirements
| Category | Details |
|---|---|
| Operating System | Ubuntu 22.04 |
| Hardware Platforms | 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators |
| Gaudi Firmware | 1.22.0 |
- Network: Internet access required for deployment; open ports for Kubernetes and container registry.
- Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
- Other: SSH key pair, SSL/TLS certificates, Hugging Face token.
Deployment Modes
- Single Node: Quick start for testing or lightweight workloads.
- Single Master, Multiple Workers: For higher throughput workloads.
- Multi-Master, Multiple Workers: Enterprise-ready HA cluster.
Key Features
- Brownfield deployment support for Openshift
- Added brownfield deployment support for Amazon EKS and Red Hat OpenShift.
- Enabled seamless deployment of Keycloak–APISIX integration in brownfield environments.
- Added brownfield deployment compatibility for GenAI Gateway across managed Kubernetes platforms.
- Istio-based deployments are not supported on Red Hat OpenShift in this release.
- Support for OpenVINO Model Server on Intel Enterprise Inference
- Documentationdocs/ovms-models
- For Openvino models on Hugging face please refer
- For more supported model on OVMS please refer
Getting Started
Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.
Post-Deployment
- Access deployed models via API endpoints (OpenAI compatible).
- Use built-in observability dashboards for monitoring and troubleshooting.
Supported Models
- View the Supported Model List.
- Deploy custom LLMs directly from Hugging Face.
License
- Licensed under the Apache License 2.0.
Thank you for using Intel® AI for Enterprise Inference!