Release Release Notes: Intel® AI for Enterprise Inference – Version 1.0.0 · opea-project/Enterprise-Inference

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.

System Requirements

Category	Details
Operating System	Ubuntu 22.04
Hardware Platforms	3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware	1.20.0

Network: Internet access required for deployment; open ports for Kubernetes and container registry.
Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Deployment Modes

Single Node: Quick start for testing or lightweight workloads.
Single Master, Multiple Workers: For higher throughput workloads.
Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

Kubernetes Orchestration: Automates deployment, scaling, and management of AI inference clusters.
Model Management: Automated deployment and lifecycle management of LLM models; supports pre-validated models available on Hugging Face Hub. Referred models will be automatically downloaded from Hugging Face.
Observability: Native Kubernetes monitoring (metrics, visualization, alerting) for apps and cluster health.
Security & Access Control: Keycloak for authentication/authorization; APISIX and NGINX Ingress for secure API and traffic management.
Hardware Optimization: Supports and manages Intel® Xeon® and Gaudi® devices via dedicated operators.
OpenAI API Compatibility: Seamless integration with enterprise applications using standard APIs.
Flexible Configuration: Easily adapt cluster and inference settings via configuration files.
Automation Scripts: End-to-end scripts for cluster setup, deployment, and model onboarding.

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.

Post-Deployment

Access deployed models via API endpoints (OpenAI compatible).
Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models

View the Supported Model List.
Deploy custom LLMs directly from Hugging Face.

License

Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release Notes: Intel® AI for Enterprise Inference – Version 1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

System Requirements

Deployment Modes

Key Features

Getting Started

Post-Deployment

Supported Models

License

Uh oh!