Skip to content

Releases: opea-project/Enterprise-Inference

Release Notes: Intel® AI for Enterprise Inference – Version 1.4.0 Latest Latest

12 Dec 10:02
856deff

Choose a tag to compare

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.22.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Deployment Modes

  • Single Node: Quick start for testing or lightweight workloads.
  • Single Master, Multiple Workers: For higher throughput workloads.
  • Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

  • Brownfield deployment support for Openshift
    • Added brownfield deployment support for Amazon EKS and Red Hat OpenShift.
    • Enabled seamless deployment of Keycloak–APISIX integration in brownfield environments.
    • Added brownfield deployment compatibility for GenAI Gateway across managed Kubernetes platforms.
    • Istio-based deployments are not supported on Red Hat OpenShift in this release.
  • Support for OpenVINO Model Server on Intel Enterprise Inference

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!

Release Notes: Intel® AI for Enterprise Inference – Version 1.3.0 Latest

12 Nov 15:44
9ceb561

Choose a tag to compare

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.22.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Deployment Modes

  • Single Node: Quick start for testing or lightweight workloads.
  • Single Master, Multiple Workers: For higher throughput workloads.
  • Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

  • Intel Gaudi Base Operator Stack Update

    • Intel Gaudi base operator updated to 1.22.0
    • vllm for Xeon and Gaudi images are updated to support updated stack
  • Updated Pre-validated Model list

    • meta-llama/Llama-4-Scout-17B-16E-Instruct - Gaudi
    • Qwen/Qwen2.5-32B-Instruct - Gaudi
    • meta-llama/Llama-3.2-3B-Instruct - Xeon
    • Qwen/Qwen3-1.7B - Xeon
    • Qwen/Qwen3-4B-Instruct-2507 - Xeon
    • Enabled support for multi-modal inference with Llama-4-Scout-17B-16E-Instruct.
  • Brownfield Deployment on Vanilla Kubernetes - Alpha

    • Ideal for clusters already running Kubernetes; no new cluster setup required
    • Preserves existing workloads and configurations while adding AI capabilities
  • Updated Tool calling for Pre-validated model

    • Update tool call parsers for the models
    • Updated tool call template files for the models
  • Code Refactoring

    • Code is modularized and components are version controlled with metadata config
  • GenAI Gateway Trace DNS Update

    • GenAI Gateway Trace is updated to use trace-domain.com instead of trace.domain.com
    • Update the documentation to use SAN based certs which eases the maintainability of SSL Certs for different component
  • IBM DA Support for v1.3.0 Enabled

    • Added support for Intel AI for Enterprise Inference for IBM Deployable Architecture

Bug Fixes

  • IOMMU Check for Ubuntu22 with Kernel version 6.8
  • Rerank model registration in GenAI Gateway

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!

Release Notes: Intel® AI for Enterprise Inference – Version 1.2.0

23 Sep 07:58
aa99bb2

Choose a tag to compare

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.21.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Cluster Deployment Modes in Enterprise Inference

Enterprise inference workloads can be deployed in different cluster configurations depending on scale, performance needs, and availability requirements.
Below are the supported modes:

Single Node Cluster — Quick start for testing or lightweight workloads:

Best for: Quick testing, Proof-of-Concepts (POCs), and lightweight workloads.
Purpose: Ideal for fast bring-up and low-latency inference in small-scale scenarios.
Setup: Runs entirely on a single Gaudi3 node, which handles both control and data plane functions.
Benefits:

  • Minimal orchestration overhead
  • Fastest deployment time
  • Suitable for single-user scenarios or serving a limited number of users

Single Master, Multiple Workers:

Best for: Medium-scale deployments requiring higher throughput.
Purpose: Separates Kubernetes infrastructure management from model execution to improve performance.
Setup:

  • A master node (e.g., Xeon CPU) runs the Kubernetes control plane and infrastructure pods (for example, the Habana Operator)
  • Multiple Gaudi3 worker nodes are dedicated to running inference workloads
    Benefits:
  • Maximizes compute utilization
  • Supports batch inference and concurrent model execution
  • Reduces resource contention by isolating infra from model workloads

Multi-Master, Multiple Workers — Enterprise-ready HA cluster:

Best for: Enterprises or production-grade deployments requiring high availability.
Purpose: Ensures fault tolerance and scalability for mission-critical inference workloads.
Setup:

  • Multiple master nodes manage the Kubernetes control plane with automatic failover
  • Gaudi3 worker nodes scale horizontally to handle complex models and high user concurrency
    Benefits:
  • High availability and resilience
  • Optimized for load balancing and SLA-driven deployments
  • Supports sustained throughput and enterprise-grade reliability

Key Features

  • Integrated GenAI Gateway

    • Integrated GenAI Gateway with LiteLLM and Langfuse for advanced AI model management and observability.
  • Xeon Optimization

    • Optimized performance for Intel Xeon CPUs.
    • Read the detailed CPU optimization guide.
    • Dynamic memory allocation for efficient resource usage.
    • Automatic topology detection for improved deployment flexibility.
  • Integrated Ceph and Istio

    • Seamless integration with Ceph storage and Istio service mesh for enhanced scalability and resilience.
  • Enhanced Observability

    • Integration with Grafana Loki for advanced log management.
    • AWS S3 and Minio support for log storage and retrieval.
  • Documentation & Workflow Updates

    • Refactored and expanded documentation for better developer experience.
    • Added vault secret management for secure workflows.
    • Various workflow enhancements for stability and usability.
  • IBM Cloud Multi-Node Architecture

    • Added support for multi-node deployment for IBM Cloud deployable architecture.
    • Added GenAI Gateway Integeration with IBM Cloud DA

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!

Release Notes: Intel® AI for Enterprise Inference on IBM Cloud – Version 1.1.0

05 Aug 08:01
cdaeb99

Choose a tag to compare

What's New

Intel® AI for Enterprise Inference with IBM Cloud Deployable Architecture

We're excited to announce the release of Intel AI for Enterprise Inference as an IBM Cloud deployable architecture. This solution automates the deployment of OpenAI-compatible AI inference endpoints powered by Intel® Gaudi® 3 AI accelerators and Intel® Xeon® processors.

System Requirements

Component Requirement
Operating System Ubuntu 22.04
Hardware 3rd-6th Gen Intel® Xeon® Scalable processors
AI Accelerators Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware v1.21.0
Storage Minimum 30GB (varies by model)

Key Features

  • Automated Infrastructure Deployment: Complete Kubernetes, model serving and authentication
  • Intel® Gaudi® 3 AI Accelerator Support: Optimized for high-performance AI workloads
  • OpenAI-Compatible API Endpoints: Seamless integration with existing applications
  • Two Deployment Patterns: Flexible options for different infrastructure requirements

Deployment Patterns

Quickstart Pattern

Standard Pattern

Supported Models

Model Name Cards Required Storage Model ID
meta-llama/Llama-3.1-8B-Instruct 1 20GB 1
meta-llama/Llama-3.3-70B-Instruct 4 150GB 12
meta-llama/Llama-3.1-405B-Instruct 8 900GB 11

Prerequisites

Required for All Deployments

  • IBM Cloud API Key
  • IBM Cloud SSH Key
  • Hugging Face Token

Additional requirements for Production deployment

  • Custom domain name
  • TLS certificate

Documentation

Quick Links

Support

For technical support and documentation, please refer to the Enterprise-Inference GitHub repository or consult the comprehensive documentation guides listed above.


Thank you for using Intel® AI for Enterprise Inference!

Release Notes: Intel® AI for Enterprise Inference – Version 1.0.0

30 Jun 13:49
d9a309e

Choose a tag to compare

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.20.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Deployment Modes

  • Single Node: Quick start for testing or lightweight workloads.
  • Single Master, Multiple Workers: For higher throughput workloads.
  • Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

  • Kubernetes Orchestration: Automates deployment, scaling, and management of AI inference clusters.
  • Model Management: Automated deployment and lifecycle management of LLM models; supports pre-validated models available on Hugging Face Hub. Referred models will be automatically downloaded from Hugging Face.
  • Observability: Native Kubernetes monitoring (metrics, visualization, alerting) for apps and cluster health.
  • Security & Access Control: Keycloak for authentication/authorization; APISIX and NGINX Ingress for secure API and traffic management.
  • Hardware Optimization: Supports and manages Intel® Xeon® and Gaudi® devices via dedicated operators.
  • OpenAI API Compatibility: Seamless integration with enterprise applications using standard APIs.
  • Flexible Configuration: Easily adapt cluster and inference settings via configuration files.
  • Automation Scripts: End-to-end scripts for cluster setup, deployment, and model onboarding.

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!