Skip to content

Release Notes: Intel® AI for Enterprise Inference – Version 1.3.0 Latest

Choose a tag to compare

@AhmedSeemalK AhmedSeemalK released this 12 Nov 15:44
· 2 commits to main since this release
9ceb561

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.22.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Deployment Modes

  • Single Node: Quick start for testing or lightweight workloads.
  • Single Master, Multiple Workers: For higher throughput workloads.
  • Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

  • Intel Gaudi Base Operator Stack Update

    • Intel Gaudi base operator updated to 1.22.0
    • vllm for Xeon and Gaudi images are updated to support updated stack
  • Updated Pre-validated Model list

    • meta-llama/Llama-4-Scout-17B-16E-Instruct - Gaudi
    • Qwen/Qwen2.5-32B-Instruct - Gaudi
    • meta-llama/Llama-3.2-3B-Instruct - Xeon
    • Qwen/Qwen3-1.7B - Xeon
    • Qwen/Qwen3-4B-Instruct-2507 - Xeon
    • Enabled support for multi-modal inference with Llama-4-Scout-17B-16E-Instruct.
  • Brownfield Deployment on Vanilla Kubernetes - Alpha

    • Ideal for clusters already running Kubernetes; no new cluster setup required
    • Preserves existing workloads and configurations while adding AI capabilities
  • Updated Tool calling for Pre-validated model

    • Update tool call parsers for the models
    • Updated tool call template files for the models
  • Code Refactoring

    • Code is modularized and components are version controlled with metadata config
  • GenAI Gateway Trace DNS Update

    • GenAI Gateway Trace is updated to use trace-domain.com instead of trace.domain.com
    • Update the documentation to use SAN based certs which eases the maintainability of SSL Certs for different component
  • IBM DA Support for v1.3.0 Enabled

    • Added support for Intel AI for Enterprise Inference for IBM Deployable Architecture

Bug Fixes

  • IOMMU Check for Ubuntu22 with Kernel version 6.8
  • Rerank model registration in GenAI Gateway

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!