Triton Tutorials

For users experiencing the "Tensor in" & "Tensor out" approach to Deep Learning Inference, getting started with Triton can lead to many questions. The goal of this repository is to familiarize users with Triton's features and provide guides and examples to ease migration. For a feature by feature explanation, refer to the Triton Inference Server documentation.

Getting Started Checklist

Overview Video	Conceptual Guide: Deploying Models

Quick Deploy

The focus of these examples is to demonstrate deployment for models trained with various frameworks. These are quick demonstrations made with an understanding that the user is somewhat familiar with Triton.

Deploy a ...

PyTorch Model	TensorFlow Model	ONNX Model	TensorRT Accelerated Model	vLLM Model

LLM Tutorials

The table below contains some popular models that are supported in our tutorials

Example Models	Tutorial Link
Llama-2-7B	TensorRT-LLM Tutorial
Persimmon-8B	HuggingFace Transformers Tutorial
Falcon-7B	HuggingFace Transformers Tutorial

Note: This is not an exhausitive list of what Triton supports, just what is included in the tutorials.

What does this repository contain?

This repository contains the following resources:

Conceptual Guide: This guide focuses on building a conceptual understanding of the general challenges faced whilst building inference infrastructure and how to best tackle these challenges with Triton Inference Server.
Quick Deploy: These are a set of guides about deploying a model from your preferred framework to the Triton Inference Server. These guides assume a basic understanding of the Triton Inference Server. It is recommended to review the getting started material for a complete understanding.
HuggingFace Guide: The focus of this guide is to walk the user through different methods in which a HuggingFace model can be deployed using the Triton Inference Server.
Feature Guides: This folder is meant to house Triton's feature-specific examples.
Migration Guide: Migrating from an existing solution to Triton Inference Server? Get an understanding of the general architecture that might best fit your use case.

Navigating Triton Inference Server Resources

The Triton Inference Server GitHub organization contains multiple repositories housing different features of the Triton Inference Server. The following is not a complete description of all the repositories, but just a simple guide to build intuitive understanding.

Server is the main Triton Inference Server Repository.
Client contains the libraries and examples needed to create Triton Clients
Backend contains the core scripts and utilities to build a new Triton Backend. Any repository containing the word "backend" is either a framework backend or an example for how to create a backend.
Tools like Model Analyzer and Model Navigator provide the tooling to either measure performance, or to simplify model acceleration.

Adding Requests

Open an issue and specify details for adding a request for an example. Want to make a contribution? Open a pull request and tag an Admin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Triton Tutorials

Getting Started Checklist

Quick Deploy

Deploy a ...

LLM Tutorials

What does this repository contain?

Navigating Triton Inference Server Resources

Adding Requests

Files

README.md

Latest commit

History

README.md

File metadata and controls

Triton Tutorials

Getting Started Checklist

Quick Deploy

Deploy a ...

LLM Tutorials

What does this repository contain?

Navigating Triton Inference Server Resources

Adding Requests