Name		Name	Last commit message	Last commit date
parent directory ..
gui		gui
img		img
model_repository		model_repository
README.md		README.md
client.py		client.py
export.py		export.py

README.md

Building Complex Pipelines: Stable Diffusion

Navigate to	Part 5: Building Model Ensembles	Part 7: Iterative Scheduling Tutorial	Documentation: BLS

Watch this explainer video with discusses the pipeline, before proceeding with the example. This example focuses on showcasing two of Triton Inference Server's features:

Using multiple frameworks in the same inference pipeline. Refer this for more information about supported frameworks.
Using the Python Backend's Business Logic Scripting API to build complex non linear pipelines.

Using Multiple Backends

Building a pipeline powered by deep learning models is a collaborative effort which often involves multiple contributors. Contributors often have differing development environment. This can lead to issues whilst building a single pipeline with work from different contributors. Triton users can solve this challenge with the use of the Python or a C++ backend along with the Business Logic Scripting API (BLS) API to trigger model execution.

In this example, the models are being run on:

ONNX Backend
TensorRT Backend
Python Backend

Both the models deployed on a framework backend can be triggered using the following API:

encoding_request = pb_utils.InferenceRequest(
    model_name="text_encoder",
    requested_output_names=["last_hidden_state"],
    inputs=[input_ids_1],
)

response = encoding_request.exec()
text_embeddings = pb_utils.get_output_tensor_by_name(response, "last_hidden_state")

Refer to model.py in the pipeline model for a complete example.

Stable Diffusion Example

Before starting, clone this repository and navigate to the root folder. Use three different terminals for an easier user experience.

Step 1: Prepare the Server Environment

First, run the Triton Inference Server Container.

# Replace yy.mm with year and month of release. Eg. 22.08
docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}:/workspace/ -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:yy.mm-py3 bash

Next, install all the dependencies required by the models running in the python backend and login with your huggingface token(Account on HuggingFace is required).

# PyTorch & Transformers Lib
pip install torch torchvision torchaudio
pip install transformers ftfy scipy accelerate
pip install diffusers==0.9.0
pip install transformers[onnxruntime]
huggingface-cli login

Step 2: Exporting and converting the models

Use the NGC PyTorch container, to export and convert the models.

docker run -it --gpus all -p 8888:8888 -v ${PWD}:/mount nvcr.io/nvidia/pytorch:yy.mm-py3

pip install transformers ftfy scipy
pip install transformers[onnxruntime]
pip install diffusers==0.9.0
huggingface-cli login
cd /mount
python export.py

# Accelerating VAE with TensorRT
trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16

# Place the models in the model repository
mkdir model_repository/vae/1
mkdir model_repository/text_encoder/1
mv vae.plan model_repository/vae/1/model.plan
mv encoder.onnx model_repository/text_encoder/1/model.onnx

Step 3: Launch the Server

From the server container, launch the Triton Inference Server.

tritonserver --model-repository=/models

Step 4: Run the client

Use the client container and run the client.

docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash

# Client with no GUI
python3 client.py

# Client with GUI
pip install gradio packaging
python3 gui/client.py --triton_url="localhost:8001"

Note: First Inference query may take more time than successive queries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part_6-building_complex_pipelines

Part_6-building_complex_pipelines

README.md

Building Complex Pipelines: Stable Diffusion

Using Multiple Backends

Stable Diffusion Example

Step 1: Prepare the Server Environment

Step 2: Exporting and converting the models

Step 3: Launch the Server

Step 4: Run the client

Files

Part_6-building_complex_pipelines

Directory actions

More options

Directory actions

More options

Latest commit

History

Part_6-building_complex_pipelines

Folders and files

parent directory

README.md

Building Complex Pipelines: Stable Diffusion

Using Multiple Backends

Stable Diffusion Example

Step 1: Prepare the Server Environment

Step 2: Exporting and converting the models

Step 3: Launch the Server

Step 4: Run the client