Skip to content

Commit

Permalink
chore: updates
Browse files Browse the repository at this point in the history
  • Loading branch information
peri044 committed Sep 24, 2024
1 parent c4f8945 commit d5246f9
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 32 deletions.
2 changes: 2 additions & 0 deletions docsrc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ Tutorials
tutorials/_rendered_examples/distributed_inference/data_parallel_gpt2
tutorials/_rendered_examples/distributed_inference/data_parallel_stable_diffusion
tutorials/_rendered_examples/dynamo/mutable_torchtrt_module_example
tutorials/_rendered_examples/dynamo/torch_export_gpt2
tutorials/_rendered_examples/dynamo/torch_export_llama2

Python API Documentation
------------------------
Expand Down
31 changes: 24 additions & 7 deletions examples/dynamo/README.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,36 @@
.. _torch_compile:

Dynamo / ``torch.compile``
----------------------------
Torch-TensorRT Examples
====================================

Torch-TensorRT provides a backend for the new ``torch.compile`` API released in PyTorch 2.0. In the following examples we describe
a number of ways you can leverage this backend to accelerate inference.
Please refer to the following examples which demonstrate the usage of different features of Torch-TensorRT. We also provide
examples of Torch-TensorRT compilation of select computer vision and language models.

* :ref:`torch_compile_resnet`: Compiling a ResNet model using the Torch Compile Frontend for ``torch_tensorrt.compile``
* :ref:`torch_compile_transformer`: Compiling a Transformer model using ``torch.compile``
Dependencies
------------------------------------

Please install the following external depencies (assuming you already have `torch_tensorrt` installed)

.. code-block:: python
pip install -r requirements.txt
Compiler Features
------------------------------------
* :ref:`torch_compile_advanced_usage`: Advanced usage including making a custom backend to use directly with the ``torch.compile`` API
* :ref:`torch_compile_stable_diffusion`: Compiling a Stable Diffusion model using ``torch.compile``
* :ref:`torch_export_cudagraphs`: Using the Cudagraphs integration with `ir="dynamo"`
* :ref:`custom_kernel_plugins`: Creating a plugin to use a custom kernel inside TensorRT engines
* :ref:`refit_engine_example`: Refitting a compiled TensorRT Graph Module with updated weights
* :ref:`mutable_torchtrt_module_example`: Compile, use, and modify TensorRT Graph Module with MutableTorchTensorRTModule
* :ref:`vgg16_fp8_ptq`: Compiling a VGG16 model with FP8 and PTQ using ``torch.compile``
* :ref:`engine_caching_example`: Utilizing engine caching to speed up compilation times
* :ref:`engine_caching_bert_example`: Demonstrating engine caching on BERT

Model Zoo
------------------------------------
* :ref:`torch_compile_resnet`: Compiling a ResNet model using the Torch Compile Frontend for ``torch_tensorrt.compile``
* :ref:`torch_compile_transformer`: Compiling a Transformer model using ``torch.compile``
* :ref:`torch_compile_stable_diffusion`: Compiling a Stable Diffusion model using ``torch.compile``
* :ref:`_torch_export_gpt2`: Compiling a GPT2 model using AOT workflow (`ir=dynamo`)
* :ref:`_torch_export_llama2`: Compiling a Llama2 model using AOT workflow (`ir=dynamo`)
4 changes: 2 additions & 2 deletions examples/dynamo/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cupy==13.1.0
torch>=2.4.0.dev20240503+cu121
torch-tensorrt>=2.4.0.dev20240503+cu121
triton==2.3.0
diffusers==0.30.3
transformers==4.44.2
45 changes: 22 additions & 23 deletions examples/dynamo/torch_compile_gpt2.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,30 +53,29 @@
# Compilation with `torch.compile` using tensorrt backend and generate TensorRT outputs
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

with torch_tensorrt.logging.debug():
# Compile the model and mark the input sequence length to be dynamic
torch._dynamo.mark_dynamic(input_ids, 1, min=2, max=1023)
model.forward = torch.compile(
model.forward,
backend="tensorrt",
dynamic=None,
options={
"enabled_precisions": {torch.float32},
"disable_tf32": True,
"min_block_size": 1,
"debug": True,
},
)
# Compile the model and mark the input sequence length to be dynamic
torch._dynamo.mark_dynamic(input_ids, 1, min=2, max=1023)
model.forward = torch.compile(
model.forward,
backend="tensorrt",
dynamic=None,
options={
"enabled_precisions": {torch.float32},
"disable_tf32": True,
"min_block_size": 1,
"debug": True,
},
)

# Auto-regressive generation loop for greedy decoding using TensorRT model
# The first token generation compiles the model using TensorRT and the second token
# encounters recompilation
trt_gen_tokens = model.generate(
inputs=input_ids,
max_length=MAX_TOKENS,
use_cache=False,
pad_token_id=tokenizer.eos_token_id,
)
# Auto-regressive generation loop for greedy decoding using TensorRT model
# The first token generation compiles the model using TensorRT and the second token
# encounters recompilation
trt_gen_tokens = model.generate(
inputs=input_ids,
max_length=MAX_TOKENS,
use_cache=False,
pad_token_id=tokenizer.eos_token_id,
)

# %%
# Decode the output sentences of PyTorch and TensorRT
Expand Down

0 comments on commit d5246f9

Please sign in to comment.