chore: updates

peri044 · peri044 · commit d5246f934187 · 2024-09-24T10:40:08.000-07:00
diff --git a/docsrc/index.rst b/docsrc/index.rst
@@ -118,6 +118,8 @@ Tutorials
    tutorials/_rendered_examples/distributed_inference/data_parallel_gpt2
    tutorials/_rendered_examples/distributed_inference/data_parallel_stable_diffusion
    tutorials/_rendered_examples/dynamo/mutable_torchtrt_module_example
+   tutorials/_rendered_examples/dynamo/torch_export_gpt2
+   tutorials/_rendered_examples/dynamo/torch_export_llama2
 
 Python API Documentation
 ------------------------
diff --git a/examples/dynamo/README.rst b/examples/dynamo/README.rst
@@ -1,19 +1,36 @@
 .. _torch_compile:
 
-Dynamo / ``torch.compile``
-----------------------------
+Torch-TensorRT Examples
+====================================
 
-Torch-TensorRT provides a backend for the new ``torch.compile`` API released in PyTorch 2.0. In the following examples we describe
-a number of ways you can leverage this backend to accelerate inference.
+Please refer to the following examples which demonstrate the usage of different features of Torch-TensorRT. We also provide
+examples of Torch-TensorRT compilation of select computer vision and language models.
 
-* :ref:`torch_compile_resnet`: Compiling a ResNet model using the Torch Compile Frontend for ``torch_tensorrt.compile``
-* :ref:`torch_compile_transformer`: Compiling a Transformer model using ``torch.compile``
+Dependencies
+------------------------------------
+
+Please install the following external depencies (assuming you already have `torch_tensorrt` installed)
+
+.. code-block:: python
+
+    pip install -r requirements.txt
+
+
+Compiler Features
+------------------------------------
 * :ref:`torch_compile_advanced_usage`: Advanced usage including making a custom backend to use directly with the ``torch.compile`` API
-* :ref:`torch_compile_stable_diffusion`: Compiling a Stable Diffusion model using ``torch.compile``
 * :ref:`torch_export_cudagraphs`: Using the Cudagraphs integration with `ir="dynamo"`
 * :ref:`custom_kernel_plugins`: Creating a plugin to use a custom kernel inside TensorRT engines
 * :ref:`refit_engine_example`: Refitting a compiled TensorRT Graph Module with updated weights
 * :ref:`mutable_torchtrt_module_example`: Compile, use, and modify TensorRT Graph Module with MutableTorchTensorRTModule
 * :ref:`vgg16_fp8_ptq`: Compiling a VGG16 model with FP8 and PTQ using ``torch.compile``
 * :ref:`engine_caching_example`: Utilizing engine caching to speed up compilation times
 * :ref:`engine_caching_bert_example`: Demonstrating engine caching on BERT
+
+Model Zoo
+------------------------------------
+* :ref:`torch_compile_resnet`: Compiling a ResNet model using the Torch Compile Frontend for ``torch_tensorrt.compile``
+* :ref:`torch_compile_transformer`: Compiling a Transformer model using ``torch.compile``
+* :ref:`torch_compile_stable_diffusion`: Compiling a Stable Diffusion model using ``torch.compile``
+* :ref:`_torch_export_gpt2`: Compiling a GPT2 model using AOT workflow (`ir=dynamo`)
+* :ref:`_torch_export_llama2`: Compiling a Llama2 model using AOT workflow (`ir=dynamo`)
diff --git a/examples/dynamo/requirements.txt b/examples/dynamo/requirements.txt
@@ -1,4 +1,4 @@
 cupy==13.1.0
-torch>=2.4.0.dev20240503+cu121
-torch-tensorrt>=2.4.0.dev20240503+cu121
 triton==2.3.0
+diffusers==0.30.3
+transformers==4.44.2
diff --git a/examples/dynamo/torch_compile_gpt2.py b/examples/dynamo/torch_compile_gpt2.py
@@ -53,30 +53,29 @@
 # Compilation with `torch.compile` using tensorrt backend and generate TensorRT outputs
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-with torch_tensorrt.logging.debug():
-    # Compile the model and mark the input sequence length to be dynamic
-    torch._dynamo.mark_dynamic(input_ids, 1, min=2, max=1023)
-    model.forward = torch.compile(
-        model.forward,
-        backend="tensorrt",
-        dynamic=None,
-        options={
-            "enabled_precisions": {torch.float32},
-            "disable_tf32": True,
-            "min_block_size": 1,
-            "debug": True,
-        },
-    )
+# Compile the model and mark the input sequence length to be dynamic
+torch._dynamo.mark_dynamic(input_ids, 1, min=2, max=1023)
+model.forward = torch.compile(
+    model.forward,
+    backend="tensorrt",
+    dynamic=None,
+    options={
+        "enabled_precisions": {torch.float32},
+        "disable_tf32": True,
+        "min_block_size": 1,
+        "debug": True,
+    },
+)
 
-    # Auto-regressive generation loop for greedy decoding using TensorRT model
-    # The first token generation compiles the model using TensorRT and the second token
-    # encounters recompilation
-    trt_gen_tokens = model.generate(
-        inputs=input_ids,
-        max_length=MAX_TOKENS,
-        use_cache=False,
-        pad_token_id=tokenizer.eos_token_id,
-    )
+# Auto-regressive generation loop for greedy decoding using TensorRT model
+# The first token generation compiles the model using TensorRT and the second token
+# encounters recompilation
+trt_gen_tokens = model.generate(
+    inputs=input_ids,
+    max_length=MAX_TOKENS,
+    use_cache=False,
+    pad_token_id=tokenizer.eos_token_id,
+)
 
 # %%
 # Decode the output sentences of PyTorch and TensorRT