Skip to content

[Usage]: How to convert GPT-OSS models (20B/120B) using TRT Flow with convert_checkpoint.py #10568

@ASH29033

Description

@ASH29033

System Info

System Information:

  • OS: 24.04
  • Python version: 3.12
  • CUDA version: 13.0
  • GPU model(s): 6000 pro
  • Driver version:
  • TensorRT-LLM version: 1.2.0rc6

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM:

I want to run inference of GPT-OSS models (specifically 20B and 120B variants from Hugging Face). I don't know how to integrate them with TensorRT-LLM using the TRT Flow approach (similar to EXAONE models) or optimize them for my use case.

Specific questions:

  • Model:
    • GPT-OSS-20B, GPT-OSS-120B

What I've tried:

I'm familiar with the EXAONE example in the documentation which uses:

  1. convert_checkpoint.py from the LLaMA example
  2. trtllm-build to create the engine

However, there's no specific guide for GPT-OSS models. I'd like to know:

  1. Is GPT-OSS architecture compatible with TensorRT-LLM? Should I use the LLaMA convert script or a different one?

  2. What's the correct conversion flow for GPT-OSS models?

   # Is this the right approach?
   python examples/llama/convert_checkpoint.py \
       --model_dir $HF_MODEL_DIR \
       --output_dir trt_models/gpt-oss/fp16/1-gpu \
       --dtype float16
   
   trtllm-build \
       --checkpoint_dir trt_models/gpt-oss/fp16/1-gpu \
       --output_dir trt_engines/gpt-oss/fp16/1-gpu \
       --gemm_plugin auto
  1. For the 120B model, what tensor parallelism configuration do you recommend? (e.g., mpxf4)

Thanks for contributing 🎉!

How would you like to use TensorRT-LLM

I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.

Specific questions:

  • Model:
  • Use case (e.g., chatbot, batch inference, real-time serving):
  • Expected throughput/latency requirements:
  • Multi-GPU setup needed:

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Model customization<NV>Adding support for new model architectures or variantsquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions