[Usage]: How to convert GPT-OSS models (20B/120B) using TRT Flow with convert_checkpoint.py

### System Info

**System Information:**
- OS: 24.04
- Python version: 3.12
- CUDA version: 13.0
- GPU model(s): 6000 pro
- Driver version: 
- TensorRT-LLM version: 1.2.0rc6

**Detailed output:**
```text
Paste the output of the above commands here
```

---

**How would you like to use TensorRT-LLM:**

I want to run inference of GPT-OSS models (specifically 20B and 120B variants from Hugging Face). I don't know how to integrate them with TensorRT-LLM using the **TRT Flow** approach (similar to EXAONE models) or optimize them for my use case.

**Specific questions:**

- **Model:** 
  - GPT-OSS-20B,  GPT-OSS-120B

**What I've tried:**

I'm familiar with the EXAONE example in the documentation which uses:
1. `convert_checkpoint.py` from the LLaMA example
2. `trtllm-build` to create the engine

However, there's no specific guide for GPT-OSS models. I'd like to know:

1. **Is GPT-OSS architecture compatible with TensorRT-LLM?** Should I use the LLaMA convert script or a different one?

2. **What's the correct conversion flow for GPT-OSS models?**
```bash
   # Is this the right approach?
   python examples/llama/convert_checkpoint.py \
       --model_dir $HF_MODEL_DIR \
       --output_dir trt_models/gpt-oss/fp16/1-gpu \
       --dtype float16
   
   trtllm-build \
       --checkpoint_dir trt_models/gpt-oss/fp16/1-gpu \
       --output_dir trt_engines/gpt-oss/fp16/1-gpu \
       --gemm_plugin auto
```

3. **For the 120B model, what tensor parallelism configuration do you recommend?** (e.g., mpxf4)

Thanks for contributing 🎉!



### How would you like to use TensorRT-LLM

I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.

**Specific questions:**
- Model:
- Use case (e.g., chatbot, batch inference, real-time serving):
- Expected throughput/latency requirements:
- Multi-GPU setup needed:


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Usage]: How to convert GPT-OSS models (20B/120B) using TRT Flow with convert_checkpoint.py #10568

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Usage]: How to convert GPT-OSS models (20B/120B) using TRT Flow with convert_checkpoint.py #10568

Description

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions