GitHub · Where software is built

[RFC]Feedback collection about TensorRT-LLM 1.0 Release Planning and API Compatibility Commitment
#3148 · juney-nvidia opened on Mar 29, 2025
5
[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups
#3124 · juney-nvidia opened on Mar 27, 2025
11

Labels Milestones New issue

[Usage]: How to convert GPT-OSS models (20B/120B) using TRT Flow with convert_checkpoint.py

Model customization

#10568

· ASH29033 opened

on Jan 9, 2026

[Installation]: nvidia/DeepSeek-V3.1-NVFP4 1.2.0rc7

#10548

· evgeniiperepelkin opened

on Jan 8, 2026

[Bug][AutoDeploy] Distributed execution fails: FakeTensorMode mismatch during torch.compile with world_size > 1

Disaggregated serving

#10545

· tcherckez-nvidia opened

on Jan 8, 2026

[Usage]:I noticed that CPU memory continues to increase during the TensorRT-LLM operation. Is this normal? Could there be a memory leak?

#10543

· july8023 opened

on Jan 8, 2026

[Feature]: trtllm-bench latency doesn't support --report_json

feature request

#10528

· tdjackey opened

on Jan 8, 2026

[Bug][AutoDeploy]: Demollm multiprocess executor is broken

#10520

· bmarimuthu-nv opened

on Jan 7, 2026

[Feature]: AutoDeploy: support HF-native fp8 dynamic quantization

feature request

#10519

· lucaslie opened

on Jan 7, 2026

[Feature]: AutoDeploy: support rms_norm for attention sharding heuristic

feature request

#10518

· lucaslie opened

on Jan 7, 2026

[AutoDeploy]: Standardize perf experiments logging

feature request

#10486

· galagam opened

on Jan 7, 2026

[Feature]: Adding support for Fuzzy Speculative Decoding (ACL Findings 2025)

feature request

Speculative Decoding

#10481

· maxholsman opened

on Jan 7, 2026

[Bug][AutoDeploy]: Llama3.2 Vision models fail - exported model signature mismatch for use_cache/return_dict

#10464

· tcherckez-nvidia opened

on Jan 6, 2026

[AutoDeploy]: Support for GLM 4.7 text model

feature request

#10462

· suyoggupta opened

on Jan 6, 2026