-
Notifications
You must be signed in to change notification settings - Fork 996
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Create INT8 KV Cache on Qserve
triaged
Issue has been triaged by maintainers
#2446
opened Nov 14, 2024 by
dleunji
Loading…
th::optional -> std::optional
triaged
Issue has been triaged by maintainers
#2397
opened Oct 31, 2024 by
r-barnes
Loading…
attention mechanism toggle added
functionality issue
triaged
Issue has been triaged by maintainers
waiting for feedback
#2384
opened Oct 28, 2024 by
Aaryanverma
Loading…
fix load_model_on_cpu on qwen/convert_checkpoint.py
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#2382
opened Oct 27, 2024 by
lkm2835
Loading…
Fix errors when using smoothquant to quantize Qwen2 model
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2370
opened Oct 24, 2024 by
Missmiaom
Loading…
README.md: Add 3rd Party Inference Speed Dashboard
documentation
Improvements or additions to documentation
triaged
Issue has been triaged by maintainers
#2244
opened Sep 22, 2024 by
matichon-vultureprime
Loading…
Modify small-batched weight only quantization
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2213
opened Sep 10, 2024 by
dasistwo
Loading…
[examples/bert/build.py]: Load weights for BertModel and RobertaModel if Issue has been triaged by maintainers
--model_dir
is provided
triaged
#2187
opened Sep 3, 2024 by
tkhanipov
Loading…
fix wrong buffer for
oneShotAllReduceKernel
under PUSH_MODE
#2099
opened Aug 8, 2024 by
YconquestY
Loading…
decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…
#2085
opened Aug 5, 2024 by
lishicheng1996
Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md
documentation
Improvements or additions to documentation
#2057
opened Jul 30, 2024 by
RuibaiXu
Loading…
Fix default min length
triaged
Issue has been triaged by maintainers
#1935
opened Jul 11, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Fix CUDA OOM when creating Mixtral checkpoint
triaged
Issue has been triaged by maintainers
waiting for feedback
#1629
opened May 19, 2024 by
VivekBits2210
Loading…
[feat]: Support weight only gemm with 2bit
triaged
Issue has been triaged by maintainers
waiting for feedback
#1568
opened May 9, 2024 by
gavinchen430
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.