Releases · triton-inference-server/triton_cli · GitHub

19 Dec 18:55

rmccorm4

0.1.1 Latest

Latest

What's Changed

chore: Updating README.md and trt-llm templates for 24.10 by @KrishnanPrash in #96
ci: Fix L0_triton_cli_test_trtllm--base by @yinggeh in #97

Full Changelog: 0.1.0...0.1.1

Contributors

KrishnanPrash and yinggeh

Assets 2

13 Dec 21:45

rmccorm4

0.1.0

What's Changed

Support for 24.09 release
Iteratively deleting model directories when running triton remove -m all to avoid deleting a mounted directory by @KrishnanPrash in #85
fix: Add fix and test for 'triton' with no args by @rmccorm4 in #89
feat: Switch to TRT-LLM LLM (High Level) API over trtllm-build CLI workflow by @rmccorm4 in #87

Full Changelog: 0.0.11...0.1.0

Contributors

rmccorm4 and KrishnanPrash

Assets 2

05 Sep 22:22

rmccorm4

0.0.11

What's Changed

chore: Add Llama3.1-8B support for vLLM and use KIND_MODEL for vLLM config by default by @rmccorm4 in #82
build: Upgrade to 24.08, TRT-LLM 0.12.0, and Triton CLI v0.0.11 by @rmccorm4 in #83

Full Changelog: 0.0.10...0.0.11

Contributors

rmccorm4

Assets 2

06 Aug 19:07

rmccorm4

0.0.10

What's Changed

Upgrade to 24.07, TRT-LLM 0.11.0, and Triton CLI v0.0.10 by @rmccorm4 in #81
Log infer inputs when using triton infer
Add more sensible TRTLLM config.pbtxt template parsing values to engine_config_parser.py

Full Changelog: 0.0.9...0.0.10

Contributors

rmccorm4

Assets 2

26 Jul 23:11

rmccorm4

0.0.9

What's Changed

chore: Update TRT-LLM checkpoint scripts to v0.10 and Fix Github Actions Pipeline by @KrishnanPrash in #78
test: Shorten genai-perf test time, fail fast on server startup, and upgrade to 24.06 by @rmccorm4 in #76
Tag 0.0.9 and update versions to 24.06 by @rmccorm4 in #79

New Contributors

@mc-nv made their first contribution in #74

Full Changelog: 0.0.8...0.0.9

Contributors

rmccorm4, mc-nv, and KrishnanPrash

Assets 2

11 Jun 22:49

rmccorm4

0.0.8

What's Changed

Disable Echo (exclude input text from output text) in TRT-LLM by default by @nnshah1 in #58
Enable calls to GenAI-Perf for profile subcommand by @dyastremsky in #52
Fix wrong huggingface login command in readme by @matthewkotila in #60
Tweak test timeouts to account for testing Llama 2 and Llama 3 models by @rmccorm4 in #61
Add GitLab CI trigger in GitHub checks by @nvda-mesharma in #64
test: Unit Tests for triton {metrics, config, status} by @KrishnanPrash in #66
chore: Upgrade dependencies for 24.05 by @KrishnanPrash in #67
refactor: Simplify testing with ScopedTritonServer instead of pytest fixtures by @KrishnanPrash in #68
ci: Restrict numpy to version 1.x by @KrishnanPrash in #70
refactor: Add TritonCLIException to denote expected vs unexpected errors by @rmccorm4 in #69
build: Update CLI version references to 0.0.8 and Triton references to 24.05 by @rmccorm4 in #72

New Contributors

@nnshah1 made their first contribution in #58
@nvda-mesharma made their first contribution in #64
@KrishnanPrash made their first contribution in #66

Full Changelog: 0.0.7...0.0.8

Contributors

matthewkotila, nnshah1, and 4 other contributors

Assets 2

0 Join discussion

08 May 02:58

rmccorm4

0.0.7

What's Changed

Sync with Triton 24.04
Bump TRT-LLM version to 0.9.0
Add support for llama-2-7b-chat, llama-3-8b, and llama-3-8b-instruct for both vLLM and TRT-LLM
Improve error checking and error messages of building TRT-LLM engines
Log the underlying convert_checkpoint.py and trtllm-build commands for reproducibility/visibility
Don't call convert_checkpoint.py if converted weights are already found
Call convert_checkpoint.py via subprocess to improve total memory usage
Attempt to cleanup failed trtllm models in model repository if import or engine building fails, rather than leaving the model repository in an unfinished state.
Update tests to wait for both HTTP and GRPC server endpoints to be ready before testing
- Fixes intermittent ConnectionRefusedError in CI tests

Full Changelog: 0.0.6...0.0.7

Assets 2

0 Join discussion

24 Apr 00:53

rmccorm4

0.0.6 Pre-release

Pre-release

What's Changed

GPT Engine Builder by @fpetrini15 in #24
Modularize TRT LLM Builders by @fpetrini15 in #26
Add --backend support to bench command and default to custom image by @rmccorm4 in #27
Fix model infer on TRT LLM with negative ints, and minor cleanup by @rmccorm4 in #28
Fix profile subcommand to account for offline (non-streaming) metrics and V1 batching by @rmccorm4 in #29
Minor Repo Optimizations by @fpetrini15 in #30
Bring back IFB default to TRT LLM models and bump to 24.01 by @rmccorm4 in #31
Bump cli version to 0.0.3, bump trtllm version to 0.7.1, and bump vllm version to 0.3.0 by @rmccorm4 in #32
Give GPT2 quicker build/load settings for demos, fix Dockerfile version syntax, bump CLI version to 0.0.4 by @rmccorm4 in #33
Add note on MPI dependencies by @rmccorm4 in #34
Add CLI subcommand tests to CI by @krishung5 in #35
Bump to v0.0.5 - CI testing working for 24.01 by @rmccorm4 in #38
Add extra tests for CLI by @krishung5 in #36
CLI TRT LLM v0.8.0 Refresh by @fpetrini15 in #37
Bump to v0.0.6 - CI testing working for 24.02 by @fpetrini15 in #39
Flatten CLI Args by @fpetrini15 in #40
Update README commands by @rmccorm4 in #42
Enable CLI Concurrent Testing by @fpetrini15 in #41
README Restructuring by @fpetrini15 in #43
Address some documentation issues by @rmccorm4 in #50

New Contributors

@krishung5 made their first contribution in #35

Full Changelog: 0.0.2...0.0.6

Contributors

rmccorm4, fpetrini15, and krishung5

Assets 2

17 Jan 21:03

rmccorm4

0.0.2 Pre-release

Pre-release

What's Changed

Setup repo and package structure by @rmccorm4 in #1
Add pre-commit hook to upgrade Python syntax by @dyastremsky in #2
Add initial prototype by @rmccorm4 in #4
Add README and update default image by @rmccorm4 in #5
Add rough NGC CLI wrapper by @rmccorm4 in #6
Basic MPI support by @fpetrini15 in #8
Populate model repo with TRTLLM templates by @oandreeva-nv in #7
Minor TRT-LLM tweaks by @rmccorm4 in #11
Misc fixes by @rmccorm4 in #14
Add profile subcommand to run perf analyzer by @matthewkotila in #13
POC: Background Server by @fpetrini15 in #15
Fix high concurrency generation throughput calculation by @nv-hwoo in #16
Add demo features for benchmarking LLMs by @rmccorm4 in #17
Add copyrights and minor cleanup by @rmccorm4 in #19
Automatic TRT LLM Detail Parsing by @fpetrini15 in #18
Fix vLLM profiler bug, add fallback logic to server start, cleanup by @rmccorm4 in #20
Add initial tests for repo subcommand by @rmccorm4 in #21
Catch errors and improve logging in Profiler by @nv-hwoo in #23
Bump version to 0.0.2 by @rmccorm4 in #22

New Contributors

@dyastremsky made their first contribution in #2
@matthewkotila made their first contribution in #13
@nv-hwoo made their first contribution in #16

Full Changelog: https://github.com/triton-inference-server/triton_cli/commits/0.0.2

Contributors

matthewkotila, rmccorm4, and 4 other contributors

Assets 2