Releases: triton-inference-server/triton_cli
Releases · triton-inference-server/triton_cli
0.0.11
0.0.10
0.0.9
What's Changed
- chore: Update TRT-LLM checkpoint scripts to v0.10 and Fix Github Actions Pipeline by @KrishnanPrash in #78
- test: Shorten genai-perf test time, fail fast on server startup, and upgrade to 24.06 by @rmccorm4 in #76
- Tag 0.0.9 and update versions to 24.06 by @rmccorm4 in #79
New Contributors
Full Changelog: 0.0.8...0.0.9
0.0.8
What's Changed
- Disable Echo (exclude input text from output text) in TRT-LLM by default by @nnshah1 in #58
- Enable calls to GenAI-Perf for profile subcommand by @dyastremsky in #52
- Fix wrong huggingface login command in readme by @matthewkotila in #60
- Tweak test timeouts to account for testing Llama 2 and Llama 3 models by @rmccorm4 in #61
- Add GitLab CI trigger in GitHub checks by @nvda-mesharma in #64
- test: Unit Tests for
triton {metrics, config, status}
by @KrishnanPrash in #66 - chore: Upgrade dependencies for 24.05 by @KrishnanPrash in #67
- refactor: Simplify testing with ScopedTritonServer instead of pytest fixtures by @KrishnanPrash in #68
- ci: Restrict numpy to version 1.x by @KrishnanPrash in #70
- refactor: Add TritonCLIException to denote expected vs unexpected errors by @rmccorm4 in #69
- build: Update CLI version references to 0.0.8 and Triton references to 24.05 by @rmccorm4 in #72
New Contributors
- @nnshah1 made their first contribution in #58
- @nvda-mesharma made their first contribution in #64
- @KrishnanPrash made their first contribution in #66
Full Changelog: 0.0.7...0.0.8
0.0.7
What's Changed
- Sync with Triton 24.04
- Bump TRT-LLM version to 0.9.0
- Add support for
llama-2-7b-chat
,llama-3-8b
, andllama-3-8b-instruct
for both vLLM and TRT-LLM - Improve error checking and error messages of building TRT-LLM engines
- Log the underlying
convert_checkpoint.py
andtrtllm-build
commands for reproducibility/visibility - Don't call
convert_checkpoint.py
if converted weights are already found - Call
convert_checkpoint.py
via subprocess to improve total memory usage - Attempt to cleanup failed trtllm models in model repository if import or engine building fails, rather than leaving the model repository in an unfinished state.
- Update tests to wait for both HTTP and GRPC server endpoints to be ready before testing
- Fixes intermittent
ConnectionRefusedError
in CI tests
- Fixes intermittent
Full Changelog: 0.0.6...0.0.7
0.0.6
What's Changed
- GPT Engine Builder by @fpetrini15 in #24
- Modularize TRT LLM Builders by @fpetrini15 in #26
- Add --backend support to bench command and default to custom image by @rmccorm4 in #27
- Fix model infer on TRT LLM with negative ints, and minor cleanup by @rmccorm4 in #28
- Fix profile subcommand to account for offline (non-streaming) metrics and V1 batching by @rmccorm4 in #29
- Minor Repo Optimizations by @fpetrini15 in #30
- Bring back IFB default to TRT LLM models and bump to 24.01 by @rmccorm4 in #31
- Bump cli version to 0.0.3, bump trtllm version to 0.7.1, and bump vllm version to 0.3.0 by @rmccorm4 in #32
- Give GPT2 quicker build/load settings for demos, fix Dockerfile version syntax, bump CLI version to 0.0.4 by @rmccorm4 in #33
- Add note on MPI dependencies by @rmccorm4 in #34
- Add CLI subcommand tests to CI by @krishung5 in #35
- Bump to v0.0.5 - CI testing working for 24.01 by @rmccorm4 in #38
- Add extra tests for CLI by @krishung5 in #36
- CLI TRT LLM v0.8.0 Refresh by @fpetrini15 in #37
- Bump to v0.0.6 - CI testing working for 24.02 by @fpetrini15 in #39
- Flatten CLI Args by @fpetrini15 in #40
- Update README commands by @rmccorm4 in #42
- Enable CLI Concurrent Testing by @fpetrini15 in #41
- README Restructuring by @fpetrini15 in #43
- Address some documentation issues by @rmccorm4 in #50
New Contributors
- @krishung5 made their first contribution in #35
Full Changelog: 0.0.2...0.0.6
0.0.2
What's Changed
- Setup repo and package structure by @rmccorm4 in #1
- Add pre-commit hook to upgrade Python syntax by @dyastremsky in #2
- Add initial prototype by @rmccorm4 in #4
- Add README and update default image by @rmccorm4 in #5
- Add rough NGC CLI wrapper by @rmccorm4 in #6
- Basic MPI support by @fpetrini15 in #8
- Populate model repo with TRTLLM templates by @oandreeva-nv in #7
- Minor TRT-LLM tweaks by @rmccorm4 in #11
- Misc fixes by @rmccorm4 in #14
- Add profile subcommand to run perf analyzer by @matthewkotila in #13
- POC: Background Server by @fpetrini15 in #15
- Fix high concurrency generation throughput calculation by @nv-hwoo in #16
- Add demo features for benchmarking LLMs by @rmccorm4 in #17
- Add copyrights and minor cleanup by @rmccorm4 in #19
- Automatic TRT LLM Detail Parsing by @fpetrini15 in #18
- Fix vLLM profiler bug, add fallback logic to server start, cleanup by @rmccorm4 in #20
- Add initial tests for repo subcommand by @rmccorm4 in #21
- Catch errors and improve logging in Profiler by @nv-hwoo in #23
- Bump version to 0.0.2 by @rmccorm4 in #22
New Contributors
- @dyastremsky made their first contribution in #2
- @matthewkotila made their first contribution in #13
- @nv-hwoo made their first contribution in #16
Full Changelog: https://github.com/triton-inference-server/triton_cli/commits/0.0.2