Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide #95

Merged
merged 5 commits into from
Jun 12, 2024

Conversation

whoisj
Copy link
Contributor

@whoisj whoisj commented May 28, 2024

This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models.

Includes:

  • Guidance
  • Helm chart w/ multiple example models value files
  • YAML files necessary for setting up a Kubernetes cluster
  • Build files for required container images
  • Grafana dashboard configuration JSON file

@whoisj whoisj added documentation Improvements or additions to documentation enhancement New feature or request labels May 28, 2024
@whoisj whoisj requested review from nnshah1 and nealvaidya May 28, 2024 20:04
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 3 times, most recently from 32213c7 to 8623def Compare May 28, 2024 20:49
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 3 times, most recently from db38d43 to beddaf9 Compare May 28, 2024 21:11
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 9 times, most recently from 103087d to e34523d Compare May 29, 2024 21:05
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 2 times, most recently from 5d516f3 to 01c0842 Compare June 10, 2024 17:47
nealvaidya
nealvaidya previously approved these changes Jun 10, 2024
This change inlcudes a number of improvements suggested by @nealvaidya.
nealvaidya
nealvaidya previously approved these changes Jun 10, 2024
@whoisj whoisj requested review from mc-nv and removed request for mc-nv June 10, 2024 21:08
nnshah1
nnshah1 previously approved these changes Jun 12, 2024
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some sentence level suggestions - could use some additional eyes to catch any other grammer / syntax errors - but overall looks great! We can continue refining in future iterations.

@harryskim - would be good to get your quick review.

@nnshah1 nnshah1 requested a review from harryskim June 12, 2024 08:09
@whoisj whoisj dismissed stale reviews from nnshah1 and nealvaidya via ae4a292 June 12, 2024 16:08
This change inlcudes a number of improvements suggested by @nnshah1.

Co-authored-by: Neelay Shah <[email protected]>
Copy link

@harryskim harryskim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the applied changes requested by Neelay.

@whoisj whoisj merged commit d459ddd into triton-inference-server:main Jun 12, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants