-
Notifications
You must be signed in to change notification settings - Fork 137
Fix linter for Inference branch #4081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Problem: To support the full Gateway API Inference Extension, we need to be able to extract the model name from the client request body in certain situations. Solution: Add a basic NJS module to extract the model name. This module will be enhanced (I've added notes) to be included in the full solution. On its own, it is not yet used.
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled. When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints. In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
Update the inference extension design doc to specify different status that needs to be set on Inference Pools to understand its state
…4006) Update gateway inference extension proposal on inability to provide a secure TLS connection to EPP.
Add status to Inference Pools Problem: Users want to see the current status of their Inference pools Solution: Add status for inference pools
Proposed changes Problem: Want to collect number of referenced InferencePools in cluster. Solution: Collect the count of referenced InferencePools. Testing: Unit tests and manually verified collection via debug logs.
923d754
to
b3b4538
Compare
uses: helm/chart-testing-action@0d28d3144d3a25ea2cc349d6e59901c4ff469b3b # v2.7.0 | ||
with: | ||
version: 3.14.0 # renovate: datasource=github-tags depName=helm/chart-testing | ||
yamale_version: "6.0.0" # renovate: datasource=github-tags depName=helm/chart-testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding this change, i kept the version so we have both. also adding another comment would be good:
with:
version: 3.14.0 # renovate: datasource=github-tags depName=helm/chart-testing
# v6.0.0 resolved the compatibility issue with Python > 3.13. may be removed after the action itself is updated
yamale_version: "6.0.0"
52fb31b
to
ba7bd6d
Compare
just a testing branch closing it |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feat/inference-extension #4081 +/- ##
============================================================
+ Coverage 86.69% 86.74% +0.04%
============================================================
Files 128 131 +3
Lines 16758 17649 +891
Branches 62 74 +12
============================================================
+ Hits 14529 15310 +781
- Misses 2044 2145 +101
- Partials 185 194 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Proposed changes
Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:
Problem: Give a brief overview of the problem or feature being addressed.
Solution: Explain the approach you took to implement the solution, highlighting any significant design decisions or
considerations.
Testing: Describe any testing that you did.
Please focus on (optional): If you any specific areas where you would like reviewers to focus their attention or provide
specific feedback, add them here.
Closes #ISSUE
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.