-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775
base: main
Are you sure you want to change the base?
Conversation
* Wait for HTTP connection when shutting down * Add test for shutdown with live HTTP connection * Use TRITONSERVER_ServerSetExitTimeout() API * Variable name update * Stop HTTP service immediately after all connections close * Remove unused include * Remove accept new connection check * Adjust test for 'Remove accept new connection check' and add testing for countdown restart * Fix non exit timeout supported endpoints * Improve existing shutdown test reliability on extra http shutdown delay * Fix gap between decided to close socket and actually close socket * Start rejecting new connections once shutdown polling start * Group checking logic
* Eanable autodocs for python client library * Fixing the format and spelling
* Deprecate dynamic log file * Update error message
* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes
Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>
…nds (#7083) Validate that memory offset and byte size requested is not out of bounds of registered memory. Previously in #6914 we checked out of bounds offset for shared memory requests. This PR also adds more testing to verify the block of memory is in fact in bounds. Client change: triton-inference-server/client#565
* Fix state complete_ race condition * Add delay and error checking to StreamInferResponseComplete * Add test for gRPC decoupled infer complete flag
* Re-enable PA trace testing but remove setting trace file * Address feedback * Address feedback * Address feedback * Fix bug * Address feedback
Fixes windows build issues caused by: #7083
* Clarify instance group documentation for ensemble * Review comment
* Add metrics model namespacing label test * Add test for namespace off
* Fix TensorRT-LLM (#7142) * TRT-LLM build * Update versions * Remove statment, as unused * Remove cache * add cmake option to set CXX11 ABI * Mchornyi krish 24.04 (#7149) * Enable TensorRT-LLM build outside of CMake * TensorRT-LLM requires lower version of cuDNN * Format --------- Co-authored-by: krishung5 <[email protected]> * Update README and versions for 2.45.0 / 24.04 (#7096) * Update README and versions for 2.45.0 / 24.04 * Update ONNX Runtime version - 1.17.3 --------- Co-authored-by: krishung5 <[email protected]>
* Validate CUDA SHM region registration size * Add test * Refactor * Fix error message * Address comment * Address comment * Address comment * Make detailed error internal. Only pass the general error message to the client * Replace 64byte with DEFAULT_SHM_BYTE_SIZE * Fix L0_grpc * Update comments * Update comments
Fix python client Shm Leak
* Add test for sequence state after cancellation * Regroup infer calls * Remove unused variable
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Meenakshi Sharma <[email protected]> Co-authored-by: Kyle McGill <[email protected]>
qa/L0_python_api/test.sh
Outdated
# TODO: [DLIS-7215] Run tritonclient.grpc as separate process | ||
# Currently, when we run tritonclient.grpc with tritonserver in the same process, | ||
# When a fork() is called by tritonserver on model load, lack of support in cygrpc | ||
# causes the python process to abort/crash without being able to be caught by pytest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate or share references? I don't understand how the client is related to server/model fork, and what "lack of support in cygrpc" means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment providing more context into how cygrpc
relates to directly imported python packages. Added link for further reading and reference.
# it will non-deterministically abort/crash without being able to be caught by pytest. | ||
# This is because fork() is called by tritonserver on model load, | ||
# which attempts to fork the imported libraries and their internal states, | ||
# and cygrpc (dependency of tritonclient.grpc) does not officially support fork(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if the application only instantiate gRPC Python objects after calling fork(), then fork() will work normally, since there is no C extension binding at this point.
If the claim is that we're calling fork()
during model load, aren't we instantiating grpc (client) after the fork()
if server/service have already started up?
Can you ever reproduce this by only running a single test case repeatedly? If not, it's possible it's coming from the threshold between test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From, my understanding it is an internal state set upon import:
import tritonclient.grpc as grpcclient
while True:
server = utils.startup_server()
sleep(1)
utils.teardown_server(server)
# This is because fork() is called by tritonserver on model load, | ||
# which attempts to fork the imported libraries and their internal states, | ||
# and cygrpc (dependency of tritonclient.grpc) does not officially support fork(). | ||
# Reference: https://github.com/grpc/grpc/blob/master/doc/fork_support.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just set env var as mentioned here so that the tests work with fork? https://github.com/grpc/grpc/blob/master/doc/fork_support.md#current-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted to set and run test suite locally, but the test cases were still failing.
rf.create_feature_group( | ||
key=42, value="Secret to the Universe", features=[Feature.HEALTH] | ||
) | ||
with pytest.raises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if exception enclose informative message
src/python/examples/example_model_repository/identity/config.pbtxt
Outdated
Show resolved
Hide resolved
…btxt Co-authored-by: GuanLuo <[email protected]>
d2f0f9b
to
793d826
Compare
What does the PR do?
This PR provides Restricted Features support to the python frontend bindings, a.k.a
tritonfrontend
.This PR introduces 3 new classes to the
tritonfrontend
package:Feature
: 1-to-1 mapping of RestrictedCategory EnumFeatureGroup
: Pydantic@dataclass
that stores (key, value, features) information while performing type validation.RestrictedFeatures
: Store collections ofFeatureGroup
instances, apply an additional layer of validation to check for collisions (ensuring that each feature belongs to only one group), and serialize the data into a JSON string.With these changes, we can pass a
RestrictedFeatures
to theKServeHttp
andKServeGrpc
frontends, which looks along the lines of:A
RestrictedFeatures
instance can be created with:Where should the reviewer start?
Take a look at
api/_restricted_features.py
andtritonfrontend.h
Test plan:
In
L0_python_api/test_kserve.py
added:test_correct_rf_parameters()
andtest_wrong_rf_parameters()
: Tests for valid/invalid arguments passed/selected forFeature
,FeatureGroup
andRestrictedFeature
To test if restricted features are working correctly with the frontends:
test_restricted_features()
: Restricts inference. Then, sends an inference request with a correct and incorrect header to verify proper functionality.CI Pipeline ID: 20189649
Background
For more information about restricted features support in Triton, please take a look at this: [Link]
Additonal Changes Made in this PR
tritonfrontend.h
andtritonfrontend_pybind.cc
.@handle_triton_error
for non-void functions.send_and_test_inference_identity()
so that we can re-use utility function for restricted features testing.KServeGrpc
andtritonclient.grpc
because of lack offork()
support provided by pythoncygrpc
. More information in DLIS-7215.