-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable deferred unregistering of shared memory regions after inference #7743
base: main
Are you sure you want to change the base?
Conversation
@@ -72,6 +72,7 @@ class SharedMemoryManager { | |||
void* mapped_addr_; | |||
TRITONSERVER_MemoryType kind_; | |||
int64_t device_id_; | |||
bool awaiting_unregister_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: could this be done with smart pointer referencing counting instead of an explicit flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnshah1, yes, I believe we can implement this with weak_ptr
.
@GuanLuo has previously provided input on an alternative approach using weak_ptr
and also discussed the current method.
Here’s a summary of the weak_ptr
approach:
Currently, the shared memory (shm) manager uses a map of shared_ptr for shm_info. When the unregister API is called on a region still in use, we can convert it to a weak_ptr
to decrease the reference count. We would track these in a separate weak_ptr map
until fully unregistered, preventing new regions with the same name and allowing status check.
When the last reference to the shm_info
shared_ptr
is released (after inference completes), the shared_ptr
destructor will trigger. At that point, we can invoke the unregister logic and remove the associated weak_ptr
from the second map.
To support this approach, we would likely need to modify most of shm manager methods, such as register, status, unregister, and unregisterHelper, etc. to work with the weak_ptr map
.
For simplicity, I proceeded with the current approach. Please let me know if we’d like to explore the weak_ptr
approach, and I can work on it. Happy to connect and discuss if any optimizations are needed.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response - it was just that it seemed like a lot of similarity with reference counting and shared pointers - so was hoping it was something quick. But if this is the more practical short term - no issue,
@GuanLuo - would we want to pre-emptively put a refactor story in? (only if you think its a near term thing that needs to be addressed) -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have raised the point that using smart pointer is a cleaner way to address this problem early on for this development, so you can have a refactor story as it is just a right thing to do. However, I believe that it is just going to be another form of TODOs at this point and what is in this PR will remain unchanged for a long time.
second_client.unregister_cuda_shared_memory() | ||
|
||
# Number of shared memory regions should be the same as the inference is not completed yet | ||
self._test_shm_found(shm_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the region will still be visible between unregister and completion of the inference, what would happen if another inference request is sent during that time, trying to reuse the unregistered region? Can you add a test for this case to demonstrate the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the shm will remain accessible between unregister and the completion of the inference. When we run another inference request that reuses the unregistered region, the shm region will be extended and unregistered after the second request, as we check the reference count before unregister. Added a test case to verify this.
What does the PR do?
Currently, an error is returned if an attempt is made to unregister a shared memory region that is in use during inference (introduced in #7567). This PR introduces the following enhancement:
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)