Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable deferred unregistering of shared memory regions after inference #7743

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Oct 25, 2024

What does the PR do?

Currently, an error is returned if an attempt is made to unregister a shared memory region that is in use during inference (introduced in #7567). This PR introduces the following enhancement:

  • When the unregister API is called for a region still in use, the API will now mark the region for pending unregistration. The server will automatically remove the region once all in-progress inferences utilizing it have completed.
  • In the interim, users can query the server to check if the region is still retained, allowing for safe cleanup once the server has fully released the region.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 19695563

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

@pskiran1 pskiran1 added the PR: feat A new feature label Oct 28, 2024
@pskiran1 pskiran1 changed the title feat: Enable deferred unregistration of shared memory regions post-inference feat: Enable deferred unregistering of shared memory regions post-inference Oct 28, 2024
@pskiran1 pskiran1 changed the title feat: Enable deferred unregistering of shared memory regions post-inference feat: Enable deferred unregistering of shared memory regions after inference Oct 28, 2024
@pskiran1 pskiran1 marked this pull request as ready for review October 28, 2024 10:27
qa/L0_cuda_shared_memory/cuda_shared_memory_test.py Outdated Show resolved Hide resolved
qa/L0_cuda_shared_memory/cuda_shared_memory_test.py Outdated Show resolved Hide resolved
src/http_server.h Outdated Show resolved Hide resolved
@pskiran1 pskiran1 requested a review from GuanLuo November 5, 2024 10:24
@@ -72,6 +72,7 @@ class SharedMemoryManager {
void* mapped_addr_;
TRITONSERVER_MemoryType kind_;
int64_t device_id_;
bool awaiting_unregister_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: could this be done with smart pointer referencing counting instead of an explicit flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnshah1, yes, I believe we can implement this with weak_ptr.
@GuanLuo has previously provided input on an alternative approach using weak_ptr and also discussed the current method.
Here’s a summary of the weak_ptr approach:
Currently, the shared memory (shm) manager uses a map of shared_ptr for shm_info. When the unregister API is called on a region still in use, we can convert it to a weak_ptr to decrease the reference count. We would track these in a separate weak_ptr map until fully unregistered, preventing new regions with the same name and allowing status check.

When the last reference to the shm_info shared_ptr is released (after inference completes), the shared_ptr destructor will trigger. At that point, we can invoke the unregister logic and remove the associated weak_ptr from the second map.

To support this approach, we would likely need to modify most of shm manager methods, such as register, status, unregister, and unregisterHelper, etc. to work with the weak_ptr map.

For simplicity, I proceeded with the current approach. Please let me know if we’d like to explore the weak_ptr approach, and I can work on it. Happy to connect and discuss if any optimizations are needed.
Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response - it was just that it seemed like a lot of similarity with reference counting and shared pointers - so was hoping it was something quick. But if this is the more practical short term - no issue,

@GuanLuo - would we want to pre-emptively put a refactor story in? (only if you think its a near term thing that needs to be addressed) -

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised the point that using smart pointer is a cleaner way to address this problem early on for this development, so you can have a refactor story as it is just a right thing to do. However, I believe that it is just going to be another form of TODOs at this point and what is in this PR will remain unchanged for a long time.

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py Outdated Show resolved Hide resolved
second_client.unregister_cuda_shared_memory()

# Number of shared memory regions should be the same as the inference is not completed yet
self._test_shm_found(shm_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the region will still be visible between unregister and completion of the inference, what would happen if another inference request is sent during that time, trying to reuse the unregistered region? Can you add a test for this case to demonstrate the behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the shm will remain accessible between unregister and the completion of the inference. When we run another inference request that reuses the unregistered region, the shm region will be extended and unregistered after the second request, as we check the reference count before unregister. Added a test case to verify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

3 participants