feat: Enable deferred unregistering of shared memory regions after inference #7743

pskiran1 · 2024-10-25T15:27:13Z

What does the PR do?

Currently, an error is returned if an attempt is made to unregister a shared memory region that is in use during inference (introduced in #7567). This PR introduces the following enhancement:

When the unregister API is called for a region still in use, the API will now mark the region for pending unregistration. The server will automatically remove the region once all in-progress inferences utilizing it have completed.
In the interim, users can query the server to check if the region is still retained, allowing for safe cleanup once the server has fully released the region.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 19695563

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: Possible bug in reference counting with shared memory regions #7688

into spolisetty_dlis_7448

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py

src/http_server.h

into spolisetty_dlis_7448

nnshah1 · 2024-11-08T14:46:46Z

src/shared_memory_manager.h

@@ -72,6 +72,7 @@ class SharedMemoryManager {
    void* mapped_addr_;
    TRITONSERVER_MemoryType kind_;
    int64_t device_id_;
+    bool awaiting_unregister_;


question: could this be done with smart pointer referencing counting instead of an explicit flag?

@nnshah1, yes, I believe we can implement this with weak_ptr.
@GuanLuo has previously provided input on an alternative approach using weak_ptr and also discussed the current method.
Here’s a summary of the weak_ptr approach:
Currently, the shared memory (shm) manager uses a map of shared_ptr for shm_info. When the unregister API is called on a region still in use, we can convert it to a weak_ptr to decrease the reference count. We would track these in a separate weak_ptr map until fully unregistered, preventing new regions with the same name and allowing status check.

When the last reference to the shm_info shared_ptr is released (after inference completes), the shared_ptr destructor will trigger. At that point, we can invoke the unregister logic and remove the associated weak_ptr from the second map.

To support this approach, we would likely need to modify most of shm manager methods, such as register, status, unregister, and unregisterHelper, etc. to work with the weak_ptr map.

For simplicity, I proceeded with the current approach. Please let me know if we’d like to explore the weak_ptr approach, and I can work on it. Happy to connect and discuss if any optimizations are needed.
Thank you.

Thanks for the detailed response - it was just that it seemed like a lot of similarity with reference counting and shared pointers - so was hoping it was something quick. But if this is the more practical short term - no issue,

@GuanLuo - would we want to pre-emptively put a refactor story in? (only if you think its a near term thing that needs to be addressed) -

I have raised the point that using smart pointer is a cleaner way to address this problem early on for this development, so you can have a refactor story as it is just a right thing to do. However, I believe that it is just going to be another form of TODOs at this point and what is in this PR will remain unchanged for a long time.

into spolisetty_dlis_7448

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py

GuanLuo · 2024-11-13T20:21:25Z

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py

+            second_client.unregister_cuda_shared_memory()
+
+        # Number of shared memory regions should be the same as the inference is not completed yet
+        self._test_shm_found(shm_names)


So the region will still be visible between unregister and completion of the inference, what would happen if another inference request is sent during that time, trying to reuse the unregistered region? Can you add a test for this case to demonstrate the behavior?

Yes, the shm will remain accessible between unregister and the completion of the inference. When we run another inference request that reuses the unregistered region, the shm region will be extended and unregistered after the second request, as we check the reference count before unregister. Added a test case to verify this.

into spolisetty_dlis_7448

pskiran1 added 6 commits October 24, 2024 18:34

Update

45f2d74

Update CI tests

34911b2

Remove debug changes

8a4676c

Remove debug changes

94b53c3

Update copyright

235d8f8

Fix pre-commit

dc2dcfc

pskiran1 added the PR: feat A new feature label Oct 28, 2024

pskiran1 added 3 commits October 28, 2024 14:18

Update variable name

febb029

Update

c0a22bd

Merge branch 'main' of https://github.com/triton-inference-server/server

f58ea61

into spolisetty_dlis_7448

pskiran1 changed the title ~~feat: Enable deferred unregistration of shared memory regions post-inference~~ feat: Enable deferred unregistering of shared memory regions post-inference Oct 28, 2024

pskiran1 changed the title ~~feat: Enable deferred unregistering of shared memory regions post-inference~~ feat: Enable deferred unregistering of shared memory regions after inference Oct 28, 2024

pskiran1 requested review from tanmayv25, rmccorm4 and GuanLuo October 28, 2024 10:27

pskiran1 marked this pull request as ready for review October 28, 2024 10:27

GuanLuo reviewed Oct 28, 2024

View reviewed changes

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py Outdated Show resolved Hide resolved

qa/L0_cuda_shared_memory/cuda_shared_memory_test.py Outdated Show resolved Hide resolved

src/http_server.h Outdated Show resolved Hide resolved

pskiran1 added 4 commits November 4, 2024 13:35

Update

0541f33

Merge branch 'main' of https://github.com/triton-inference-server/server

8b3629b

into spolisetty_dlis_7448

Update

43c52f3

Update cuda shm test

4cbbb7b

pskiran1 requested a review from GuanLuo November 5, 2024 10:24

Merge branch 'main' of https://github.com/triton-inference-server/server

fa2b072

into spolisetty_dlis_7448

nnshah1 reviewed Nov 8, 2024

View reviewed changes

Merge branch 'main' of https://github.com/triton-inference-server/server

621c1b5

into spolisetty_dlis_7448

GuanLuo reviewed Nov 13, 2024

View reviewed changes

pskiran1 added 3 commits November 14, 2024 14:56

Merge branch 'main' of https://github.com/triton-inference-server/server

5f701a5

into spolisetty_dlis_7448

Merge branch 'main' of https://github.com/triton-inference-server/server

84c2c59

into spolisetty_dlis_7448

Update CI

ad9aca0

pskiran1 requested a review from GuanLuo November 18, 2024 13:03

pskiran1 added 3 commits November 18, 2024 18:42

Add TODO comment

0438755

Merge branch 'main' of https://github.com/triton-inference-server/server

99f0184

into spolisetty_dlis_7448

Merge branch 'main' of https://github.com/triton-inference-server/server

ec2e6a6

into spolisetty_dlis_7448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable deferred unregistering of shared memory regions after inference #7743

feat: Enable deferred unregistering of shared memory regions after inference #7743

pskiran1 commented Oct 25, 2024 •

edited

Loading

nnshah1 Nov 8, 2024

pskiran1 Nov 8, 2024

nnshah1 Nov 8, 2024

GuanLuo Nov 13, 2024

GuanLuo Nov 13, 2024

pskiran1 Nov 18, 2024

feat: Enable deferred unregistering of shared memory regions after inference #7743

Are you sure you want to change the base?

feat: Enable deferred unregistering of shared memory regions after inference #7743

Conversation

pskiran1 commented Oct 25, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

nnshah1 Nov 8, 2024

Choose a reason for hiding this comment

pskiran1 Nov 8, 2024

Choose a reason for hiding this comment

nnshah1 Nov 8, 2024

Choose a reason for hiding this comment

GuanLuo Nov 13, 2024

Choose a reason for hiding this comment

GuanLuo Nov 13, 2024

Choose a reason for hiding this comment

pskiran1 Nov 18, 2024

Choose a reason for hiding this comment

pskiran1 commented Oct 25, 2024 •

edited

Loading