Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate index state sharing into mem management #1545

Conversation

jmazanec15
Copy link
Member

Description

Last PR in series to fix #1507

Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state.

Additionally, there was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types.

Note - the addition of these integration tests will increase the CI time. Its a tradeoff that can be discussed. In future, we can take some items to reduce this time (such as #1540 to decouple CI for lib tests and plugin tests)

Once merged, I will create a PR to merge the feature branch into main so it can be released in 2.13.

Issues Resolved

#1507

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 95.83333% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 85.32%. Comparing base (8fdeb55) to head (05c4530).

Files Patch % Lines
...arch/knn/index/memory/SharedIndexStateManager.java 90.62% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@                         Coverage Diff                          @@
##             feature/remove-redunant-allocs    #1545      +/-   ##
====================================================================
+ Coverage                             85.14%   85.32%   +0.18%     
- Complexity                             1289     1302      +13     
====================================================================
  Files                                   168      169       +1     
  Lines                                  5249     5308      +59     
  Branches                                495      499       +4     
====================================================================
+ Hits                                   4469     4529      +60     
+ Misses                                  573      571       -2     
- Partials                                207      208       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

*
* @param sharedIndexState to return to the system.
*/
public void release(SharedIndexState sharedIndexState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wrap this whole function in try and finally to ensure that if any exceptions comes from any of the method we are releasing the write lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense

Comment on lines 104 to 115
if (refCount <= 0) {
log.info("Evicting entry from shared index state cache for key {}", sharedIndexState.getModelId());
sharedIndexStateCache.remove(sharedIndexState.getModelId());
JNIService.freeSharedIndexState(sharedIndexState.getSharedIndexStateAddress(), sharedIndexState.getKnnEngine());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this logic it means that if some query is still using the SharedIndexState then we will not be able to release the memory. But if a user has done the clear cache api then in that case we will just skip over this release and return that we have cleared the cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release portion is tied to an IndexAllocation not the query. When the index allocation is closed, we will decrease the reference count to this. Once it goes to 0, we free. If an index is loaded and needs shared state, it will need to obtain read lock and so will be blocked. So, after this completes, it will have to load again.

On clear cache, all IndexAllocations will be closed leading to refcount going down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this line will prevent https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L273 the scenario where query is happening and we cannot release the memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thats correct

Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
* @return ShareModelContext
*/
public SharedIndexState get(long indexAddress, String modelId, KNNEngine knnEngine) {
this.readWriteLock.readLock().lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we don't add lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible that underlying shared index state gets freed in release and this returns a value with an invalid pointer

this.readWriteLock.writeLock().lock();

try {
if (!sharedIndexStateCache.containsKey(sharedIndexState.getModelId())) {
Copy link
Member

@VijayanB VijayanB Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not use get and do null check? This will avoid get call at line no 107

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense let me add

* @return ++referenceCount
*/
private long incRef() {
return referenceCount.incrementAndGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may be void since you are not using incremented value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth. It is required for decRef() so I figured for symmetry I would make it return long

Comment on lines 277 to 279
if (StringUtils.isEmpty(modelId) || KNNEngine.FAISS != knnEngine) {
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking Engine is faiss or not is already covered in the JNIService.isSharedIndexStateRequired why we are adding it here again? We can avoid that check otherwise if we need to share state for some other engine then we will end up making changes here and also in JNIService.

public static boolean isSharedIndexStateRequired(long indexAddr, KNNEngine knnEngine) {
        if (KNNEngine.FAISS == knnEngine) {
            return FaissService.isSharedIndexStateRequired(indexAddr);
        }

        return false;
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. will update

@jmazanec15
Copy link
Member Author

BWC seems unrelated. Created an issue around it: #1556

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 merged commit da8a936 into opensearch-project:feature/remove-redunant-allocs Mar 18, 2024
37 of 51 checks passed
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Mar 18, 2024
…#1545)

Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Mar 18, 2024
…#1545)

Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit that referenced this pull request Mar 18, 2024
Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Mar 18, 2024
…#1545)

Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit that referenced this pull request Mar 18, 2024
Adds the ability to share index state amongst indices during index load
operations into the plugins memory management system. Introduces a
manager of the shared state that will properly manage the lifecycle of
the shared state.

There was a bug in clear cache that had to be fixed to get this change
working as well. Previously, only one index file per clear cache would
be freed. This fixes that logic to clear everything.

Added unit tests and an integration test to confirm functionality. In
addition, modified recall integration tests to get more coverage on the
different algo configs. Along with this, had to fix a few things around
the computation of recall for non-l2 space types.

Signed-off-by: John Mazanec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants