Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented the Streaming Feature to stream vectors from Java to JNI layer to enable creation of larger segments for vector indices #1604

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

navneet1v
Copy link
Collaborator

@navneet1v navneet1v commented Apr 9, 2024

Description

Implemented the Streaming Feature to stream vectors from Java to JNI layer to enable creation of larger segments for vector indices

Changes include:

  1. Add the interface for streaming the vectors from java to jni layer with initial capacity (Add the interface for streaming the vectors from java to jni layer with initial capacity #1586)
  2. Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. (Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. #1588)
  3. Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address(Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address #1595)
  4. Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib (Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib. #1602)

All these changes are already merged in feature/stream-vectors branch of k-NN https://github.com/opensearch-project/k-NN/tree/feature/stream-vectors

JNI Test

(base) 16:32 ~/workplace/k-NN/jni (stream-vectors)$ ./bin/jni_test 
Running main() from /Users/navneev/workplace/k-NN/jni/googletest-src/googletest/src/gtest_main.cc
[==========] Running 22 tests from 20 test suites.
[----------] Global test environment set-up.
[----------] 1 test from FaissCreateIndexTest
[ RUN      ] FaissCreateIndexTest.BasicAssertions
[       OK ] FaissCreateIndexTest.BasicAssertions (5 ms)
[----------] 1 test from FaissCreateIndexTest (5 ms total)

[----------] 1 test from FaissCreateIndexFromTemplateTest
[ RUN      ] FaissCreateIndexFromTemplateTest.BasicAssertions
[       OK ] FaissCreateIndexFromTemplateTest.BasicAssertions (3 ms)
[----------] 1 test from FaissCreateIndexFromTemplateTest (3 ms total)

[----------] 3 tests from FaissLoadIndexTest
[ RUN      ] FaissLoadIndexTest.BasicAssertions
[       OK ] FaissLoadIndexTest.BasicAssertions (3 ms)
[ RUN      ] FaissLoadIndexTest.HNSWPQDisableSdcTable
WARNING clustering 256 points to 16 centroids: please provide at least 624 training points
[       OK ] FaissLoadIndexTest.HNSWPQDisableSdcTable (368 ms)
[ RUN      ] FaissLoadIndexTest.IVFPQDisablePrecomputeTable
WARNING clustering 256 points to 16 centroids: please provide at least 624 training points
[       OK ] FaissLoadIndexTest.IVFPQDisablePrecomputeTable (379 ms)
[----------] 3 tests from FaissLoadIndexTest (751 ms total)

[----------] 1 test from FaissQueryIndexTest
[ RUN      ] FaissQueryIndexTest.BasicAssertions
[       OK ] FaissQueryIndexTest.BasicAssertions (4 ms)
[----------] 1 test from FaissQueryIndexTest (4 ms total)

[----------] 1 test from FaissQueryIndexWithFilterTest1435
[ RUN      ] FaissQueryIndexWithFilterTest1435.BasicAssertions
[       OK ] FaissQueryIndexWithFilterTest1435.BasicAssertions (10 ms)
[----------] 1 test from FaissQueryIndexWithFilterTest1435 (10 ms total)

[----------] 1 test from FaissQueryIndexWithParentFilterTest
[ RUN      ] FaissQueryIndexWithParentFilterTest.BasicAssertions
[       OK ] FaissQueryIndexWithParentFilterTest.BasicAssertions (5 ms)
[----------] 1 test from FaissQueryIndexWithParentFilterTest (5 ms total)

[----------] 1 test from FaissFreeTest
[ RUN      ] FaissFreeTest.BasicAssertions
[       OK ] FaissFreeTest.BasicAssertions (0 ms)
[----------] 1 test from FaissFreeTest (0 ms total)

[----------] 1 test from FaissInitLibraryTest
[ RUN      ] FaissInitLibraryTest.BasicAssertions
[       OK ] FaissInitLibraryTest.BasicAssertions (0 ms)
[----------] 1 test from FaissInitLibraryTest (0 ms total)

[----------] 1 test from FaissTrainIndexTest
[ RUN      ] FaissTrainIndexTest.BasicAssertions
[       OK ] FaissTrainIndexTest.BasicAssertions (0 ms)
[----------] 1 test from FaissTrainIndexTest (0 ms total)

[----------] 1 test from FaissCreateHnswSQfp16IndexTest
[ RUN      ] FaissCreateHnswSQfp16IndexTest.BasicAssertions
[       OK ] FaissCreateHnswSQfp16IndexTest.BasicAssertions (4 ms)
[----------] 1 test from FaissCreateHnswSQfp16IndexTest (4 ms total)

[----------] 1 test from FaissIsSharedIndexStateRequired
[ RUN      ] FaissIsSharedIndexStateRequired.BasicAssertions
[       OK ] FaissIsSharedIndexStateRequired.BasicAssertions (0 ms)
[----------] 1 test from FaissIsSharedIndexStateRequired (0 ms total)

[----------] 1 test from FaissInitAndSetSharedIndexState
[ RUN      ] FaissInitAndSetSharedIndexState.BasicAssertions
WARNING clustering 256 points to 16 centroids: please provide at least 624 training points
[       OK ] FaissInitAndSetSharedIndexState.BasicAssertions (353 ms)
[----------] 1 test from FaissInitAndSetSharedIndexState (353 ms total)

[----------] 1 test from IDGrouperBitMapTest
[ RUN      ] IDGrouperBitMapTest.BasicAssertions
[       OK ] IDGrouperBitMapTest.BasicAssertions (0 ms)
[----------] 1 test from IDGrouperBitMapTest (0 ms total)

[----------] 1 test from NmslibIndexWrapperSearchTest
[ RUN      ] NmslibIndexWrapperSearchTest.BasicAssertions
[       OK ] NmslibIndexWrapperSearchTest.BasicAssertions (0 ms)
[----------] 1 test from NmslibIndexWrapperSearchTest (0 ms total)

[----------] 1 test from NmslibCreateIndexTest
[ RUN      ] NmslibCreateIndexTest.BasicAssertions
[       OK ] NmslibCreateIndexTest.BasicAssertions (1 ms)
[----------] 1 test from NmslibCreateIndexTest (1 ms total)

[----------] 1 test from NmslibLoadIndexTest
[ RUN      ] NmslibLoadIndexTest.BasicAssertions
[       OK ] NmslibLoadIndexTest.BasicAssertions (1 ms)
[----------] 1 test from NmslibLoadIndexTest (1 ms total)

[----------] 1 test from NmslibQueryIndexTest
[ RUN      ] NmslibQueryIndexTest.BasicAssertions
[       OK ] NmslibQueryIndexTest.BasicAssertions (2 ms)
[----------] 1 test from NmslibQueryIndexTest (2 ms total)

[----------] 1 test from NmslibFreeTest
[ RUN      ] NmslibFreeTest.BasicAssertions
[       OK ] NmslibFreeTest.BasicAssertions (0 ms)
[----------] 1 test from NmslibFreeTest (0 ms total)

[----------] 1 test from NmslibInitLibraryTest
[ RUN      ] NmslibInitLibraryTest.BasicAssertions
[       OK ] NmslibInitLibraryTest.BasicAssertions (0 ms)
[----------] 1 test from NmslibInitLibraryTest (0 ms total)

[----------] 1 test from CommonsTests
[ RUN      ] CommonsTests.BasicAssertions
[       OK ] CommonsTests.BasicAssertions (0 ms)
[----------] 1 test from CommonsTests (0 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 20 test suites ran. (1148 ms total)
[  PASSED  ] 22 tests.

Issues Resolved

#1506

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…layer to enable creation of larger segments for vector indices

Changes include:
1. Add the interface for streaming the vectors from java to jni layer with initial capacity (opensearch-project#1586)
2. Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. (opensearch-project#1588)
3. Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address(opensearch-project#1595)
4. Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib (opensearch-project#1602)

Signed-off-by: Navneet Verma <[email protected]>
@navneet1v navneet1v added Enhancements Increases software capabilities beyond original client specifications v2.14.0 backport 2.x labels Apr 9, 2024
@navneet1v
Copy link
Collaborator Author

BWC fix PR: #1605 . The failures in BWC is not related to this PR.

@naveentatikonda
Copy link
Member

Build is failing on Windows. Can you please check

@navneet1v
Copy link
Collaborator Author

Build is failing on Windows. Can you please check

make[3]: *** [CMakeFiles/opensearchknn_nmslib.dir/build.make:120: release/opensearchknn_nmslib.dll] Error 1
make[2]: *** [CMakeFiles/Makefile2:287: CMakeFiles/opensearchknn_nmslib.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:294: CMakeFiles/opensearchknn_nmslib.dir/rule] Error 2
make: *** [Makefile:202: opensearchknn_nmslib] Error 2

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':buildJniLib'.
> Process 'command 'make'' finished with non-zero exit value 2

For a long time I have not seen windows CI successful

@navneet1v navneet1v merged commit c184854 into opensearch-project:main Apr 10, 2024
55 of 60 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1604-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c18485450dbbab32f01bc23e75e71b00c57549c8
# Push it to GitHub
git push --set-upstream origin backport/backport-1604-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1604-to-2.x.

navneet1v added a commit to navneet1v/k-NN that referenced this pull request Apr 10, 2024
…layer to enable creation of larger segments for vector indices (opensearch-project#1604)

Changes include:
1. Add the interface for streaming the vectors from java to jni layer with initial capacity (opensearch-project#1586)
2. Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. (opensearch-project#1588)
3. Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address(opensearch-project#1595)
4. Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib (opensearch-project#1602)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit that referenced this pull request Apr 10, 2024
…layer to enable creation of larger segments for vector indices (#1604) (#1608)

Changes include:
1. Add the interface for streaming the vectors from java to jni layer with initial capacity (#1586)
2. Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. (#1588)
3. Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address(#1595)
4. Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib (#1602)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Apr 11, 2024
…layer to enable creation of larger segments for vector indices (opensearch-project#1604) (opensearch-project#1608)

Changes include:
1. Add the interface for streaming the vectors from java to jni layer with initial capacity (opensearch-project#1586)
2. Integrating storeVectors interfaces with createIndex and createIndexTemplate functions. (opensearch-project#1588)
3. Update KNN80BinaryDocValues reader count live docs and use live docs as initial capacity to initialize vector address(opensearch-project#1595)
4. Move free vectorAddress from Java to JNI layer to reduce the memory footprint for Nmslib (opensearch-project#1602)

Signed-off-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications v2.14.0
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants