Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getters for precomputed constants for observability #508

Merged
merged 2 commits into from
Oct 22, 2024

Conversation

mbautin
Copy link
Contributor

@mbautin mbautin commented Oct 21, 2024

We would like to have access to these precomputed constants so we can display them to the database operator and compare them against similarly configured indexes with other HNSW implementations.

@ashvardanian ashvardanian merged commit 113a786 into unum-cloud:main-dev Oct 22, 2024
10 of 11 checks passed
mbautin added a commit to yugabyte/yugabyte-db that referenced this pull request Oct 22, 2024
Summary:
Fixing some inconsistencies in index parameters that are causing a discrepancy between Usearch and Hnswlib performance:
- Correctly specifying connectivity for hnswlib as num_neighbors_per_vertex instead of max_neighbors_per_vertex.
- Passing the ef option into hnswlib configuration.

Adding internal statistics introspection to Usearch and Hnswlib index wrappers.

PR for hnswlib changes: nmslib/hnswlib#594.
PR for usearch changes: unum-cloud/usearch#508

Also allow specifying multiple values of k to pass in as input, as long as they are not greater than the precomputed ground truth result list size.

Updating hnsw_tool to always convert uint8_t coordinates to float32 when using Hnswlib to have a fair comparison with Usearch on the SIFT1B dataset. Usearch does not currently support the uint8_t type natively.

The changes to src/inline-thirdparty will be pushed as separate commits generated by `build-support/thirdparty_tool --sync-inline-thirdparty`.

Test Plan:
Jenkins

Manual testing using hnsw_tool

- hnswlib: https://gist.githubusercontent.com/mbautin/d21580dcac0b51ad2d7bc9fc130c5f9e/raw

```
    Hnswlib index with 5 levels
    max_elements: 1000000
    M: 16
    maxM: 16
    maxM0: 32
    ef_construction: 128
    ef: 10
    mult: 0.360674
    Level 0: 1000000 nodes, 21613828 edges, 21.61 average edges per node
    Level 1: 62323 nodes, 885027 edges, 14.20 average edges per node
    Level 2: 3855 nodes, 50515 edges, 13.10 average edges per node
    Level 3: 238 nodes, 2543 edges, 10.68 average edges per node
    Level 4: 17 nodes, 244 edges, 14.35 average edges per node
    Totals: 1066433 nodes, 22552157 edges, 21.15 average edges per node

    i-recall @ 50, i=1..10:

    1-recall @ 50: 0.9695000052
    2-recall @ 50: 0.9645000100
    3-recall @ 50: 0.9604333043
    4-recall @ 50: 0.9568499923
    5-recall @ 50: 0.9541400075
    6-recall @ 50: 0.9504333138
    7-recall @ 50: 0.9467428327
    8-recall @ 50: 0.9435999990
    9-recall @ 50: 0.9406333566
    10-recall @ 50: 0.9377999902
```

- usearch: https://gist.githubusercontent.com/mbautin/74948b310780562e74831eb29e43cb13/raw

```
    Usearch index with 4 levels
    connectivity: 16
    connectivity_base: 32
    expansion_add: 128
    expansion_search: 10
    inverse_log_connectivity: 0.360674
    Level 0: 1000000 nodes, 20973352 edges, 20.97 average edges per node
    Level 1: 64036 nodes, 890428 edges, 13.91 average edges per node
    Level 2: 5090 nodes, 66295 edges, 13.02 average edges per node
    Level 3: 481 nodes, 5304 edges, 11.03 average edges per node
    Totals: 1069607 nodes, 21935379 edges, 20.51 average edges per node

    i-recall@50, i=1..10:

    1-recall @ 40: 0.9305999875
    2-recall @ 40: 0.9201999903
    3-recall @ 40: 0.9141333103
    4-recall @ 40: 0.9085000157
    5-recall @ 40: 0.9036399722
    6-recall @ 40: 0.8987166882
    7-recall @ 40: 0.8932142854
    8-recall @ 40: 0.8890249729
    9-recall @ 40: 0.8852999806
    10-recall @ 40: 0.8813199997
```

Reviewers: sergei, aleksandr.ponomarenko

Reviewed By: sergei, aleksandr.ponomarenko

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants