Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Brute force search gives incorrect results #527

Open
3 tasks done
Stefano-t opened this issue Nov 4, 2024 · 0 comments
Open
3 tasks done

Bug: Brute force search gives incorrect results #527

Stefano-t opened this issue Nov 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Stefano-t
Copy link

Describe the bug

Using usearch.index.search(..., exact=True) gives distances that are different from an exact search over and usearch.Index.

I'm using Tanimoto distance in example below, but same problem appears when using other distance functions.

I'm not sure if I'm doing something wrong.

Steps to reproduce

import numpy as np
from usearch import index

np.random.seed(0xDEADBEEF)

vec = np.random.randint(0, 2, (3, 64), dtype=np.uint8)
vec_packed = np.packbits(vec, axis=1)

# Brute force search
res_brute = index.search(vec_packed, vec_packed, len(vec_packed), metric=index.MetricKind.Tanimoto, exact=True)

# Exact index search
search_index = index.Index(ndim=vec.shape[1], metric=index.MetricKind.Tanimoto)
search_index.add(None, vec_packed)
res_index = search_index.search(vec_packed, len(vec_packed), exact=True)

# This assert fails. I'm expecting this assert to pass.
try:
  np.testing.assert_array_almost_equal(res_brute.distances, res_index.distances)
except AssertionError as e:
  print("--FAIL", e)

# *** Double check
# Computing Tanimoto distance by hand
tanimoto_dist = []
for v in vec:
  tanimoto_dist.append(1 - np.bitwise_and(v, vec).sum(axis=1) / np.bitwise_or(v, vec).sum(axis=1))
tanimoto_dist = np.array(tanimoto_dist)

# Arrange res_index
index_exact = []
for dist, keys in zip(res_index.distances, res_index.keys):
  tmp_v = np.zeros(len(keys))
  for d, k in zip(dist, keys):
    tmp_v[k] = d
  index_exact.append(tmp_v)
index_exact = np.array(index_exact)

# This test passes
np.testing.assert_array_almost_equal(tanimoto_dist, index_exact)

Expected behavior

I think that usearch.index.search(..., exact=True) should return the same results as exact search over a previously constructed index.

Furthermore, a simple double check suggests that usearch.index.search(..., exact=True) distances might be not correct.

USearch version

2.16.0

Operating System

Ubuntu 22.04

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

[email protected]

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Stefano-t Stefano-t added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant