Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thresholding, overquery, and compression #254

Open
jbellis opened this issue Mar 19, 2024 · 1 comment
Open

Thresholding, overquery, and compression #254

jbellis opened this issue Mar 19, 2024 · 1 comment

Comments

@jbellis
Copy link
Owner

jbellis commented Mar 19, 2024

The interaction between threshold and topk is confusing, it's almost "stop the search whichever comes first", but not quite, because we don't actually start accumulating results towards topk until we clear the threshold.

I'd like to simplify this to

  1. Find a local maximum in the graph using an approach similar to TwoPhaseTracker
  2. Pull out the top k

However, this doesn't currently work because there's a ton of noise introduced by compression. We can fix this by performing step 1 using exact similarities for the top candidate.

We know that that introduces a lot of expensive full-resolution comparisons. But we might be able to make up for that if we push overquery into the search process. (So if user asks for 100 results we pass topk=100 into the search and let it deal with the noise.)

@jbellis
Copy link
Owner Author

jbellis commented Mar 19, 2024

Not sure what to do when PQ is so lossy that it doesn't ever really converge. Today you get garbage results, with my proposal you'd end up searching the entire graph instead, which is arguably worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant