Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Handling, Update Stability, Improved Java SDK #402

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

ashvardanian
Copy link
Contributor

@ashvardanian ashvardanian commented May 5, 2024

USearch implementation had 2 layers, the core HNSW structure implemented in index.hpp and the high-level wrapper for dense equidimensional vectors in index_dense.hpp. In this release, we've made the top layer thinner and cleaner, also making this APIs more similar, error-handling more consistent, and builds faster.

Reducing Dependencies & Accelerating Builds

Previously index_dense.hpp had the following STL includes:

#include <thread> // `std::thread`
#include <functional> // `std::function`
#include <vector> // `std::vector`
#include <numeric> // `std::iota`

Those are some of the most common includes in C++ projects, also the ones I like the least. Here are a couple of reasons to hate them, taken from the "C++ Compile Health Watchdog":

name compilation time lines of code binary size
<functional> 82 .. 228 ms 12.9 .. 27.4 kLoC 0 .. 141 kB
<vector> 32 .. 48 ms 7.1 .. 8.0 kLoC 0 .. 8.2 kB
<numeric> 7 .. 13 ms 1.6 .. 2.1 kLoC 0 .. 3.3 kB
<thread> 110 .. 189 ms 17.5 .. 20.3 kLoC 0 .. 153 kB

Improved Java, C#, and Swift SDK

Thanks to @mccullocht, the Java SDK has been extended with get, loadFromPath, and viewFromPath APIs.

Usage examples to come :)

Other SDK improvements include:

  • Better docs for C#
  • Better test coverage for Swift

Reduced Memory Consumption for DBMS-like Users

Most databases using USearch, would prefer to have a smaller index at the cost of some functionality. As mentioned by @rschu1ze, if enable_key_lookups is disabled, and some external DBMS is responsible for "key to vector" mappings, the memory consumption of the index_dense_gt can be further reduced.

Improving Stability

  • In update-heavy workloads, it's possible to encounter an undefined behavior, leading to a corrupt index state.
  • In aligned-alloc an integer overflow was resolved by @antonio2368.
  • Search is now exhaustive by default, so if you have ≥10 items in the index and ask for 10 closest vectors, you are expected to get 10, not ≤10.
  • On GCC 12 and older the #pragma region warning was resolved by @mbautin.
  • Casts to uint40_t slot IDs have been fixed by @Ngalstyan4.
  • Casts to b1x8_t vectors have been fixed by @Ngalstyan4.
  • Python type annotation has been added by @jamesbraza.
  • C++ example was patched by @SheldonFung98.
  • JavaScript duplicate exports patched by @eryue0220 & @johnhorsema.

jaysenmarais and others added 14 commits April 29, 2024 00:46
Relates to the #377 and the comment:
#377 (comment)

This temporarily disables the failing CI pipeline to generate
and update docs.
This commit drops `std::vector` dependency,
making compilation time shorter and error
handling universal across abstraction layers.
In the past, if we got "too lucky" traversing the graph,
we could exit early before accumulating K top matches,
even if the index had more than K entries. This patch
changes that behavior, making output more predicatable.
This patch addresses the issue #399, originally
observed in the Swift layer. Reimplementing it in
C++ helped locate the issue and lead to refactoring
the `update` procedure in the lowest-layer `index_gt`.

Now, `add` and `update` share less code. The `add`
is one branch shorter (not that it would be noticeable),
and `update` brings additional logic to avoid spilling
`updated_slot` into top-results and consequently
introducing self-loops.

Closes #399
ashvardanian and others added 15 commits May 8, 2024 02:49
Both `view` and `load` would `reset` the thread contexts.
After that, the very first `search` and `add` would fail, as
no thread-local contexts are initialized. It would require
a `reserve` call with a non-zero second arcgument to
define the number of concurrent threads, for which the
queues & buffers need to be allocated.

That design is counter-intuitive, so this patch re-inits the
same number of threads as before the `load` & `view` or
one, if none existed.
Co-authored-by: Ash Vardanian <[email protected]>
Co-authored-by: Adolfo Garcia <[email protected]>
As noted by Robert Schulze, we can avoid populating `slot_lookup_`
during insertions, if `enable_key_lookups` is not set. This would lead
to lower memory consumption for large indexes of tiny vectors,
particularly common in GIS.

Co-authored-by: Robert Schulze <[email protected]>
@rschu1ze
Copy link

@ashvardanian I am happy to see this PR getting merged but it is getting bigger and bigger :-) Question: could we get the stability fixes + the reduced mem consumption commits merged as separate PRs into Usearch's main branch? None of them seem to be breaking, and it would be nice to accelerate things a bit.

@ashvardanian
Copy link
Contributor Author

Agreed @rschu1ze, will merge soon.

@ashvardanian
Copy link
Contributor Author

@rschu1ze, there is a weird behavior on very large queries, where the distance to matches isn't monotonically decreasing in some cases. The new extended tests catch it, but I am not sure if that issue was present in the past. I'd really like at least that one issue to be resolved before merging.

Everyone is welcome to join the bug hunt 🤗

@Ngalstyan4
Copy link
Contributor

#422 might be related. It had caused similar symptoms in lantern - distances would not be monotonically decreasing in index search results.
This was caught in the development branch of lantern, once we added quantization support and tests:

-- create an 8-bit quantized index
CREATE INDEX ind8 ON sift_base1k USING lantern_hnsw (v_scaled) WITH (dim=128, M=8, quant_bits=8);
-- look for vectors closest to the vector with id 42
SELECT id, v_scaled <-> :'v_scaled42' as dist FROM sift_base1k ORDER BY 2 LIMIT 10;

Below is the diff with the bug fix

diff -u /lantern_shared/test/expected/hnsw_sq.out /tmp/lantern/tmp_output/results/hnsw_sq.out
--- /lantern_shared/test/expected/hnsw_sq.out   2024-05-24 04:43:03.172237129 +0000
+++ /tmp/lantern/tmp_output/results/hnsw_sq.out 2024-05-25 03:18:11.796896512 +0000
@@ -106,15 +106,15 @@
  id  |   dist
 -----+-----------
   42 |         0
- 285 |   16.7111
- 261 |   16.0935
- 195 | 17.061296
-  48 | 5.1038003
-  50 | 11.509201
   36 |     1.053
-  46 | 10.790699
- 216 | 18.992302
+  48 | 5.1038003
   39 |  5.626501
+ 886 |  7.163699
+ 402 | 7.7013006
+ 518 |  8.502399
+ 331 |  8.779598
+ 340 |  9.726098
+  46 | 10.790699
 (10 rows)

Previously we were down-casting floats to the target type (e.g. int8_t), and then
clamping to [-100, 100] range. This means that e.g. 129 would be cast to -127 and
then converted to -100, in stead of becoming 100

The fix does clamping first, and then casts the resulting number
(which is guaranteed to be in range [-100, 100], due to clamping)
from source type to target int8_t. Given the clamping, this will never overflow.

---------

Co-authored-by: Ash Vardanian <[email protected]>
@ashvardanian
Copy link
Contributor Author

ashvardanian commented May 25, 2024

Thanks for the patch, @Ngalstyan4! I've merged it, but it doesn't solve the issue we have with tests right now. We still have a failing assertion at cpp/test.cpp:426 where we check that the output distances are monotonically increasing for long-tail queries, like getting top-500 closest neighbors. Help always appreciated 🤗

ashvardanian and others added 13 commits May 25, 2024 23:36
When calling `index_dense_gt`, the thread lock was not propagating
with the `search_result_t`. That is a an error-prone API. When too many
threads are running in parallel (ideally, more than physical CPU cores)
another thread may start reusing the `context_t` before the original
caller finishes exporting entries with `dump_to`.

This solution is backwards compatible and passes the tests.
Tracing implicit conversions of `std::uint32_t` and other
primitive types isn't always easy in concurrent apps. This
commit adds support for `enum` types to be used for
safer implementation of `index_gt` specializations.
Adding type annotation for Python native modules
solves the `Skipping analyzing "usearch.index" module` 
warning due to `missing library stubs or py.typed marker`.

Closes #424
When converting floating point arrays to binary, we use bitwise OR
operations to set the relevant bits in the output buffer to 1. We do
nothing if the bit is zero, so we assume that the bit is zero to start
with. The `memset` statement makes sure this assumption holds.
In high-connectivity graphs, the number of distance
computations can be dominated by the number of "refine"
heuristic computations performed by the core structure.
The extended `add_result_t` now includes both:

- `computed_distances_in_refines`
- `computed_distances_in_reverse_refines`

This commit also extends the documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet