Skip to content

Releases: unum-cloud/usearch

v0.19.3

06 Jul 15:58
Compare
Choose a tag to compare

0.19.3 (2023-07-06)

Fix

  • Using same label types across bindings (ccdf18b)

Hashes

  • Source code (zip) : b028d45afc7a286591b5a31df5b102e6ce5b262c781a58cc6b771df688809c62
  • Source code (tar.gz) : d60c98708d08046dc1096a3a8c85a15282a7cba0e5ae9d54d5a8c5c9fe327951

v0.19.2

04 Jul 15:20
Compare
Choose a tag to compare

0.19.2 (2023-07-04)

Improve

  • Simpler JIT-ing interface (903af3c)

Hashes

  • Source code (zip) : 2322c8ec38d3968bfd17e4a9abd72402b88a2dc0eaeecc100f112ebdd52eed7d
  • Source code (tar.gz) : 868e3b8c9aae6689864351ede246b8a742a022f878b25242dec106f22733c9aa

v0.19.1

02 Jul 16:01
Compare
Choose a tag to compare

0.19.1 (2023-07-02)

Docs

Fix

  • reserve for ObjC/Swift (5e84162)
  • Avoid inadequate test configurations (60f81ca)
  • Measuring exact recall (71c1813)
  • Scalar kinds and dynamic type inference (b792f7c)
  • Using unsigned integers for .bbin files (c2c44e7)
  • Wrap the Swift test in XCTestCase (e38eda2)

Improve

  • Allow <f short style Numpy formats (2d4e4c9)
  • Auto-resolve save/load/view path (c1bbf50)
  • Handling GIL and keyboard interrupt (ca148a8)
  • Logging long-running add tasks (f6ac472)
  • User Defined Functions in C (e5d9a27)

Make

  • Disable OpenMP on MacOS (c9c8bca)
  • Enable OpenMP for MacOs and format (3bfaa6b)

Hashes

  • Source code (zip) : 870ebfb5df66c7d262afded4d704735bd3781f937ecfe2eaa1a4dd04e9ab2f0d
  • Source code (tar.gz) : fd88e0c4d373bac217d91fb63563f932ef6d025ea99f726d4fd24378ebe181f9

Scaling Vector Search: Numba, Cppyy, PeachPy, and SimSIMD

24 Jun 13:30
Compare
Choose a tag to compare

USearch has now been used in two internet-scale companies/projects 🎉

In one case, it's only 1 Billion vectors. The second is closer to 10 Billion per node. So it feels like a good time to enter the Big ANN competition track 🐎 Still, we aim for an even larger scale and introduce a few improvements, such as the new CompiledMetric interface that allows one to mix C/C++, Python, and Assembly code for different hardware architectures in the same USearch configuration file 🤯 It should come in handy, given the rise in the number of exotic AI hardware vendors 😉

SIMD and JIT

The common idea across Unum projects is to use SIMD for hardware acceleration when we hit the wall.
Writing SIMD can be challenging, but distributing it may be even more difficult. You are compiling for modern hardware and modern libraries, and all of a sudden, someone deploys it on an old machine...

illegal instruction

A good balance can be achieved by compiling the core of USearch for higher compatibility and leaving plugs for hardware-specific optimizations. That is where Just-in-Time Compilation comes in. USearch allows combining the pre-compiled HNSW structure with JIT-ed distance metrics, which would tune the resulting assembly for the current target architecture. Such functionality was already supported with Numba JIT, but now you have more flexibility to choose different backends and define functions with different signatures:

  • MetricSignature.ArrayArray, aka float distance(scalar *, scalar *).
  • MetricSignature.ArrayArraySize, aka float distance(scalar *, scalar *, size_t).
  • MetricSignature.ArraySizeArraySize, aka float distance(scalar *, size_t, scalar *, size_t).

This enum is a core part of CompiledMetric and is passed down to our CPython binding.

class CompiledMetric(NamedTuple):
    pointer: int
    kind: MetricKind
    signature: MetricSignature

Numba JIT

Just to remind you, a minimal Numba + USearch example would look like this:

from numba import cfunc, types, carray

ndim = 256
signature = types.float32(
    types.CPointer(types.float32),
    types.CPointer(types.float32))

@cfunc(signature)
def inner_product(a, b):
    a_array = carray(a, ndim)
    b_array = carray(b, ndim)
    c = 0.0
    for i in range(ndim):
        c += a_array[i] * b_array[i]
    return 1 - c

index = Index(ndim=ndim, metric=CompiledMetric(
    pointer=inner_product.address,
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArray,
))

No scary C++ or Assembly involved.

Cppyy

Similarly, you can use Cppyy with Cling to JIT-compile native C or C++ code and pass it to USearch, which may be a good idea if you want to explicitly request loop-unrolling or other low-level optimizations!

import cppyy
import cppyy.ll

cppyy.cppdef("""
float inner_product(float *a, float *b) {
    float result = 0;
#pragma unroll
    for (size_t i = 0; i != ndim; ++i)
        result += a[i] * b[i];
    return 1 - result;
}
""".replace("ndim", str(ndim)))

function = cppyy.gbl.inner_product
index = Index(ndim=ndim, metric=CompiledMetric(
    pointer=cppyy.ll.addressof(function),
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArraySize,
))

PeachPy

Instead of writing in C or C++, you can go one level down and directly assemble a kernel for x86. Below is an example of constructing the "Inner Product" distance for 8-dimensional f32 vectors for x86 using PeachPy.

from peachpy import (
    Argument,
    ptr,
    float_,
    const_float_,
)
from peachpy.x86_64 import (
    abi,
    Function,
    uarch,
    isa,
    GeneralPurposeRegister64,
    LOAD,
    YMMRegister,
    VSUBPS,
    VADDPS,
    VHADDPS,
    VMOVUPS,
    VFMADD231PS,
    VPERM2F128,
    VXORPS,
    RETURN,
)

a = Argument(ptr(const_float_), name="a")
b = Argument(ptr(const_float_), name="b")

with Function(
    "inner_product", (a, b), float_, target=uarch.default + isa.avx2
) as asm_function:
    
    # Request two 64-bit general-purpose registers for addresses
    reg_a, reg_b = GeneralPurposeRegister64(), GeneralPurposeRegister64()
    LOAD.ARGUMENT(reg_a, a)
    LOAD.ARGUMENT(reg_b, b)

    # Load the vectors
    ymm_a = YMMRegister()
    ymm_b = YMMRegister()
    VMOVUPS(ymm_a, [reg_a])
    VMOVUPS(ymm_b, [reg_b])

    # Prepare the accumulator
    ymm_c = YMMRegister()
    ymm_one = YMMRegister()
    VXORPS(ymm_c, ymm_c, ymm_c)
    VXORPS(ymm_one, ymm_one, ymm_one)

    # Accumulate A and B products into C
    VFMADD231PS(ymm_c, ymm_a, ymm_b)

    # Reduce the contents of a YMM register
    ymm_c_permuted = YMMRegister()
    VPERM2F128(ymm_c_permuted, ymm_c, ymm_c, 1)
    VADDPS(ymm_c, ymm_c, ymm_c_permuted)
    VHADDPS(ymm_c, ymm_c, ymm_c)
    VHADDPS(ymm_c, ymm_c, ymm_c)

    # Negate the values from "similarity" to "distance"
    VSUBPS(ymm_c, ymm_one, ymm_c)

    # A common convention is to return floats in XMM registers
    RETURN(ymm_c.as_xmm)

python_function = asm_function.finalize(abi.detect()).encode().load()
metric = CompiledMetric(
    pointer=python_function.loader.code_address,
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArray,
)
index = Index(ndim=ndim, metric=metric)

SimSIMD

Last but not the list, for the most common metrics and vector spaces, I am going to provide pre-compiled optimized CPython Capsules as part of SimSIMD. Let us know which embeddings you use the most - OpenAI, BERT, ViT, or maybe UForm?

Release Notes: 0.19.0 (2023-06-24)

Add

Fix

  • Re-indexing vectors on restart (529c137)

Improve

  • Preserve metric_kind_t in snapshots (5ec0b65)
  • Support Cppyy JIT compilation (80f99cd)
  • Support signatures for JIT-ed metrics (c89e145)

Hashes

  • Source code (zip): 19c2e9f41134c17ac5db20567b268ee1fce1b4d5dbb42a8b6e5a252ac9e242de
  • Source code (tar.gz): 34e5bd00639504d752f0ab5eb96e62bc34ab18537e6e31823f3e487a4d8139a3

v0.18.8

17 Jun 18:34
Compare
Choose a tag to compare

0.18.8 (2023-06-17)

Fix

Improve

  • Broader types support for Numba (1631638)

Hashes

  • Source code (zip) : 13495be36d302788278df0f0dc726cc1cf5708a2dbdc8b42b0e3f228e7a5ef7b
  • Source code (tar.gz) : 069072fa5ad6817f24fa1d2599dac41263c78096bccb97382d6447c733df279d

v0.18.7

17 Jun 17:47
Compare
Choose a tag to compare

0.18.7 (2023-06-17)

Fix

Hashes

  • Source code (zip) : 454b6c9f3dc0fbae5db62d58706e31c3125c67504850a14817fbb2eec80c3ce2
  • Source code (tar.gz) : db99edc9db3431fc881ca69eda0018733e67651d696b0aa47ab460e1d8196f60

v0.18.6

17 Jun 16:21
Compare
Choose a tag to compare

0.18.6 (2023-06-17)

Fix

  • Deprecation warning for np.alltrue (7a82bdb)

Hashes

  • Source code (zip) : 374517347750a708389f9d33c00ea188a01c5b5870311d898e142e8d85d62d5b
  • Source code (tar.gz) : 8b567cea32c86508c16f5b3eed3297c6e50c4444e47969195756166cab0b135a

v0.18.5

17 Jun 16:00
Compare
Choose a tag to compare

0.18.5 (2023-06-17)

Fix

  • Importing ans exporting bit-vectors (45af308)

Hashes

  • Source code (zip) : 73c82224ff0d9dfd38661f758bd4c3424fb8f70e209843df488babce96eb6324
  • Source code (tar.gz) : f4b4e7e411fa83746f2c2a188b39def6c2a5f59b3b7f9382c811b363dc9db1b2

v0.18.4

17 Jun 15:08
Compare
Choose a tag to compare

0.18.4 (2023-06-17)

Chore

  • Connecting to the right tmux session (150a9be)

Fix

  • Dispatching SIMD pre-AVX512 and pre-Neon (da517d2)
  • Windows build for tsl::robin_map (d9f90b3)

Make

  • Degrade Python builds for compatibility (cc43eb3)

Hashes

  • Source code (zip) : 1c7f3b8d02d3685864bbccfa83f7891943d3b286843254194f407b21de16ba36
  • Source code (tar.gz) : 4ab478f91b16f9449373213210565e000ff44487d8a79014f1c7155cb94bd3e7

v0.18.3

17 Jun 12:35
Compare
Choose a tag to compare

0.18.3 (2023-06-17)

Make

  • Macros across Python and C++ (b857e66)

Hashes

  • Source code (zip) : e8a4017508a13bad954628625ed4672798370b79dbd8d3219b8d3e50c32236fb
  • Source code (tar.gz) : f8fbe4f32be2b8c65b17950804e4daa32846eca89bc1a3fd340655a03ec64aed