Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature fix definition of the inner product distance #419

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kmkolasinski
Copy link

@kmkolasinski kmkolasinski commented May 22, 2024

This PR is related to this thread #405

It proposed a simple fix for Inner Product distance definition

P = - sum(a[i] * b[i])

@kmkolasinski
Copy link
Author

Now, I think the problem is somewhere else, inside simsimd package

import simsimd

vf16 = np.array([5.0, 5.0, 5.0], dtype=np.float16)
vi8 = np.array([5, 5, 5], dtype=np.int8)

np.array(simsimd.dot(vf16, vf16)), np.array(simsimd.dot(vi8, vi8))
>> (array(75.), array(1.78813934e-07))

@kmkolasinski
Copy link
Author

another place which is probably causing confusion

#if SIMSIMD_TARGET_NEON
        if (viable & simsimd_cap_neon_k)
            switch (kind) {
            case simsimd_metric_dot_k: *m = (m_t)&simsimd_cos_i8_neon, *c = simsimd_cap_neon_k; return;
            case simsimd_metric_cos_k: *m = (m_t)&simsimd_cos_i8_neon, *c = simsimd_cap_neon_k; return;
            case simsimd_metric_l2sq_k: *m = (m_t)&simsimd_l2sq_i8_neon, *c = simsimd_cap_neon_k; return;
            default: break;
            }
#endif
#if SIMSIMD_TARGET_ICE
        if (viable & simsimd_cap_ice_k)
            switch (kind) {
            case simsimd_metric_dot_k: *m = (m_t)&simsimd_cos_i8_ice, *c = simsimd_cap_ice_k; return;
            case simsimd_metric_cos_k: *m = (m_t)&simsimd_cos_i8_ice, *c = simsimd_cap_ice_k; return;
            case simsimd_metric_l2sq_k: *m = (m_t)&simsimd_l2sq_i8_ice, *c = simsimd_cap_ice_k; return;
            default: break;
            }
#endif
#if SIMSIMD_TARGET_HASWELL
        if (viable & simsimd_cap_haswell_k)
            switch (kind) {
            case simsimd_metric_dot_k: *m = (m_t)&simsimd_cos_i8_haswell, *c = simsimd_cap_haswell_k; return;
            case simsimd_metric_cos_k: *m = (m_t)&simsimd_cos_i8_haswell, *c = simsimd_cap_haswell_k; return;
            case simsimd_metric_l2sq_k: *m = (m_t)&simsimd_l2sq_i8_haswell, *c = simsimd_cap_haswell_k; return;
            default: break;
            }
#endif

        if (viable & simsimd_cap_serial_k)
            switch (kind) {
            case simsimd_metric_dot_k: *m = (m_t)&simsimd_cos_i8_serial, *c = simsimd_cap_serial_k; return;
            case simsimd_metric_cos_k: *m = (m_t)&simsimd_cos_i8_serial, *c = simsimd_cap_serial_k; return;
            case simsimd_metric_l2sq_k: *m = (m_t)&simsimd_l2sq_i8_serial, *c = simsimd_cap_serial_k; return;
            default: break;
            }

        break;

cos is used for dot ?

@ashvardanian
Copy link
Contributor

Yes, that was historically the case, but that should probably
be changed. I was planning to look into it next week. It may require new kernels, but those should be easy to design by stripping the cosine implementations.

@kmkolasinski
Copy link
Author

dot - is not a strict distance metric in math terms, hence probably the confusion in various places in the code

@kmkolasinski
Copy link
Author

For int8, I see index.hardware_acceleration == "ice" and there is still something which I don't fully understand. I tested my changes locally by installing usearch from source code, by running

pip install . 

which definitely compiles the usearch code. Maybe the code is already fixed in the main branch as compared to usearch==2.12.0 ?

@kmkolasinski
Copy link
Author

kmkolasinski commented May 22, 2024

Nope to my question above. When running on main branch I get this
image
then switch to this branch and reinstall usearch
image

note also speed change for both runs, is it metric_ip_gt is used only for graph construction ?

@kmkolasinski
Copy link
Author

@ashvardanian do you know how it is possible ? I still don't undestand how the kernels are being picked, but it looks like simsimd should be used here, so my changes should not affect the search.

@ashvardanian
Copy link
Contributor

SimSIMD should be used in most Linux builds. But you are right to note, that there is a lot of stuff on the main-dev that hasn't been merged yet 🤗

ashvardanian added a commit to ashvardanian/SimSIMD that referenced this pull request May 26, 2024
Add: Swift bindings & `int8` dot-products

With these changes we can make USearch behavior
more predictable for `int8` inputs.

unum-cloud/usearch#405
unum-cloud/usearch#419
@kmkolasinski
Copy link
Author

hi, this probably can be closed, right ?

@ashvardanian
Copy link
Contributor

It will be automatically closed, as we merge the branch 🤗

@kmkolasinski
Copy link
Author

ok, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants