Skip to content

Comments

feat: Generalised Minkowski Metric (L_p norm)#287

Open
cbueth wants to merge 40 commits intosdd:masterfrom
cbueth:feature/minkowski
Open

feat: Generalised Minkowski Metric (L_p norm)#287
cbueth wants to merge 40 commits intosdd:masterfrom
cbueth:feature/minkowski

Conversation

@cbueth
Copy link

@cbueth cbueth commented Feb 8, 2026

This PR builds on #286 and adds the power metric $\mathcal{L}_p^p$. If a user wants Minkowski distances, they can use $\mathcal{L}_p^p$ and then take the $\sqrt[p]{\text{dist}}$, similar to SquaredEuclidean which is $\mathcal{L}_2^2$. While this is not depending on the max-based changes from the previous PR, it uses the metric testing infrastructure to reassure correct behaviour.

The new metrics are:

  • Minkowski<const P: u32>: An implementation for integer powers using powi. It should be significantly faster than the floating-point version.
  • MinkowskiF64<const P_BITS: u64>: A flexible implementation for arbitrary floating-point powers (like $P=0.5$), utilizing a const-generic bit-representation to remain compatible with stable Rust.

I have written the docstrings for correct usage, guiding users toward Manhattan ($P=1$) and SquaredEuclidean ($P=2$) for common cases, as these might have a performance advantage. At compile time it's checked if the given P is 1 or 2. The new metrics easily satisfy the common_metric_tests suite from #286, verifying mathematical properties (non-negativity, identity, symmetry) across dimensions 1–5. They also work with the nearest-neighbor tests (Gaussian, ties, no-ties) for both integer and fractional powers. Furthermore, I added documentation aliases for "taxicab", "l1", "l2", and "euclidean" to help users find the correct metric via standard mathematical terminology.

depends on:

sdd and others added 22 commits December 8, 2025 07:07
fixes bug where nearest_n_within accessed self.content_items instead of
remainder_items for remainder elements, causing incorrect results when
dataset size % CHUNK_SIZE != 0. Also removed unnecessary unsafe code in
best_n_within.

Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
tests nearest_n_within with size-33 dataset to verify items in remainder
region are found correctly. Before the fix, this would access
self.content_items[0] instead of remainder_items[0], returning wrong items.

Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
If leaf_items.len() exceeds u32::MAX (~4.3 billion), this silently
truncates. For datasets with billions of points, this is realistic and
causes severe corruption.
* release-plz checkout depth fixed so that full changelogs are generated
* add commitlint with conventional commits config
…thin_unsorted_iter

within_unsorted_iter is modified to decouple the lifetime
of the iterator from that of the query by performing a
generally very cheap copy just once at the start of the query
@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

❌ Patch coverage is 96.01911% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.03%. Comparing base (2056051) to head (77d5bfc).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/fixed/distance.rs 84.68% 15 Missing and 2 partials ⚠️
src/float/distance.rs 98.36% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   94.89%   95.03%   +0.13%     
==========================================
  Files          54       54              
  Lines        5705     6307     +602     
  Branches     5705     6307     +602     
==========================================
+ Hits         5414     5994     +580     
- Misses        273      288      +15     
- Partials       18       25       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

"MinkowskiF64<P as F64> with power that is basically integer. Consider using Minkowski<P as u32> instead.",
);
}
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a check here that P is finite and not NaN when cast back to f64

#[inline]
#[allow(clippy::let_unit_value)]
fn dist(a: &[A; K], b: &[A; K]) -> A {
let _ = Self::CHECK_P;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small comment to indicate that this is a compile-time-only check thanks to the const implementation

@sdd
Copy link
Owner

sdd commented Feb 8, 2026

Hi Carlson,

Another great PR 😄 Thanks for this one also. I agree, there are likely to be some optimisations that can be made for the more common L1 and L2 cases and so I like that you've kept separate definitions for those rather than them being re-implemented purely as type aliases to Minkowski<1> and Minkowski<2>. Regarding the float power variant, I'm going to experiment with some of these changes on the v6.x branch and see how well they fit in.

As with the other PR, can we raise this one targeting the v5.x.x branch rather than master as well if possible.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants