feat: Generalised Minkowski Metric (L_p norm)#287
feat: Generalised Minkowski Metric (L_p norm)#287cbueth wants to merge 40 commits intosdd:masterfrom
Conversation
…ded in Rust 1.54 instead
fixes bug where nearest_n_within accessed self.content_items instead of remainder_items for remainder elements, causing incorrect results when dataset size % CHUNK_SIZE != 0. Also removed unnecessary unsafe code in best_n_within. Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
tests nearest_n_within with size-33 dataset to verify items in remainder region are found correctly. Before the fix, this would access self.content_items[0] instead of remainder_items[0], returning wrong items. Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
If leaf_items.len() exceeds u32::MAX (~4.3 billion), this silently truncates. For datasets with billions of points, this is realistic and causes severe corruption.
* release-plz checkout depth fixed so that full changelogs are generated * add commitlint with conventional commits config
…thin_unsorted_iter within_unsorted_iter is modified to decouple the lifetime of the iterator from that of the query by performing a generally very cheap copy just once at the start of the query
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #287 +/- ##
==========================================
+ Coverage 94.89% 95.03% +0.13%
==========================================
Files 54 54
Lines 5705 6307 +602
Branches 5705 6307 +602
==========================================
+ Hits 5414 5994 +580
- Misses 273 288 +15
- Partials 18 25 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| "MinkowskiF64<P as F64> with power that is basically integer. Consider using Minkowski<P as u32> instead.", | ||
| ); | ||
| } | ||
| }; |
There was a problem hiding this comment.
It may be worth adding a check here that P is finite and not NaN when cast back to f64
| #[inline] | ||
| #[allow(clippy::let_unit_value)] | ||
| fn dist(a: &[A; K], b: &[A; K]) -> A { | ||
| let _ = Self::CHECK_P; |
There was a problem hiding this comment.
Can we add a small comment to indicate that this is a compile-time-only check thanks to the const implementation
|
Hi Carlson, Another great PR 😄 Thanks for this one also. I agree, there are likely to be some optimisations that can be made for the more common L1 and L2 cases and so I like that you've kept separate definitions for those rather than them being re-implemented purely as type aliases to As with the other PR, can we raise this one targeting the Thanks again! |
- deprecate `rd_update` with `D::accumulate` for consistent handling of sum-based and max-based metrics - conditional logic for SIMD (L1/L2) and general L∞ - differentiate distance accumulation behaviour
- integration `nearest_n` tests (Chebyshev, Manhattan, SquaredEuclidean).
…doc, add Gaussian scenario to tests
… trait - improve test coverage
…fy usage scenarios
77d5bfc to
8970913
Compare
This PR builds on #286 and adds the power metric$\mathcal{L}_p^p$ . If a user wants Minkowski distances, they can use $\mathcal{L}_p^p$ and then take the $\sqrt[p]{\text{dist}}$ , similar to $\mathcal{L}_2^2$ . While this is not depending on the max-based changes from the previous PR, it uses the metric testing infrastructure to reassure correct behaviour.
SquaredEuclideanwhich isThe new metrics are:
Minkowski<const P: u32>: An implementation for integer powers usingpowi. It should be significantly faster than the floating-point version.MinkowskiF64<const P_BITS: u64>: A flexible implementation for arbitrary floating-point powers (likeI have written the docstrings for correct usage, guiding users toward$P=1$ ) and $P=2$ ) for common cases, as these might have a performance advantage. At compile time it's checked if the given P is 1 or 2. The new metrics easily satisfy the
Manhattan(SquaredEuclidean(common_metric_testssuite from #286, verifying mathematical properties (non-negativity, identity, symmetry) across dimensions 1–5. They also work with the nearest-neighbor tests (Gaussian, ties, no-ties) for both integer and fractional powers. Furthermore, I added documentation aliases for "taxicab", "l1", "l2", and "euclidean" to help users find the correct metric via standard mathematical terminology.depends on: