You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After addingUltraLogLog support to Apache Pinot I've been looking at adding some of the MinHash variants, but to do this I need a reliable way to merge them together when running SQL queries, or merging rows.
Solution
I'd like the SimilarityHasher interface to also have a merge method that takes two byte[] and returns a byte[] that represents the merged state.
Merging is currently not supported. In general, the finalization steps in all similarity hashing algorithms can truncate information so that the resulting signatures cannot be further merged but require less memory. A possible solution would be to introduce an intermediate representation that can be merged.
Problem
After adding
UltraLogLog
support to Apache Pinot I've been looking at adding some of theMinHash
variants, but to do this I need a reliable way to merge them together when running SQL queries, or merging rows.Solution
I'd like the
SimilarityHasher
interface to also have amerge
method that takes twobyte[]
and returns abyte[]
that represents the merged state.Alternatives
The text was updated successfully, but these errors were encountered: