v2.16: Mixed-Precision K-Means
Many data scientists embark on their journey by implementing K-Means clustering, much like app developers starting with a calculator. But despite K-Means’ popularity, most implementations overlook the power of SIMD on modern CPUs. Efficient vector math, especially with single- and double-precision floating-point vectors, is challenging due to the computational cost of accuracy. Meanwhile, float16
, bfloat16
, and smaller types can fail under uneven distributions or when computing centroids for large clusters. So, what’s Unum’s solution? Mixed precision!
Thanks to strong community support and sponsorship from @sutoiku (LinkedIn, Website), we're introducing a high-performance K-Means implementation! It utilizes any numeric type for distance calculations, switching to float64
for centroid updates, a technique that boosts performance and enables billion-scale clustering on a single machine.