About kmeans clustering #20

zheng-xing · 2024-04-28T03:02:04Z

Hi,

First thanks for such a great work and making it open.

I notice in your paper you mentioned,

you can cluster 400 million samples into 1 million clustering within 10 minutes
Table 5, three cluster counts are mentioned, 100K, 1M, 10M

Can you add more details about which particular tools did you use for this clustering step?
I am very curious as usually kmeans can only handle small cluster sizes.

Thanks very much.

anxiangsir · 2024-04-28T06:21:05Z

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

zhangluustb · 2024-05-23T10:23:44Z

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

Thank you for sharing. May I ask if this portion of the code can be made open source?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About kmeans clustering #20

About kmeans clustering #20

zheng-xing commented Apr 28, 2024

anxiangsir commented Apr 28, 2024

zhangluustb commented May 23, 2024

About kmeans clustering #20

About kmeans clustering #20

Comments

zheng-xing commented Apr 28, 2024

anxiangsir commented Apr 28, 2024

zhangluustb commented May 23, 2024