Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About kmeans clustering #20

Open
zheng-xing opened this issue Apr 28, 2024 · 2 comments
Open

About kmeans clustering #20

zheng-xing opened this issue Apr 28, 2024 · 2 comments

Comments

@zheng-xing
Copy link

Hi,

First thanks for such a great work and making it open.

I notice in your paper you mentioned,

  • you can cluster 400 million samples into 1 million clustering within 10 minutes
  • Table 5, three cluster counts are mentioned, 100K, 1M, 10M

Can you add more details about which particular tools did you use for this clustering step?
I am very curious as usually kmeans can only handle small cluster sizes.

Thanks very much.

@anxiangsir
Copy link
Collaborator

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

@zhangluustb
Copy link

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

Thank you for sharing. May I ask if this portion of the code can be made open source?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants