-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why the medoids are outside of clusters #88
Comments
The visualization of a 100-dimensional vector is a very tricky question. I don't know how you reduced the dimension but the curse of dimensionality makes it so that the geometry is not very intuitive, informally we often say that in very high dimension all the points are far from one another. If a point is in the center of the cluster as you say, it is not necessarily so that when projected we always have a point center of the cluster. The fact that you use cosine distance does not help because cosine distance is not as intuitive as euclidean either. Example :
I obtain the following figure, centroids are in green : EDIT : Rk that it can also be a convergence problem, KMedoids is not very stable in high dimension... no clustering algorithm is really stable in high dimension. It can also be a bug but we can't conclude that it is a bug with just what you say. |
You used cosine similarity. Distance does not matter, only angle. |
For the euclidean distance, we observe the same phenomenon
|
center1 and center2 are indexes into the y and ~y subsets in your code. |
Because you only plot 2 of 100 dimensions. It is not central in each individually. |
Yes it was exactly my point, thank you. |
Hi
I have used sklearn_extra for clustering my data based on cosine similarity. The data is 100-dimentional vectors.
After clustering, I reduce dimensionality to visualize the clustering. I am a bit confused about 'kmedoids.cluster_centers', is it returning medoids of clusters? should the medoids be approximately in the middle of the clusters?
when I visualize the clusters, the kmedoids.cluster_centers are outside of the clusters.
The text was updated successfully, but these errors were encountered: