Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a rule of thumb for the lower bound on the perplexity? #84

Open
NikTuzov opened this issue May 4, 2020 · 0 comments
Open

Is there a rule of thumb for the lower bound on the perplexity? #84

NikTuzov opened this issue May 4, 2020 · 0 comments

Comments

@NikTuzov
Copy link

NikTuzov commented May 4, 2020

Dear Dr. van der Maaten:

Could you help me enhance my understanding of how the perplexity parameter works. There are two questions.

  1. Looking at the implementation, do I get it right that a reasonable upper bound on perplexity is equal to 1/3 of the minimal expected cluster size (for simplicity, assume we know what cluster sizes to expect).

  2. On your home page, there is a question (“I get a strange ‘ball’ with uniformly distributed points”) and your suggestion is to reduce perplexity. Do you think the same “ball” effect can be see when perplexity is too low? If yes, how do you suggest we define a lower bound for perplexity?

Regarding 2), I have this digit images data set with 40,000 points that is supposed to contain 10 clusters of about the same size. When I subsample 2000 points and run default Rtsne (its implementation is very similar to yours) the embedding looks nice. However, it is far worse on the full data set. I figured it was because the default perplexity of 30 was too low compared to the typical cluster size, 4000, so I reset it to 30*20 = 600 and obtained a very nice embedding.

When the expected result is unknown, I guess one could try to use a similar subsampling approach to figure out how to increase perplexity. I was wondering if you know of a more analytical method or a rule of thumb.

Regards,
Nik Tuzov, PhD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant