How to choose best parameters for ensuring local coherent structures with maximum spread #703
Unanswered
PranayMehta
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working on survey text data and I want to implement clustering to group survey responses into multiple topics/clusters. For this, I use BERT model and transform each record into 768D array. Since I want to apply dimensionality reduction , I am trying out UMAP to see if 2D plan retains the structure.
Sample data points would look like
sentences = ['I am worried about the weather tomorrow' , ' I have a lot of homework', 'I am yet to take a covid19 vaccine']
In general, I have found that this library and algorithm is super helpful in helping me identify regions in 2d plane. As an example, there are a bunch of responses that talk about topic A and others about topic B and C and so on. My task is to keep these topics as isolated as possible.
However, I am a little confused with the number of parameters that are available in the UMAP method. @lmcinnes It would be super helpful if I can tune the UMAP algorithm to do so. As of now, I tried with several values of parameters for
What I do not understand at the moment is
spread
parameter work ? What is the range that this param can take?metric
value passed?Here are some of the plots that I got by varying the parameters
There are clusters that are forming well but then there is 1 big cluster that kind of gets clumped in the middle . any help is appreciated :-)
Beta Was this translation helpful? Give feedback.
All reactions