You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using BERTopic with UMAP and HDBScan. I set the random_state of UMAP to a specific number, but I still get wildly different results on fit_transform if my input array (the corpus) is in another order. For example, When I have a collection of descriptions of size ~350 and I switch the first and last element, then my amount of clusters go from 20 to 24. Is this normal? Which step in the BERTopic process creates this randomness?
I am frustrated by the same issue. the model gave 4 good topics from the data but from later on its only making 2 topic.
JvdsReform
changed the title
BERTopic random results with random_state set
BERTopic random results with random_state set when input order is changed
Jan 29, 2025
@JvdsReform That is likely a result of the underlying models (HDBSCAN or UMAP) rather than something with BERTopic itself. BERTopic is a modular framework and will have no effect on the randomness of the results. I believe the order might have an effect with UMAP if I remember correctly. You could check out the issues page of UMAP as I remember there might be one or two issues about this.
Also note that without more information (code, BERTopic version, etc.) it is incredibly hard to see more about this particular subject (hence why I advise opening an issues and providing the information suggested there). That said, this is most likely related to the underlying models.
@saipavankumar-muppalaneni As mentioned above, without more information it's hard for me to say why this is happening to you. Did you use a random_state? Did you run it in the exact same environment before and after? Which versions do you have? How did you initialize BERTopic? Etc.
I'm using BERTopic with UMAP and HDBScan. I set the random_state of
UMAP
to a specific number, but I still get wildly different results onfit_transform
if my input array (the corpus) is in another order. For example, When I have a collection of descriptions of size ~350 and I switch the first and last element, then my amount of clusters go from 20 to 24. Is this normal? Which step in the BERTopic process creates this randomness?Code in question where I create the model:
Versions:
The text was updated successfully, but these errors were encountered: