How to use Bertopic with a very specific and large data #1048
Replies: 2 comments 10 replies
-
|
There are several options for doing this. First, you can |
Beta Was this translation helpful? Give feedback.
-
|
Hello, I 'm also interessted in the answer. I would like to do temporal batch clustering, but I don't want to use Online Topic modeling because the results obtained are not satisfactory. What I want to do is to divide my dataset in several batches by time interval, make Bertopic on each batch with exactly the same configurations of each algorithm (I mean sentence model, clustering, UMAP), then merge the topics between these different batches if they are close based on the cosine similarity. Morever, I would like to perform classification on the incoming data by checking the cosine similaraty with the new texts and the existing clusters and create a new cluster if it is to high. I was wondering if I can do this onely using Bertopic with the GPU versions of UMAP and HDBSCAN. Tank you for your help |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm working on the classification of large (2M documents mainly in French and some English) and very noisy data from a specific domain (chats).
I already tested Bertopic on a sample and I felt that I could do interesting things with it.
Knowing that I would like to detect new topics in real time, which variant of Bertopic do you advise me to use? I also tested the GPU version but for reasons of memory I did not succeed in .fit all the dataset
Beta Was this translation helpful? Give feedback.
All reactions