Skip to content

BERTopic is an open-source project that implements a topic modeling technique using pre-trained BERT models to generate embeddings for text data. It can be used to identify topics within large corpus of text data. With easy-to-follow instructions, users can start using the algorithm in their own projects.

Notifications You must be signed in to change notification settings

ozi-dev/BERTopic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

BERTopic

BERTopic is an open-source project that implements the BERTopic algorithm, which is a topic modeling technique that uses pre-trained BERT models to generate embeddings for text data. This algorithm can be used to identify topics and subtopics within a large corpus of text data.

Getting Started

To get started using BERTopic, first clone this repository to your local machine. You can do this by running the following command in your terminal:

git clone https://github.com/ozi-dev/BERTopic.git

Once you have the necessary dependencies installed, you can start using BERTopic. The BERTopic.ipynb notebook provides an example of how to use the algorithm.

Usage

To use BERTopic, you can import the BERTopic class from the bertopic module:

from bertopic import BERTopic

You can then create an instance of the BERTopic class and fit it on your data:

model = BERTopic()
model.fit(data)

Once the model is trained, you can use the transform method to transform new documents into topic vectors:

topic_vectors = model.transform(new_data)

You can also use the get_topic method to get the top N most representative documents for a given topic:

representative_docs = model.get_topic(topic_id, n=10)

For more information on how to use BERTopic, please refer to the BERTopic.ipynb notebook.

Contributing

If you would like to contribute to BERTopic, please feel free to submit a pull request with your changes. Before doing so, please make sure to run the unit tests using pytest.

About

BERTopic is an open-source project that implements a topic modeling technique using pre-trained BERT models to generate embeddings for text data. It can be used to identify topics within large corpus of text data. With easy-to-follow instructions, users can start using the algorithm in their own projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published