Skip to content

GSSOC-20 Extended- Based on Topic Modeling, Classification, Natural Language Processing

Notifications You must be signed in to change notification settings

khushboogupta13/How_Many_topics

 
 

Repository files navigation

How Many topics? (Girlscript Summer of Code Extended 2020)

Image

About How Many Topics?

An open source project based on Topic Modeling, Classification, Natural Language Processing.

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus.

We intend to work on a research paper where we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data.

This project includes the following pipeline: Dataset Collection, Topic Modeling, Fine-tuning the hyperparameters for topic modeling, Classification.

My team worked on the COVID-19 Research paper dataset.

About

GSSOC-20 Extended- Based on Topic Modeling, Classification, Natural Language Processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%