Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 4.66 KB

conclusion.md

File metadata and controls

44 lines (29 loc) · 4.66 KB
layout title nav_order
default
Conclusion & Additional Resources
6

Congratulations

Congratulations! You've just finished this workshop.

You should now be able to:

  • Define topic modeling
  • Use at least one tool to perform topic modeling on a text corpus
  • Explain the limitations of topic modeling

Additional Resources

To learn more about any particular topic, take a look at the links below.

Diving Deeper into Topic Modeling

In the lesson, we briefly discussed how topic modeling works without getting into the mathematical basis for the practice. David Blei gives an overview of topic modeling, with a plain language description of latent Dirchlet allocation (LDA), in the Winter 2012 issue of the Journal of Digital Humanities. The entire issue of the journal is dedicated to topic modeling and may also be of interest!

If you wish to read more about the specifics of LDA, the seminal article by David M. Blei, Andrew Y. Ng and Michael I. Jordan is a good place to start. Or, if you prefer to head off on a tangent that might enrich your understanding, Alexandra Schofield, Måns Magnusson and David Mimno (yes, the David Mimno who is the primary maintainer of MALLET) have written a provocative paper that suggests removing stopwords after training is as effective as removing them before in topic modeling.

Building on your Topic Modeling Skills in Python

If you already familiar or have enjoyed working with Python, William J.B. Mattingly -- a historian and digital humanist -- has developed a playlist of video tutorials that go into greater depth about topic modeling, including another package (Top2Vec) that Mattingly insists is the best way to do topic modeling in Python. The script in the "Topic Modeling with Python" part of the lesson owes a debt to Mattingly's experiments with Gensim - which may have been abandoned to pursue Top2Vec!

If you wish to continue using Gensim for topic modeling, however, you may wish to explore the Gensim documentation further.

Topic Modeling in R

We used the Python programming language for topic modeling but if you are more familiar or comfortable with the R programming language, which is popular in academic and data science contexts, there are an abundance of resources to guide you:

There is much more out there if R is your language of choice!

Alternatives to Visualizing Topic Modeling Values

Visualization is one modality for exploratory data analysis, but it privileges the visual sense and may not be accessible for all audiences. Shawn Graham has created a Programming Historian lesson on sonification, or the mapping of dataset features to sound. Graham demonstrates several tools for sonifying data in the lesson, and the part of the lesson that explores Sonic Pi uses the probalistic weights of topics from a topic model - data that you will have available to try out from your experiments with Voyant, MALLET and Gensim.