computing doc.length with mallet #47

gpcoursera · 2015-10-13T06:36:43Z

Hi,
I used to have a previous version of LDAvis (2014) installed with devtools.
In the version I had of LDAvis I would call createJSON as:
json <- createJSON(K, phi, term.frequency, vocab, topic.proportions)

Today I updated my R packages and have a newer vesion of LDAvis (from CRAN) which uses createJSON as:
json <- createJSON(phi, theta, doc.length, vocab, term.frequency)

I'm using MALLET for the LDA. I can easily access to the phi and theta matrices as well as the vocab and term.frequency but not so much to doc.length.
According to the doc of LDAvis it's a vector containing the number of tokens in each document of the corpus.

Question: how can I construct such vector from a MALLET instance (mallet.import)?

Thanks!
G.

LalaNguyen · 2015-12-19T15:19:56Z

just figured out that you can compute it yourself. Here is my take

doc.tokens <- data.frame(id=c(1:nrow(doc.topics)), tokens=0)
for(i in vocab){
  # Find word if word in text
  matched <- grepl(i,docs$text)
  doc.tokens[matched,2] =doc.tokens[matched,2] +  1
}

aronlindberg · 2016-12-08T15:45:34Z

@gpcoursera: how do you get phi, theta, vocab, and term.frequency from a topic model defined by mallet in R?

Also, did the code by @LalaNguyen work for you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

computing doc.length with mallet #47

computing doc.length with mallet #47

gpcoursera commented Oct 13, 2015

LalaNguyen commented Dec 19, 2015

aronlindberg commented Dec 8, 2016

computing doc.length with mallet #47

computing doc.length with mallet #47

Comments

gpcoursera commented Oct 13, 2015

LalaNguyen commented Dec 19, 2015

aronlindberg commented Dec 8, 2016