Auto-correlation between samples (Binkley et al.) #201

jonaschn · 2021-05-31T20:17:26Z

I recently found this paper by Binkley et al.

A short extract from this paper follows:

b – the number of burn-in iterations
n – the number of samples (random variates)
si – the sampling interval

If si is large enough, the observations are practically independent. However, too small a value risks unwanted correlation. To summarize the effect of b, n, and si: if any of these settings are too low, then the Gibbs sampler will produce inaccurate or inadequate information; if any of these settings are too high, then the only penalty is wasted computational effort.
Unfortunately, as described in Section 6, support for extracting
interval-separated observations is limited in existing LDA tools. For example,
For example, Mallet provides this capability but appears to suffer from a local maxima problem

with a footnote linking to http://www.cs.loyola.edu/~binkley/topic_models/additional-images/mallet-fixation/

Does this problem still exist?

Reference:
Binkley, D., Heinz, D., Lawrie, D., & Overfelt, J. (2014). Understanding LDA in source code analysis. 22nd International Conference on Program Comprehension, ICPC 2014 - Proceedings, 26–36. https://doi.org/10.1145/2597008.2597150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-correlation between samples (Binkley et al.) #201

Auto-correlation between samples (Binkley et al.) #201

jonaschn commented May 31, 2021

Auto-correlation between samples (Binkley et al.) #201

Auto-correlation between samples (Binkley et al.) #201

Comments

jonaschn commented May 31, 2021