There appears to be an error when computing document length histograms. The workers collect histogram information on each iteration, but the counts are only reset when alpha statistics are collected. This results in counts that are optimizeInterval / saveSampleInterval times larger than they should be.
Also, can't the document length information be calculated once? It shouldn't change. Caching this information would save some time and space.
There appears to be an error when computing document length histograms. The workers collect histogram information on each iteration, but the counts are only reset when alpha statistics are collected. This results in counts that are optimizeInterval / saveSampleInterval times larger than they should be.
Also, can't the document length information be calculated once? It shouldn't change. Caching this information would save some time and space.