You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Word2Vec callbacks produce greatly different most_similar() results on last end of epoch compared to end of training. Expectation would be that the final end epoch callback similarity results are either identical or approximate of post training most similar results.
Steps/code/corpus to reproduce
I'm using gensim's Word2Vec for a recommendation-like task with part of my evaluation being the use of callbacks and the most_similar() method. However, I am noticing a huge disparity between the final few epoch callbacks and that of immediately post-training. In fact, the last epoch callback may often appear worthless, while the post training result is as best as could be desired.
My during-training tracking of most similar entries utilizes gensim's CallbackAny2Vec class. It follows the doc example fairly directly and roughly looks like:
classEpochTracker(CallbackAny2Vec):
def__init__(self):
self.epoch=0defon_epoch_begin(self, model):
print("Epoch #{} start".format(self.epoch))
defon_epoch_end(self, model):
print('Some diagnostics')
# Multiple terms used in the belowe=model.wvprint(e.most_similar(positive=['some term'])[0:3]) # grab the top 3 examples for some termprint("Epoch #{} end".format(self.epoch))
self.epoch+=1
As the epochs progress, the most_similar() results given by the callbacks seem to not indicate an advancement of learning and seem erratic. In fact, often the callback from the first epoch shows the best result.
Counterintuitively, I also have an additional process (not shown) built into the callback that does indicate gradual learning. Following the similarity print, I take the current model's vectors and evaluate them against a down-stream task. In brief, this process is a sklearn GridSearchCV logistic regression check against some known labels.
I find that often the last on_epoch_end callback indicates little learning in my particular use case. However, if directly after training the model I try the similarity call again:
e=e_model.wv# e_model was the variable assignment of the model overallprint(e.most_similar(positive=['some term'])[0:3])
I tend to get beautiful results that are in agreement with the downstream evaluation task also used in the callbacks, or are at least vastly different than that of the final epoch end.
I suspect most_similar() has an unusual behavior with during-training epoch-end callbacks, but I would be happy to understand instead my approach as flawed.
I believe this is the same as, or related to #2260 - but the removal of the (at-risk-of-staleness) vectors_norm cache should have cleared that up.
There is still a much-smaller cache of each vector's own mangitude, inw2v_model.wv.norms, that might be contributing to the issue. @Joshkking, in the setup where the problem was otherwise showing, what if you add a line e.fill_norms(force=True) just before your most_similar() operation? Does that make the last-epoch-end results match the after-return-from-traininig results?
If there's truly still an issue in Gensim 4.0+, it'd be good to have a small fully self-contained example that vividly demonstrates it. That'd rule out something idiosyncratic in @Joshkking's setup, and likely point to some fix, or new workaround, or new warning we could show.
@gojomo The instigating code is work related and has moved on with an updated environment with that class stripped out. I'll see if I can get the chance to replicate this on a smaller, publicly accessible corpora though it may be a while.
Problem description
Word2Vec callbacks produce greatly different
most_similar()
results on last end of epoch compared to end of training. Expectation would be that the final end epoch callback similarity results are either identical or approximate of post training most similar results.Steps/code/corpus to reproduce
I'm using gensim's Word2Vec for a recommendation-like task with part of my evaluation being the use of callbacks and the
most_similar()
method. However, I am noticing a huge disparity between the final few epoch callbacks and that of immediately post-training. In fact, the last epoch callback may often appear worthless, while the post training result is as best as could be desired.My during-training tracking of most similar entries utilizes gensim's
CallbackAny2Vec
class. It follows the doc example fairly directly and roughly looks like:As the epochs progress, the
most_similar()
results given by the callbacks seem to not indicate an advancement of learning and seem erratic. In fact, often the callback from the first epoch shows the best result.Counterintuitively, I also have an additional process (not shown) built into the callback that does indicate gradual learning. Following the similarity print, I take the current model's vectors and evaluate them against a down-stream task. In brief, this process is a sklearn
GridSearchCV
logistic regression check against some known labels.I find that often the last
on_epoch_end
callback indicates little learning in my particular use case. However, if directly after training the model I try the similarity call again:I tend to get beautiful results that are in agreement with the downstream evaluation task also used in the callbacks, or are at least vastly different than that of the final epoch end.
I suspect
most_similar()
has an unusual behavior with during-training epoch-end callbacks, but I would be happy to understand instead my approach as flawed.Versions
macOS-10.16-x86_64-i386-64bit
Python 3.9.7 (default, Sep 16, 2021, 08:50:36)
[Clang 10.0.0]
Bits 64
NumPy 1.21.2
SciPy 1.7.3
gensim 4.1.2
FAST_VERSION 1
The text was updated successfully, but these errors were encountered: