You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for sharing this implementation!
I'm observing some unexpected behaviour, possibly a bug, if you could check. Any help would be appreciated. Thank you!
Problem description
I ran the code in main with no problems but it seems that in the background train() is repeateadly using the same walk from the begging to the end of the train phase. More concretely if I have 48475 extracted walks, in one epoch/iteration the train runs 48475 times as expected but always using the first walk for the first entity present in the walks list of lists.
I observed the behaviour when checking the sample_batched in line 161 of Trainer.py. Every sample is some variation of the first walk as previously mentioned. Further checking, it seems that in data_reader.py, the Word2VecDataset nested for loops are using only the first line and the first words of that first line in the data.walks.
Steps to reproduce with minimal code snippet
Haven't changed anything from the original code except batch_size and iterations, and some print/log debbuging commands not shown here.
The issue really does seem to be in the Word2vecDataset class. If you confirm the problem, I have a possible solution that seems to be working for me. Let me know if that may be of use to you.
Hello @TobiWeller,
First of all, thank you for sharing this implementation!
I'm observing some unexpected behaviour, possibly a bug, if you could check. Any help would be appreciated. Thank you!
Problem description
I ran the code in main with no problems but it seems that in the background train() is repeateadly using the same walk from the begging to the end of the train phase. More concretely if I have 48475 extracted walks, in one epoch/iteration the train runs 48475 times as expected but always using the first walk for the first entity present in the walks list of lists.
I observed the behaviour when checking the sample_batched in line 161 of Trainer.py. Every sample is some variation of the first walk as previously mentioned. Further checking, it seems that in data_reader.py, the Word2VecDataset nested for loops are using only the first line and the first words of that first line in the data.walks.
Steps to reproduce with minimal code snippet
Haven't changed anything from the original code except batch_size and iterations, and some print/log debbuging commands not shown here.
`walks_obj = Word2VecWalks('./data/mutag/train.tsv', './data/mutag/test.tsv', 'label_mutagenic')
walks = walks_obj.get_walks('./data/mutag/mutag.owl', {'http://dl-learner.org/carcinogenesis#isMutagenic'}, [['http://dl-learner.org/carcinogenesis#hasBond', 'http://dl-learner.org/carcinogenesis#inBond'], ['http://dl-learner.org/carcinogenesis#hasAtom', 'http://dl-learner.org/carcinogenesis#charge']])
w2v = Word2VecTrainer_Skipgram(walks=walks, batch_size=1, iterations=1, min_count=0)
w2v.train()
`
Environment
Operating system: Windows 10
Python version: 3.10.2
Torch version: 1.11.0
P.S. If it would be of any help I can send you the debbuging output that led to this.
The text was updated successfully, but these errors were encountered: