Possible issue with samples for training #3

fjben · 2022-08-24T17:35:01Z

First of all, thank you for sharing this implementation!
I'm observing some unexpected behaviour, possibly a bug, if you could check. Any help would be appreciated. Thank you!

Problem description
I ran the code in main with no problems but it seems that in the background train() is repeateadly using the same walk from the begging to the end of the train phase. More concretely if I have 48475 extracted walks, in one epoch/iteration the train runs 48475 times as expected but always using the first walk for the first entity present in the walks list of lists.

I observed the behaviour when checking the sample_batched in line 161 of Trainer.py. Every sample is some variation of the first walk as previously mentioned. Further checking, it seems that in data_reader.py, the Word2VecDataset nested for loops are using only the first line and the first words of that first line in the data.walks.

Steps to reproduce with minimal code snippet

Haven't changed anything from the original code except batch_size and iterations, and some print/log debbuging commands not shown here.

`walks_obj = Word2VecWalks('./data/mutag/train.tsv', './data/mutag/test.tsv', 'label_mutagenic')

walks = walks_obj.get_walks('./data/mutag/mutag.owl', {'http://dl-learner.org/carcinogenesis#isMutagenic'}, [['http://dl-learner.org/carcinogenesis#hasBond', 'http://dl-learner.org/carcinogenesis#inBond'], ['http://dl-learner.org/carcinogenesis#hasAtom', 'http://dl-learner.org/carcinogenesis#charge']])

w2v = Word2VecTrainer_Skipgram(walks=walks, batch_size=1, iterations=1, min_count=0)

w2v.train()
`

Environment
Operating system: Windows 10
Python version: 3.10.2
Torch version: 1.11.0

P.S. If it would be of any help I can send you the debbuging output that led to this.

fjben · 2022-09-05T09:32:49Z

Hello @TobiWeller,

The issue really does seem to be in the Word2vecDataset class. If you confirm the problem, I have a possible solution that seems to be working for me. Let me know if that may be of use to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible issue with samples for training #3

Possible issue with samples for training #3

fjben commented Aug 24, 2022

fjben commented Sep 5, 2022

Possible issue with samples for training #3

Possible issue with samples for training #3

Comments

fjben commented Aug 24, 2022

fjben commented Sep 5, 2022