-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instructions for generating vocab.pkl? #17
Comments
Hi. I'm sorry but I lost access to the resources. |
Hi Shion
Thanks for getting back to me! I checked that issue and it points to the ChEMBL24 dataset. I am interested in how to generate vocab.pkl from this dataset.
Actually, I just wish to run the pretrained model on a set of molecules I have, to generate their vector representations. If this is possible without the vocab.pkl file, please let me know!
Thanks
Regards,
Manav
…________________________________
From: Shion Honda ***@***.***>
Sent: Monday, June 28, 2021 3:36 PM
To: DSPsleeporg/smiles-transformer ***@***.***>
Cc: manavsingh415 ***@***.***>; Author ***@***.***>
Subject: Re: [DSPsleeporg/smiles-transformer] instructions for generating vocab.pkl? (#17)
Hi. I'm sorry but I lost access to the resources.
Does this issue help?
#11<#11>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#17 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMRX2BDUJKUPHPRPEPZEY43TVCCIDANCNFSM47N53UPA>.
|
Well, I meant to mention this comment. I'm glad if it helps. |
Hi Shion, I have tried to generate the vocab.pkl from ChEMBL 24. Using default parameters in the build_vocab.py file, I get a vocabulary size of 75. If I am not mistaken, this is not compatible with the pretrained model provided: size mismatch for embed.weight: copying a param with shape torch.Size([45, 256]) from checkpoint, the shape in current model is torch.Size([75, 256]). Thanks! |
Thanks for reporting. |
Does it help? |
Hi..! Unfortunately the vocab.pkl file from #19 does not help either... size mismatch for embed.weight: copying a param with shape torch.Size([45, 256]) from checkpoint, the shape in current model is torch.Size([50, 256]). |
I was able to reproduce the
PS: don't forget to change the the n_layers from 3 to 4 - Thanks Regards, |
@dinabandhu50 |
have you solved this mismatch problem ? |
It indeed solves the mismatch problem |
where could I fin 01_data_prepare.ipynb ? Thanks. |
I find the data_prepare.ipynb, however, I still have a problem in step of runing the build_corpus.py. At the beginning It shows i don't have the utils module, then I install it with pip install utils. However, when I run it again, it shows the error "cannot import name 'split' from 'utils'". I use Python3 to run this command, do you have any suggestion on it? Thanks. |
Where did you get the "01_data_prepare.ipnyb'? |
I think that is 'prepare_data.ipynb' in 'experiments' folder |
Hi. Would it be possible for the authors to either upload vocab.pkl (for the pretrained model), or give instructions and code about how to generate the vocab.pkl file from the CHEMBL24 dataset (or any other dataset used)? Thanks
The text was updated successfully, but these errors were encountered: