Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem loading aromatic SMILES or kekulized molecules with RDKit #396

Open
stevenkbennett opened this issue Sep 20, 2021 · 3 comments
Open

Comments

@stevenkbennett
Copy link
Contributor

stevenkbennett commented Sep 20, 2021

There is an issue with loading aromatic SMILES or kekulized molecules from RDKit where bonds involved in aromatic rings are converted to single bonds.

Discovered by @joshkamm, @andrewtarzia and myself.

stk is able to handle to handle the aromatic bond order of 1.5, however, when converting back into the RDKit molecule, or writing to a .mol file, the aromatic bond information is lost.
When converting back into the RDKit molecule with to_rdkit_mol, I believe this occurs due to line 739 in https://github.com/lukasturcani/stk/blob/5987b0aa7ce73175701ba267d392b4fe5c651707/src/stk/molecular/molecules/molecule/molecule.py. When rdkit.BondType is called with a floating point bond order, it is converted into an int.
This behaviour can be seen just by executing

rdkit.BondType(1.6)

For writing to a .mol file, this issue appears due to line 82 in https://github.com/andrewtarzia/stk/blob/main/src/stk/molecular/molecules/molecule/utilities/writers/mdl_mol.py, as mentioned by @andrewtarzia on Discord as bond orders are all converted to ints.

To reproduce this, the following code can be used:

import stk
from rdkit.Chem import AllChem as rdkit

smiles = 'C1=CC=CC=C1'

mol = rdkit.MolFromSmiles(smiles)
rdkit.MolToMolFile(mol, 'rdkit.mol')
mol = rdkit.AddHs(mol)
rdkit.EmbedMolecule(mol)
stk.BuildingBlock.init_from_rdkit_mol(mol).write('prekc.mol')
rdkit.Kekulize(mol)
rdkit.MolToMolFile(mol, 'rdkit2.mol')
stk.BuildingBlock.init_from_rdkit_mol(mol).write('postkc.mol')

stk.BuildingBlock(smiles).write('stk.mol')

To fix, maybe an error can be raised if the user passes in non-integer bond types when initialising from an RDKit molecule, or some different handling of non-integer bond types within stk itself.

@lukasturcani
Copy link
Owner

stk does not deal in aromatic bonds. It is quite right for it to convert to integers. The problem, as I see it, is that it is not sanitizing user input. By which I mean, that when receiving an rdkit molecule, it does not kekulize it.

@stevenkbennett
Copy link
Contributor Author

Hey Lukas, just wanted to bring this issue up again as it has appeared quite a few times in our work. I was planning on submitting a PR in which a warning is issued to the user if an aromatic molecule is received in the init_from_rdkit function. What do you think of this?

@lukasturcani
Copy link
Owner

#407 (comment)

lukasturcani pushed a commit that referenced this issue Dec 28, 2021
Issues: #396

Added additional notes section in the docstring to explain that users
should kekulize their molecule prior to calling the
`init_form_rdkit_mol` method. It also explains the consequences of not
doing so (bond orders changed to integer values).

Co-authored-by: stevenbennett96 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants