-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use rdkit internal PDBResidueInfo for standard annotations #742
feat: use rdkit internal PDBResidueInfo for standard annotations #742
Conversation
@greglandrum, do you have a hint on the above by any chance? |
CodSpeed Performance ReportMerging #742 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this refactoring. I left a few questions in the comments, but overall this looks very good to me. Regarding your two questions:
- I agree this is probably not ideal, but I suppose this interface will be mainly used to convert molecules with only a few atoms. Hence, the performance penalty of iterating over atoms should not be that big.
- Just to clarify: You mean this would be a problem for optional annotations, e.g. I would always get a value for
GetOccupancy()
independent of whether it was set byas_mol()
or not?
@Croydon-Brixton I would be happy to help, but I don't have time to wade through the issue discussion and your code to try and figure out what you want to do. If you have a specific RDKit question, I would be happy to try an answer it |
Re 1) I agree, it's not terribly slow and there doesn't seem to be an easy way around it. |
Thank you! Yes the specific question would be the following: Currently we for-loop iterate over numpy arrays (the Here's a minimal example snippet ...
for i in range(atoms.array_length()):
rdkit_atom = Atom(atoms.element[i].capitalize())
rdkit_atom.SetFormalCharge(atoms.charge[i].item())
rdkit_atom_res_info = AtomPDBResidueInfo(
atomName=atoms.atom_name[i].item(),
)
rdkit_atom.SetPDBResidueInfo(rdkit_atom_res_info)
mol.AddAtom(rdkit_atom)
... |
I think it would be OK then to always add these annotations to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final comment, but otherwise this is good to go from my side.
Sounds good -- I removed the atom naming part as suggested. Good to go from my side too now! |
Ah, the doctest needs to be updated as well, hence the current CI failure. The current doctest still uses |
From my perspective we can address potential performance improvements later. Feel free to merge, if you are ready |
Sounds good to me, merging now! |
Here's an attempt at grouping the standard annotations under the appropriate RDKit annotation as discussed here #736 (comment)
Let me know what you think -- what I'm not 100% happy with yet is that
(1) python-iterating over atoms seems inefficient, I wonder if there's an array-like API in RDKit and
(2) I'm currently not tracking whether an annotation was explicitly set because it was present in the original AtomArray or whether it's filling in the default value. Doing this seems non-trivial for rdkit methods which change the underlying mol -- unless maybe we add a bool property that tracks if an annotation exists?