-
Notifications
You must be signed in to change notification settings - Fork 129
Do not lookup bonds for hetero residues #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not lookup bonds for hetero residues #820
Conversation
a6ac3c5
to
8e54d9f
Compare
8e54d9f
to
f13a538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this 👍. I think this can be solely fixed by changes in biotite.structure.io.pdb
as custom_bond_dict
already supports the required flexibility. This way we would automatically respect the differences in PDB and PDBx bond definitions.
We could use the approach from here (see the Notes section). We could supply the custom_bond_dict
only with hetero=False
residues plus water. So in PDBFile
instead of calling connect_via_residue_names(array)
we could do something like this:
custom_bond_dict = {
custom_bond_dict[res_name]: bonds_in_residue(res_name)
for res_name in itertools.chain(np.unique(array.res_name), ["HOH"])
}
connect_via_residue_names(array, custom_bond_dict=custom_bond_dict)
Thank you for your feedback. Unfortunately, I run into a 404 when I click your link. I think, I understand your idea though and will start working on the refactor. |
I pasted the wrong text 😅, now the URL in the link is working. |
CodSpeed Performance ReportMerging #820 will not alter performanceComparing Summary
|
Looks good to me, can I merge this PR? |
Yes, please feel free :) I would have done it, but, unsurprisingly, I do not have the permissions :D |
The PR aims at resolving issue 818. The idea is to disable the bond lookup in the compound dictionary for hetero atoms. I tried to keep the changes minimal, but I'm not sure if that introduces too much special casing. I'm looking forward to your feedback.
My first, naive approach to globally disable the bond lookup did not work, because the bond parsing from .pdb files and .cif files works a bit differently, as far as I understand. While for pdbs
CONECT
entries are parsed as bonds directly and then extended by bonds found in the dictionary lookup (see here), for cifs, the whole_chem_comp_bond
table is parsed as the compound dictionary and all the bonds (for both hetero and regular atoms) are looked up in it (see here). So, I introduced theignore_hetero
flag, which isFalse
by default (preserving the current behavior) and I set toTrue
for pdb parsing. Lastly, I introduced water special casing, because in the test file 1crr, the pdb does not containCONECT
entries for the water atoms, while the CIF_chem_comp_bond
does. I am not sure how common that scenario really is or if it would be possible to adapt the test case.