Skip to content

Conversation

Emrys-Merlin
Copy link
Contributor

@Emrys-Merlin Emrys-Merlin commented Aug 1, 2025

The PR aims at resolving issue 818. The idea is to disable the bond lookup in the compound dictionary for hetero atoms. I tried to keep the changes minimal, but I'm not sure if that introduces too much special casing. I'm looking forward to your feedback.

My first, naive approach to globally disable the bond lookup did not work, because the bond parsing from .pdb files and .cif files works a bit differently, as far as I understand. While for pdbs CONECT entries are parsed as bonds directly and then extended by bonds found in the dictionary lookup (see here), for cifs, the whole _chem_comp_bond table is parsed as the compound dictionary and all the bonds (for both hetero and regular atoms) are looked up in it (see here). So, I introduced the ignore_hetero flag, which is False by default (preserving the current behavior) and I set to True for pdb parsing. Lastly, I introduced water special casing, because in the test file 1crr, the pdb does not contain CONECT entries for the water atoms, while the CIF _chem_comp_bond does. I am not sure how common that scenario really is or if it would be possible to adapt the test case.

@Emrys-Merlin Emrys-Merlin force-pushed the disable_hetatm_bond_lookup branch from a6ac3c5 to 8e54d9f Compare August 4, 2025 12:18
Copy link
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this 👍. I think this can be solely fixed by changes in biotite.structure.io.pdb as custom_bond_dict already supports the required flexibility. This way we would automatically respect the differences in PDB and PDBx bond definitions.

We could use the approach from here (see the Notes section). We could supply the custom_bond_dict only with hetero=False residues plus water. So in PDBFile instead of calling connect_via_residue_names(array) we could do something like this:

custom_bond_dict = {
    custom_bond_dict[res_name]: bonds_in_residue(res_name)
    for res_name in itertools.chain(np.unique(array.res_name), ["HOH"])
}
connect_via_residue_names(array, custom_bond_dict=custom_bond_dict)

@Emrys-Merlin
Copy link
Contributor Author

Thank you for your feedback. Unfortunately, I run into a 404 when I click your link. I think, I understand your idea though and will start working on the refactor.

@Emrys-Merlin Emrys-Merlin requested a review from padix-key August 8, 2025 09:24
@padix-key
Copy link
Member

padix-key commented Aug 8, 2025

I pasted the wrong text 😅, now the URL in the link is working.

Copy link

codspeed-hq bot commented Aug 8, 2025

CodSpeed Performance Report

Merging #820 will not alter performance

Comparing Emrys-Merlin:disable_hetatm_bond_lookup (81dab9a) with main (a541322)

Summary

✅ 59 untouched benchmarks

@padix-key
Copy link
Member

padix-key commented Aug 8, 2025

Looks good to me, can I merge this PR?

@Emrys-Merlin
Copy link
Contributor Author

Emrys-Merlin commented Aug 8, 2025

Yes, please feel free :) I would have done it, but, unsurprisingly, I do not have the permissions :D

@padix-key padix-key merged commit 7474d90 into biotite-dev:main Aug 8, 2025
35 of 47 checks passed
@Emrys-Merlin Emrys-Merlin deleted the disable_hetatm_bond_lookup branch August 8, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants