Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit 'standard' bonds when writing struct_conn category #678

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

padix-key
Copy link
Member

@padix-key padix-key commented Oct 17, 2024

This PR increases the speed in which PDBx files with structure + bonds written by Biotite can be read:

Currently, pdbx.set_structure(include_bonds=True) writes all inter-residue bonds, to be sure that all required bonds are written, because the PDBx dictionary describes the bonds that are stored in this category rather vaguely:

Nonstandard residue linkage. The LINK records specify connectivity between residues that is not implied by the primary structure.

This means that the struct_conn category can become much larger than in the original from the PDB.

For parsing the struct_conn category again, each of its rows is matched against each row in atom_site. While this can be done in a fast vectorized manner using (n_atoms x n_bonds) boolean matrices, this does not scale well when struct_conn is 'verbose' as described above: Now these matrices effectively have approximately the shape (n_atoms x n_residues), because there is one bond for each residue linkage. As n_residues is approximately proportional to n_atoms, the shape of the boolean matrices and thus the time complexity becomes O(n^2). Therefore, the time for parsing inter-residue bonds explodes for larger structures.

The solution of this PR is to write less inter-residue bonds to struct_conn: While the specification of the category is no very precise, backbone bonds between adjacent canonical amino acids/nucleotides can be definetely excluded from the category. Filtering these out, renders the size of struct_conn much smaller and, more importantly, it does not scale with the number of atoms anymore.

@padix-key padix-key marked this pull request as ready for review October 21, 2024 12:29
@padix-key padix-key merged commit 737c1b6 into biotite-dev:main Oct 21, 2024
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant