Omit 'standard' bonds when writing struct_conn
category
#678
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR increases the speed in which PDBx files with structure + bonds written by Biotite can be read:
Currently,
pdbx.set_structure(include_bonds=True)
writes all inter-residue bonds, to be sure that all required bonds are written, because the PDBx dictionary describes the bonds that are stored in this category rather vaguely:This means that the
struct_conn
category can become much larger than in the original from the PDB.For parsing the
struct_conn
category again, each of its rows is matched against each row inatom_site
. While this can be done in a fast vectorized manner using(n_atoms x n_bonds)
boolean matrices, this does not scale well whenstruct_conn
is 'verbose' as described above: Now these matrices effectively have approximately the shape(n_atoms x n_residues)
, because there is one bond for each residue linkage. Asn_residues
is approximately proportional ton_atoms
, the shape of the boolean matrices and thus the time complexity becomesO(n^2)
. Therefore, the time for parsing inter-residue bonds explodes for larger structures.The solution of this PR is to write less inter-residue bonds to
struct_conn
: While the specification of the category is no very precise, backbone bonds between adjacent canonical amino acids/nucleotides can be definetely excluded from the category. Filtering these out, renders the size ofstruct_conn
much smaller and, more importantly, it does not scale with the number of atoms anymore.