-
Notifications
You must be signed in to change notification settings - Fork 129
Description
I have a dataset of pdb files that each contain a protein binding a ligand. The ligands differ from file to file. In each file the ligand always has the residue name "LIG". I want to load these files with biotite including the bonds. Sometimes, the loaded ligand has more bonds than specified by the "CONECT" entries in the PDB file. I have a minimal example below (containing only a ligand).
I traced the problem to this line in biotite.structure.io.pdb.file.PDBFile.get_structure
. The line calls connect_via_residue_names
, which then looks up residues by residue name. Unfortunately, "LIG" is a valid entry in the used CCD and the method adds bonds if the atom names (by chance, in my case) match pairs in the bonds_in_residue("LIG")
dictionary. This behavior was unexpected to me and I was wondering if it is intentional.
For my use case, it would be great if the bond lookup in the CCD were restricted to the amino acids of the peptide. However, I'm not sure if that is too restrictive. As a workaround, I checked which keys return an empty dictionary in bonds_in_residue
(at the time of writing, e.g., "BLU") and I could refactor the pdbs to use these keys as residue names. At the same time, that does not feel very stable to me. I would be happy about any input how to tackle my use case. If I am using the library wrong, I would also be happy to hear about it (and how to do it right 🙈).
I could think of two of solutions, if the observed behavior is undesired:
- A switch to disable the bond lookup for HETATM entries.
- A guaranteed residue name (or maybe even a range) that will never be looked up and that can be found in the docs.
Thank you for the great work and library!
Example
from pathlib import Path
import biotite.structure.io.pdb as pdb
def main():
path = Path("./minimal.pdb")
pdb_file = pdb.PDBFile.read(path)
structure = pdb.get_structure(pdb_file, model=1, include_bonds=True)
bond_list = structure.bonds
print(bond_list.as_array())
if __name__ == "__main__":
main()
Example PDB file (stub) - please ignore the identical coordinates
HETATM 704 C7 LIG B 101 -17.432 24.497 -0.918 1.00 26.28 C
HETATM 705 C8 LIG B 101 -17.432 24.497 -0.918 1.00 26.28 C
HETATM 706 C13 LIG B 101 -17.432 24.497 -0.918 1.00 26.28 C
HETATM 707 C14 LIG B 101 -17.432 24.497 -0.918 1.00 26.28 C
HETATM 708 C15 LIG B 101 -17.432 24.497 -0.918 1.00 26.28 C
CONECT 704 705
CONECT 705 704 706
CONECT 706 705 707
CONECT 707 706 708
CONECT 708 707
Expected output
array([[0, 1, 0],
[1, 2, 0],
[2, 3, 0],
[3, 4, 0]], dtype=uint32)
Actual output
array([[0, 1, 6],
[0, 4, 5],
[2, 4, 6],
[1, 2, 0],
[2, 3, 0],
[3, 4, 0]], dtype=uint32)
Relevant bonds_in_residue
entry
> pprint(bonds_in_residue("LIG"))
{('C10', 'HC10'): <BondType.SINGLE: 1>,
('C10', 'N12'): <BondType.AROMATIC_DOUBLE: 6>,
('C13', 'C15'): <BondType.AROMATIC_DOUBLE: 6>,
('C13', 'HC13'): <BondType.SINGLE: 1>,
('C15', 'HC15'): <BondType.SINGLE: 1>,
('C17', 'C20'): <BondType.SINGLE: 1>,
('C17', 'H171'): <BondType.SINGLE: 1>,
('C17', 'H172'): <BondType.SINGLE: 1>,
('C20', 'C21'): <BondType.AROMATIC_DOUBLE: 6>,
('C20', 'C28'): <BondType.AROMATIC_SINGLE: 5>,
('C21', 'C22'): <BondType.AROMATIC_SINGLE: 5>,
('C22', 'C24'): <BondType.AROMATIC_DOUBLE: 6>,
('C22', 'HC22'): <BondType.SINGLE: 1>,
('C24', 'C26'): <BondType.AROMATIC_SINGLE: 5>,
('C24', 'HC24'): <BondType.SINGLE: 1>,
('C26', 'C28'): <BondType.AROMATIC_DOUBLE: 6>,
('C26', 'HC26'): <BondType.SINGLE: 1>,
('C28', 'HC28'): <BondType.SINGLE: 1>,
('C4', 'C21'): <BondType.AROMATIC_SINGLE: 5>,
('C4', 'C5'): <BondType.AROMATIC_SINGLE: 5>,
('C5', 'C17'): <BondType.SINGLE: 1>,
('C5', 'C6'): <BondType.AROMATIC_DOUBLE: 6>,
('C6', 'C7'): <BondType.AROMATIC_SINGLE: 5>,
('C7', 'C15'): <BondType.AROMATIC_SINGLE: 5>,
('C7', 'C8'): <BondType.AROMATIC_DOUBLE: 6>,
('C8', 'C10'): <BondType.AROMATIC_SINGLE: 5>,
('C8', 'HC8'): <BondType.SINGLE: 1>,
('N1', 'C6'): <BondType.AROMATIC_SINGLE: 5>,
('N1', 'HN1'): <BondType.SINGLE: 1>,
('N1', 'N3'): <BondType.AROMATIC_SINGLE: 5>,
('N12', 'C13'): <BondType.AROMATIC_SINGLE: 5>,
('N3', 'C4'): <BondType.AROMATIC_DOUBLE: 6>}