Skip to content

connect_via_residue_names adds unwanted bonds to ligand while reading pdb file #818

@Emrys-Merlin

Description

@Emrys-Merlin

I have a dataset of pdb files that each contain a protein binding a ligand. The ligands differ from file to file. In each file the ligand always has the residue name "LIG". I want to load these files with biotite including the bonds. Sometimes, the loaded ligand has more bonds than specified by the "CONECT" entries in the PDB file. I have a minimal example below (containing only a ligand).

I traced the problem to this line in biotite.structure.io.pdb.file.PDBFile.get_structure. The line calls connect_via_residue_names, which then looks up residues by residue name. Unfortunately, "LIG" is a valid entry in the used CCD and the method adds bonds if the atom names (by chance, in my case) match pairs in the bonds_in_residue("LIG") dictionary. This behavior was unexpected to me and I was wondering if it is intentional.

For my use case, it would be great if the bond lookup in the CCD were restricted to the amino acids of the peptide. However, I'm not sure if that is too restrictive. As a workaround, I checked which keys return an empty dictionary in bonds_in_residue (at the time of writing, e.g., "BLU") and I could refactor the pdbs to use these keys as residue names. At the same time, that does not feel very stable to me. I would be happy about any input how to tackle my use case. If I am using the library wrong, I would also be happy to hear about it (and how to do it right 🙈).

I could think of two of solutions, if the observed behavior is undesired:

  1. A switch to disable the bond lookup for HETATM entries.
  2. A guaranteed residue name (or maybe even a range) that will never be looked up and that can be found in the docs.

Thank you for the great work and library!

Example

from pathlib import Path
import biotite.structure.io.pdb as pdb


def main():
    path = Path("./minimal.pdb")
    pdb_file = pdb.PDBFile.read(path)
    structure = pdb.get_structure(pdb_file, model=1, include_bonds=True)
    bond_list = structure.bonds
    print(bond_list.as_array())

if __name__ == "__main__":
    main()

Example PDB file (stub) - please ignore the identical coordinates

HETATM  704  C7  LIG B 101     -17.432  24.497  -0.918  1.00 26.28           C  
HETATM  705  C8  LIG B 101     -17.432  24.497  -0.918  1.00 26.28           C  
HETATM  706  C13 LIG B 101     -17.432  24.497  -0.918  1.00 26.28           C  
HETATM  707  C14 LIG B 101     -17.432  24.497  -0.918  1.00 26.28           C  
HETATM  708  C15 LIG B 101     -17.432  24.497  -0.918  1.00 26.28           C  
CONECT  704  705
CONECT  705  704  706
CONECT  706  705  707
CONECT  707  706 708
CONECT  708  707

Expected output

array([[0, 1, 0],
       [1, 2, 0],
       [2, 3, 0],
       [3, 4, 0]], dtype=uint32)

Actual output

array([[0, 1, 6],
       [0, 4, 5],
       [2, 4, 6],
       [1, 2, 0],
       [2, 3, 0],
       [3, 4, 0]], dtype=uint32)

Relevant bonds_in_residue entry

> pprint(bonds_in_residue("LIG"))
{('C10', 'HC10'): <BondType.SINGLE: 1>,
 ('C10', 'N12'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C13', 'C15'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C13', 'HC13'): <BondType.SINGLE: 1>,
 ('C15', 'HC15'): <BondType.SINGLE: 1>,
 ('C17', 'C20'): <BondType.SINGLE: 1>,
 ('C17', 'H171'): <BondType.SINGLE: 1>,
 ('C17', 'H172'): <BondType.SINGLE: 1>,
 ('C20', 'C21'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C20', 'C28'): <BondType.AROMATIC_SINGLE: 5>,
 ('C21', 'C22'): <BondType.AROMATIC_SINGLE: 5>,
 ('C22', 'C24'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C22', 'HC22'): <BondType.SINGLE: 1>,
 ('C24', 'C26'): <BondType.AROMATIC_SINGLE: 5>,
 ('C24', 'HC24'): <BondType.SINGLE: 1>,
 ('C26', 'C28'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C26', 'HC26'): <BondType.SINGLE: 1>,
 ('C28', 'HC28'): <BondType.SINGLE: 1>,
 ('C4', 'C21'): <BondType.AROMATIC_SINGLE: 5>,
 ('C4', 'C5'): <BondType.AROMATIC_SINGLE: 5>,
 ('C5', 'C17'): <BondType.SINGLE: 1>,
 ('C5', 'C6'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C6', 'C7'): <BondType.AROMATIC_SINGLE: 5>,
 ('C7', 'C15'): <BondType.AROMATIC_SINGLE: 5>,
 ('C7', 'C8'): <BondType.AROMATIC_DOUBLE: 6>,
 ('C8', 'C10'): <BondType.AROMATIC_SINGLE: 5>,
 ('C8', 'HC8'): <BondType.SINGLE: 1>,
 ('N1', 'C6'): <BondType.AROMATIC_SINGLE: 5>,
 ('N1', 'HN1'): <BondType.SINGLE: 1>,
 ('N1', 'N3'): <BondType.AROMATIC_SINGLE: 5>,
 ('N12', 'C13'): <BondType.AROMATIC_SINGLE: 5>,
 ('N3', 'C4'): <BondType.AROMATIC_DOUBLE: 6>}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions