Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dihedral Backbone Does Not Account for Gaps, Returns Incorrect Values #744

Open
cing opened this issue Jan 29, 2025 · 3 comments
Open

Dihedral Backbone Does Not Account for Gaps, Returns Incorrect Values #744

cing opened this issue Jan 29, 2025 · 3 comments
Labels

Comments

@cing
Copy link

cing commented Jan 29, 2025

I didn't see anything in the docs that says biotite.structure.dihedral_backbone() will return an incorrect value if there's a gap the protein backbone, i.e. it fails biotite.structure.check_res_id_continuity().

For example in the Ramachandran code sample, with PDB: 3VKH there are many gaps in that structure which would result in incorrect dihedrals at the boundaries when we pull atom coordinates between successive i and i+1 residues. In just chain A, there would be erroneous dihedrals produced at all the following positions:

Couldn't find next residue data for residue 1619
Couldn't find next residue data for residue 2060
Couldn't find next residue data for residue 2169
Couldn't find next residue data for residue 2452
Couldn't find next residue data for residue 2630
Couldn't find next residue data for residue 3211
Couldn't find next residue data for residue 3491
Couldn't find next residue data for residue 3736
Couldn't find next residue data for residue 4438
Couldn't find next residue data for residue 4465
Couldn't find next residue data for residue 4521
Couldn't find next residue data for residue 4581
Couldn't find next residue data for residue 4613
Couldn't find next residue data for residue 4624
Couldn't find next residue data for residue 4673
Couldn't find next residue data for residue 4729

Here's an example from the above PDB which isolates one of these gaps (with 2 residues on either side)
3vkh_gap.pdb.txt
In this case, phi/psi/omega, in degrees, are:

(array([        nan,   42.100914,  -94.83789 , -118.06954 ], dtype=float32), 
array([156.29105 , -47.083096, 123.67293 ,        nan], dtype=float32), 
array([ 179.54808 ,   31.235714, -179.63321 ,         nan], dtype=float32))

The phi value -94.837 and the psi/omega the values -47.084 and 31.235 would be incorrect due to the unphysical gap, both would be better as nan.


For comparison, MDAnalysis returns None when you try to get dihedrals across a gap, for example:

import MDAnalysis as mda
from MDAnalysis.analysis import dihedrals
import numpy as np
u = mda.Universe("3vkh_gap.pdb")
protein = u.select_atoms('protein')
print('There are {} residues in the protein'.format(len(protein.residues)))
omegas = [res.omega_selection() for res in protein.residues]
for omega in omegas:
    if omega:
        print(omega, omega.dihedral.value())

returns the correct non-nan angles only:

There are 4 residues in the protein
<AtomGroup [<Atom 2: CA of type C of resname LEU, resid 2059 and segid A and altLoc >, <Atom 3: C of type C of resname LEU, resid 2059 and segid A and altLoc >, <Atom 9: N of type N of resname LEU, resid 2060 and segid A and altLoc >, <Atom 10: CA of type C of resname LEU, resid 2060 and segid A and altLoc >]> 179.54809477851654
<AtomGroup [<Atom 18: CA of type C of resname ASN, resid 2064 and segid A and altLoc >, <Atom 19: C of type C of resname ASN, resid 2064 and segid A and altLoc >, <Atom 22: N of type N of resname ILE, resid 2065 and segid A and altLoc >, <Atom 23: CA of type C of resname ILE, resid 2065 and segid A and altLoc >]> -179.63322828758015
@padix-key
Copy link
Member

Thanks for the report, this is definitely an oversight in the current implementation dihedral_backbone(). However, I am not sure if residue ID continuity is the best way to find such gaps: There might be cases of proteins with inconsistent numbering, that would imply gaps but actually the chain is contiguous and also the other way around. Instead, I suggest to check the C-N distances and set angles to None where this distance is greater than some threshold value.

@padix-key padix-key added the bug label Jan 29, 2025
@cing
Copy link
Author

cing commented Jan 29, 2025

Right so you mean using biotite.structure.check_backbone_continuity()?

@padix-key
Copy link
Member

padix-key commented Jan 31, 2025

Yes, this should work. But since dihedral_backbone() already extracted the backbone coordinates, I assume it is more efficient to compute the backbone continuity directly in dihedral_backbone() in a similar manner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants