Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding pdb_addter #112

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions pdbtools/pdb_add_manual_ter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright 2021 Brian Andrews
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Adds TER entries at locations in pdb designated by the user. Starting residue
must be specified to avoid inconsistent behavior. This program is agnostic to
residue numbers in the pdb file. The first residue of the pdb file will be
considered residue one (1) and TER entries will be entered at specified intervals
using that convention. Existing TER entries will not be affected.

Usage:
python pdb_addter.py -[first residue]:[last residue]:[frequency] <pdb file>

Example:
python pdb_addter.py -1:10 1CTF.pdb # Adds TER after every residue starting with the end of residue 1 to before residue 10
python pdb_addter.py -1::3 1CTF.pdb # Adds TER after every 3th residue starting from residue 1
python pdb_addter.py -1:10:5 1CTF.pdb # Adds TER after every 5th residue from residues 1 to 10
python pdb_addter.py -4: 1CTF.pdb # Adds TER after every residue starting at the end of residue 4

This program is part of the `pdb-tools` suite of utilities and should not be
distributed isolatedly. The `pdb-tools` were created to quickly manipulate PDB
files using the terminal, and can be used sequentially, with one tool streaming
data to another. They are based on old FORTRAN77 code that was taking too much
effort to maintain and compile. RIP.
"""

import os
import sys

__author__ = ["Joao Rodrigues", "Brian Andrews"]
__email__ = ["[email protected]", "[email protected]"]


def check_input(args):
"""Checks whether to read from stdin/file and validates user input/options.
"""

def return_integer(string):
try:
return int(string)
except:
emsg = 'ERROR!! Range values must be integers!\n'
sys.stderr.write(emsg.format(string))
sys.exit(1)

# Defaults
option = ':::'
fh = sys.stdin # file handle

if not len(args):
# Reading from pipe with default option
if sys.stdin.isatty():
sys.stderr.write(__doc__)
sys.exit(1)

elif len(args) == 1:
# One of two options: option & Pipe OR file & default option
if args[0].startswith('-'):
option = args[0][1:]
if sys.stdin.isatty(): # ensure the PDB data is streamed in
emsg = 'ERROR!! No data to process!\n'
sys.stderr.write(emsg)
sys.stderr.write(__doc__)
sys.exit(1)

else:
if not os.path.isfile(args[0]):
emsg = 'ERROR!! File not found or not readable: \'{}\'\n'
sys.stderr.write(emsg.format(args[0]))
sys.stderr.write(__doc__)
sys.exit(1)

fh = open(args[0], 'r')

elif len(args) == 2:
# Two options: option & File
if not args[0].startswith('-'):
emsg = 'ERROR! First argument is not an option: \'{}\'\n'
sys.stderr.write(emsg.format(args[0]))
sys.stderr.write(__doc__)
sys.exit(1)

if not os.path.isfile(args[1]):
emsg = 'ERROR!! File not found or not readable: \'{}\'\n'
sys.stderr.write(emsg.format(args[1]))
sys.stderr.write(__doc__)
sys.exit(1)

option = args[0][1:]
fh = open(args[1], 'r')

else: # Whatever ...
sys.stderr.write(__doc__)
sys.exit(1)

# Validate option
if not (1 <= option.count(':') <= 2):
emsg = 'ERROR!! Residue range must be in \'a:z:s\' where a and z are '
emsg += 'optional (default to first residue and last respectively), and '
emsg += 's is an optional step value (to return every s-th residue).\n'
sys.stderr.write(emsg)
sys.exit(1)

start, end, step = None, None, 1
slices = [num if num.strip() else None for num in option.split(':')]
if len(slices) == 3:
start, end, step = slices
elif len(slices) == 2:
start, end = slices
elif len(slices) == 1:
if option.startswith(':'):
end = slices[0]
elif option.endswith(':'):
start = slices[0]

#residue range start
if start is None:
emsg = 'ERROR!! Please specify starting value: \'{}\'\n'
sys.stderr.write(emsg.format(start))
sys.exit(1)
else:
start = return_integer(start)

if start < 1:
emsg = 'ERROR!! Starting value must be 1 or greater: \'{}\'\n'
sys.stderr.write(emsg.format(start))
sys.exit(1)

#residue range end
if end is None:
end = 1000000 # a value that presumably will not be reached
else:
end = return_integer(end)

if start >= end:
emsg = 'ERROR!! Start ({}) cannot be larger than end ({})\n'
sys.stderr.write(emsg.format(start, end))
sys.exit(1)

#residue range step
if step is None:
step = 1
else:
step = return_integer(step)

if step <= 0:
emsg = 'ERROR!! Step value must be a positive number: \'{}\'\n'
sys.stderr.write(emsg.format(step))
sys.exit(1)

resrange = set(range(start, end + 2)) #plus 2 here is necessary
return (fh, resrange, step)


def run(fhandle, residue_range, step):
"""
Add TER records within the residue range at frequency identified by step.

This function is a generator.

Parameters
----------
fhandle : a line-by-line iterator of the original PDB file.

residue_range : set, list, or tuple
The residues describing the range to consider.

step : int
The step at which to insert a TER record.

Yields
------
str (line-by-line)
All lines with added TER lines designated by inputs.
"""

def make_TER(prev_line):
"""Creates a TER statement based on the last ATOM/HETATM line.
"""

# Add last TER statement
serial = int(prev_line[6:11]) + 1
rname = prev_line[17:20]
chain = prev_line[21]
resid = prev_line[22:26]
icode = prev_line[26]

return fmt_TER.format(serial, rname, chain, resid, icode)

# TER 606 LEU A 75
fmt_TER = "TER {:>5d} {:3s} {:1s}{:>4s}{:1s}" + " " * 53 + "\n"

prev_line = None
prev_res = None
res_counter = 0
no_more_atoms = False
min_residue = min(residue_range)
records = ('ATOM', 'HETATM', 'ANISOU') #added END* for cases where TER added after last residue
ignored = ('TER')
end = ('END', 'ENDMDL', 'CONECT')
for line in fhandle:
if line.startswith(records):

res_id = line[21:26] # include chain ID
if res_id != prev_res:

prev_res = res_id
res_counter += 1
if res_counter - min_residue != 0 \
and (res_counter - min_residue) % step == 0 \
and res_counter in residue_range \
and not prev_line.startswith(ignored): #does not add TER record if one exists
yield make_TER(prev_line)

# sees record that indicates end of ATOMS records, checks if TER record should be
# added based on user input only once.
if line.startswith(end) \
and res_counter in residue_range \
and res_counter % step == 0 \
and not prev_line.startswith(ignored) \
and not no_more_atoms:
no_more_atoms = True
yield make_TER(prev_line)

prev_line = line
yield line

add_manual_ter = run

def main():
# Check Input
pdbfh, resrange, step = check_input(sys.argv[1:])

# Do the job
new_pdb = run(pdbfh, resrange, step)

try:
_buffer = []
_buffer_size = 5000 # write N lines at a time
for lineno, line in enumerate(new_pdb):
if not (lineno % _buffer_size):
sys.stdout.write(''.join(_buffer))
_buffer = []
_buffer.append(line)

sys.stdout.write(''.join(_buffer))
sys.stdout.flush()
except IOError:
# This is here to catch Broken Pipes
# for example to use 'head' or 'tail' without
# the error message showing up
pass

# last line of the script
# We can close it even if it is sys.stdin
pdbfh.close()
sys.exit(0)


if __name__ == '__main__':
main()
75 changes: 75 additions & 0 deletions tests/data/add_manual_ter_existingTER.pdb
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
CRYST1 0.000 0.000 0.000 90.00 90.00 90.00 P 1 1
ATOM 1 N ALA X 1 -0.132 2.450 -0.112 0.80 0.00 1 N
ATOM 2 C ALA X 1 0.301 0.419 -1.423 0.80 0.00 1 C
ATOM 3 O ALA X 1 0.479 -0.783 -1.242 0.80 0.00 1 O
ATOM 4 CA ALA X 1 -0.701 1.208 -0.598 0.80 0.00 1 C
ATOM 5 CB ALA X 1 -1.903 1.542 -1.431 0.80 0.00 1 C
ATOM 6 HA ALA X 1 -0.992 0.670 0.193 0.80 0.00 1 H
ATOM 7 HT1 ALA X 1 0.220 3.108 -0.747 0.80 0.00 1 H
ATOM 8 HT2 ALA X 1 -0.767 3.067 0.352 0.80 0.00 1 H
ATOM 9 HB1 ALA X 1 -2.108 2.634 -1.418 0.80 0.00 1 H
ATOM 10 HB2 ALA X 1 -1.743 1.237 -2.489 0.80 0.00 1 H
ATOM 11 HB3 ALA X 1 -2.803 1.018 -1.044 0.80 0.00 1 H
TER
ATOM 12 N ALA X 2 0.969 1.064 -2.333 0.80 0.00 1 N
ATOM 13 C ALA X 2 3.021 -0.294 -2.323 0.80 0.00 1 C
ATOM 14 O ALA X 2 3.374 -1.452 -2.536 0.80 0.00 1 O
ATOM 15 CA ALA X 2 1.956 0.379 -3.172 0.80 0.00 1 C
ATOM 16 HA ALA X 2 1.468 -0.304 -3.715 0.80 0.00 1 H
ATOM 17 CB ALA X 2 2.588 1.376 -4.097 0.80 0.00 1 C
ATOM 18 HN ALA X 2 0.833 2.036 -2.485 0.80 0.00 1 H
ATOM 19 HB1 ALA X 2 3.693 1.256 -4.120 0.80 0.00 1 H
ATOM 20 HB2 ALA X 2 2.208 1.245 -5.134 0.80 0.00 1 H
ATOM 21 HB3 ALA X 2 2.362 2.413 -3.766 0.80 0.00 1 H
ATOM 22 N ALA X 3 3.552 0.402 -1.362 0.80 0.00 1 N
ATOM 23 C ALA X 3 4.097 -1.439 0.176 0.80 0.00 1 C
ATOM 24 O ALA X 3 4.783 -2.459 0.205 0.80 0.00 1 O
ATOM 25 CA ALA X 3 4.589 -0.170 -0.499 0.80 0.00 1 C
ATOM 26 HA ALA X 3 5.382 -0.373 -1.074 0.80 0.00 1 H
ATOM 27 CB ALA X 3 4.982 0.845 0.533 0.80 0.00 1 C
ATOM 28 HN ALA X 3 3.271 1.339 -1.184 0.80 0.00 1 H
ATOM 29 HB1 ALA X 3 5.946 0.573 1.014 0.80 0.00 1 H
ATOM 30 HB2 ALA X 3 5.101 1.849 0.070 0.80 0.00 1 H
ATOM 31 HB3 ALA X 3 4.208 0.912 1.328 0.80 0.00 1 H
ATOM 32 N ALA X 4 2.921 -1.408 0.730 0.80 0.00 1 N
ATOM 33 C ALA X 4 2.321 -3.779 0.466 0.80 0.00 1 C
ATOM 34 O ALA X 4 2.725 -4.888 0.812 0.80 0.00 1 O
ATOM 35 CA ALA X 4 2.374 -2.587 1.406 0.80 0.00 1 C
ATOM 36 HA ALA X 4 2.967 -2.790 2.185 0.80 0.00 1 H
ATOM 37 CB ALA X 4 0.998 -2.267 1.911 0.80 0.00 1 C
ATOM 38 HN ALA X 4 2.360 -0.588 0.710 0.80 0.00 1 H
ATOM 39 HB1 ALA X 4 0.950 -1.232 2.313 0.80 0.00 1 H
ATOM 40 HB2 ALA X 4 0.250 -2.351 1.092 0.80 0.00 1 H
ATOM 41 HB3 ALA X 4 0.709 -2.966 2.726 0.80 0.00 1 H
ATOM 42 N ALA X 5 1.827 -3.587 -0.721 0.80 0.00 1 N
ATOM 43 C ALA X 5 3.099 -5.306 -1.938 0.80 0.00 1 C
ATOM 44 O ALA X 5 3.260 -6.525 -1.934 0.80 0.00 1 O
ATOM 45 CA ALA X 5 1.736 -4.685 -1.687 0.80 0.00 1 C
ATOM 46 HA ALA X 5 1.112 -5.368 -1.309 0.80 0.00 1 H
ATOM 47 CB ALA X 5 1.164 -4.160 -2.971 0.80 0.00 1 C
ATOM 48 HN ALA X 5 1.500 -2.694 -1.008 0.80 0.00 1 H
ATOM 49 HB1 ALA X 5 1.507 -3.121 -3.167 0.80 0.00 1 H
ATOM 50 HB2 ALA X 5 1.482 -4.793 -3.829 0.80 0.00 1 H
ATOM 51 HB3 ALA X 5 0.053 -4.153 -2.930 0.80 0.00 1 H
ATOM 52 N ALA X 6 4.095 -4.500 -2.161 0.80 0.00 1 N
ATOM 53 C ALA X 6 5.914 -5.897 -1.272 0.80 0.00 1 C
ATOM 54 O ALA X 6 6.435 -6.992 -1.476 0.80 0.00 1 O
ATOM 55 CA ALA X 6 5.442 -5.017 -2.416 0.80 0.00 1 C
ATOM 56 HA ALA X 6 5.406 -5.547 -3.264 0.80 0.00 1 H
ATOM 57 CB ALA X 6 6.382 -3.863 -2.602 0.80 0.00 1 C
ATOM 58 HN ALA X 6 3.973 -3.514 -2.165 0.80 0.00 1 H
ATOM 59 HB1 ALA X 6 7.428 -4.158 -2.369 0.80 0.00 1 H
ATOM 60 HB2 ALA X 6 6.354 -3.500 -3.652 0.80 0.00 1 H
ATOM 61 HB3 ALA X 6 6.105 -3.022 -1.929 0.80 0.00 1 H
ATOM 62 N ALA X 7 5.751 -5.448 -0.063 0.80 0.00 1 N
ATOM 63 C ALA X 7 5.524 -7.601 1.108 0.80 0.00 1 C
ATOM 64 O ALA X 7 6.173 -8.625 1.312 0.80 0.00 1 O
ATOM 65 CA ALA X 7 6.183 -6.232 1.098 0.80 0.00 1 C
ATOM 66 HA ALA X 7 7.177 -6.331 1.042 0.80 0.00 1 H
ATOM 67 CB ALA X 7 5.836 -5.483 2.351 0.80 0.00 1 C
ATOM 68 HC1 ALA X 7 4.508 -7.641 0.940 0.80 0.00 1 H
ATOM 69 HN ALA X 7 5.331 -4.565 0.111 0.80 0.00 1 H
ATOM 70 HB1 ALA X 7 6.748 -5.080 2.841 0.80 0.00 1 H
ATOM 71 HB2 ALA X 7 5.163 -4.628 2.123 0.80 0.00 1 H
ATOM 72 HB3 ALA X 7 5.324 -6.153 3.076 0.80 0.00 1 H
END
Loading