Skip to content

Commit

Permalink
⚡ Make release ready
Browse files Browse the repository at this point in the history
Arguments have been updated for full command line access and output
saving commands have been adapted to give full all useful information.
A new output file `energies.txt` will be created when `-a` is used, this
contains the energies of each alanine scan. The format of all output
files is now tab-separated, with blank places indicated with '--' to
ensure compatibility with most text parsing tools. The README has been
updated to include usage guidance and some references for the algorithm.
A full description of the algorithm is nto included at this stage.
  • Loading branch information
Jedd Bellamy-Carter committed May 30, 2020
1 parent 51f12af commit b4d17c5
Show file tree
Hide file tree
Showing 2 changed files with 169 additions and 72 deletions.
92 changes: 73 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,75 @@
# chargePlacer: Python scripts for charge distribution in gas-phase molecular dynamics

This is a command line script to determine a reasonablly energy
minimised proton sequence for an input PDB file (**INPUT**) for a given
charge state (**CHARGE**). A search algorithm is used to sample proton
permutations across chargeable side-chains and termini represented as
point charges. This algorithm produces a reproducible output proton
sequence in far fewer steps than required for sampling all permutations.

The optimised proton sequence is saved to file (**OUTPUT**) with the format:
`<Residue Name> <Residue Number> <Chain Identifier>`

A choice of energies are calculated and used for determination. By
default, `E_tot` is used. This is the Coulomb energy minus the proton
binding energy (i.e. the summed proton affinities of protonated
residues). `Coulomb-only` is the alternative mode, where only the
Coulomb energy is taken into account.

This software also provides the option to perform in silico alanine
scanning, where each chargeable side-chain is removed and the minimised
proton sequence determined for each 'mutant'.
This is a command line script to determine a reasonably energy minimised proton sequence for an input PDB file (**input**) for a given charge state (**charge**). See [Basic Usage](#basic-usage).
A search algorithm is used to sample proton permutations across chargeable side-chains and termini represented as point charges. The algorithm is inspired by the method described by refs [1–3](#references), it produces a reproducible output proton sequence in far fewer steps than required for sampling all permutations.

A choice of energies are calculated and used for determination:
* By default, `E_tot` is used. This is the Coulomb energy minus the proton binding energy (i.e. the summed proton affinities of protonated residues). The values used herein are derived from simplified versions of each amino acid, per ref [4](#references). These are PA<sub>NT</sub>=886.6, PA<sub>ASP-</sub>=1453.5, PA<sub>GLU-</sub>=1448.5, PA<sub>HIS</sub>=958, PA<sub>LYS</sub>=918, PA<sub>ARG</sub>=1002, PA<sub>CT-</sub>=1430 kJ/mol.
* `Coulomb-only` is the alternative mode (activated by `-c`), where only the Coulomb energy is taken into account.

This software also provides the option to perform in silico alanine scanning (activated by `-a`), where each chargeable side-chain is removed and the minimised proton sequence determined for each 'mutant'.

Both default and `alanine_scan` modes output two tab-separated files `proton_sites.txt` and `charges.txt`.
* `proton_sites.txt` : A text file listing the side-chains and termini that are protonated in the energy minimised sequence. Each row has the form `<RESN> <CHAIN> <RESI>`. The energies calculated are included in the file header. This file only ever contains the data for the 'wild-type' protein sequence.
* `charges.txt` : A text file containing the charges of each residue and terminus in the energy minimised sequence. For the `alanine_scan` method, each subsequent row contains each mutant variant with the mutated residue charge indicated by `nan`.
An additional file `energies.txt` is generated in `alanine_scan` mode, this contains the calculated energies for each charge sequence in `charges.txt`.

## Basic Usage

In the simplest case, a PDB file **`input`** is provided along with a target charge state **`charge`**.
`chargePlacer` will then import the atom coordinates for pre-defined point charge atoms for all chargeable residues and termini.

```shell
python chargePlacer.py input charge
```

To minimise for Coulomb energy only, append `-c`:

```shell
python chargePlacer.py input charge -c
```

To perform an `alanine_scan`, append `-a`:

```shell
python chargePlacer.py input charge -a
```

If you would like to capture the full output of the program you can use piping:

```shell
python chargePlacer.py input charge > log.txt
```

this may be especially powerful when combined with the `verbose` option (`-v`), which will print the energies calculated at each step.

```shell
python chargePlacer.py input charge -v > log.txt
```

See [Command Line Options](#command-line-options) below for further details.

### Command Line Options

#### Required Arguments
| Parameter | Description |
|-----------|-----------------------------------------------|
| **input**, e.g. `input.pdb` | input PDB file for which to determine charges |
| **charge**, e.g. `7` | target charge state |

#### Optional arguments
| Option | Description |
|---------------------------|---------------------------------|
| `-h`, `--help` | show this help message and exit |
| `-v`, `--verbose` | verbose output |
| `-c`, `--coulomb_only` | minimise for Coulomb repulsion only, ignores proton affinity |
| `-r` , `--relative_permittivity` | relative permittivity to use (default: 1) |
| `-a`, `--alanine_scan` | perform in silico alanine scanning for all chargeable residues |
| `-o`, `--output` OUTPUT | prefix for output files (default: ""). Gives `*proton_sites.txt` and `*charges.txt` |


## References
1. V. Popa, D. A. Trecroce, R. G. McAllister and L. Konermann, J. Phys. Chem. B, 2016, 120, 5114–5124.
2. M. Bakhtiari and L. Konermann, J. Phys. Chem. B, 2019, 123, 1784–1796.
3. S. K. Fegan and M. Thachuk, J. Chem. Theory Comput., 2013, 9, 2531–2539.
4. A. Moser, K. Range and D. M. York, J. Phys. Chem. B, 2010, 114, 13911–13921.
149 changes: 96 additions & 53 deletions chargePlacer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# MIT License
"""chargePlacer: Python implementation of a charge positioning algorithm
This is a command line script to determine a reasonablly energy
This is a command line script to determine a reasonably energy
minimised proton sequence for an input PDB file (INPUT) for a given
charge state (CHARGE). A search algorithm is used to sample proton
permutations across chargeable side-chains and termini represented as
Expand Down Expand Up @@ -35,7 +35,6 @@
# Standard Python Modules
import argparse
import os
import sys
import time

# Additional Modules
Expand Down Expand Up @@ -103,14 +102,14 @@ class PDB():
def __init__(self, filename=None):
if not os.path.splitext(filename)[1] in ['.pdb', '.pdbqt']:
raise ValueError('Incorrect file extension, must be .pdb or .pdbqt')
else:
self.filename = filename

self.AA_3to1 = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D',
'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G',
'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K',
'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S',
'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'}

self.filename = filename

self.AA_3to1 = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D',
'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G',
'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K',
'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S',
'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'}
self.ATOM_STRING = "{}{:5d} {:4}{:.1}{:.3} {:.1}{:>4d}{:.1} {:8.3f}{:8.3f}{:8.3f}{:6.2f}{:6.2f} {:.2} \n"

self.structure = [None]
Expand Down Expand Up @@ -161,7 +160,7 @@ def _parse(self):
elif record_type == 'MODEL ':
open_model = True
_model = int(entry.split()[1])
if not _model == model+1:
if _model != model+1:
print('MODEL records should be sequentially numbered beginning with 1.')
print('MODEL {} renumbered to {}'.format(_model, model+1))
model += 1
Expand Down Expand Up @@ -229,9 +228,9 @@ def distance_matrix(a, b):
distances : array of Euclidean distances (NxM)
"""

_a = np.asarray(a)[:,np.newaxis,:]
_b = np.asarray(b)[np.newaxis,:,:]
return np.sum((_a - _b)**2, axis = -1)**0.5
_a = np.asarray(a)[:, np.newaxis, :]
_b = np.asarray(b)[np.newaxis, :, :]
return np.sum((_a - _b)**2, axis=-1)**0.5

def symmetric_matrix(vector):
"""
Expand Down Expand Up @@ -397,7 +396,7 @@ def minimise_energy(deprot_charges, affinities, xyz, charge, coulomb_only=False,
shunt_min = current_min
counters = [time.process_time(), 0, 0]

while (shunt_min <= current_min):
while shunt_min <= current_min:
counters[1] += 1
if verbose:
print('Shunt={}'.format(counters[1]))
Expand All @@ -422,7 +421,7 @@ def minimise_energy(deprot_charges, affinities, xyz, charge, coulomb_only=False,
if verbose:
print('Shunt {} minimum energy {:.2f} kJ/mol'.format(counters[1], shunt_min))
# Update `proton_seq` to best values
if (shunt_min >= current_min):
if shunt_min >= current_min:
e_coulomb = coulomb_energy(proton_seq+deprot_charges, distances, mask)
e_proton = binding_energy(proton_seq, affinities)
counters[0] = time.process_time() - counters[0]
Expand Down Expand Up @@ -461,7 +460,7 @@ def alanine_scan(resn, resi, deprot_charges, affinities, xyz, charge, coulomb_on
-------
mutant_proton_seq : proton sequence after minimisation for each alanine mutant (MxN array)
"""
mutable = [i for i in range(len(resi)) if not resn[i] in ['NT', 'CT']] # 'NT' and 'CT' are immutable
mutable = [i for i in range(len(resi)) if not resn[i] in ['NT', 'CT']] # 'NT' and 'CT' are immutable
mutant_proton_seq = np.zeros((len(mutable), len(resn)))
ignore_mask = np.ones_like(deprot_charges, dtype=bool)
mutant_energies = []
Expand All @@ -479,24 +478,22 @@ def alanine_scan(resn, resi, deprot_charges, affinities, xyz, charge, coulomb_on
mutant_proton_seq[r, ignore_mask] = mutant_min_energy[0]
mutant_proton_seq[r, res] = np.nan
ignore_mask[res] = True
mutant_energies.append(mutant_min_energy[1:3])
mutant_energies.append(mutant_min_energy[2:4])

return mutant_proton_seq, mutant_energies, mutable
return mutable, mutant_proton_seq, mutant_energies

def save_charge_sequence(filename, wt_charge_sequence, wt_energy, resn, resi, mutable=None,
mutant_charge_sequence=None, mut_energies=None, pdb_file=None):
def save_charge_sequence(filename, wt_charge_sequence, resn, resi, mutable=None,
mutant_charge_sequence=None, pdb_file=None):
"""Save charge sequences to file.
Parameters
----------
filename : str
wt_charge_sequence : ndarray
wt_energy : float
resn : list
resi : list
mutable : list, optional
mut_charge_sequence : ndarray, optional
mut_energies : list, optional
pdb_file : str, optional
Returns
Expand All @@ -509,13 +506,44 @@ def save_charge_sequence(filename, wt_charge_sequence, wt_energy, resn, resi, mu
outfile.write('# This file was generated by chargePlacer.py, {}\n'.format(time.strftime("%d %b %Y %H:%M:%S")))
if pdb_file:
outfile.write('# From {}\n'.format(pdb_file))
outfile.write('\t'+'\t'.join(resn)+'\n') # Residue names
outfile.write('\t'+'\t'.join(str(resi))+'\n') # Residues numbers
data_str = ','.join(['{:3.0f}'.format(i) for i in wt_charge_sequence])
outfile.write('WT\t'+data_str+'\n')
outfile.write('--\t--\t'+'\t'.join(resn)+'\n') # Residue names
outfile.write('--\t--\t'+'\t'.join(map(str, resi))+'\n') # Residues numbers

data_str = '{}\t{}\t'+'\t'.join(['{:.0f}']*len(resn))+'\n'

outfile.write(data_str.format('WT','--',*wt_charge_sequence))

if mutable:
pass
#TODO: THIS!!!!!!!!!!!!!!!!!!!!!
for m, mut in enumerate(mutable):
outfile.write(data_str.format(resn[mut],resi[mut],*mutant_charge_sequence[m]))
print('Charge sequence(s) successfully saved to {}'.format(filename+'charges.txt'))

def save_energies(filename, wt_energies, mut_energies, mutable, resn, resi, pdb_file=None):
"""Save energies to file
Parameters
----------
filename : str
wt_energy : tuple
mut_energies : list of tuples
mutable : list of str
pdb_file : str
Returns
-------
outfile : file
Text file containing energies for alanine scanning
<RESN>\t<RESI>\t<COULOMB>\t<BINDING>
"""
with open(filename+'energies.txt', 'w') as outfile:
outfile.write('# This file was generated by chargePlacer.py, {}\n'.format(time.strftime("%d %b %Y %H:%M:%S")))
if pdb_file:
outfile.write('# From {}\n'.format(pdb_file))
data_str = '{}\t{}\t{:.2f}\t{:.2f}\n'
outfile.write(data_str.format('WT','--',*wt_energies))
for m, mut in enumerate(mutable):
outfile.write(data_str.format(resn[mut],resi[mut],*mut_energies[m]))
print('Energies successfully saved to {}'.format(filename+'energies.txt'))

def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, resi, pdb_file=None):
"""Save minimised proton sequence to file.
Expand Down Expand Up @@ -551,7 +579,7 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
argparser.add_argument('input', metavar='INPUT', help='input PDB file to determine charges for')
argparser.add_argument('input', metavar='INPUT', help='input PDB file for which to determine charges')
argparser.add_argument('charge', metavar='CHARGE', help='target charge state', type=int)
argparser.add_argument('-v', '--verbose',
help='verbose output', action='store_true')
Expand All @@ -562,7 +590,7 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r
help='relative permittivity to use (default: 1)',
default=1, type=float)
argparser.add_argument('-a', '--alanine_scan',
help='perform in silico alanine scanning for all chargeable residues',
help='perform in silico alanine scanning for all chargeable residues. Additional file energies.txt os generated',
action='store_true')
argparser.add_argument('-o', '--output',
help='prefix for output files (default: ""). Gives *proton_sites.txt and *charges.txt',
Expand All @@ -573,38 +601,53 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r
E_CONST = _E_CONST / args.relative_permittivity

print('Opening {} and parsing coordinates...'.format(args.input))
resn, resi , deprot_charges, xyz, affinities = parse_coordinates(args.input)
resn, resi, deprot_charges, xyz, affinities = parse_coordinates(args.input)

print('Minimising energy of proton sequence...')
min_energy = minimise_energy(deprot_charges,
affinities,
xyz,
args.charge,
coulomb_only = args.coulomb_only,
verbose = args.verbose)

if args.alanine_scan:
print('Beginning in silico alanine scanning...')
alanine_scan(resn,
resi,
deprot_charges,
affinities,
xyz,
args.charge,
coulomb_only = args.coulomb_only,
verbose = args.verbose)
else:
save_charge_sequence(args.output,
min_energy[0],
min_energy[1:3],
resn,
resi,
args.input)

coulomb_only=args.coulomb_only,
verbose=args.verbose)

save_proton_sequence(args.output,
min_energy[0],
min_energy[2],
min_energy[3],
resn,
resi,
args.input)

if args.alanine_scan:
print('Beginning in silico alanine scanning...')
scanned_alas = alanine_scan(resn,
resi,
deprot_charges,
affinities,
xyz,
args.charge,
coulomb_only=args.coulomb_only,
verbose=args.verbose)

save_charge_sequence(args.output,
min_energy[0]+deprot_charges,
resn,
resi,
mutable=scanned_alas[0],
mutant_charge_sequence=scanned_alas[1]+deprot_charges,
pdb_file=args.input)
save_energies(args.output,
min_energy[2:4],
scanned_alas[2],
scanned_alas[0],
resn,
resi,
pdb_file=args.input)
else:
save_charge_sequence(args.output,
min_energy[0]+deprot_charges,
resn,
resi,
pdb_file=args.input)

0 comments on commit b4d17c5

Please sign in to comment.