From b4d17c5fb99e326168e72217b69d4d0c5c71bd0c Mon Sep 17 00:00:00 2001 From: Jedd Bellamy-Carter Date: Sat, 30 May 2020 15:39:41 +0100 Subject: [PATCH] :zap: Make release ready Arguments have been updated for full command line access and output saving commands have been adapted to give full all useful information. A new output file `energies.txt` will be created when `-a` is used, this contains the energies of each alanine scan. The format of all output files is now tab-separated, with blank places indicated with '--' to ensure compatibility with most text parsing tools. The README has been updated to include usage guidance and some references for the algorithm. A full description of the algorithm is nto included at this stage. --- README.md | 92 ++++++++++++++++++++++++------ chargePlacer.py | 149 +++++++++++++++++++++++++++++++----------------- 2 files changed, 169 insertions(+), 72 deletions(-) diff --git a/README.md b/README.md index 148e72b..4f02e72 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,75 @@ # chargePlacer: Python scripts for charge distribution in gas-phase molecular dynamics -This is a command line script to determine a reasonablly energy -minimised proton sequence for an input PDB file (**INPUT**) for a given -charge state (**CHARGE**). A search algorithm is used to sample proton -permutations across chargeable side-chains and termini represented as -point charges. This algorithm produces a reproducible output proton -sequence in far fewer steps than required for sampling all permutations. - -The optimised proton sequence is saved to file (**OUTPUT**) with the format: -` ` - -A choice of energies are calculated and used for determination. By -default, `E_tot` is used. This is the Coulomb energy minus the proton -binding energy (i.e. the summed proton affinities of protonated -residues). `Coulomb-only` is the alternative mode, where only the -Coulomb energy is taken into account. - -This software also provides the option to perform in silico alanine -scanning, where each chargeable side-chain is removed and the minimised -proton sequence determined for each 'mutant'. +This is a command line script to determine a reasonably energy minimised proton sequence for an input PDB file (**input**) for a given charge state (**charge**). See [Basic Usage](#basic-usage). +A search algorithm is used to sample proton permutations across chargeable side-chains and termini represented as point charges. The algorithm is inspired by the method described by refs [1–3](#references), it produces a reproducible output proton sequence in far fewer steps than required for sampling all permutations. + +A choice of energies are calculated and used for determination: +* By default, `E_tot` is used. This is the Coulomb energy minus the proton binding energy (i.e. the summed proton affinities of protonated residues). The values used herein are derived from simplified versions of each amino acid, per ref [4](#references). These are PANT=886.6, PAASP-=1453.5, PAGLU-=1448.5, PAHIS=958, PALYS=918, PAARG=1002, PACT-=1430 kJ/mol. +* `Coulomb-only` is the alternative mode (activated by `-c`), where only the Coulomb energy is taken into account. + +This software also provides the option to perform in silico alanine scanning (activated by `-a`), where each chargeable side-chain is removed and the minimised proton sequence determined for each 'mutant'. + +Both default and `alanine_scan` modes output two tab-separated files `proton_sites.txt` and `charges.txt`. +* `proton_sites.txt` : A text file listing the side-chains and termini that are protonated in the energy minimised sequence. Each row has the form ` `. The energies calculated are included in the file header. This file only ever contains the data for the 'wild-type' protein sequence. +* `charges.txt` : A text file containing the charges of each residue and terminus in the energy minimised sequence. For the `alanine_scan` method, each subsequent row contains each mutant variant with the mutated residue charge indicated by `nan`. +An additional file `energies.txt` is generated in `alanine_scan` mode, this contains the calculated energies for each charge sequence in `charges.txt`. + +## Basic Usage + +In the simplest case, a PDB file **`input`** is provided along with a target charge state **`charge`**. +`chargePlacer` will then import the atom coordinates for pre-defined point charge atoms for all chargeable residues and termini. + +```shell +python chargePlacer.py input charge +``` + +To minimise for Coulomb energy only, append `-c`: + +```shell +python chargePlacer.py input charge -c +``` + +To perform an `alanine_scan`, append `-a`: + +```shell +python chargePlacer.py input charge -a +``` + +If you would like to capture the full output of the program you can use piping: + +```shell +python chargePlacer.py input charge > log.txt +``` + +this may be especially powerful when combined with the `verbose` option (`-v`), which will print the energies calculated at each step. + +```shell +python chargePlacer.py input charge -v > log.txt +``` + +See [Command Line Options](#command-line-options) below for further details. + +### Command Line Options + +#### Required Arguments +| Parameter | Description | +|-----------|-----------------------------------------------| +| **input**, e.g. `input.pdb` | input PDB file for which to determine charges | +| **charge**, e.g. `7` | target charge state | + +#### Optional arguments +| Option | Description | +|---------------------------|---------------------------------| +| `-h`, `--help` | show this help message and exit | +| `-v`, `--verbose` | verbose output | +| `-c`, `--coulomb_only` | minimise for Coulomb repulsion only, ignores proton affinity | +| `-r` , `--relative_permittivity` | relative permittivity to use (default: 1) | +| `-a`, `--alanine_scan` | perform in silico alanine scanning for all chargeable residues | +| `-o`, `--output` OUTPUT | prefix for output files (default: ""). Gives `*proton_sites.txt` and `*charges.txt` | + + +## References +1. V. Popa, D. A. Trecroce, R. G. McAllister and L. Konermann, J. Phys. Chem. B, 2016, 120, 5114–5124. +2. M. Bakhtiari and L. Konermann, J. Phys. Chem. B, 2019, 123, 1784–1796. +3. S. K. Fegan and M. Thachuk, J. Chem. Theory Comput., 2013, 9, 2531–2539. +4. A. Moser, K. Range and D. M. York, J. Phys. Chem. B, 2010, 114, 13911–13921. diff --git a/chargePlacer.py b/chargePlacer.py index b2935e7..638f9e8 100644 --- a/chargePlacer.py +++ b/chargePlacer.py @@ -4,7 +4,7 @@ # MIT License """chargePlacer: Python implementation of a charge positioning algorithm -This is a command line script to determine a reasonablly energy +This is a command line script to determine a reasonably energy minimised proton sequence for an input PDB file (INPUT) for a given charge state (CHARGE). A search algorithm is used to sample proton permutations across chargeable side-chains and termini represented as @@ -35,7 +35,6 @@ # Standard Python Modules import argparse import os -import sys import time # Additional Modules @@ -103,14 +102,14 @@ class PDB(): def __init__(self, filename=None): if not os.path.splitext(filename)[1] in ['.pdb', '.pdbqt']: raise ValueError('Incorrect file extension, must be .pdb or .pdbqt') - else: - self.filename = filename - - self.AA_3to1 = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', - 'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G', - 'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K', - 'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S', - 'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'} + + self.filename = filename + + self.AA_3to1 = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', + 'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G', + 'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K', + 'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S', + 'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'} self.ATOM_STRING = "{}{:5d} {:4}{:.1}{:.3} {:.1}{:>4d}{:.1} {:8.3f}{:8.3f}{:8.3f}{:6.2f}{:6.2f} {:.2} \n" self.structure = [None] @@ -161,7 +160,7 @@ def _parse(self): elif record_type == 'MODEL ': open_model = True _model = int(entry.split()[1]) - if not _model == model+1: + if _model != model+1: print('MODEL records should be sequentially numbered beginning with 1.') print('MODEL {} renumbered to {}'.format(_model, model+1)) model += 1 @@ -229,9 +228,9 @@ def distance_matrix(a, b): distances : array of Euclidean distances (NxM) """ - _a = np.asarray(a)[:,np.newaxis,:] - _b = np.asarray(b)[np.newaxis,:,:] - return np.sum((_a - _b)**2, axis = -1)**0.5 + _a = np.asarray(a)[:, np.newaxis, :] + _b = np.asarray(b)[np.newaxis, :, :] + return np.sum((_a - _b)**2, axis=-1)**0.5 def symmetric_matrix(vector): """ @@ -397,7 +396,7 @@ def minimise_energy(deprot_charges, affinities, xyz, charge, coulomb_only=False, shunt_min = current_min counters = [time.process_time(), 0, 0] - while (shunt_min <= current_min): + while shunt_min <= current_min: counters[1] += 1 if verbose: print('Shunt={}'.format(counters[1])) @@ -422,7 +421,7 @@ def minimise_energy(deprot_charges, affinities, xyz, charge, coulomb_only=False, if verbose: print('Shunt {} minimum energy {:.2f} kJ/mol'.format(counters[1], shunt_min)) # Update `proton_seq` to best values - if (shunt_min >= current_min): + if shunt_min >= current_min: e_coulomb = coulomb_energy(proton_seq+deprot_charges, distances, mask) e_proton = binding_energy(proton_seq, affinities) counters[0] = time.process_time() - counters[0] @@ -461,7 +460,7 @@ def alanine_scan(resn, resi, deprot_charges, affinities, xyz, charge, coulomb_on ------- mutant_proton_seq : proton sequence after minimisation for each alanine mutant (MxN array) """ - mutable = [i for i in range(len(resi)) if not resn[i] in ['NT', 'CT']] # 'NT' and 'CT' are immutable + mutable = [i for i in range(len(resi)) if not resn[i] in ['NT', 'CT']] # 'NT' and 'CT' are immutable mutant_proton_seq = np.zeros((len(mutable), len(resn))) ignore_mask = np.ones_like(deprot_charges, dtype=bool) mutant_energies = [] @@ -479,24 +478,22 @@ def alanine_scan(resn, resi, deprot_charges, affinities, xyz, charge, coulomb_on mutant_proton_seq[r, ignore_mask] = mutant_min_energy[0] mutant_proton_seq[r, res] = np.nan ignore_mask[res] = True - mutant_energies.append(mutant_min_energy[1:3]) + mutant_energies.append(mutant_min_energy[2:4]) - return mutant_proton_seq, mutant_energies, mutable + return mutable, mutant_proton_seq, mutant_energies -def save_charge_sequence(filename, wt_charge_sequence, wt_energy, resn, resi, mutable=None, - mutant_charge_sequence=None, mut_energies=None, pdb_file=None): +def save_charge_sequence(filename, wt_charge_sequence, resn, resi, mutable=None, + mutant_charge_sequence=None, pdb_file=None): """Save charge sequences to file. Parameters ---------- filename : str wt_charge_sequence : ndarray - wt_energy : float resn : list resi : list mutable : list, optional mut_charge_sequence : ndarray, optional - mut_energies : list, optional pdb_file : str, optional Returns @@ -509,13 +506,44 @@ def save_charge_sequence(filename, wt_charge_sequence, wt_energy, resn, resi, mu outfile.write('# This file was generated by chargePlacer.py, {}\n'.format(time.strftime("%d %b %Y %H:%M:%S"))) if pdb_file: outfile.write('# From {}\n'.format(pdb_file)) - outfile.write('\t'+'\t'.join(resn)+'\n') # Residue names - outfile.write('\t'+'\t'.join(str(resi))+'\n') # Residues numbers - data_str = ','.join(['{:3.0f}'.format(i) for i in wt_charge_sequence]) - outfile.write('WT\t'+data_str+'\n') + outfile.write('--\t--\t'+'\t'.join(resn)+'\n') # Residue names + outfile.write('--\t--\t'+'\t'.join(map(str, resi))+'\n') # Residues numbers + + data_str = '{}\t{}\t'+'\t'.join(['{:.0f}']*len(resn))+'\n' + + outfile.write(data_str.format('WT','--',*wt_charge_sequence)) + if mutable: - pass - #TODO: THIS!!!!!!!!!!!!!!!!!!!!! + for m, mut in enumerate(mutable): + outfile.write(data_str.format(resn[mut],resi[mut],*mutant_charge_sequence[m])) + print('Charge sequence(s) successfully saved to {}'.format(filename+'charges.txt')) + +def save_energies(filename, wt_energies, mut_energies, mutable, resn, resi, pdb_file=None): + """Save energies to file + + Parameters + ---------- + filename : str + wt_energy : tuple + mut_energies : list of tuples + mutable : list of str + pdb_file : str + + Returns + ------- + outfile : file + Text file containing energies for alanine scanning + \t\t\t + """ + with open(filename+'energies.txt', 'w') as outfile: + outfile.write('# This file was generated by chargePlacer.py, {}\n'.format(time.strftime("%d %b %Y %H:%M:%S"))) + if pdb_file: + outfile.write('# From {}\n'.format(pdb_file)) + data_str = '{}\t{}\t{:.2f}\t{:.2f}\n' + outfile.write(data_str.format('WT','--',*wt_energies)) + for m, mut in enumerate(mutable): + outfile.write(data_str.format(resn[mut],resi[mut],*mut_energies[m])) + print('Energies successfully saved to {}'.format(filename+'energies.txt')) def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, resi, pdb_file=None): """Save minimised proton sequence to file. @@ -551,7 +579,7 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r if __name__ == '__main__': argparser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) - argparser.add_argument('input', metavar='INPUT', help='input PDB file to determine charges for') + argparser.add_argument('input', metavar='INPUT', help='input PDB file for which to determine charges') argparser.add_argument('charge', metavar='CHARGE', help='target charge state', type=int) argparser.add_argument('-v', '--verbose', help='verbose output', action='store_true') @@ -562,7 +590,7 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r help='relative permittivity to use (default: 1)', default=1, type=float) argparser.add_argument('-a', '--alanine_scan', - help='perform in silico alanine scanning for all chargeable residues', + help='perform in silico alanine scanning for all chargeable residues. Additional file energies.txt os generated', action='store_true') argparser.add_argument('-o', '--output', help='prefix for output files (default: ""). Gives *proton_sites.txt and *charges.txt', @@ -573,34 +601,16 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r E_CONST = _E_CONST / args.relative_permittivity print('Opening {} and parsing coordinates...'.format(args.input)) - resn, resi , deprot_charges, xyz, affinities = parse_coordinates(args.input) + resn, resi, deprot_charges, xyz, affinities = parse_coordinates(args.input) print('Minimising energy of proton sequence...') min_energy = minimise_energy(deprot_charges, affinities, xyz, args.charge, - coulomb_only = args.coulomb_only, - verbose = args.verbose) - - if args.alanine_scan: - print('Beginning in silico alanine scanning...') - alanine_scan(resn, - resi, - deprot_charges, - affinities, - xyz, - args.charge, - coulomb_only = args.coulomb_only, - verbose = args.verbose) - else: - save_charge_sequence(args.output, - min_energy[0], - min_energy[1:3], - resn, - resi, - args.input) - + coulomb_only=args.coulomb_only, + verbose=args.verbose) + save_proton_sequence(args.output, min_energy[0], min_energy[2], @@ -608,3 +618,36 @@ def save_proton_sequence(filename, proton_sequence, e_coulomb, e_proton, resn, r resn, resi, args.input) + + if args.alanine_scan: + print('Beginning in silico alanine scanning...') + scanned_alas = alanine_scan(resn, + resi, + deprot_charges, + affinities, + xyz, + args.charge, + coulomb_only=args.coulomb_only, + verbose=args.verbose) + + save_charge_sequence(args.output, + min_energy[0]+deprot_charges, + resn, + resi, + mutable=scanned_alas[0], + mutant_charge_sequence=scanned_alas[1]+deprot_charges, + pdb_file=args.input) + save_energies(args.output, + min_energy[2:4], + scanned_alas[2], + scanned_alas[0], + resn, + resi, + pdb_file=args.input) + else: + save_charge_sequence(args.output, + min_energy[0]+deprot_charges, + resn, + resi, + pdb_file=args.input) +