Skip to content

Latest commit

 

History

History
144 lines (127 loc) · 6 KB

README.md

File metadata and controls

144 lines (127 loc) · 6 KB

sugarMassesPredict

command line tool to calculate all possible glycan molecules and the m/z values of their ions given a set of input parameters

NOTE: I have now also added an 'R' version - if you use the reticulate package in R, you can source the file (sugarMassesPredict-r.py) and then use the function 'predict_sugars' directly in R to generate output there.

this tool is being frequently updated :) if you have any questions or issue please feel free to contact me at [email protected]

word of caution: please note that this tool will predict sugars that are not possible as the nature of sugar chemistry means that it would take a long time to add in all the constraints!

e.g.

library(reticulate)
py_install("pandas", "numpy")
source_python("sugarMassesPredict-r.py")
dp1 = as.integer(1)
dp2 = as.integer(3)
ESI_mode = 'pos'
scan_range1 = as.integer(100)
scan_range2 = as.integer(800)
pent_option = as.integer(1)
modifications = list('sulphate', 'deoxy')
label = "procainamide"
df <- predict_sugars(dp1 = dp1, dp2 = dp2, ESI_mode = ESI_mode, scan_range1 = scan_range1, scan_range2 = scan_range2, pent_option = pent_option, modifications = modifications, label = label) 

dependencies

  • pandas
  • numpy
  • python 3

input parameters

required

  • dp (degree of polymerisation) range
  • whether pentose monomers should be used in addition to hexose
  • modifications - possible options are none, all, or any combination of:
    • sulphate
    • carboxyl
    • phosphate
    • deoxy
    • N-acetyl
    • O-acetyl
    • O-methyl
    • anhydrobridge
    • unsaturated
    • alditol
    • amino
    • dehydrated
  • maximum number of modifications per monomer on average
  • ionisation mode
  • scan range (m/z)

optional

  • label - current options are procainamide (added by reductive amination) and benzoic acid (added on free alcohol groups, will calculate glycans with no label to the maximum number of labels possible)
  • output file path - defaults to "predicted_sugars.txt"
  • options to do with the calculation of the possible number of structural isomers (but this section needs to be fixed)

output

tab delimited text file with one row per molecule. m/z values outside the scan range as shown as "NA", and molecules with no ions with m/z values within the scan are not returned. columns are as follows:

  • degree of polymerisation (dp)
  • name
  • monoisotopic mass (Da)
  • sum formula
  • columns with m/ values of possible ions given the input parameters.
    • positive mode:
      • [M+H]+
      • [M+Na]+
    • negative mode:
      • [M+Cl]-
      • [M+CHOO]-
      • [M+2Cl]2-
      • [M+2CHOO]2-
      • [M+Cl-H]2-
      • [M+CHOO-H]2-
      • [M+CHOO+Cl]2-
      • [M-nH]n-, where n is 1 to the maximum number anionic groups that any single molecule in the table has
  • if parameters related to isomers were specified in the input, there is an additional column for the number of possible isomers per molecule

how to run

the help menu accessed with:

sugarMassesPredict.py -h

returns the following:

usage: sugarMassesPredict.py [-h] -dp int int [-p int] -m str [str ...]
                             [-n int] [-ds int] [-ld str [str ...]] [-oh int]
                             [-b int] -i str [str ...] -s int int [-l label]
                             [-o filepath]

Script to predict possible masses of unknown sugars. Written by Margot Bligh.

optional arguments:
  -h, --help            show this help message and exit
  -dp int int, --dp_range int int
                        DP range to predict within: two space separated
                        numbers required (lower first)
  -p int, --pent_option int
                        should pentose monomers be considered as well as
                        hexose: 0 for no {default}, 1 for yes
  -m str [str ...], --modifications str [str ...]
                        space separated list of modifications to consider.
                        note that alditol and unsaturated are max once per
                        saccharide. allowed values: none OR all OR any
                        combination of carboxyl, phosphate, deoxy, nacetyl,
                        omethyl, anhydrobridge, oacetyl, unsaturated, alditol,
                        sulphate
  -n int, --nmod_max int
                        max no. of modifications per monomer on average
                        {default 1}. does not take into account unsaturated or
                        alditol.
  -ds int, --double_sulphate int
                        can monomers be double-sulphated: 0 for no {default},
                        1 for yes. for this you MUST give a value of at least
                        2 to -n/--nmod_max
  -ld str [str ...], --LorD_isomers str [str ...]
                        isomers calculated for L and/or D enantiomers {default
                        D only}. write space separated if both
  -oh int, --OH_stereo int
                        stereochem of OH groups considered when calculating
                        no. of isomers: 0 for no {default}, 1 for yes
  -b int, --bond_stereo int
                        stereochem of glycosidic bonds and reducing end
                        anomeric carbons considered when calculating no. of
                        isomers: 0 for no {default}, 1 for yes
  -i str [str ...], --ESI_mode str [str ...]
                        neg and/or pos mode for ionisation (space separated if
                        both)
  -s int int, --scan_range int int
                        mass spec scan range to predict within: two space
                        separated numbers required (lower first)
  -l label, --label label
                        name a label added to the oligosaccharide. if not
                        labelled do not include. options: procainamide OR
                        benzoic_acid.
  -o filepath, --output filepath
                        filepath to .txt file for output table {default:
                        predicted_sugars.txt}