SanXoT_advancedHelp.txt


SanXoT v1.00 is a program made in the Jesus Vazquez Cardiovascular Proteomics
Lab at Centro Nacional de Investigaciones Cardiovasculares, used to perform
integration of experimental data to a higher level (such as integration from
peptide data to protein data), while determining the variance between them.

SanXoT needs two input files:

     * the lower level input data file, a tab separated text file containing
     three columns: the first one with the unique identifiers of each lower
     level element (such as "RawFile05.raw-scan19289-charge2" for a scan, or
     "CGLAGCGLLK" for a peptide sequence, or "P01308" for the Uniprot accession
     number of a protein), the Xi which corresponds to the log2(A/B), and the
     Vi which corresponds to the weight of the measure). This data have to be
     pre-calibrated with a certain weight (see help of the Klibrate program).

     * the relations file, a tab separated text file containing a first column
     with the higher level identifiers (such as the peptide sequence, a Uniprot
     accession number, or a Gene Ontology category) and the lower level
     identifiers within the abovementioned input data file.
     
     NOTE: you must include a first line header in all your files.

And delivers five output file:

     * the output data file for the higher level, which has the same format as
     the lower level data file, but containing the ids of the higher level in
     the first column, the ratio Xj in the second column, and the weight Vj in
     the third column. By default, this file is suffixed as "_outHigher".
     
     * the lower level output file, containing three columns: the first one
     with the identifiers of the lower level, the second containing the
     Xinf - Xsup (i.e. the ratios of the lower level, but centered for each
     element they belong to), and the new weight Winf, contanining the variance
     of the integration. For example, integrating from scan to peptide, this
     file would contain firstly the scan identifiers, secondly the Xscan - Xpep
     (the ratios of each scan compared to the peptide they are identifying) and
     Wscan (the weight of the scan, taking into account the variance of the
     scan distribution). By default, this file is suffixed "_outLower".
     
     * a file useful for statistics, containing all the relations of the higher
     and lower level element present in the data file, with a copy of their
     ratios X and weights V, followed by the number of lower elements contained
     in the upper element (for example, the number of scans that identify the
     same peptide), the Z (which is the distance in sigmas of the lower level
     ratio X to the higher level weighted average), and the FDR (the false
     discovery rate, important to keep track of changes or outliers). By
     default, this file is suffixed "_outStats".
     
     * an info file, containing a log of the performed integrations. Its last
     line is always in the form of "Variance = [double]". This file can be used
     as input in place of the variance (see -v and -V arguments). By default,
     this file is suffixed "_outInfo".
     
     * a graph file, depicting the sigmoid of the Z column which appears in the
     stats file, compared to the theoretical normal distribution. By default,
     this file is suffixed "_outGraph".
     
Usage: sanxot.py -d[data file] -r[relations file] [OPTIONS]

   -h, --help          Display basic help and exit.
   -H, --advanced-help Display this help and exit.
   -a, --analysis=string
                       Use a prefix for the output files. If this is not
                       provided, then the prefix will be garnered from the data
                       file.
   -b, --no-verbose    Do not print result summary after executing.
   -C, --confluence    A modified version of the relations file is used, where
                       all the destination higher level elements are "1". If no
                       relations file is provided, the program gets the lower
                       level elements from the first column of the data file.
   -d, --datafile=filename
                       Data file with identificators of the lowel level in the
                       first column, measured values (x) in the second column,
                       and weights (v) in the third column.
   -f, --forceparameters
                       Use the parameters as provided, without using the
                       Levenberg-Marquardt algorithm.
   -g, --no-graph      Do not show the Zij vs rank / N graph.
   -G, --outgraph=filename
                       To use a non-default name for the graph file.
   -l, --graphlimits=integer
                       To set the +- limits of the Zij graph (default is 6). If
                       you want the limits to be between the minimum and
                       maximum values, you can use -l0.
   -L, --outinfo=filename
                       To use a non-default name for the info file.
   -m, --maxiterations=integer
                       Maximum number of iterations performed by the Levenberg-
                       Marquardt algorithm to calculate the variance. If
                       unused, then the default value of the algorithm is
                       taken.
   -o, --outhigher=filename
                       To use a non-default higher level output file name.
   -p, --place, --folder
                       To use a different common folder for the output files.
                       If this is not provided, the the folder used will be the
                       same as the input folder.
   -r, --relfile, --relationsfile=filename
                       Relations file, with identificators of the higher level
                       in the first column, and identificators of the lower
                       level in the second column.
   -R, --randomise, --randomize
                       A modified version of the relations file is used, where
                       the higher level elements (first column) are replaced by
                       numbers and randomly written in the first column. The
                       numbers range from 1 to the total number of elements.
                       The second column (containing the lowel level elements)
                       remains unchanged.
   -s, --no-steps      Do not print result summary and the steps of every
                       Levenberg-Marquardt iteration.
   -u, --outlower=filename
                       To use a non-default lower level output file name.
   -v, --var, --varianceseed=double
                       Seed used to start calculating the variance.
                       Default is 0.001.
   -V, --varfile=filename
                       Get the variance value from a text file. It must contain
                       a line (not more than once) with the text
                       "Variance = [double]". This suits the info file from a
                       previous integration (see -L).
   -z, --outstats=filename
                       To use a non-default stats file name.

examples (use "sanxot.py" if you are not using the standalone version):

* To calculate the variance starting with a seed = 0.02, using a datafile.txt
and a relationsfile.txt, both in C:\temp:

sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -v0.02

* To get fast results of an integration forcing a variance = 0.02922:

sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -f -v0.02922

* To get an integration forcing the variance reported in the info file at
C:\data\infofile.txt, and saving the resulting graph in C:\data\ instead
of C:\temp\:

sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -f -VC:\data\infofile.txt -GC:\data\graphFile.png