-
Notifications
You must be signed in to change notification settings - Fork 1
/
SanXoT_advancedHelp.txt
executable file
·141 lines (120 loc) · 7.72 KB
/
SanXoT_advancedHelp.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
SanXoT v1.00 is a program made in the Jesus Vazquez Cardiovascular Proteomics
Lab at Centro Nacional de Investigaciones Cardiovasculares, used to perform
integration of experimental data to a higher level (such as integration from
peptide data to protein data), while determining the variance between them.
SanXoT needs two input files:
* the lower level input data file, a tab separated text file containing
three columns: the first one with the unique identifiers of each lower
level element (such as "RawFile05.raw-scan19289-charge2" for a scan, or
"CGLAGCGLLK" for a peptide sequence, or "P01308" for the Uniprot accession
number of a protein), the Xi which corresponds to the log2(A/B), and the
Vi which corresponds to the weight of the measure). This data have to be
pre-calibrated with a certain weight (see help of the Klibrate program).
* the relations file, a tab separated text file containing a first column
with the higher level identifiers (such as the peptide sequence, a Uniprot
accession number, or a Gene Ontology category) and the lower level
identifiers within the abovementioned input data file.
NOTE: you must include a first line header in all your files.
And delivers five output file:
* the output data file for the higher level, which has the same format as
the lower level data file, but containing the ids of the higher level in
the first column, the ratio Xj in the second column, and the weight Vj in
the third column. By default, this file is suffixed as "_outHigher".
* the lower level output file, containing three columns: the first one
with the identifiers of the lower level, the second containing the
Xinf - Xsup (i.e. the ratios of the lower level, but centered for each
element they belong to), and the new weight Winf, contanining the variance
of the integration. For example, integrating from scan to peptide, this
file would contain firstly the scan identifiers, secondly the Xscan - Xpep
(the ratios of each scan compared to the peptide they are identifying) and
Wscan (the weight of the scan, taking into account the variance of the
scan distribution). By default, this file is suffixed "_outLower".
* a file useful for statistics, containing all the relations of the higher
and lower level element present in the data file, with a copy of their
ratios X and weights V, followed by the number of lower elements contained
in the upper element (for example, the number of scans that identify the
same peptide), the Z (which is the distance in sigmas of the lower level
ratio X to the higher level weighted average), and the FDR (the false
discovery rate, important to keep track of changes or outliers). By
default, this file is suffixed "_outStats".
* an info file, containing a log of the performed integrations. Its last
line is always in the form of "Variance = [double]". This file can be used
as input in place of the variance (see -v and -V arguments). By default,
this file is suffixed "_outInfo".
* a graph file, depicting the sigmoid of the Z column which appears in the
stats file, compared to the theoretical normal distribution. By default,
this file is suffixed "_outGraph".
Usage: sanxot.py -d[data file] -r[relations file] [OPTIONS]
-h, --help Display basic help and exit.
-H, --advanced-help Display this help and exit.
-a, --analysis=string
Use a prefix for the output files. If this is not
provided, then the prefix will be garnered from the data
file.
-b, --no-verbose Do not print result summary after executing.
-C, --confluence A modified version of the relations file is used, where
all the destination higher level elements are "1". If no
relations file is provided, the program gets the lower
level elements from the first column of the data file.
-d, --datafile=filename
Data file with identificators of the lowel level in the
first column, measured values (x) in the second column,
and weights (v) in the third column.
-f, --forceparameters
Use the parameters as provided, without using the
Levenberg-Marquardt algorithm.
-g, --no-graph Do not show the Zij vs rank / N graph.
-G, --outgraph=filename
To use a non-default name for the graph file.
-l, --graphlimits=integer
To set the +- limits of the Zij graph (default is 6). If
you want the limits to be between the minimum and
maximum values, you can use -l0.
-L, --outinfo=filename
To use a non-default name for the info file.
-m, --maxiterations=integer
Maximum number of iterations performed by the Levenberg-
Marquardt algorithm to calculate the variance. If
unused, then the default value of the algorithm is
taken.
-o, --outhigher=filename
To use a non-default higher level output file name.
-p, --place, --folder
To use a different common folder for the output files.
If this is not provided, the the folder used will be the
same as the input folder.
-r, --relfile, --relationsfile=filename
Relations file, with identificators of the higher level
in the first column, and identificators of the lower
level in the second column.
-R, --randomise, --randomize
A modified version of the relations file is used, where
the higher level elements (first column) are replaced by
numbers and randomly written in the first column. The
numbers range from 1 to the total number of elements.
The second column (containing the lowel level elements)
remains unchanged.
-s, --no-steps Do not print result summary and the steps of every
Levenberg-Marquardt iteration.
-u, --outlower=filename
To use a non-default lower level output file name.
-v, --var, --varianceseed=double
Seed used to start calculating the variance.
Default is 0.001.
-V, --varfile=filename
Get the variance value from a text file. It must contain
a line (not more than once) with the text
"Variance = [double]". This suits the info file from a
previous integration (see -L).
-z, --outstats=filename
To use a non-default stats file name.
examples (use "sanxot.py" if you are not using the standalone version):
* To calculate the variance starting with a seed = 0.02, using a datafile.txt
and a relationsfile.txt, both in C:\temp:
sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -v0.02
* To get fast results of an integration forcing a variance = 0.02922:
sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -f -v0.02922
* To get an integration forcing the variance reported in the info file at
C:\data\infofile.txt, and saving the resulting graph in C:\data\ instead
of C:\temp\:
sanxot -dC:\temp\datafile.txt -rrelationsfile.txt -f -VC:\data\infofile.txt -GC:\data\graphFile.png