Skip to content

Command line parameters

Pavel edited this page Jul 19, 2016 · 14 revisions

-h, --help
Displays help page in console.

-i, --in
Use: -i input.sdf
Path to the sdf-file containing single compounds.

-o, --out
Use: -o output.txt
Path to the output text file, where calculated descriptors will be saved.

-b, --output_format
Use: -b svm
format of output file with calculated descriptors (txt|svm). Txt is ordinary tab-separated text file. It contains compounds in rows and descriptors in columns. The first line of the file is a header containing the names of descriptors, the first column contains the names of compounds. Svm is a sparse format, two additional files will be saved with extensions .colnames and .rownames. Default: txt.

-a, --atoms_labeling
Use: -a elm none
list of atom labeling schemes separated by space. Built-in scheme is element (elm) and topology (none). To include other schemes user should specify the name of the corresponding property value identical to the name of SDF field, which contains calculated atomic properties. Fields names are case-sensitive. For RDF/RXN input files only built-in types can be used at that moment. Default: elm.

Since atomic characteristics can be continuous values (e.g. charges) they should be split on bins, and each bin will receive its own label starting from "A" and so on. To specify threshold for bins you need to create a file "setup.txt" in the directory with your input file, each line has following format:

CHARGE=-0.5<0<0.5  

This means that for property named CHARGE four bins will be created and atoms of the first bin (having charge less then or equal -0.5) will be marked by label "A", atoms from the second bin (having charge in the range (-0.5;0]) will be marked by label "B" and so on.

--min_atoms
Use: --min_atoms 2
The minimal number of atoms in fragments. Default: 4

--max_atoms
Use: --max_atoms 6
The maximum number of atoms in fragments. Default: 4

--min_components
Use: --min_components 1
The minimal number of disconnected groups of atoms in the fragment. Default: 1 (means fully connected fragments).

--max_components
Use: --max_components 2
The maximal number of disconnected groups of atoms in the fragment. Default: 2. Note: increasing of this value may lead to combinatorial explosion and slow calculations.

-q, --quasi_mix
Use: -q
calculate quasi-mixture descriptors for single compounds

-m, --mixtures
Use: -m mixtures.txt
text file containing list of mixtures of components and their ratios. Names of components should be the same as in the input.sdf file.

File format (all values are tab-separated):

ethanol	acetone	1	1
n-butanol	acetone	1	2
i-propanol	butanone-2	1	3
i-propanol	cyclopentanone	butanone-2	2	1	1

You may specify mixtures with different number of compounds in a single file.

--min_mix_components
Use: --min_mix_components 2
The minimal number of components which contribute to mixture fragments. Default: 2 (and cannot be less).

--max_mix_components
Use: --max_mix_components 3
The maximal number of molecules which contribute to mixture fragments. Default: 2 (take into account only binary interactions in a mixture).

--mix_type
Use: --mix_type rel
Possible values: abs|rel
abs: means that amount of components given in a mixture file will be considered as is.
rel: means that amount of components given in a mixture file will be taken as relative amount and will be scaled to the sum of 1. Default: abs.

-r, --mix_ordered
Use: -r
if set this flag the mixtures will be considered ordered, otherwise as unordered. In ordered mixtures role of each component is known and position of a component in a mixture description file will be taken into account. In unordered mixtures all components are equitable and their roles don't depend on their positions in mixture description. Used only in combination with -m key.

In ordered mixtures the mixture of ethanol:acetone (1:1) and acetone:ethanol (1:1) are two different mixtures and will result in different descriptors. In descriptor names the index of a component in the mixture will precede the atom labels in SMILES part (e.g. 1C-1C.2C=2O). This option can be useful if you confident that components have different roles, e.g. host-guest complexes, ligand-receptor complexes, etc.

--mix_self_association
Use: --mix_self_association
calculates mixture descriptors between components with themselves in order to take into account self-interaction of components. Default: false.

--reaction_diff
Use: --reaction_diff
if set this flag difference between product and reactant descriptors will be calculated. By default these two feature vectors are concatenated.

--descriptors_transformation
Use: --descriptors_transformation rel
Possible values: num|rel|both.
num: numbers of fragments (for single compounds) or number of fragments combinations weighted by their molar ratios (for mixtures).
prob: final value of descriptors are divided on sum of all descriptors to represent the relative frequency of occurring of fragments combinations. Descriptors for single compounds and mixtures are weighted separately.
both: will generate both types of descriptors. Default: num.

-v, --verbose
Use: -v
progress will be printed out (may cause decrease in speed).

-x, --noH
Use: -x
If set this flag then only non-hydrogen atoms will be considered during descriptors calculation. Can significantly increase calculation speed at the cost of somewhat lower discriminative ability of descriptors set.

-f, --fragments
Use: -f fragments.txt
the path to the text file which contains information about fragments which should be excluded from the structure of compound during descriptors calculation. File format:

cyclopentanone	carbocycle	1	2	3	4	5
cyclopentanone	carbonyl	5	6
cyclopentanone	oxygen	6

There is no header in the file. Each line is a tab-separated and contained compound name, fragment name and indexes (1-based) of atoms belonging to the named fragment which will be excluded at descriptor generation process. This option is useful if you want to estimate contribution of separate fragments into the target property.

-w, --id_field_name
Use: -w Compound_ID
field name of unique ID for compounds (sdf) or reactions (rdf/rxn). If omitted for sdf molecule titles will be used or auto-generated names; for rdf $RIREG/$REREG/$MIREG/$MEREG field if it is not empty or auto-generated names.

Clone this wiki locally