Skip to content

Commit 6bd6aef

Browse files
committed
Rename to used underscores
1 parent 850bf45 commit 6bd6aef

File tree

5 files changed

+412
-0
lines changed

5 files changed

+412
-0
lines changed

Diff for: doc/orca_addmethod.md

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Adding a new method to ORCA
2+
3+
ORCA is designed to ease the process of adding new methods. You only need to add the new algorithm's class file to folder `src/Algorithms`. After that, the method will be available to the framework and configuration files can be used to automate experiments.
4+
5+
## Method template
6+
7+
The code has follow the following template, which basically satisfies the API defined in the `Algorithm` abstract class:
8+
9+
```MATLAB
10+
classdef NEWMETHOD < Algorithm
11+
properties
12+
description = 'my NEWMETHOD method description';
13+
% Parameters to optimize and default value
14+
parameters = struct('k', 5);
15+
end
16+
17+
methods
18+
function obj = NEWMETHOD(varargin)
19+
% Process key-values pairs of parameters
20+
obj.parseArgs(varargin);
21+
end
22+
23+
function [projectedTrain, predictedTrain]= privfit( obj, train, param)
24+
% fit the model and return prediction of train set. It is called by
25+
% super class Algorithm.fit() method.
26+
...
27+
% Save the model
28+
obj.model = model;
29+
end
30+
31+
function [projected, predicted] = privpredict(obj, testPatterns)
32+
% predict unseen patterns with 'obj.model' and return prediction and
33+
% projection of patterns (for threshold models)
34+
% It is called by super class Algorithm.predict() method.
35+
end
36+
end
37+
end
38+
```
39+
40+
Where `train` is a structure with `train.patterns` being a matrix of patterns and `train.targets` being a vector with the corresponding labels. `model` class property stores the model built with the train data.
41+
42+
## Example: adding KNN to ORCA
43+
44+
To illustrate the one-step process of adding a new method, we will add the KNN classifier to ORCA. Just copy the file [KNN.m](KNN.m) to folder `src/Algorithms`:
45+
46+
```MATLAB
47+
classdef KNN < Algorithm
48+
%KNN Basic k-nearest neighbors algorithm based on Euclidean distance
49+
50+
properties
51+
description = 'k-nearest neighbors algorithm';
52+
% Parameters to optimize and default value
53+
parameters = struct('k', 5);
54+
end
55+
56+
methods
57+
function obj = KNN(varargin)
58+
%KNN constructs an object of the class KNN. Default k is 5
59+
%
60+
% OBJ = KNN('k', neighbours)
61+
% builds KNN with NEIGHBOURS as number of neighbours to consider
62+
% to label new patterns.
63+
obj.parseArgs(varargin);
64+
end
65+
66+
function [projectedTrain, predictedTrain]= privfit( obj, train, param)
67+
if(nargin == 3)
68+
obj.parameters.k = param.k;
69+
end
70+
71+
% save train data in the model structure
72+
obj.model.train = train;
73+
obj.model.parameters = obj.parameters;
74+
% Predict train labels
75+
[projectedTrain, predictedTrain] = predict(obj, train.patterns);
76+
end
77+
78+
function [projected, predicted] = privpredict(obj, testPatterns)
79+
% Variables aliases
80+
x = obj.model.train.patterns;
81+
xlabel = obj.model.train.targets;
82+
k = obj.model.parameters.k;
83+
84+
dist = pdist2(testPatterns,x);
85+
% indicies of nearest neighbors
86+
[~,nearest] = sort(dist,2);
87+
% k nearest
88+
nearest = nearest(:,1:k);
89+
% mode of k nearest
90+
val = xlabel(nearest);
91+
predicted = mode(val,2);
92+
93+
% dummy value for projections
94+
projected = -1.*ones(length(testPatterns),1);
95+
end
96+
end
97+
end
98+
```
99+
100+
Then, you can define a configuration file such as [knntoy.ini](knntoy.ini) to describe experiments using KNN:
101+
102+
```INI
103+
;Experiment ID
104+
[knn-mae-toy]
105+
{general-conf}
106+
;Datasets path
107+
basedir = ../exampledata/30-holdout
108+
;Datasets to process (comma separated list)
109+
datasets = toy
110+
;Activate data standardization
111+
standarize = true
112+
;Number of folds for the parameters optimization
113+
num_folds = 5
114+
;Crossvalidation metric
115+
cvmetric = mae
116+
117+
;Method: algorithm and parameter
118+
{algorithm-parameters}
119+
algorithm = KNN
120+
121+
;Method's hyper-parameter values to optimize
122+
{algorithm-hyper-parameters-to-cv}
123+
k = 3,5,7
124+
```
125+
126+
To run experiments described in that file, from `src` folder type:
127+
128+
```MATLAB
129+
Utilities.runExperiments('../doc/addmethod/knntoy.ini')
130+
```

Diff for: doc/orca_condor.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Experiments parallelization with HTCondor
2+
3+
ORCA can be *easily* integrated with a High Throughput Computing (HTC) environment, such as [HTCondor](http://research.cs.wisc.edu/htcondor/), so extensive experiments can be speed up. The parallelization is done at dataset partition level, i.e. each partitions of each dataset is run in a different slot of the cluster.
4+
5+
The [src/condor](../src/condor) folder contains a set of scripts that automate the use of ORCA with HTCondor. The script [condor-matlabExperiment.sh](../src/condor/condor-matlabExperiment.sh) allows to run an ORCA set of experiments by using any of the configuration files. To use the script, you will need to have the configuration file in the [src/condor](../src/condor) folder:
6+
```bash
7+
~/orca/orca$ cd src/condor/
8+
~/orca/orca/src/condor$ cp ../config-files/svorim.ini ./
9+
```
10+
Then, you have to edit the test file, so that the path of the experiments is correct with respect to the current path (replace `../example-data` by `../../example-data`). This can be done with a text editor or using the following `sed` command:
11+
```bash
12+
~/orca/orca/src/condor$ sed -i 's/\.\.\//\.\.\/\.\.\//g' svorim.ini
13+
```
14+
Now the script is ready to be used. The following command:
15+
```bash
16+
~/orca/orca/src/condor$ condor-matlabExperiment.sh svorim.ini
17+
```
18+
will create a HTCondor work and will add this work to the HTCondor queue. Each work consists of a task for dividing the work into different independent configuration files, a train-test task for each dataset partition and an extra task to collect all the data and create the reports. Most of the experimental results will be compressed, with the exception of the CSV files. To adapt the set of scripts to your HTCondor system please set up environment variables corresponding to MATLAB's path, universe, requirements and edit the ``.sh`` file.
19+
20+
Additionally, the [src/condor](../src/condor) folder includes the following files:
21+
- [condor-matlabFramework.dag](../src/condor/condor-matlabFramework.dag): this HTCondor `dag` file will run the `submit` files into the appropriate order, that is, `condor-createExperiments.submit`, `condor-runExperiments.submit` and `condor-joinResults.submit`.
22+
- [condor-createExperiments.sh](../src/condor/condor-createExperiments.sh): this a `bash` script invoking ORCA for separating the configuration file into as many configuration files as the number of datasets by the number of partitions. The script receives two command line arguments, the name of the configuration file and the name of the working directory.
23+
- [condor-createExperiments.submit](../src/condor/condor-createExperiments.submit): this HTCondor `submit` file will run `condor-createExperiments.sh` into the HTCondor cluster.
24+
- [condor-runExperiments.sh](../src/condor/condor-runExperiments.sh): this HTCondor `submit` file will run `condor-joinResults.sh` into the HTCondor cluster. The script receives two command line arguments, the name of the working directory and the number of experiment to be run.
25+
- [condor-runExperiments.submit](../src/condor/condor-runExperiments.submit): this HTCondor `submit` file will run `condor-runExperiments.sh` into the HTCondor cluster.
26+
- [condor-joinResults.sh](../src/condor/condor-joinResults.sh): this a `bash` script invoking ORCA for joining the results of all the experiments run. The script receives a command line argument, the name of the working directory.
27+
- [condor-joinResults.submit](../src/condor/condor-joinResults.submit): this HTCondor `submit` file will run `condor-joinResults.sh` into the HTCondor cluster.
28+
29+
# Experiments parallelization with other cluster environments
30+
31+
The design of ORCA allows easy parallelization with other cluster environments, given that all the intermediate results of the different partitions are saved to disk. In this way, you can use the scripts [condor-createExperiments.sh](../src/condor/condor-createExperiments.sh), [condor-runExperiments.sh](../src/condor/condor-runExperiments.sh) and [condor-joinResults.sh](../src/condor/condor-joinResults.sh) to run the code in your own cluster environment.

Diff for: doc/orca_install.md

+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# ORCA detailed build and troubleshooting
2+
3+
This a detailed install guide. If you have not done it yet, please try the [Quick Install steps](orca-quick-install.md) before continuing. After this, if you are here, that means that there have been some errors when building the `mex` files.
4+
5+
## Building `mex` files in GNU/Linux
6+
7+
`make.m` scripts could fail if the compiler is not correctly installed and configured. In those cases, please try `mex -setup` to choose a suitable compiler for `mex`. Make sure your compiler is accessible. Then type `make` to start the installation.
8+
9+
If neither `make.m` nor `mex -setup` works, the simplest way to compile all the algorithms is to use the [Makefile](../src/Makefile) included in ORCA. This will compile all the algorithms and clean intermediate object files.
10+
11+
12+
### Building `mex` files for Matlab from the GNU/Linux terminal
13+
14+
For building the mex files in MATLAB, you need to properly configure the MATLABDIR variable of [Makefile](../src/Makefile), in order to point out to the MATLAB installation directory (for instance `MATLABDIR = /usr/local/MATLAB/R2017b`). Then, from the `bash` terminal:
15+
```bash
16+
$ cd src/
17+
$ make
18+
```
19+
20+
### Building `mex` files for Octave from the GNU/Linux terminal
21+
For building the `mex` files in Octave, you will need to configure the OCTAVEDIR variable in the [Makefile](../src/Makefile). This variable has to point out to the Octave heather files (for instance, `OCTAVEDIR = /usr/include/octave-4.0.0/octave/`). Then, from the `bash` terminal:
22+
```bash
23+
$ cd src/
24+
$ make octave
25+
```
26+
27+
## Building `mex` files in Windows
28+
29+
In Windows, we recommend compiling `mex` files from Octave/MATLAB console.
30+
31+
### Building `mex` files in Windows for Octave
32+
33+
Default Octave installation provides `mex` command pre-configured with `MinGW`.
34+
35+
1. Inside Octave's console, run `make` in folder `src\Algorithms`
36+
1. From `src` run `runtestssingle` to check the installation.
37+
38+
### Building `mex` files in Windows for Matlab
39+
40+
1. Install a [supported compiler](https://es.mathworks.com/support/compilers.html). The easier way is to use the "Add-ons" assistant to download
41+
and install [MinGW](http://es.mathworks.com/help/matlab/matlab_external/install-mingw-support-package.html).
42+
1. Test [basic C example](https://es.mathworks.com/matlabcentral/fileexchange/52848-matlab-support-for-mingw-w64-c-c++-compiler) to ensure `mex` is properly working.
43+
1. From the MATLAB's console, run `make` in `src\Algorithms`.
44+
1. Then run `runtestssingle` in `src` to check the instalation.
45+
46+
We provide binaries and *dlls* for 'ORBoost', because building this method in Windows can be very *complex*. Make will unpack all the binary files. If you need to compile your own binaries, these are the steps:
47+
48+
1. Install [w64-mingw32](https://mingw-w64.org).
49+
1. Open a terminal by pressing Windows icon and type `cmd.exe`.
50+
1. Set Windows path to your `w64-mingw32` installation binaries dir, for instance:
51+
```
52+
set PATH=C:\Program Files\mingw-w64\x86_64-7.2.0-posix-seh-rt_v5-rev0\mingw64\bin;"%PATH%"
53+
```
54+
1. Move to directory `orca\src\Algorithms\orensemble\orensemble`.
55+
1. Run `mingw32-make.exe Makefile.win all`.
56+
57+
## Individually compiling the algorithms
58+
59+
If you are not able to compile all the algorithms using the above methods, we recommend individually compiling them. Under GNU/Linux, please, edit the files `src/Algorithms/libsvm-rank-2.81/matlab/Makefile`, `src/Algorithms/libsvm-weights-3.12/matlab/Makefile`, `src/Algorithms/SVOREX/Makefile` and `src/Algorithms/SVORIM/Makefile`. Make sure that the variables `MATLABDIR` or `OCTAVEDIR` are correctly pointing to the folders. For MATLAB, you can also make a symbolic link to your current Matlab installation folder:
60+
```bash
61+
$ sudo ln -s /path/to/matlab /usr/local/matlab
62+
```
63+
The following subsections provides individual instructions for compiling each of the dependencies in case the global [Makefile](../src/Algorithms/Makefile) still fails or for those which are working in other operating systems.
64+
65+
### libsvm-weights-3.12
66+
67+
These instructions are adapted from the corresponding README of `libsvm`. First, you should open MATLAB/Octave console and then `cd` to the directory `src/Algorithms/libsvm-weights-3.12/matlab`. After that, try to compile the `MEX` files using `make.m` (from the MATLAB/Octave console):
68+
```MATLAB
69+
>> cd src/Algorithms/libsvm-weights-3.12/matlab
70+
>> make
71+
```
72+
73+
### libsvm-rank-2.81
74+
75+
To compile this dependency, the instructions are similar to those of `libsvm-weights-3.12` (from the MATLAB/Octave console):
76+
```MATLAB
77+
>> cd src/Algorithms/libsvm-rank-2.81/matlab
78+
>> make
79+
```
80+
81+
### SVOREX and SVORIM
82+
83+
For both algorithms, please use the `make.m` file included in them (from the MATLAB/Octave console):
84+
```MATLAB
85+
>> cd src/Algorithms/SVOREX
86+
>> make
87+
>> cd ..
88+
>> cd SVORIM
89+
>> make
90+
```
91+
92+
### orensemble
93+
94+
We have not prepared a proper MEX interface for ORBoost, so the binary files of this algorithm should be compiled and are then invoked directly from Matlab. For compiling the ORBoost algorithm, you should uncompress the file `orsemble.tar.gz` and compile the corresponding source code. In GNU/Linux, this can be done by (from the `bash` console):
95+
```bash
96+
$ cd src/Algorithms/orensemble
97+
$ tar zxf orensemble.tar.gz
98+
$ cd orensemble/
99+
$ make
100+
g++ -Ilemga-20060516/lemga -Wall -Wshadow -Wcast-qual -Wpointer-arith -Wconversion -Wredundant-decls -Wwrite-strings -Woverloaded-virtual -D NDEBUG -O3 -funroll-loops -c -o robject.o lemga-20060516/lemga/object.cpp
101+
...
102+
```
103+
Then, you should move the binary files to `..` folder and clean the folder (from the `bash` console):
104+
```bash
105+
$ mv boostrank-predict ../
106+
$ mv boostrank-train ../
107+
$ cd ..
108+
$ rm -Rf orensemble
109+
```

Diff for: doc/orca_parallel.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Parallelizing experiments with ORCA
2+
3+
ORCA can take advantage of MATLAB's parallel toolbox. The parallelism is done at dataset partition level. The method `runExperiments` of the class [Utilities](../src/Utilities.m) includes the following optional arguments, apart from the name of the configuration file:
4+
- `'parallel'`: *false* or *true* to activate CPU parallel processing of databases's folds. Default value is '*false*'.
5+
- `'numcores'`: default maximum number of cores or desired number. If *true* and numcores is lower than 2, this paramter sets the maximum number of cores.
6+
- `'closepool`': whether to close or not the pool after the experiments. Default *true*. Disabling it can speed up consecutive calls to `runExperiments` saving the time of opening and closing pools.
7+
8+
The improvement is done in models fitting and prediction. However, the reports have to be generated sequentially. Given that lots of metrics are obtained in these reports, this non-parallelizable operation is very costly.
9+
10+
In Octave, the `parfor` tool is not yet implemented. However, we have adapted the code to use the `parallel` package which provides similar functionality. If you want to parallelize experiments in Octave, you will have to install the corresponding package:
11+
```MATLAB
12+
pkg install -forge parallel
13+
```
14+
15+
These are some examples measuring the performance improvement:
16+
```MATLAB
17+
% Launch experiments sequentially
18+
tic;Utilities.runExperiments('tests/cvtests-30-holdout/kdlor.ini');toc
19+
...
20+
Elapsed time is 318.869864 seconds.
21+
22+
% Launch parallel experiments with maximum number of cores
23+
tic;Utilities.runExperiments('tests/cvtests-30-holdout/kdlor.ini', 'parallel', true);toc
24+
...
25+
Elapsed time is 190.453860 seconds.
26+
27+
% Runs parallel folds with max workers and do not close the pool
28+
Utilities.runExperiments('tests/cvtests-30-holdout/kdlor.ini', 'parallel', 1, 'closepool', false)
29+
Utilities.runExperiments('tests/cvtests-30-holdout/svorim.ini', 'parallel', 1, 'closepool', false)
30+
31+
```

0 commit comments

Comments
 (0)