Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug: the random value generator used in svm_binary_svc_probability() function will not work well when training data size is large. #103

Open
yangguangfd opened this issue Sep 4, 2017 · 2 comments · May be fixed by #140

Comments

@yangguangfd
Copy link

In svm_binary_svc_probability() function, random shuffle is applied on the train data before it is used in the 5-fold cross-validation process. The random shuffle is realized by the following codes:

for(i=0;il;i++) perm[i]=i;
for(i=0;il;i++)
{
int j = i+rand()%(prob->l-i);
swap(perm[i],perm[j]);
}

The C++ rand() function in the codes returns a random number in the range between 0 and RAND_MAX. Normally, RAND_MAX is 32767 (on my PC, windows, x64-based processor, RAND_MAX is also this value). So if prob->l-i is larger than RAND_MAX, the codes above can only shuffle index between 0 and RAND_MAX. I noticed that the train data input svm_problem *prob of the function svm_binary_svc_probability() had already been sorted by the data label (+1, -1 for binary classification), so the first part of prob->y[i] are for label being +1. If the number of train data with label being +1 is above RAND_MAX, in the 5-fold cross-validation, the first "predicting data set" will probably be the ones all with label +1. This will create weird results for estimating probA and probB.

So I suggest using the random function from William H. Press, et al.,
Numerical Recipes in C, which can return a random float value between 0 and 1. And another question is, in svm_binary_svc_probability() function, why not using stratified shuffle as it is used in svm_cross_validation() function?

@cjlin1
Copy link
Owner

cjlin1 commented Sep 4, 2017 via email

@smarie
Copy link

smarie commented Mar 25, 2019

FYI if you're still interested :) I submitted a fix in PR #140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants