Nested cross-validation #163

ziqianwang9 · 2020-02-20T15:56:18Z

Dear Lin,
thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation.
Here I list the script I used for my study:

clear all;
load median20190923.mat

%leave-one-out cross-validation
w = zeros(size(data_all));% weight
h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)
    waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
    new_DATA = data_all;
    new_label  = label;
    test_data   = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA;
    test_label   = label(i,:);new_label(i,:) = [];train_label = new_label;
    
%  Data Normalization
    [train_data,PS] = mapminmax(train_data',0,1);
    test_data          = mapminmax('apply',test_data',PS);
    train_data = train_data';
    test_data   = test_data';
    
    % RFE feature selectioin
    step = 1;
    ftRank = SVMRFE(train_label,train_data, step,'-t 0');
    IX = ftRank(1:ceil(length(ftRank)*0.4));
    
    [bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
    cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
    
    model = svmtrain(train_label,train_data(:,IX),cmd);
    w(i,IX)   = model.SVs'*model.sv_coef; 
    [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model);
    acc(i,1) = accuracy(1);
    deci_value(i,1) = deci;
%     clear  test_data  train_data test_label train_label model IX k
end
w_msk = double(sum(w~=0,1)==size(w,1));
w = mean(w,1).*w_msk;
acc_final = mean(acc);
disp(['accuracy - ',num2str(acc_final)]);

% ROC
[X,Y,T,AUC] = perfcurve(label,deci_value,1);
figure;plot(X,Y);hold on;plot(X,X,'-');
xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X)
    Cut_off(i,1) = (1-X(i))*Y(i);
end
[~,maxind] = max(Cut_off);
Specificity = 1-X(maxind);
Sensitivty = Y(maxind);
disp(['Specificity= ', num2str(Specificity)]);
disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n');
Nsloop = 5000;
auc_rand = zeros(Nsloop,1);
for i=1:Nsloop
    label_rand = randperm(length(label));
    deci_value_rand = deci_value(label_rand);
    [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
    clear label_rand
end
p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold.
Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best,
Ziqian

The text was updated successfully, but these errors were encountered:

cjlin1 · 2020-02-20T21:30:51Z

To implement CV in matlab what you need to do are - randomly permute data by randperm() - use a for loop to get each validation fold num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold); - then use this "range" to extract the validation fold. The training fold can be get by a similar way - then do training/prediction, and aggregate results to get CV acuracy - for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

…

On 2020-02-20 23:56, ziqianwang9 wrote: Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study: clear all; load median20190923.mat %leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..'); for i = 1:size(data_all,1) waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label; % Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data'; % RFE feature selectioin step = 1; ftRank = SVMRFE(train_label,train_data, step,'-t 0'); IX = ftRank(1:ceil(length(ftRank)*0.4)); [bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1']; model = svmtrain(train_label,train_data(:,IX),cmd); w(i,IX) = model.SVs'*model.sv_coef; [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]); % ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate'); for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]); fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]); Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this? Best, Ziqian -- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] #163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ

ziqianwang9 · 2020-02-28T10:19:38Z

Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV: The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set). My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful. Best, Ziqian

…

在 2020年2月20日，下午10:30，Chih-Jen Lin ***@***.***> 写道： To implement CV in matlab what you need to do are - randomly permute data by randperm() - use a for loop to get each validation fold num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold); - then use this "range" to extract the validation fold. The training fold can be get by a similar way - then do training/prediction, and aggregate results to get CV acuracy - for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that On 2020-02-20 23:56, ziqianwang9 wrote: > Dear Lin, > thanks for providing this useful toolbox. I'm trying to use it to > publish a paper, here I met some problem from the reviewer. He > suggested me to use the nested cross-validation. > Here I list the script I used for my study: > > clear all; > load median20190923.mat > > %leave-one-out cross-validation > w = zeros(size(data_all));% weight > h = waitbar(0,'please wait..'); > > for i = 1:size(data_all,1) > > waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) > new_DATA = data_all; > new_label = label; > test_data = data_all(i,:); new_DATA(i,:) = []; train_data = > new_DATA; > test_label = label(i,:);new_label(i,:) = [];train_label = > new_label; > > % Data Normalization > [train_data,PS] = mapminmax(train_data',0,1); > test_data = mapminmax('apply',test_data',PS); > train_data = train_data'; > test_data = test_data'; > > % RFE feature selectioin > step = 1; > ftRank = SVMRFE(train_label,train_data, step,'-t 0'); > IX = ftRank(1:ceil(length(ftRank)*0.4)); > > [bestacc,bestc] = > SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); > cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1']; > > model = svmtrain(train_label,train_data(:,IX),cmd); > w(i,IX) = model.SVs'*model.sv_coef; > [predicted_label, accuracy, deci] = > svmpredict(test_label,test_data(:,IX),model); > acc(i,1) = accuracy(1); > deci_value(i,1) = deci; > % clear test_data train_data test_label train_label model IX k > end > w_msk = double(sum(w~=0,1)==size(w,1)); > w = mean(w,1).*w_msk; > acc_final = mean(acc); > disp(['accuracy - ',num2str(acc_final)]); > > % ROC > [X,Y,T,AUC] = perfcurve(label,deci_value,1); > figure;plot(X,Y);hold on;plot(X,X,'-'); > xlabel('False positive rate'); ylabel('True positive rate'); > > for i=1:length(X) > Cut_off(i,1) = (1-X(i))*Y(i); > end > [~,maxind] = max(Cut_off); > Specificity = 1-X(maxind); > Sensitivty = Y(maxind); > disp(['Specificity= ', num2str(Specificity)]); > disp(['Sensitivty= ', num2str(Sensitivty)]); > > fprintf('Permutation test ......\n'); > Nsloop = 5000; > auc_rand = zeros(Nsloop,1); > for i=1:Nsloop > label_rand = randperm(length(label)); > deci_value_rand = deci_value(label_rand); > [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); > clear label_rand > end > p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); > disp(['Pvalue= ', num2str(p_auc)]); > > Here, what I used is leave-one-out cross-valitaion. But the reviewer > suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., > Neuroimage, 2017) and K-fold. > Since I am not familiar with nested cross-validation. Is it any > possible we perform it based on your libsvm? If it is, could you > please give me some clue how to achieve this? > > Best, > Ziqian > > -- > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub [1], or unsubscribe > [2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", > "potentialAction": { ***@***.***": "ViewAction", "target": > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", > "url": > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", > "name": "View Issue" }, "description": "View this Issue on GitHub", > "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": > "https://github.com" } } ] > > Links: > ------ > [1] > #163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A > [2] > https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ>.

ziqianwang9 · 2020-03-02T11:34:32Z

Dear Lin, I found that this nested VC add grid search in every loop of inner loop. If it’s 5-fold, it calculate 5 best-c, then calculate arithmetic mean/geometric mean or power mean. Here is also a description in Chinese: 这个思想有两个循环（loop）：（1）外循环就是普通的cross validation （2）内循环相当于是一个子优化问题，通过grid search寻常当前子问题中模型对应的最优参数。grid search就相当于是遍历有限的空间点（每一个点对应于一组参数），每一组参数对应一个模型的performance，然后选取performance最好的模型。 cross validation用了几个fold最后就有几组模型参数，如果你的模型是stable的，那么这几组参数应该类似。 I don’t know if this is the state of art. But it should be a good way to solve the problem of information ‘leaking’. Could we manage to implement to your wonderful libsvm toolbox? Best, Ziqian

…

在 2020年2月28日，上午11:19，王子谦 ***@***.***> 写道： Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV: <F1QgU.png> The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set). My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful. Best, Ziqian > 在 2020年2月20日，下午10:30，Chih-Jen Lin ***@***.*** ***@***.***>> 写道： > > To implement CV in matlab what you need to do are > > - randomly permute data by randperm() > > - use a for loop to get each validation fold > > num_per_fold = ceil(num_data/num_fold); > for i = 1 : num_fold > range = (i-1)*num_per_fold + 1 : min(num_data, i*num_per_fold); > > - then use this "range" to extract the validation fold. The training > fold can be get by a similar way > > - then do training/prediction, and aggregate results to get CV acuracy > > - for nested CV I think you mean 2-level CV. You can use a 2-level for > loop on that > > > On 2020-02-20 23:56, ziqianwang9 wrote: > > Dear Lin, > > thanks for providing this useful toolbox. I'm trying to use it to > > publish a paper, here I met some problem from the reviewer. He > > suggested me to use the nested cross-validation. > > Here I list the script I used for my study: > > > > clear all; > > load median20190923.mat > > > > %leave-one-out cross-validation > > w = zeros(size(data_all));% weight > > h = waitbar(0,'please wait..'); > > > > for i = 1:size(data_all,1) > > > > waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) > > new_DATA = data_all; > > new_label = label; > > test_data = data_all(i,:); new_DATA(i,:) = []; train_data = > > new_DATA; > > test_label = label(i,:);new_label(i,:) = [];train_label = > > new_label; > > > > % Data Normalization > > [train_data,PS] = mapminmax(train_data',0,1); > > test_data = mapminmax('apply',test_data',PS); > > train_data = train_data'; > > test_data = test_data'; > > > > % RFE feature selectioin > > step = 1; > > ftRank = SVMRFE(train_label,train_data, step,'-t 0'); > > IX = ftRank(1:ceil(length(ftRank)*0.4)); > > > > [bestacc,bestc] = > > SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); > > cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1']; > > > > model = svmtrain(train_label,train_data(:,IX),cmd); > > w(i,IX) = model.SVs'*model.sv_coef; > > [predicted_label, accuracy, deci] = > > svmpredict(test_label,test_data(:,IX),model); > > acc(i,1) = accuracy(1); > > deci_value(i,1) = deci; > > % clear test_data train_data test_label train_label model IX k > > end > > w_msk = double(sum(w~=0,1)==size(w,1)); > > w = mean(w,1).*w_msk; > > acc_final = mean(acc); > > disp(['accuracy - ',num2str(acc_final)]); > > > > % ROC > > [X,Y,T,AUC] = perfcurve(label,deci_value,1); > > figure;plot(X,Y);hold on;plot(X,X,'-'); > > xlabel('False positive rate'); ylabel('True positive rate'); > > > > for i=1:length(X) > > Cut_off(i,1) = (1-X(i))*Y(i); > > end > > [~,maxind] = max(Cut_off); > > Specificity = 1-X(maxind); > > Sensitivty = Y(maxind); > > disp(['Specificity= ', num2str(Specificity)]); > > disp(['Sensitivty= ', num2str(Sensitivty)]); > > > > fprintf('Permutation test ......\n'); > > Nsloop = 5000; > > auc_rand = zeros(Nsloop,1); > > for i=1:Nsloop > > label_rand = randperm(length(label)); > > deci_value_rand = deci_value(label_rand); > > [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); > > clear label_rand > > end > > p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); > > disp(['Pvalue= ', num2str(p_auc)]); > > > > Here, what I used is leave-one-out cross-valitaion. But the reviewer > > suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., > > Neuroimage, 2017) and K-fold. > > Since I am not familiar with nested cross-validation. Is it any > > possible we perform it based on your libsvm? If it is, could you > > please give me some clue how to achieve this? > > > > Best, > > Ziqian > > > > -- > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub [1], or unsubscribe > > [2]. [ { ***@***.***": "http://schema.org <http://schema.org/>", ***@***.***": "EmailMessage", > > "potentialAction": { ***@***.***": "ViewAction", "target": > > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A>", > > "url": > > "#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A>", > > "name": "View Issue" }, "description": "View this Issue on GitHub", > > "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": > > "https://github.com <https://github.com/>" } } ] > > > > Links: > > ------ > > [1] > > #163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A <#163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A> > > [2] > > https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ <https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ> > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub <#163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ>. >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested cross-validation #163

Nested cross-validation #163

ziqianwang9 commented Feb 20, 2020

cjlin1 commented Feb 20, 2020 via email

ziqianwang9 commented Feb 28, 2020 via email

ziqianwang9 commented Mar 2, 2020 via email

Nested cross-validation #163

Nested cross-validation #163

Comments

ziqianwang9 commented Feb 20, 2020

cjlin1 commented Feb 20, 2020 via email

ziqianwang9 commented Feb 28, 2020 via email

ziqianwang9 commented Mar 2, 2020 via email