0.632 Bootstrap #828

UBragaNeto · 2021-05-16T20:25:16Z

UBragaNeto
May 16, 2021

First of all, congratulations to Sebastian on leading this important initiative. I came to learn about it when I was searching for a python implementation of the 0.632 bootstrap classification error estimator, which is provided by mlextend.

The purpose of this post is to discuss the proper definition of the 0.632 bootstrap. In Efron's original 1983 paper, the "zero bootstrap" error estimator is defined as

where Q(x_i,x*b) is 1 if the class prediction for x_i made by the classifier trained on the bootstrap sample x*b disagrees with the label of x_i, and P_i*b=0 indicates that x_i does not appear in the bootstrap sample x*b. Then the 0.632 bootstrap estimator is defined by

where err is the resubstitution error (i.e., the training error) made by the classifier designed on the entire original training data.

In mlextend, bootstrap_point632_score() computes equation (6.12) internally to each boostrap sample, with the training error being with respect to the bootstrap sample, and not the entire training data. I ran a comparison experiment with Gaussian synthetic data, where I computed the RMS (which combines bias and variance, see definitions below) for different sample sizes of the "external" (Efron's definition) and the "internal" (mlextend's implementation) 0.632 bootstrap error estimator, along with the plain zero bootstrap, the plain resubstitution error on the original training data, the 2-fold cross-validation (which Efron claims in the 1983 paper would be essentially the same as the zero bootstrap; here denoted "hcv"), and bolstered resubstitution (more on this one later). I used a linear SVM classification rule and 5-nearest neighbor, to see the behavior with linear and nonlinear decision boundaries. If e and e^ denote the true error (this one was estimated with a large independent test sample of size M=400) and the estimated error, respectively, the definitions are:

bias = E[eˆ-e]
variance = Var(e^-e)
RMS = E[(e^-e)^2] = sqrt(biasˆ2+var)

The expectations were approximated by the usual sample estimator using a large number N of independent training data sets at each sample size (here, N=500). A small RMS indicates a superior compromise between bias and variance. I also display an estimate of the deviation distribution, which is the probability density of eˆ-e, by fitting a beta density to the vector of N differences e^-e. The deviation distribution should be centered around zero, for low bias, and tall and thin, for low variance.

These results show that the internal and external bootstrap error estimators are comparable in performance for the 5NN classification rule, but the external performs better for the linear SVM; it is less biased and less variable. Also, the internal bootstrap seems to be basically equal (in the average sense) to the plain zero bootstrap, in both cases, being positively biased, as pointed out in the 1983 Efron paper. I have not tried to understand why they agree in detail, but it doesn't seem to be a coincidence, since I have seen the same behavior in other experiments I ran with other classification rules. Contrary to what is claimed by Efron in his paper, the 2-fold CV is not similar to the zero bootstrap. It has low bias but a very large variance, as is typical of all cross-validation estimators.

The bolstered resubstitution estimator does not strictly belong to the current discussion, but I have added for more context. This is a modified form of resubstitution that has smaller bias and variance. Like resubstitution, it does not require training of any additional classifiers, and is thus much faster to compute than all cross-validation and bootstrap error estimators. We can see it did not work well in the 5NN case, basically because resubstitution is too biased here (even the external bootstrap suffers from this), but it has the best performance in the linear SVM case, despite being much simpler to compute. Empirically, we have observed consistently that bolstered resubistution tends to perform well with linear or piecewise linear decision boundaries (the latter include CART decision trees and RELU-neural networks). The original bolstered resubstitution paper is: https://www.sciencedirect.com/science/article/abs/pii/S0031320303003327

UBragaNeto · 2021-05-17T03:06:44Z

UBragaNeto
May 17, 2021
Author

I forgot to attach the code:

err_est_comp.py.zip

0 replies

rasbt · 2021-06-13T17:59:49Z

rasbt
Jun 13, 2021
Maintainer

Thanks so much for this detailed note and experiments. Based on your experiments (and of course to adhere to the original definition), it makes absolute sense to update the current implementation. It's been a busy summer so far (which is also why my answer came with such a big delay), but I am planning to get to it in the upcoming weeks.

0 replies

UBragaNeto · 2021-06-13T19:54:25Z

UBragaNeto
Jun 13, 2021
Author

Dear Sebastian, Thank you for your e-mail. I can imagine things are hectic, no worries. If I can be of assistance with the update, please let me know. Also, I would like to ask you if you would like me to contribute a bolstered resubstitution estimator module to mlxtend. I could work with one of my students to make a pull request. We don't have a popular implementation of it yet (just some C code I distribute with the paper), and of course mlextend is very popular. Best Regards Ulisses.

…

-- Ulisses M. Braga-Neto, Ph.D. Professor of Electrical and Computer Engineering Director of the TAMIDS Scientific Machine Learning Lab Texas A&M University https://sciml.tamids.tamu.edu/ https://braganeto.engr.tamu.edu/ "Only the educated are free." -- Epictetus

________________________________ From: Sebastian Raschka ***@***.***> Sent: Sunday, June 13, 2021 1:00 PM To: rasbt/mlxtend Cc: Braga Neto, Ulisses; Author Subject: Re: [rasbt/mlxtend] 0.632 Bootstrap (#828) Thanks so much for this detailed note and experiments. Based on your experiments (and of course to adhere to the original definition), it makes absolute sense to update the current implementation. It's been a busy summer so far (which is also why my answer came with such a big delay), but I am planning to get to it in the upcoming weeks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/rasbt/mlxtend/discussions/828*discussioncomment-865509__;Iw!!KwNVnqRv!VRdfXNK6LijGQ8-CaHu8Gy_IYCIQ6DwSPKGRulMM66j8DDM4M3ai02hCYra5Kig$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AUDAUHOPVXHPFPYO3D4S3CDTSTW2BANCNFSM447GZ2GQ__;!!KwNVnqRv!VRdfXNK6LijGQ8-CaHu8Gy_IYCIQ6DwSPKGRulMM66j8DDM4M3ai02hCpho4efA$>.

0 replies

rasbt · 2021-09-02T22:38:11Z

rasbt
Sep 2, 2021
Maintainer

Finally got some time to address the issue, and it is merged into the main branch now via #844 . I will also make a new release later today. Thanks again for pointing out the issue with internal training set use.

Regarding the bolstered resubstitution estimator, this sounds like a useful method, and I would welcome a PR if it is feasible. I realize though that converting the C code into a form that follows the mlxtend/scikit-learn API might involve some extra work. But in case you decide to make a PR, I am happy to give feedback and help with the process.

0 replies

UBragaNeto · 2021-09-02T22:40:40Z

UBragaNeto
Sep 2, 2021
Author

Thanks Sebastian! I will ask my student to look into submitting a PR. Best Regards Ulisses.

…

-- Ulisses M. Braga-Neto, Ph.D. Professor of Electrical and Computer Engineering Director of the TAMIDS Scientific Machine Learning Lab Texas A&M University https://sciml.tamids.tamu.edu/ https://braganeto.engr.tamu.edu/ "Only the educated are free." -- Epictetus

________________________________ From: Sebastian Raschka ***@***.***> Sent: Thursday, September 2, 2021 5:38 PM To: rasbt/mlxtend ***@***.***> Cc: Braga Neto, Ulisses ***@***.***>; Author ***@***.***> Subject: Re: [rasbt/mlxtend] 0.632 Bootstrap (#828) Finally got some time to address the issue, and it is merged into the main branch now via #844<https://urldefense.com/v3/__https://github.com/rasbt/mlxtend/pull/844__;!!KwNVnqRv!Th3bkX2M21-4LQRUV6T8ODNbWJdlDWRZi2na6XCiV9MB9ijJ8mFf7n6tDDjm9QA$> . I will also make a new release later today. Thanks again for pointing out the issue with internal training set use. Regarding the bolstered resubstitution estimator, this sounds like a useful method, and I would welcome a PR if it is feasible. I realize though that converting the C code into a form that follows the mlxtend/scikit-learn API might involve some extra work. But in case you decide to make a PR, I am happy to give feedback and help with the process. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/rasbt/mlxtend/discussions/828*discussioncomment-1274438__;Iw!!KwNVnqRv!Th3bkX2M21-4LQRUV6T8ODNbWJdlDWRZi2na6XCiV9MB9ijJ8mFf7n6to78Oo6c$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AUDAUHO3YYNCIOZNEZKAA6TT774F3ANCNFSM447GZ2GQ__;!!KwNVnqRv!Th3bkX2M21-4LQRUV6T8ODNbWJdlDWRZi2na6XCiV9MB9ijJ8mFf7n6t5XCnYK8$>. Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!KwNVnqRv!Th3bkX2M21-4LQRUV6T8ODNbWJdlDWRZi2na6XCiV9MB9ijJ8mFf7n6tzO1G-lI$> or Android<https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!KwNVnqRv!Th3bkX2M21-4LQRUV6T8ODNbWJdlDWRZi2na6XCiV9MB9ijJ8mFf7n6tGoXTNZo$>.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.632 Bootstrap #828

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

0.632 Bootstrap #828

Uh oh!

UBragaNeto May 16, 2021

Replies: 5 comments

Uh oh!

Uh oh!

UBragaNeto May 17, 2021 Author

Uh oh!

rasbt Jun 13, 2021 Maintainer

Uh oh!

UBragaNeto Jun 13, 2021 Author

Uh oh!

rasbt Sep 2, 2021 Maintainer

Uh oh!

UBragaNeto Sep 2, 2021 Author

UBragaNeto
May 16, 2021

UBragaNeto
May 17, 2021
Author

rasbt
Jun 13, 2021
Maintainer

UBragaNeto
Jun 13, 2021
Author

rasbt
Sep 2, 2021
Maintainer

UBragaNeto
Sep 2, 2021
Author