Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

Open
EugeniaKoKo opened this issue Mar 11, 2020 · 0 comments

Comments

@EugeniaKoKo
Copy link

In method identify_collinear I discovered, that you do not respect pvalue of Pearson coefficient.
That is, one can remove features, which correlation have nor statistical importance \

It can be done simply by adding pvalue-check for each identified correlation:

from scipy import stats
pvalue = stats.pearsonr(data[feat1], data[feat2])[1]
if pvalue < 0.05 ... 

One can also add threshold for statistical significance and set 0.01 instead of 0.05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant