Why identify_collinear does't consider statistical importance of Pearson coeff? #39

EugeniaKoKo · 2020-03-11T15:01:21Z

In method identify_collinear I discovered, that you do not respect pvalue of Pearson coefficient.
That is, one can remove features, which correlation have nor statistical importance \

It can be done simply by adding pvalue-check for each identified correlation:

from scipy import stats
pvalue = stats.pearsonr(data[feat1], data[feat2])[1]
if pvalue < 0.05 ...

One can also add threshold for statistical significance and set 0.01 instead of 0.05

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

EugeniaKoKo commented Mar 11, 2020

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

Comments

EugeniaKoKo commented Mar 11, 2020