-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict clipping of DataFrame.corr only when cov=False #61214
Conversation
@mroeschke here is my pull request for fixing the |
Thanks! Would you be able to add new unit test that covers this? It seems like we didn't have one that hit this edge case previously. I think no release note is necessary, since the original one made clear it's only for |
val1 = df.cov() | ||
val2 = df.dropna().cov() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could you call this
result
andexpected
? - For expected, could you construct the result without using
cov
i.e.DataFrame({"A": ..., "B": ...})
|
||
def test_cov_with_missing_values(self): | ||
df = DataFrame({"A": [1, 2, None, 4], "B": [2, 4, None, 9]}) | ||
expected = DataFrame({"A": [1.0, 1.0], "B": [1.0, 1.0]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the the expected dataframe needs index=["A", "B"]
: https://github.com/pandas-dev/pandas/actions/runs/14250783260/job/39942795933?pr=61214#step:5:45
And can you confirm that 1
is the expected value? If the 2.2.3 behavior is correct, then the values in #61154 (comment) were different:
Out[68]:
A B
A 2.333333 5.5
B 5.500000 13.0
I don't know whether it matters, but it might be worth testing both df.cov()
and df.dropna().cov()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing now. Thanks!
Thanks @j-hendricks |
Closes #61154
DataFrame.corr
was clipped between-1
and1
to handle numerical precision errors. However, this was done regardless of whethercov
equalsTrue
orFalse
, and should instead only be done whencov=False
.doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.