Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict clipping of DataFrame.corr only when cov=False #61214

Merged
merged 17 commits into from
Apr 3, 2025

Conversation

j-hendricks
Copy link
Contributor

Closes #61154 DataFrame.corr was clipped between -1 and 1 to handle numerical precision errors. However, this was done regardless of whether cov equals True or False, and should instead only be done when cov=False.

@j-hendricks j-hendricks requested a review from WillAyd as a code owner April 2, 2025 02:37
@j-hendricks
Copy link
Contributor Author

@mroeschke here is my pull request for fixing the dataframe.corr issue. Thanks!

@TomAugspurger
Copy link
Contributor

Thanks! Would you be able to add new unit test that covers this? It seems like we didn't have one that hit this edge case previously.

I think no release note is necessary, since the original one made clear it's only for corr.

@mroeschke mroeschke added this to the 3.0 milestone Apr 2, 2025
Comment on lines 503 to 504
val1 = df.cov()
val2 = df.dropna().cov()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you call this result and expected?
  2. For expected, could you construct the result without using cov i.e. DataFrame({"A": ..., "B": ...})


def test_cov_with_missing_values(self):
df = DataFrame({"A": [1, 2, None, 4], "B": [2, 4, None, 9]})
expected = DataFrame({"A": [1.0, 1.0], "B": [1.0, 1.0]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the the expected dataframe needs index=["A", "B"]: https://github.com/pandas-dev/pandas/actions/runs/14250783260/job/39942795933?pr=61214#step:5:45

And can you confirm that 1 is the expected value? If the 2.2.3 behavior is correct, then the values in #61154 (comment) were different:

Out[68]:
          A     B
A  2.333333   5.5
B  5.500000  13.0

I don't know whether it matters, but it might be worth testing both df.cov() and df.dropna().cov()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing now. Thanks!

@mroeschke mroeschke merged commit 5736b96 into pandas-dev:main Apr 3, 2025
42 checks passed
@mroeschke
Copy link
Member

Thanks @j-hendricks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants