Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null #47105

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wayneguow
Copy link
Contributor

What changes were proposed in this pull request?

This PR aims to fix the calculation bug of RegrSlope&RegrIntercept` when the first parameter is null. Regardless of whether the first parameter(y) or the second parameter(x) is null, this tuple should be filtered out.

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

Yes, the calculation changes when the first value of a tuple is null, but the value is truly correct.

How was this patch tested?

Pass GA and test with build/sbt "~sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z linear-regression.sql"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 26, 2024
@@ -1,7 +1,7 @@
-- Automatically generated by SQLQueryTestSuite
-- !query
CREATE OR REPLACE TEMPORARY VIEW testRegression AS SELECT * FROM VALUES
(1, 10, null), (2, 10, 11), (2, 20, 22), (2, 25, null), (2, 30, 35)
(1, 10, null), (2, 10, 11), (2, 20, 22), (2, 25, null), (2, 30, 35), (2, null, 40)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tuple is added with the value of y is null, it should be filtered out during calculation, so the output related to RegrSlope & RegrIntercept in the output remains unchanged.

@wayneguow wayneguow changed the title [SPARK-48719][SQL] Fix the calculation bug of RegrSlope&RegrIntercept` when the first parameter is null [SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null Jun 26, 2024
@wayneguow wayneguow marked this pull request as ready for review June 27, 2024 02:42
@wayneguow
Copy link
Contributor Author

cc @beliefer

@wayneguow
Copy link
Contributor Author

Gentle ping @HyukjinKwon, when you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants