Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51391][SQL][CONNECT] Fix SparkConnectClient to respect SPARK_USER and user.name #50159

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 5, 2025

What changes were proposed in this pull request?

This PR aims to fix SparkConnectClient to respect SPARK_USER and user.name for the feature parity with PySpark Connect client.

BEFORE
Screenshot 2025-03-04 at 17 04 21

AFTER
Screenshot 2025-03-04 at 17 05 04

Why are the changes needed?

Like pyspark, spark-shell and spark-connect-shell should have a consistent default user when --user_id is not given.

$ bin/pyspark --remote sc://localhost:15002
>>> spark.version
'4.0.0'

// Spark Connect Server shows `userId: dongjoon`.
25/03/04 16:57:53 INFO SessionHolder: Session with userId: dongjoon and sessionId: 1fd1bc8f-233b-4a6b-9924-9f90af15f894 accessed,time 1741136273692 ms.
$ bin/spark-shell --remote sc://localhost:15002

// Spark Connect Server shows `userId: `.
25/03/04 16:58:54 INFO SessionHolder: Session with userId:  and sessionId: 00f7ac26-98c4-49ca-a027-5808e9e5b155 accessed,time 1741136334546 ms.
$ bin/spark-connect-shell --remote sc://localhost:15002

// Spark Connect Server shows `userId: `.
25/03/04 16:59:29 INFO SessionHolder: Session with userId:  and sessionId: 63afa7dd-8b95-41da-b90e-cd73918b33be accessed,time 1741136369060 ms.

Does this PR introduce any user-facing change?

Yes, this is a bug fix for feature parity across Spark Connect languages.

BEFORE (Apache Spark 4.0.0 RC2)

$ bin/spark-shell --remote sc://localhost:15002

// Spark Connect Server shows `userId: `.
25/03/04 17:01:03 INFO SessionHolder: Session with userId:  and sessionId: b9c7edd7-2209-4c53-b7b1-dfc3171d012a accessed,time 1741136463337 ms.

AFTER

$ bin/spark-shell --remote sc://localhost:15002

// Spark Connect Server shows `userId: dongjoon`.
25/03/04 17:01:35 INFO SessionHolder: Session with userId: dongjoon and sessionId: 3a9305ff-4dbf-4240-b3fd-edb8f3edab02 accessed,time 1741136495807 ms.
$ SPARK_USER=spark2005 bin/spark-shell --remote sc://localhost:15002

// Spark Connect Server shows `userId: spark2005 `.
25/03/04 17:02:24 INFO SessionHolder: Session with userId: spark2005 and sessionId: 72142325-94d8-4f67-a777-1ede6e02bbf3 accessed,time 1741136544444 ms.

How was this patch tested?

Pass the CIs and manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -423,7 +423,6 @@ object SparkConnectClient {
def configuration: Configuration = _configuration

def userId(id: String): Builder = {
// TODO this is not an optional field!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to fix SparkConnectClient.__init__ here:

            self._user_id = os.getenv("USER", None)

to

            self._user_id = os.getenv("SPARK_USER", os.getenv("USER", None))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @HyukjinKwon . I'll take a look at that part too.

@github-actions github-actions bot added the PYTHON label Mar 5, 2025
@dongjoon-hyun
Copy link
Member Author

The comment is addressed, @HyukjinKwon . Thank you.

@dongjoon-hyun
Copy link
Member Author

Could you review this once more, @HyukjinKwon ?

@dongjoon-hyun
Copy link
Member Author

Could you review this Spark Connect PR, @huaxingao ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants