Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Nov 13, 2025

What changes were proposed in this pull request?

Optimize pyspark.sql.tests.connect.test_connect_plan, by removing the remote session creation

Why are the changes needed?

Recently, I notice that this test become extremely slow, e.g. in https://github.com/apache/spark/actions/runs/19318125464/job/55253739424

before

Starting test(python3.11): pyspark.sql.tests.connect.test_connect_plan (temp output: /__w/spark/spark/python/target/763913e5-9ba0-46ed-a583-56bf5fa5f588/python3.11__pyspark.sql.tests.connect.test_connect_plan__nbrkokah.log)
Finished test(python3.11): pyspark.sql.tests.connect.test_connect_plan (1222s)

after

Starting test(python3.11): pyspark.sql.tests.connect.test_connect_plan (temp output: /__w/spark/spark/python/target/45831997-66c0-4c44-89cf-1ce85dc89ee7/python3.11__pyspark.sql.tests.connect.test_connect_plan__tyaymd_b.log)
Finished test(python3.11): pyspark.sql.tests.connect.test_connect_plan (1s)

The tests theirselves are pretty fast, so I think the root cause is the remote session creation which is not necessary in this test which is for validation of protobufs

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

No

@zhengruifeng zhengruifeng changed the title [PYTHON][TESTS] Optimize pyspark.sql.tests.connect.test_connect_plan [SPARK-54331][PYTHON][TESTS] Optimize pyspark.sql.tests.connect.test_connect_plan Nov 13, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Nov 13, 2025
…_connect_plan`

### What changes were proposed in this pull request?
Optimize `pyspark.sql.tests.connect.test_connect_plan`, by removing the remote session creation

### Why are the changes needed?
Recently, I notice that this test become extremely slow, e.g. in https://github.com/apache/spark/actions/runs/19318125464/job/55253739424

before
```
Starting test(python3.11): pyspark.sql.tests.connect.test_connect_plan (temp output: /__w/spark/spark/python/target/763913e5-9ba0-46ed-a583-56bf5fa5f588/python3.11__pyspark.sql.tests.connect.test_connect_plan__nbrkokah.log)
Finished test(python3.11): pyspark.sql.tests.connect.test_connect_plan (1222s)
```

after
```
Starting test(python3.11): pyspark.sql.tests.connect.test_connect_plan (temp output: /__w/spark/spark/python/target/45831997-66c0-4c44-89cf-1ce85dc89ee7/python3.11__pyspark.sql.tests.connect.test_connect_plan__tyaymd_b.log)
Finished test(python3.11): pyspark.sql.tests.connect.test_connect_plan (1s)
```

The tests theirselves are pretty fast, so I think the root cause is the remote session creation which is not necessary in this test which is for validation of protobufs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53032 from zhengruifeng/opt_test_connect_plan.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 602a4bd)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to master/4.1.

@zhengruifeng zhengruifeng deleted the opt_test_connect_plan branch November 14, 2025 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants