[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359

gaogaotiantian · 2025-12-05T23:50:44Z

What changes were proposed in this pull request?

Add sanity check for number of configurations being passed.

Why are the changes needed?

This is helpful to recognize malformed message - avoid potential deadlock when the message does not conform to protocol.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This error should not happen and it should not break CI either.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-12-06T15:31:33Z

python/pyspark/worker.py

    def load(self, infile):
        num_conf = read_int(infile)
-        for i in range(num_conf):
+        if num_conf < 0 or num_conf > 10000:


10,000 seems to be too small though, @gaogaotiantian .

Hm, don't we already only send the allowed confs only? e.g., ArrowPythonRunner.getPythonRunnerConfMap

Ah, okay so this is adding a sanity check

dongjoon-hyun

Instead of introducing a hard-coded magic number, please provide an environment variable to control this, @gaogaotiantian . Also, it would be great if we can have more higher default values to make it sure.

if num_conf < 0 or num_conf > 10000:

gaogaotiantian · 2025-12-06T18:48:16Z

Hi @dongjoon-hyun , runner_conf is hand-picked by daemon to pass to worker in a need-to-know basis. For now, the maximum number is less than 10. 10000 is a very safe upper limit. Also, we have about a few thousand spark configs in total so even if we pass everything, it's still going to be less than 10000.

I don't think this should be controllable by an env var because:

It's a sanity check, not a run-time validation. We just want to make sure that number at least makes some sense.
One of the reasons to introduce runner_conf is to avoid passing too many arguments through env var. Hopefully we can put more environment setups in runner_conf. Having another env var to control it is kind of against the purpose.
We will have more sanity checks in the protocol - which will be super safe but helpful to rule out crazy situations. Having an env var for each of them will explode our env var namespace.

If you really hate the magic number, we can only check the positivity of the number. However, that leaves plenty of unreasonable space for sanity check.

Thanks!

dongjoon-hyun · 2025-12-06T19:09:28Z

Got it. Thank you for the details.

Got the rational.

HyukjinKwon · 2025-12-07T22:48:45Z

python/pyspark/worker.py


    def load(self, infile):
        num_conf = read_int(infile)
-        for i in range(num_conf):


Let's add a bit of background here in a comment. I was also wondering why we need this as we're already controlling the confs to send.

gaogaotiantian · 2025-12-07T23:15:27Z

I updated the comments, let's see if it's clearer.

The rational behind this is - when we change something on the protocol (passing an extra integer at a random place, which is what we often do now), it's common to just "stuck" at somewhere. The test hangs and we don't know what happened - we don't even know where the message might go wrong.

Having sanity check in different places in our protocols can stop the communication early so we know the message is already wrong at this place. It's helpful for debugging.

More than that, there could be communication errors during production (rare, but possible). There could be dark corners that we forgot to test. Raising an error explicitly is always better than hanging there.

That's why I think we should introduce more sanity check and real runtime validation checks on our data passed in.

Of course, eventually we might just want a more dedicated RPC, but for now this is helpful.

dongjoon-hyun · 2025-12-10T17:13:15Z

Merged to master for Apache Spark 4.2.0. Thank you, @gaogaotiantian and all.

Add a sanity check for configuration numbers

bf5d33b

gaogaotiantian marked this pull request as ready for review December 5, 2025 23:50

github-actions bot added CORE PYTHON labels Dec 5, 2025

dongjoon-hyun reviewed Dec 6, 2025

View reviewed changes

dongjoon-hyun previously requested changes Dec 6, 2025

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-54619] Add a sanity check for configuration numbers~~ [SPARK-54619][PYTHON] Add a sanity check for configuration numbers Dec 6, 2025

HyukjinKwon reviewed Dec 7, 2025

View reviewed changes

Update comments

8ccacb3

HyukjinKwon approved these changes Dec 7, 2025

View reviewed changes

dongjoon-hyun approved these changes Dec 8, 2025

View reviewed changes

allisonwang-db approved these changes Dec 9, 2025

View reviewed changes

dongjoon-hyun closed this in 0701855 Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359

[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359

gaogaotiantian commented Dec 5, 2025

Uh oh!

dongjoon-hyun Dec 6, 2025

Uh oh!

HyukjinKwon Dec 7, 2025

Uh oh!

HyukjinKwon Dec 7, 2025

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

gaogaotiantian commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

HyukjinKwon Dec 7, 2025

Uh oh!

gaogaotiantian commented Dec 7, 2025

Uh oh!

dongjoon-hyun commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359

[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359

Conversation

gaogaotiantian commented Dec 5, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaogaotiantian commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

HyukjinKwon Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

gaogaotiantian commented Dec 7, 2025

Uh oh!

dongjoon-hyun commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun left a comment •

edited

Loading