-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-54619][PYTHON] Add a sanity check for configuration numbers #53359
Conversation
| def load(self, infile): | ||
| num_conf = read_int(infile) | ||
| for i in range(num_conf): | ||
| if num_conf < 0 or num_conf > 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10,000 seems to be too small though, @gaogaotiantian .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, don't we already only send the allowed confs only? e.g., ArrowPythonRunner.getPythonRunnerConfMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay so this is adding a sanity check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing a hard-coded magic number, please provide an environment variable to control this, @gaogaotiantian . Also, it would be great if we can have more higher default values to make it sure.
if num_conf < 0 or num_conf > 10000:
|
Hi @dongjoon-hyun , I don't think this should be controllable by an env var because:
If you really hate the magic number, we can only check the positivity of the number. However, that leaves plenty of unreasonable space for sanity check. Thanks! |
|
Got it. Thank you for the details. |
|
|
||
| def load(self, infile): | ||
| num_conf = read_int(infile) | ||
| for i in range(num_conf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a bit of background here in a comment. I was also wondering why we need this as we're already controlling the confs to send.
|
I updated the comments, let's see if it's clearer. The rational behind this is - when we change something on the protocol (passing an extra integer at a random place, which is what we often do now), it's common to just "stuck" at somewhere. The test hangs and we don't know what happened - we don't even know where the message might go wrong. Having sanity check in different places in our protocols can stop the communication early so we know the message is already wrong at this place. It's helpful for debugging. More than that, there could be communication errors during production (rare, but possible). There could be dark corners that we forgot to test. Raising an error explicitly is always better than hanging there. That's why I think we should introduce more sanity check and real runtime validation checks on our data passed in. Of course, eventually we might just want a more dedicated RPC, but for now this is helpful. |
|
Merged to master for Apache Spark 4.2.0. Thank you, @gaogaotiantian and all. |
What changes were proposed in this pull request?
Add sanity check for number of configurations being passed.
Why are the changes needed?
This is helpful to recognize malformed message - avoid potential deadlock when the message does not conform to protocol.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This error should not happen and it should not break CI either.
Was this patch authored or co-authored using generative AI tooling?
No.