-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592
Conversation
The failed test case in The case succeeds in version 17.0.13 but fails in 17.0.14. It seems that the behavior of 17.0.14 aligns with Java 21, but I'm not sure what specific changes in Java 17 in the new version have caused this issue. I have created a Jira ticket: |
fixed by #49599 |
@LuciferYang Thanks for the fix! Let me merge it and rerun tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
The remaining test failures are not related to this PR. |
Thanks! merging to master. |
### What changes were proposed in this pull request? Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49592 from ueshin/issues/SPARK-50909/faulthandler. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Takuya Ueshin <[email protected]>
### What changes were proposed in this pull request? This is a backport of #49592. Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49635 from ueshin/issues/SPARK-50909/4.0/faulthandler. Lead-authored-by: Takuya Ueshin <[email protected]> Co-authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ueshin , @allisonwang-db , @HyukjinKwon .
Newly added test_udtf_segfault
seems to fail on PyPy3
environment for last 5 days.
I also validated locally that Python Deamons are terminated and lost. The test case fails in the same way.
$ python/run-tests --testnames pyspark.sql.tests.test_udtf
...
======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='eval', enabled=True)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2780, in test_udtf_segfault
self._check_result_or_exception(
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised
======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='analyze', enabled=True)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2797, in test_udtf_segfault
self._check_result_or_exception(
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised
======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='eval', enabled=False)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2780, in test_udtf_segfault
self._check_result_or_exception(
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised
======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='analyze', enabled=False)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2797, in test_udtf_segfault
self._check_result_or_exception(
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised
----------------------------------------------------------------------
Ran 233 tests in 26.393s
FAILED (failures=4, skipped=119)
Had test failures in pyspark.sql.tests.test_udtf with pypy3; see logs.
Could you take a look at these failures?
As a side note, it seems that we need to check |
Thanks for the report. I submitted the fix #49720. |
Thank you, @ueshin ! |
### What changes were proposed in this pull request? Disable segfault tests in `pypy`, same as in `test_udf`. ### Why are the changes needed? In pypy environment, segfault doesn't happen. - #49592 (review) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49720 from ueshin/issues/SPARK-50909/pypy. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? Disable segfault tests in `pypy`, same as in `test_udf`. ### Why are the changes needed? In pypy environment, segfault doesn't happen. - #49592 (review) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49720 from ueshin/issues/SPARK-50909/pypy. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 8dbf1dd) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Setups
faulthandler
inPythonPlannerRunner
s.It can be enabled by the same config as UDFs.
spark.sql.execution.pyspark.udf.faulthandler.enabled
spark.python.worker.faulthandler.enabled
False
by defaultWhy are the changes needed?
The
faulthandler
is not set up inPythonPlannerRunner
s.Does this PR introduce any user-facing change?
When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process.
How was this patch tested?
Added the related tests.
Was this patch authored or co-authored using generative AI tooling?
No.