[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592

ueshin · 2025-01-22T01:30:54Z

What changes were proposed in this pull request?

Setups faulthandler in PythonPlannerRunners.

It can be enabled by the same config as UDFs.

SQL conf: spark.sql.execution.pyspark.udf.faulthandler.enabled
It fallback to Spark conf: spark.python.worker.faulthandler.enabled
False by default

Why are the changes needed?

The faulthandler is not set up in PythonPlannerRunners.

Does this PR introduce any user-facing change?

When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process.

How was this patch tested?

Added the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

LuciferYang · 2025-01-22T05:52:20Z

@HyukjinKwon @ueshin

The failed test case in org.apache.spark.util.UtilsSuite maybe due to the upgrade to Java 17.

The case succeeds in version 17.0.13 but fails in 17.0.14. It seems that the behavior of 17.0.14 aligns with Java 21, but I'm not sure what specific changes in Java 17 in the new version have caused this issue.

I have created a Jira ticket:

https://issues.apache.org/jira/browse/SPARK-50946

LuciferYang · 2025-01-22T16:02:51Z

@HyukjinKwon @ueshin

The failed test case in org.apache.spark.util.UtilsSuite maybe due to the upgrade to Java 17.

The case succeeds in version 17.0.13 but fails in 17.0.14. It seems that the behavior of 17.0.14 aligns with Java 21, but I'm not sure what specific changes in Java 17 in the new version have caused this issue.

I have created a Jira ticket:

https://issues.apache.org/jira/browse/SPARK-50946

fixed by #49599

ueshin · 2025-01-22T18:44:42Z

@LuciferYang Thanks for the fix! Let me merge it and rerun tests.

allisonwang-db

Thanks for adding this!

ueshin · 2025-01-23T02:28:29Z

The remaining test failures are not related to this PR.

ueshin · 2025-01-23T02:28:38Z

Thanks! merging to master.

### What changes were proposed in this pull request? Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49592 from ueshin/issues/SPARK-50909/faulthandler. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Takuya Ueshin <[email protected]>

### What changes were proposed in this pull request? This is a backport of #49592. Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49635 from ueshin/issues/SPARK-50909/4.0/faulthandler. Lead-authored-by: Takuya Ueshin <[email protected]> Co-authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

dongjoon-hyun

Hi, @ueshin , @allisonwang-db , @HyukjinKwon .

Newly added test_udtf_segfault seems to fail on PyPy3 environment for last 5 days.

https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml

I also validated locally that Python Deamons are terminated and lost. The test case fails in the same way.

$ python/run-tests --testnames pyspark.sql.tests.test_udtf
...
======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='eval', enabled=True)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2780, in test_udtf_segfault
    self._check_result_or_exception(
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
    with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised

======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='analyze', enabled=True)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2797, in test_udtf_segfault
    self._check_result_or_exception(
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
    with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised

======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='eval', enabled=False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2780, in test_udtf_segfault
    self._check_result_or_exception(
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
    with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised

======================================================================
FAIL: test_udtf_segfault (pyspark.sql.tests.test_udtf.UDTFTests) (method='analyze', enabled=False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 2797, in test_udtf_segfault
    self._check_result_or_exception(
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/test_udtf.py", line 711, in _check_result_or_exception
    with self.assertRaisesRegex(err_type, expected):
AssertionError: Exception not raised

----------------------------------------------------------------------
Ran 233 tests in 26.393s

FAILED (failures=4, skipped=119)

Had test failures in pyspark.sql.tests.test_udtf with pypy3; see logs.

Could you take a look at these failures?

dongjoon-hyun · 2025-01-28T05:00:22Z

As a side note, it seems that we need to check branch-4.0 like master branch in order to cover PyPy in Spark 4.0.0.

[SPARK-51017][INFRA] Add Daily PyPy3.10 GitHub Action Job for branch-4.0 #49707

ueshin · 2025-01-29T00:59:24Z

Thanks for the report. I submitted the fix #49720.

dongjoon-hyun · 2025-01-29T01:02:43Z

Thank you, @ueshin !

### What changes were proposed in this pull request? Disable segfault tests in `pypy`, same as in `test_udf`. ### Why are the changes needed? In pypy environment, segfault doesn't happen. - #49592 (review) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49720 from ueshin/issues/SPARK-50909/pypy. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Disable segfault tests in `pypy`, same as in `test_udf`. ### Why are the changes needed? In pypy environment, segfault doesn't happen. - #49592 (review) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49720 from ueshin/issues/SPARK-50909/pypy. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 8dbf1dd) Signed-off-by: Dongjoon Hyun <[email protected]>

Setup faulthandler in PythonPlannerRunners.

d53b170

ueshin requested review from HyukjinKwon and allisonwang-db January 22, 2025 01:30

github-actions bot added SQL CORE PYTHON labels Jan 22, 2025

HyukjinKwon approved these changes Jan 22, 2025

View reviewed changes

Merge branch 'master' into issues/SPARK-50909/faulthandler

2f0a437

allisonwang-db approved these changes Jan 23, 2025

View reviewed changes

ueshin closed this in 917a9a1 Jan 23, 2025

ueshin mentioned this pull request Jan 24, 2025

[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

Closed

dongjoon-hyun reviewed Jan 28, 2025

View reviewed changes

ueshin mentioned this pull request Jan 29, 2025

[SPARK-50909][PYTHON][FOLLOWUP] Disable segfault tests in pypy #49720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592

[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592

ueshin commented Jan 22, 2025 •

edited

Loading

LuciferYang commented Jan 22, 2025 •

edited

Loading

LuciferYang commented Jan 22, 2025

ueshin commented Jan 22, 2025

allisonwang-db left a comment

ueshin commented Jan 23, 2025

ueshin commented Jan 23, 2025

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jan 28, 2025

ueshin commented Jan 29, 2025

dongjoon-hyun commented Jan 29, 2025

[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592

[SPARK-50909][PYTHON] Setup faulthandler in PythonPlannerRunners #49592

Conversation

ueshin commented Jan 22, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

LuciferYang commented Jan 22, 2025 • edited Loading

LuciferYang commented Jan 22, 2025

ueshin commented Jan 22, 2025

allisonwang-db left a comment

Choose a reason for hiding this comment

ueshin commented Jan 23, 2025

ueshin commented Jan 23, 2025

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun commented Jan 28, 2025

ueshin commented Jan 29, 2025

dongjoon-hyun commented Jan 29, 2025

ueshin commented Jan 22, 2025 •

edited

Loading

LuciferYang commented Jan 22, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading