Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions python/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,10 @@ def get_default_python_executables():
def split_and_validate_testnames(testnames):
testnames_to_test = []

py4j_module_path = os.path.join(SPARK_HOME, "python/lib/py4j-0.10.9.9-src.zip")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I believe we're already handling this via bin/pyspark script (see run_individual_python_test).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it fail when it imports? if the case, importlib.util.find_spec(name) is not None should work

Copy link
Contributor Author

@gaogaotiantian gaogaotiantian Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did this in bin/pyspark so we need it here. importlib.util.find_spec(name) fails because it can't find py4j. It can find pyspark that's why the errors.

Basically when the script tries to locate the test module. It can find pyspark, but not pyspark.sql (because py4j is not there). Then it believes the test should be split as pyspark sql.xxx. We need py4j in this specific script (that runs before bin/pyspark) to check the test module properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the problem here is that it might require other dependencies around, e.g., numpy is a required dependency for ML, and it will fail. Can we handle this case as well?

if py4j_module_path not in sys.path:
sys.path.append(py4j_module_path)

def module_exists(module):
try:
return importlib.util.find_spec(module) is not None
Expand Down