Skip to content

Conversation

@gaogaotiantian
Copy link
Contributor

@gaogaotiantian gaogaotiantian commented Dec 10, 2025

What changes were proposed in this pull request?

This add ruff to our docker image and enables ruff check in our CI. Also this adds ruff check in dev/lint-python.

We want to have both ruff and flake8 run in CI for a while to confirm the compatibility then we will deprecate flake8.

It is intentional to leave ruff version blank - so it uses the latest version. I think the linter rule is pretty stable and the version upgrade should not affect our workflow too much. This is an experiment about using the latest version of devtool. Pinning the version has its advantage but upgrading the version is painful. If we can occasionally (per month?) fix a small amount of code for linter, I think that's acceptable.

After discussion with @Yicong-Huang I think it's a good idea to pin the version - so we can do all the code fix with devtool change in a single PR. We should avoid having multiple people trying to fix the code when there is a new rule.

Why are the changes needed?

flake8 is too slow and our pinned version is just too old. We should replace it.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Local ruff test passes.

Was this patch authored or co-authored using generative AI tooling?

No

@Yicong-Huang
Copy link
Contributor

Upvote for this switch! however, I think it is still better to pin the exact version of ruff to avoid future unexpected conflicts caused by linter upgrade

Comment on lines 81 to +82
FLAKE8_TEST=true
RUFF_TEST=true
Copy link
Contributor

@Yicong-Huang Yicong-Huang Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we enable both (flake8+black) and ruff in CI, will they have potential conflict on format? maybe better to have single one exist at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional - we want to make sure that ruff and flake8 are compatible at this point. We will eventually deprecate flake8.

@HyukjinKwon HyukjinKwon changed the title [SPARK-54632][FOLLOWUP] Enable ruff on our CI and lint-python [SPARK-54632][INFRA][FOLLOW-UP] Enable ruff on our CI and lint-python Dec 10, 2025
@HyukjinKwon
Copy link
Member

Merged to master.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break the CI unfortunately via ruff checks failed, @gaogaotiantian . Could you take a look at that?

ruff checks failed:
F401 [*] `pyarrow` imported but unused
    --> python/pyspark/sql/pandas/serializers.py:1219:27
     |
1217 |         (one list per batch).
1218 |         """
1219 |         import pyarrow as pa
     |                           ^^
1220 |
1221 |         def process_group(batches: "Iterator[pa.RecordBatch]"):
     |
help: Remove unused import: `pyarrow`

Found 1 error.
[*] 1 fixable with the `--fix` option.
Screenshot 2025-12-10 at 16 46 18

@HyukjinKwon
Copy link
Member

Reverted for now

@HyukjinKwon
Copy link
Member

Let's file a new JIRA @gaogaotiantian

@dongjoon-hyun
Copy link
Member

Thank you for swift recovering!

@gaogaotiantian gaogaotiantian deleted the add-ruff-to-ci branch December 11, 2025 07:48
@gaogaotiantian
Copy link
Contributor Author

Okay this is a real lint issue - so I believe there is some racing on master, which made the PR in without being checked by ruff.

When 19a1da9 is merged, it removed the pa usage from that method so pa is not needed anymore. Notice that there is still a pa in type hint - that does not need pa here because pa is imported globally for type checking.

if TYPE_CHECKING:
    import pandas as pd
    import pyarrow as pa

This is not a false alarm, this is an issue that should've been caught by ruff if the order is correct.

@gaogaotiantian
Copy link
Contributor Author

I opened #53441 without a new JIRA because nothing changed - it's not the original PR that broke the CI, it's the other PR. I believe what happened was that the two PRs were merged almost simutaneously, so when the ruff PR pulls the repo, it somehow gets the change from the later PR - notice the line number is 1219 - that's the line number after 19a1da9 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants