-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54632][INFRA][FOLLOW-UP] Enable ruff on our CI and lint-python #53412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Upvote for this switch! however, I think it is still better to pin the exact version of ruff to avoid future unexpected conflicts caused by linter upgrade |
| FLAKE8_TEST=true | ||
| RUFF_TEST=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we enable both (flake8+black) and ruff in CI, will they have potential conflict on format? maybe better to have single one exist at a time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional - we want to make sure that ruff and flake8 are compatible at this point. We will eventually deprecate flake8.
|
Merged to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to break the CI unfortunately via ruff checks failed, @gaogaotiantian . Could you take a look at that?
ruff checks failed:
F401 [*] `pyarrow` imported but unused
--> python/pyspark/sql/pandas/serializers.py:1219:27
|
1217 | (one list per batch).
1218 | """
1219 | import pyarrow as pa
| ^^
1220 |
1221 | def process_group(batches: "Iterator[pa.RecordBatch]"):
|
help: Remove unused import: `pyarrow`
Found 1 error.
[*] 1 fixable with the `--fix` option.
|
Reverted for now |
|
Let's file a new JIRA @gaogaotiantian |
|
Thank you for swift recovering! |
|
Okay this is a real lint issue - so I believe there is some racing on master, which made the PR in without being checked by ruff. When 19a1da9 is merged, it removed the if TYPE_CHECKING:
import pandas as pd
import pyarrow as paThis is not a false alarm, this is an issue that should've been caught by ruff if the order is correct. |
|
I opened #53441 without a new JIRA because nothing changed - it's not the original PR that broke the CI, it's the other PR. I believe what happened was that the two PRs were merged almost simutaneously, so when the ruff PR pulls the repo, it somehow gets the change from the later PR - notice the line number is |
What changes were proposed in this pull request?
This add
ruffto our docker image and enables ruff check in our CI. Also this adds ruff check indev/lint-python.We want to have both
ruffandflake8run in CI for a while to confirm the compatibility then we will deprecateflake8.It is intentional to leave ruff version blank - so it uses the latest version. I think the linter rule is pretty stable and the version upgrade should not affect our workflow too much. This is an experiment about using the latest version of devtool. Pinning the version has its advantage but upgrading the version is painful. If we can occasionally (per month?) fix a small amount of code for linter, I think that's acceptable.After discussion with @Yicong-Huang I think it's a good idea to pin the version - so we can do all the code fix with devtool change in a single PR. We should avoid having multiple people trying to fix the code when there is a new rule.
Why are the changes needed?
flake8is too slow and our pinned version is just too old. We should replace it.Does this PR introduce any user-facing change?
No
How was this patch tested?
Local ruff test passes.
Was this patch authored or co-authored using generative AI tooling?
No