Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lint: add typos check #1888

Merged
merged 4 commits into from
Jul 17, 2024
Merged

Conversation

Borda
Copy link
Contributor

@Borda Borda commented Mar 31, 2024

Just a suggestion to add a check for typos and maybe let's fix some without breaking API

README.md Outdated Show resolved Hide resolved
@Borda Borda marked this pull request as draft March 31, 2024 20:47
@EliahKagan
Copy link
Contributor

EliahKagan commented Mar 31, 2024

I've noticed a considerable number of areas in the diff where correct names are made incorrect ("rela" stands for "relative" and I don't think there are any occurrences where it should be changed to "real", and there are some others). This is not limited to the GPG signature and project-name cases that you've identified. In addition, I'm not sure any changes should be made in files in test/fixtures that are used as test repository contents.

However, I've also noticed that you've marked this as a draft, and maybe you aware of the other issues. If you think it would be helpful for me to leave a review with comments on the individual problematic cases, I'd be pleased to do so. Otherwise I will assume as long as this is a draft that such a review might be more of a distraction than a help, and refrain from it.

There are also some areas where at least the fixes are clearly a huge improvement, particularly in test/test_index.py where it had not even occurred to me that, because of the way pytest marks work, I could misspell raises and accidentally write xfail marks that don't enforce specific exception types. At minimum, that should certainly be fixed. Hopefully the idea here will work out in some form, but even if not, I can't speak for you or for Byron but from my perspective the effort so far is worth it just for finding that.

@EliahKagan
Copy link
Contributor

EliahKagan commented Apr 1, 2024

To make sure it is not lost track of, and also to report the results of some manual testing because the affected xfail markings cover some things not produced on CI, I've opened #1893 for the bug you've discovered in test_index.py here. This is the bug where three of the xfail markings pass misspelled keyword arguments that attempt unsuccessfully to cause the test to report an unexpected failure if the exception is not as declared. It differs from some of the other misspellings found here because it affects the behavior of the tests and can cause an unexpected failure to be wrongly reported as an expected failure.

Although those are definitely not the only typos found here that should be fixed, it seems to me that their elevated importance and relationship to the correctness of the tests justifies a separate PR to fix them, especially if such a PR would result in their being fixed sooner (and then they would no longer have to be worried about here).

If you are amenable to this idea, then I suggest opening that, as you deserve the credit for it. But I would be pleased to open that PR instead if you prefer (I would list you in the Co-authored-by trailer).

While another option may be to wait for the change to come in with this PR, I think it is better that it not be delayed while figuring out if and how automated spell checking can be added safely and with an acceptably low rate of false positives.

@Byron
Copy link
Member

Byron commented Apr 2, 2024

Thanks for sharing this draft, I am happy it could already find a genuine issue (#1893) despite a high rate of false positives.
My feeling here is that given that high amount and known issues in the underlying engine, trying to make this work beyond what's here won't be worth it. But that will be for @Borda to decide, and I'd appreciate such a decision so the PR won't stay open for too long.

Thank you

@Borda
Copy link
Contributor Author

Borda commented Apr 2, 2024

With my other projects, I have been using several typing tools, and this seems to be at first, lower effort, but as mentioned, it produces a significant number of false positives, and with the next version, there could be even more (just opened issues for crate-ci/typos#966 and crate-ci/typos#969)

So I'll open a separate PR for the fixes and most likely pivot this PR to use another typing alternative :)

@Borda
Copy link
Contributor Author

Borda commented May 7, 2024

@Borda
Copy link
Contributor Author

Borda commented May 7, 2024

@EliahKagan @Byron, would you mind having a look at the updated version?
Not sure what to do about doesnt, which is used as a file name... 🐿️

@Borda Borda marked this pull request as ready for review May 7, 2024 17:40
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making it happen!

Now it looks like the tool is usable, and it's nice to see that it caught a couple of real errors.

I will wait for @EliahKagan approval though before merging in case I am missing some more obscure aspects of the tool and as it's integrated into the tooling of GitPython.

git/objects/util.py Outdated Show resolved Hide resolved
git/util.py Outdated Show resolved Hide resolved
Copy link
Contributor

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the keyword argument spelling fixes in test/test_index.py justify adding automated spell-checking, even though various cases remain where spell-checking seems to have led to incorrect or sub-optimal changes.

These can be fixed, and the risk that spell-checking would lead to such cases being introduced later is, in my opinion, outweighed by the benefits of catching misspellings that, due to the dynamic nature of Python and its idiomatic uses, may affect the behavior of GitPython or its tests.

I've looked at each change and commented about the ones that I think should not be done or otherwise still need improvement. Some comments cover multiple changes, so the absence of a comment on a specific change does not mean that I think it is correct.

I recommend that this PR be marked as fixing #1893.

Edit: If Cygwin tests fail with "dubious ownership" errors when more commits are pushed to this pull request, that is not any fault of this PR, but also happens without the changes here. I've opened #1916 to fix it. If that pull request is merged, then merging from main or (perhaps better) rebasing this PR feature branch onto main should allow new Cygwin runs to pass here too.

@@ -439,9 +439,9 @@ def raise_exc(e: Exception) -> NoReturn:
# END glob handling
try:
for root, _dirs, files in os.walk(abs_path, onerror=raise_exc):
for rela_file in files:
for relative_fpath in files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the name relative_fpath better than rela_file? Why was this chosen instead of relative_path? If the f in fpath is unimportant, then it should be removed. If it is important, then it should be spelled out. If explicitness is not required, then presumably rela_file is also okay, in which case it should not be changed just to make the spell checker happy.

This applies to most occurrences of relative_fpath, including in other files.

My guess is that this should be relative_path. The nonpublic _items_to_rela_paths method was renamed to _items_to_relative_paths. Assuming that change is good, which I think it is, it seems like relative_fpath should just be relative_path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rela was marked as typo so I found easier to use full name without affecting API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think relative_fpath is a full name. It looks like a typo that is meant to be relative_path without the f.

Maybe it is not a typo. Maybe f is an abbreviation for something that should be spelled out (if important) or omitted (if unimportant).

The key point is that I do not know what relative_fpath means in the places where this PR has introduced it, and I have not been able to figure that out. (I have been able to guess that the f stands for "file," but I am not certain of this, and without knowing the old variable name rela_file, I would likely not even have been able to guess this.) I expect that other current or future readers may also not know what it means.

I recommend changing it, probably to relative_path.

git/objects/util.py Outdated Show resolved Hide resolved
git/refs/symbolic.py Outdated Show resolved Hide resolved
git/remote.py Outdated
# uptodate encoded in control character
# up-to-date encoded in control character
Copy link
Contributor

@EliahKagan EliahKagan May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may seem at first glance to be obviously correct, but I think it actually may be wrong, and that if it is to be kept then it requires a specific technical justification.

I think uptodate is a specific technical term in Git. In the Git source code, it often appears capitalized, but it also appears lower-case in multiple places, which also seems to be intentional. As one example, in fetch.c:

		/* uptodate lines are only shown on high verbosity level */
		if (verbosity <= 0 && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
			continue;

It seems like that specific technical meaning is the one relevant here. If this has been verified not to be the case, then the change here is okay. Otherwise, either the change should be undone and uptodate added as a correct spelling, or it should be investigated.

Although this feels minor, making technical terms harder to search for can accumulate and make a codebase difficult to work with. That is both potentially relevant to this specific change, and a potential risk of automated spell-checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case making it a command associate would be better

pyproject.toml Outdated
@@ -79,3 +79,9 @@ lint.unfixable = [
"test/**" = [
"B018", # useless-expression
]

[tool.codespell]
skip = 'test/fixtures/reflog_*'
Copy link
Contributor

@EliahKagan EliahKagan May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of the files in fixtures that represent test input or expected output should be spell-checked. I think the *.py files in fixtures, which are actually run as code, should be spell-checked, and that other files should not.

It seems to me that the question to ask is, if code appearing inside a fixture were found to have a logic error, should that bug be fixed? A number of fixture files have Ruby code or diffs thereof, but these are just test data. If logic errors in that code (which isn't run) shouldn't be fixed, then either the same files should not be spell-checked, or the justification for spell-checking them should be made clear. The issues with these kinds of changes are:

  1. Churn in test data may make it so that changes to test data that are actually done to improve the tests are hard to identify.
  2. Changes in test data need to be reviewed to evaluate whether they could have any impact on the tests. It is possible, in general, for a change to test data to keep a test passing, while preventing it from catching regressions that it would have caught before the test data changed.

The first concern is minor and may well be overcome by the slight readability improvement of avoiding typos. The second concern is less minor and it seems to me that this is not worth the risk, even if small. Tests can assert things that are affected by the presence or absence of specific strings or that involve specific lengths.

This also applies to all changes in test/fixutres/diff_mode_only. I have not posted separate comments there.

Edit: See also #1920 (review).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes all sounds reasonable to me, so how about splitting this into two PRs?

  • add typos check with exclude fixtures
  • revisit fixtures' typos and eventually remove ignoring this folder from typo's check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EliahKagan Did you see this message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good approach to me.

pyproject.toml Outdated Show resolved Hide resolved
test/test_refs.py Outdated Show resolved Hide resolved
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now officially setting this PR to a state that indicates that some modifications are needed.

@Borda Borda force-pushed the precommit/typos branch from b1aa63d to 2ce013c Compare July 17, 2024 10:20
git/remote.py Outdated Show resolved Hide resolved
@@ -52,7 +52,7 @@

_streams_n_substrings = (
None,
"steram",
"stream",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably linked to fixtures / test data

@Borda Borda requested review from EliahKagan and Byron July 17, 2024 10:40
@Borda
Copy link
Contributor Author

Borda commented Jul 17, 2024

@EliahKagan @Byron reverted most of my additional changes so keep it just with adding check and fixing all flagged issues, also excluding fixtures...

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the minification of the PR, it looks good to me!

@Byron Byron merged commit 89822f8 into gitpython-developers:main Jul 17, 2024
26 checks passed
Copy link
Contributor

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Borda Borda deleted the precommit/typos branch July 17, 2024 17:30
EliahKagan added a commit to EliahKagan/GitPython that referenced this pull request Jul 24, 2024
Some of the CI tests use WSL. This switches the WSL distribution
from Debian to Alpine, which might be slightly faster. For the way
it is being used here, the main expected speed improvement would be
to how long the image would take to download, as Alpine is smaller.

(The reason for this is thus unrelated to the reason for the Alpine
docker CI test job added in gitpython-developers#1826. There, the goal was to test on a
wider variety of systems and environments, and that runs the whole
test suite in Alpine. This just changes the WSL distro, used by a
few tests on Windows, from Debian to Alpine.)

Two things have changed that, taken together, have unblocked this:

- Vampire/setup-wsl#50 was fixed, so the
  action we are using is able to install Alpine Linux. See:
  gitpython-developers#1917 (review)

- gitpython-developers#1893 was fixed in gitpython-developers#1888. So if switching the WSL distro from
  Debian to Alpine breaks any tests, including by making them fail
  in an unexpected way that raises the wrong exception, we are
  likely to find out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants