Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Handle encodings better; make the sum type "public"
Windows does not have a direct analogue of LANG=C or LC_ALL=C. Some programs give them special treatment, but they do not affect the way localized behavior of the Windows API operates. In particular, the bash.exe WSL wrapper, as well as wsl.exe and wslconfig.exe, do not produce their own localized messages (for errors not originating in a running distribution) when they are set. Windows also provides significant localization through localized versions of Windows, so changing language settings in Windows, even system-wide, does not always produce the same effect that many or most Windows users who use a particular language would experience. Various encodings may appear when bash.exe is WSL-related but gives its own error message. Such a message is often in UTF-16LE, which is what Windows uses internally, and preserves any localization. That is the more well behaved scenario and already was detected; this commit moves, but does not change, the code for that. The situation where it is not UTF-16LE was previously handled by treating it as UTF-8. Because the default strict error treatment was used, this would error out in test discovery in some localized setups, preventing all tests in test_index from running, including the majority of them that are not related to hooks. This fixes that by doing better detection that should decode the mesages correctly most of the time, that should in practice decode them well enough to tell (by the aka.ms URL) if the message is complaining about there being no installed distribution all(?) of the time, and that should avoid breaking unrelated tests even if that can't be done. An English non-UTF-16LE message appears on GitHub Actions CI when no distribution is installed. Testing of this situation on other languages was performed in a virtual machine on a development machine. That the message is output in a narrow character set some of the time when bash.exe produces it appears to be a limitation of the bash.exe wrapper. In particular, with a zh-CN version of Windows (and with the language not changed to anything else), a localized message in Simplified Chinese was correctly printed when running wsl.exe, but running bash.exe produced literal "?" characters in place of Chinese characters (it was not a display or font issue, and they were "?" and not Unicode replacement characters). The change here does not overcome that; the literal "?" characters will be included. But "https://aka.ms/wslstore" is still present if it is an error about not having any distributions, so the correct status is still inferred. For more information on code pages in Windows, see: https://serverfault.com/a/836221 The following alternatives to all or part of the approach taken here were considered but, at least for now, not done, because they would not clearly be simpler or more correct: - Abandoning this "run bash.exe and see what it shows us" approach altogether and instead reimplementing the rules CreateProcessW uses, to find if the bash.exe the system finds is the one in System32, and then, if so, checking the metadata in that executable to determine if it's the WSL wrapper. I believe that would be even more complex to do correctly than it seems; the behavior noted in the WinBashStatus docstring and recent commit messages is not the whole story. The documented search order for CreateProcessW seems not to be followed in some circumstances. One is that the Windows Store version of Python seems always to forgo the usual System32 search that precedes seaching directories in PATH. It looks like there may also be some subtleties in which directories 32-bit builds search in. - Using chardet. Although the chardet library is excellent, it is not clear that the code needed to bridge this highly specific use case to it would be simpler than the code currently in use. Some of the work might still need to be done by other means; when I tested it out for this, this did not detect the UTF-16LE messages as such for English. (They are often also valid UTF-8, because interleaving null characters is, while strange, permitted.) - Also calling wsl.exe and/or wslconfig.exe. It's still necessary to call bash.exe, because it may not be the WSL bash, even on a system with WSL fully set up. Furthermore, those tools' output seem to vary in some complex ways, too. Using only one subprocess for the detection seemed the simplest. Even using "wsl --list" would introduce significant additional logic. Sometimes its output is a list of distributions, sometimes it is an error message, and if WSL is not set up it may be a help message. - Using the Windows API to check for WSL systems. https://learn.microsoft.com/en-us/windows/win32/api/wslapi/ does not currently include functions to list registered distributions. - Attempting to get wsl.exe to produce an English message using Windows API techniques like those used in Locale Emulator. This would be complicated, somewhat unintuitive and error prone to do in Python, and I am not sure how well it would work on a system that does not have an English language pack installed. - Checking on disk for WSL distributions in the places they are most often expected to be. This would intertwine WinBashStatus with deep details of how WSL actually operates, and this seems like the sort of thing that is likely to change in the future. However, there may be a more straightforward way to do this (that is at least as correct and that remains transparent to debug). Especially if the WinBashStatus class remains in test_index for long (rather than just being used to aid in debugging existing test falures and possible subsequent design decisions for making commit hooks work more robustly on Windows in GitPython), then this may be worth revisiting. Thus it is *not* with the intention of treating WinBashStatus as a "stable" part of the test suite that it is renamed from _WinBashStatus. This is instead done because: - Like HOOKS_SHEBANG, it makes sense to import it when figuring out how the tests work or debugging them. - When debugging, it is intended that it be imported to call check() and examine the resulting `process` and `message` information, at least in the CheckError case.
- Loading branch information