Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Reading utf8 encoded source code from windows stdin will throw UnicodeEncodeError #430

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

EmbolismSoil
Copy link

The old version of the fix_code function uses stdin.encoding as the default input encoding, which is not the encoding of the input file. For example, you are editing a utf8 encoded file in vim on windows, and then output the contents of the vim buffer to autopep8 through the pipeline. At this time, since the stdin.encoding on windows(gb2312 as default) is different from the encoding of the buffered content of vim, UnicodeEncodeError exception will be thrown.
The fix for this problem is that stdin is only used as a byte input pipeline. autopep8 reads the source code that needs to be formatted by stdin as bytes, and then dynamically detects the encoding format.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.004%) to 98.693% when pulling 14442a6 on EmbolismSoil:master into 72821e7 on hhatto:master.

@hhatto hhatto self-requested a review September 19, 2018 07:45
@hhatto
Copy link
Owner

hhatto commented Sep 19, 2018

Thanks for contribution.

Could you tell us more detailed information on the environment?

  • OS (detail of Windows version, etc...)
  • Python Version

@EmbolismSoil
Copy link
Author

Thanks for contribution.

Could you tell us more detailed information on the environment?

  • OS (detail of Windows version, etc...)
  • Python Version

Windows-7-6.1.7601-SP1 64-bit
Python 3.6.6 64-bit

@AllanChain
Copy link

AllanChain commented Jul 10, 2019

I got this error yesterday when using vim ale to fix a utf-8 python script, and attempted to solve it, only to find that encoding is quite annoying. I have tried EmbolismSoil's solution but isn't perfect. Here is my test, the result is annoying but interersting.

  • Before merging EmbolismSoil's
D:\Desktop>echo print("循环") > a.py

D:\Desktop>type a.py
print("循环")

D:\Desktop>autopep8 - < a.py
print("ѭ��")

# In MSYS2
 /d/Desktop  echo 'print("循环")' > b.py
 /d/Desktop  cat b.py
print("循环")
 /d/Desktop  autopep8 - < b.py
Traceback (most recent call last):
  File "d:\study\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\study\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Study\Python\Scripts\autopep8.exe\__main__.py", line 9, in <module>
  File "d:\study\python\lib\site-packages\autopep8.py", line 4192, in main
    fix_code(sys.stdin.read(), args, encoding=encoding))
UnicodeEncodeError: 'gbk' codec can't encode character '\udcaa' in position 8: illegal multibyte sequence
 ✘  /d/Desktop  autopep8 -i b.py
 /d/Desktop  cat b.py
print("循环")
 /d/Desktop  file a.py
a.py: ISO-8859 text, with CRLF line terminators
 /d/Desktop  file b.py
b.py: UTF-8 Unicode text

  • After merging EmbolismSoil's
D:\Desktop\Program\Gits\autopep8>autopep8.py - < D:\Desktop\a.py
Traceback (most recent call last):
  File "D:\Desktop\Program\Gits\autopep8\autopep8.py", line 4252, in <module>
    sys.exit(main())
  File "D:\Desktop\Program\Gits\autopep8\autopep8.py", line 4209, in main
    stdout.write(fix_code(stdin, args))
  File "D:\Desktop\Program\Gits\autopep8\autopep8.py", line 3354, in fix_code
    return fix_lines(source.readlines(), options=options)
  File "D:\Study\Python\lib\codecs.py", line 618, in readlines
    data = self.read()
  File "D:\Study\Python\lib\codecs.py", line 504, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 9: invalid start byte

D:\Desktop\Program\Gits\autopep8>autopep8.py - < D:\Desktop\b.py
print("ѭ��")

And I'm using Windows 10 Version 1903 64-bit and Python 3.7.3 64-bit, cloned latest autopep8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants