Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing encoding for Windows with non-utf-8 code page #45

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

anion0278
Copy link

This change specifies the code page for string decoding in Windows. When using decode() without the encoding parameter on Windows machine with non-utf-8 encoding ("Region - Language for non-Unicode programs" for example cp852) the following exceptions occur:

colcon.colcon_core.event_reactor ERROR Exception in event handler extension 'console_stderr': 'utf-8' codec can't decode byte 0xe1 in position 112: invalid continuation byte
Traceback (most recent call last):
  File "c:\opt\ros\foxy\x64\lib\site-packages\colcon_core\event_reactor.py", line 78, in _notify_observers
    retval = observer(event)
  File "c:\opt\ros\foxy\x64\lib\site-packages\colcon_output\event_handler\console_stderr.py", line 49, in __call__
    b''.join(
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 112: invalid continuation byte

Code page number is obtained once at ctor of the ConsoleCohesionEventHandler. The change applies only to Windows OS.
For other platforms the encoding remains the same utf-8.
The reason for acquiring the codepage through the subprocess is that sys.stdout.encoding incorrectly returns utf-8, while the actual encoding is different.

This should represent a simple fix for the following issues:
ms-iot/ROSOnWindows#214 (comment)
ros2/ros2_documentation#654
https://answers.ros.org/question/351567/ros2-colon-building-turtlesim-package-failed/

anion0278 added 2 commits May 5, 2023 12:24
Code page number is obtained once at ctor of the ConsoleCohesionEventHandler.
The change applies only to Windows OS.
For other OS the encoding remains the same "utf-8".
@Michaelmlh
Copy link

Well done! I met the same problem and solved it with this method.
But the change requires more consideration. When the new function get_encoding gets the Active Code Page via chcp, the result will be different depending on the Active Code Page. So the replace('Active code page: ', '') might not work.

@anion0278
Copy link
Author

@Michaelmlh could you please give some examples of what results chcp might produce and what alternative solutions are available? I'm basing the current solution on this StackOverflow answer, and it seems to do the job, or at least I'm not aware of any cases where it would return something unparseable.

@Michaelmlh
Copy link

@anion0278
When I use this code chcp, I get the result "活动代码页: 936" instead of "Active code pege: 936", which causes an error when I run colcon build --envent-handlers console_cohesion+ with this method.
I think the different Windows language caused this to happen.

@Michaelmlh
Copy link

Maybe you should get the numeric part of the returned string, and it will avoid the first part of the string

@anion0278
Copy link
Author

Thanks for pointing out this case! I will try to solve the issue using Regex, something like .+: (.+) and then using the first group. I will make changes now.

@anion0278
Copy link
Author

@Michaelmlh I have incorporated changes in order to make it more universal. Could you, please, test it on your machine?

@Michaelmlh
Copy link

@anion0278
It works! And Thx for solving the problem. When I first learned colcon, it took me a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants