Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Ignoring encoding error 'utf-8' codec can't decode byte 0xd4 in position 38: invalid continuation byte when playing file with russian file name #283

Open
soredake opened this issue Mar 19, 2024 · 6 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@soredake
Copy link

Describe the bug

A clear and concise description of what the bug/error is.

Desktop (please complete the following information):

  • OS and Version: Windows 11 Pro
  • Python Version: Python 3.12.2
  • Player and Version: MPV.net 7.1.1.0

To Reproduce

Steps to reproduce the behavior:

  1. Open file with russian language in file name (like Физрук - S01E01 - Episode 1.mkv)
  2. Got to the trakt-scrobbler log.
  3. Ignoring encoding error 'utf-8' codec can't decode byte 0xd4 in position 38: invalid continuation byte

Log file

Click to see log contents

2024-03-19 12:30:16,955 - DEBUG - mpv - file_info - Raw filepath 'C:\\Users\\user\\shared-unruhe\\non-anime\\������\\Season 1\\������ - S01E01 - Episode 1.mkv'
2024-03-19 12:30:16,956 - DEBUG - mpv - utils - System encoding scheme: 'cp1251'
2024-03-19 12:30:16,956 - DEBUG - mpv - utils - UTF8: b'C:\\Users\\user\\shared-unruhe\\non-anime\\\xd0\xa4\xd0\xb8\xd0\xb7\xd1\x80\xd1\x83\xd0\xba\\Season 1\\\xd0\xa4\xd0\xb8\xd0\xb7\xd1\x80\xd1\x83\xd0\xba - S01E01 - Episode 1.mkv'
2024-03-19 12:30:16,956 - DEBUG - mpv - utils - System: b'C:\\Users\\user\\shared-unruhe\\non-anime\\\xd4\xe8\xe7\xf0\xf3\xea\\Season 1\\\xd4\xe8\xe7\xf0\xf3\xea - S01E01 - Episode 1.mkv'
2024-03-19 12:30:16,957 - WARNING - mpv - utils - Ignoring encoding error 'utf-8' codec can't decode byte 0xd4 in position 38: invalid continuation byte
2024-03-19 12:30:16,957 - DEBUG - mpv - file_info - Matched whitelist entry 'C:\\Users\\user\\shared-unruhe\\non-anime'
2024-03-19 12:30:17,171 - DEBUG - mpv - file_info - Guess: MatchesDict({'title': '������', 'season': 1, 'episode': 1, 'episode_title': 'Episode 1', 'container': 'mkv', 'mimetype': 'video/x-matroska', 'type': 'episode'})
2024-03-19 12:30:17,171 - DEBUG - mpv - monitor - action=scrobble
2024-03-19 12:30:17,172 - DEBUG - mpv - monitor - {'state': 2, 'progress': 0.02, 'media_info': {'type': 'episode', 'title': '������', 'season': 1, 'episode': 1}, 'updated_at': 1710844217.1718879}
2024-03-19 12:30:17,172 - DEBUG - scrobbler - scrobbler - Scrobbling start at 0.02% for ������
2024-03-19 12:30:18,063 - INFO - scrobbler - scrobbler - Scrobble start successful for P. E. Teacher S01E01 at 0.02%
2024-03-19 12:30:25,598 - DEBUG - mpv - monitor - action=scrobble
2024-03-19 12:30:25,599 - DEBUG - mpv - monitor - {'state': 0, 'progress': 0.58, 'media_info': {'type': 'episode', 'title': '������', 'season': 1, 'episode': 1}, 'updated_at': 1710844225.598191}
2024-03-19 12:30:25,599 - DEBUG - mpv - mpv - Pipe closed.
2024-03-19 12:30:25,599 - DEBUG - scrobbler - scrobbler - Scrobbling stop at 0.58% for ������
2024-03-19 12:30:25,700 - INFO - mpv - mpv - Unable to connect to MPV. Check ipc path.
2024-03-19 12:30:25,780 - INFO - scrobbler - scrobbler - Scrobble pause successful for P. E. Teacher S01E01 at 0.58%

@iamkroot
Copy link
Owner

Maybe related to #208 and #205? Try updating to the master branch.

Install from branch:

  1. Stop the scrobbler with trakts stop
  2. Run pipx install --force --pip-args='--force-reinstall' git+https://github.com/iamkroot/trakt-scrobbler.git@master
  3. Start scrobbler with trakts start

I don't have enough time right now to try to reproduce this locally.

@iamkroot iamkroot added bug Something isn't working help wanted Extra attention is needed labels Mar 20, 2024
@soredake
Copy link
Author

I was already on master, so no changes for me.

@soredake
Copy link
Author

Same with ukrainian:

2024-03-24 12:36:54,413 - DEBUG - mpv - file_info - Raw filepath 'C:\\Users\\user\\OneDrive\\Âèäåî\\Captures\\Shin Megami Tensei - Persona 3 FES 2024-03-06 17-56-04.mp4'
2024-03-24 12:36:54,414 - DEBUG - mpv - utils - System encoding scheme: 'cp1251'
2024-03-24 12:36:54,415 - DEBUG - mpv - utils - UTF8: b'C:\\Users\\user\\OneDrive\\\xd0\x92\xd0\xb8\xd0\xb4\xd0\xb5\xd0\xbe\\Captures\\Shin Megami Tensei - Persona 3 FES 2024-03-06 17-56-04.mp4'
2024-03-24 12:36:54,415 - DEBUG - mpv - utils - System: b'C:\\Users\\user\\OneDrive\\\xc2\xe8\xe4\xe5\xee\\Captures\\Shin Megami Tensei - Persona 3 FES 2024-03-06 17-56-04.mp4'
2024-03-24 12:36:54,415 - WARNING - mpv - utils - Ignoring encoding error 'utf-8' codec can't decode byte 0xc2 in position 23: invalid continuation byte

@iamkroot
Copy link
Owner

Pinging @Sp3EdeR to see if they have any insights on this.

@iamkroot
Copy link
Owner

This is happening in mpv, though I'm guessing we have to do something similar to what we did for mpchc...

@Sp3EdeR
Copy link
Contributor

Sp3EdeR commented Mar 27, 2024

The issue is raised because the continuation byte is D4 (11010100), and according to the UTF-8 standard, all continuation bytes must begin with 10... in binary. So this is a clear hint that the path is not transmitted in UTF-8. The logs say that the system encoding scheme is CP1251, which is the cyrillic code page, which makes sense for this system, of course. In this single-byte codepage, the D4 byte represents the Ф character, which is what @soredake pasted, so they match. This clearly shows that somewhere in the script, we try to decode with utf8, instead of cp1251.

The second case, where logs are attached show the UTF-8 and System string lines though. In this log, both lines correctly represent the Видео string in the associated encoding. So at least the logs show seemingly correct data.

My first guess would be that somewhere in the code, we use MPV.net's output that is in the encoding set in Windows (Windows is capable of unicode in UTF-16, but it is up to each program to choose whether to use UTF-16 or the encoding set by the system.) I think the easiest solution would be to reproduce the issue with MPV.net while setting the Windows codepage to whatever cp, and reproducing the issue with an aptly named folder. Then through debugging, it would be easiest to find where utf8 is assumed instead of using the correct system encoding.

The difference from MPC-HC will be that MPC specifically uses UTF-8 on its web API, so it does not change from computer to computer. In this case, it can be expected that every computer will have a different setup, and thus the encoding should be read (from the system or the MPV.net interface) instead of assumed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants