Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue with mimic3-server (Latin-1 vs UTF-8) #52

Open
sulivanShu opened this issue Oct 23, 2023 · 0 comments
Open

Encoding issue with mimic3-server (Latin-1 vs UTF-8) #52

sulivanShu opened this issue Oct 23, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@sulivanShu
Copy link

Hi!

I have an encoding issue with mimic3-server:

$ mimic3 --remote --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
Traceback (most recent call last):
  File "mimic3.py", line 40, in <module>
  File "mimic3_tts/__main__.py", line 129, in main
  File "mimic3_tts/__main__.py", line 450, in process_lines
  File "mimic3_tts/__main__.py", line 397, in process_line
  File "mimic3_tts/__main__.py", line 587, in get_remote_wav_bytes
  File "requests/api.py", line 115, in post
  File "requests/api.py", line 59, in request
  File "requests/sessions.py", line 587, in request
  File "requests/sessions.py", line 701, in send
  File "requests/adapters.py", line 489, in send
  File "urllib3/connectionpool.py", line 703, in urlopen
  File "urllib3/connectionpool.py", line 398, in _make_request
  File "urllib3/connection.py", line 239, in request
  File "http/client.py", line 1255, in request
  File "http/client.py", line 1300, in _send_request
  File "http/client.py", line 164, in _encode
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 5: Body ('’') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
[582387] Failed to execute script 'mimic3' due to unhandled exception!
aplay: read_header:2931: erreur de lecture

However, there is no issue with mimic3:

$ mimic3 --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
INFO:mimic3_tts.tts:Loaded voice from /usr/share/mycroft/mimic3/voices/en_UK/apope_low

The error message states: “Use body.encode(‘utf-8’) if you want to send it encoded in UTF-8.” but I don’t know how do this. I simply run the server with the command:

$ mimic3-server --num-threads 6

I couldn’t find the option to tell the server that the input is utf-8 encoded. Here the versions of mimic3 and mimic3-server:

$ mimic3 --version
0.2.3
$ mimic3-server --version
0.1.1

Here are my locales and system:

$ env | grep LANG
LANG=fr_FR.utf8
GDM_LANG=fr_FR.utf8
$ lsb_release -a
LSB Version:    n/a
Distributor ID: Manjaro-ARM
Description:    Manjaro ARM Linux
Release:        23.02
Codename:       n/a

Here is a tip to get around the issue:

echo "I don’t speak English" | iconv -f UTF-8 -t ISO-8859-1//TRANSLIT | mimic3 --remote --voice 'en_UK/apope_low' | aplay --quiet

This converts UTF-8 strings to ISO-8859-1 (i.e. Latin-1) while attempting to transcribe unrecognized characters, like "’".

I think this is a bug, because mimic3-server should accept UTF-8 encoding, as mimic3 does without problem.

@sulivanShu sulivanShu added the bug Something isn't working label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant