Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra Arabic diacritic and TTS models #141

Open
Kentoseth opened this issue Jun 4, 2024 · 3 comments
Open

Add extra Arabic diacritic and TTS models #141

Kentoseth opened this issue Jun 4, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Kentoseth
Copy link

Kentoseth commented Jun 4, 2024

Hi there,

Thanks again for this wonderful project. I think we previously discussed that the format of the models for your app should be .ort. Fortunately for us, we now have more of these for Arabic.

Here are 2 diacritic models in .onnx format:

https://github.com/nipponjo/arabic_vocalizer

(I found some limitations with libtashkeel and opened an issue with the author to clarify: mush42/libtashkeel#2 )

I think the only complicated part here will be in the selection option (not present) of the vocalizer model. Right now it seems to default to the only model available.

And here is the .onnx TTS model:

https://github.com/nipponjo/tts_arabic

I wasn't able to detect the model file in the repo though.

(I don't know if the .onnx format will be an issue, as it is an intermediate model and not the production option)

@mkiol
Copy link
Owner

mkiol commented Jun 5, 2024

I'm very happy that there is more Arabic support :) I will definitely check out these models.

I think the only complicated part here will be in the selection option (not present) of the vocalizer model. Right now it seems to default to the only model available.

Yes, this is a missing part and have to be implemented.

I don't know if the .onnx format will be an issue

No, it is not an issue. Onnx is used by piper and mimic3, so all needed libraries are already integrated and packed into Flatpak package.

@mkiol mkiol added the enhancement New feature or request label Jun 5, 2024
@Kentoseth
Copy link
Author

I've been discussing with libtashkeel author: mush42/libtashkeel#2 (comment)

He informed me that the piper model you are using from piper-phonemize is an MVP model and he has since updated to a better model.

It may be best to drop the MVP model entirely and use the .onnx available here:

https://github.com/mush42/libtashkeel/blob/main/libtashkeel_base/data/ort/model.onnx

To summarize, if you drop the MVP model, then there will be three new diacritics models available and one new Arabic TTS model for the app.

@mkiol
Copy link
Owner

mkiol commented Jun 7, 2024

Thanks a lot for all the insights!

Indeed, Speech Note currently uses tashkeel re-implemented to C++ version borrowed from Piper project. This version doesn't work with the latest ONNX model. To enable it, I need to integrate the newest libtashkeel. The problem is that libtashkeel uses Rust, so I need to introduce new compiler in my tool-chain. It is a lot of hassle but it is perfectly doable. I will try to do something for the next version (or next after next).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants