Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transliteration from Arabic not working for continuous text #7

Open
twardoch opened this issue Aug 11, 2021 · 4 comments
Open

Transliteration from Arabic not working for continuous text #7

twardoch opened this issue Aug 11, 2021 · 4 comments

Comments

@twardoch
Copy link
Collaborator

Transliteration from Arabic is not working for continuous text. It works for single space-separated characters. The Arabic Wiktionary module is a bit complex, need to investigate and add some special processing.

@kbatsuren
Copy link
Owner

Should I implement preprocessing and postprocessing functions in this case? It is like tokenizing continuous text in preprocessing and concat the transliteration results in postprocessing.

@twardoch
Copy link
Collaborator Author

I think it’d be best to find out WHY it’s happening. There are multiple modules:

ar-translit has an unusual tr function: function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit).

I could try to find out how to deal with this, or you might :)

We ought

@skalyan91
Copy link

I would add that when the language is set as fas (Persian), even single letters are not transliterated.

@twardoch
Copy link
Collaborator Author

Yeah, there are a few different Arabic-script transliterators and the whole notion of Arabic needs some special handling in our Py code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@twardoch @skalyan91 @kbatsuren and others