-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine hyphenation patterns for Serbian Cyrillic and Latin scripts #566
Conversation
…ript Combine the patterns for Cyrillic and Latin scripts.
You mean the Latin one is currently completely absent I presume? As phrased it sounds a bit like you forgot to delete it. :-) |
Pinging @strn @roshavagarga who contributed to #372 for thoughts and approval. |
@poire-z I'd say @strn would be able to give a more valid opinion around whether this is something that should be done, as my understanding of Serbian and the cultural connotations of the above change are fairly basic. If it works out-of-the-box and there aren't any cultural reasons not to do this, I don't see an issue. I would note, however, that I'm not sure how the source(s) used for this compare to the one we currently use for Serbian, so possibly something to compare and/or test? (Taken from here) |
You are right, they are now absent. When I read a Serbian book written in Latin script, I have to change the language to Croatian. That loads the croatian patterns that are based on the same Latin script. Otherwise, there is no hyphenation. |
Those are the same patterns, made by Dejan Muhamedagić, used in TeX.
|
Pinging again @strn - please give us some feedback. |
@poire-z , sorry for the late reply. Yes, if patterns are the same, then they should be used for hyphenating texts in Serbian language - regardless of how it is written now. However, let me just emphasize and remind you once again that only Serbian Cyrillic is a valid Serbian language alphabet. Usage of Croatian Latin alphabet comes from Yugoslav era and is best to be left there. |
As I've already said, this is just a technical matter that removes the need to change languages when reading books typeset on the Latin script. @strn Can you please point to some valid reference that supports your claims? |
Includes: - Russian hyphenation: revert "allow hyphens after не" koreader/crengine#568 - Serbian hyphenation: combine patterns for Cyrillic and Latin scripts koreader/crengine#566 - writeNodeEx(): fix handling of multilines attribute values koreader/crengine#569 See #12004 (comment). - Add getBalancedHTML() helper Also includes: - kobo: add missing blitbuffer library koreader/koreader-base#1823
Includes: - Russian hyphenation: revert "allow hyphens after не" koreader/crengine#568 - Serbian hyphenation: combine patterns for Cyrillic and Latin scripts koreader/crengine#566 - writeNodeEx(): fix handling of multilines attribute values koreader/crengine#569 See koreader#12004 (comment). - Add getBalancedHTML() helper Also includes: - kobo: add missing blitbuffer library koreader/koreader-base#1823
This pull request continues on the pull request #372.
As Serbian language uses two scripts with different codepoints, it is safe to combine the patterns into one file. In that way, it doesn't matter which script is used, and even texts that use both scripts will be properly hyphenated. Only the main part of the language tag in (X)HTML should be consulted to load the appropriate patterns. So sr, sr-Cyrl, sr-Latn, and regional versions of these (like sr_RS) should all load the same pattern file.
This approach is already successfully implemented in ConTeXt.
Patterns have been converted from https://devbase.net/dict-sr/ same ones used in LibreOffice extension Serbian Spellchecker.
This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"