Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speller does not recognize words with touching full stop #3

Open
rueter opened this issue Jan 4, 2023 · 7 comments
Open

Speller does not recognize words with touching full stop #3

rueter opened this issue Jan 4, 2023 · 7 comments
Assignees

Comments

@rueter
Copy link
Member

rueter commented Jan 4, 2023

The Erzya speller does not recognize words with touching full stop.

This may have something to do with Cyrillic-letter range.

Лов куцятне палыть чеерь толкс...
Аволь вейке монгак правтынь сельведь,
Кода совинь Мавзолеенть потс.

Весе содыть, кие течи кулось,
Ансяк тенень кемемс пек стака.

Here full stops, both ‹.› and ‹...› following a word will cause it to not be recognized.
If a space is inserted, the word form is recognized.

@snomos
Copy link
Member

snomos commented Jan 5, 2023

The comma does not seem to be an Issue. Can you confirm that, @rueter ?

@bbqsrc
Copy link
Member

bbqsrc commented Jan 5, 2023

Tokeniser seems to be fine. Bug report needs to be more specific. Which platform and in which software are you seeing the issue?

     Running `target/debug/divvunspell tokenize 'Лов куцятне палыть чеерь толкс...
Аволь вейке монгак правтынь сельведь,
Кода совинь Мавзолеенть потс.

Весе содыть, кие течи кулось,
Ансяк тенень кемемс пек стака.'`
   0: "Лов"
   6: " "
   7: "куцятне"
  21: " "
  22: "палыть"
  34: " "
  35: "чеерь"
  45: " "
  46: "толкс"
  56: "."
  57: "."
  58: "."
  59: "
"
  60: "Аволь"
  70: " "
  71: "вейке"
  81: " "
  82: "монгак"
  94: " "
  95: "правтынь"
 111: " "
 112: "сельведь"
 128: ","
 129: "
"
 130: "Кода"
 138: " "
 139: "совинь"
 151: " "
 152: "Мавзолеенть"
 174: " "
 175: "потс"
 183: "."
 184: "
"
 185: "
"
 186: "Весе"
 194: " "
 195: "содыть"
 207: ","
 208: " "
 209: "кие"
 215: " "
 216: "течи"
 224: " "
 225: "кулось"
 237: ","
 238: "
"
 239: "Ансяк"
 249: " "
 250: "тенень"
 262: " "
 263: "кемемс"
 275: " "
 276: "пек"
 282: " "
 283: "стака"
 293: "."

@rueter
Copy link
Member Author

rueter commented Jan 6, 2023

The comma does not seem to be an Issue. Can you confirm that, @rueter ?

Yes, I confirm that there is no problem with commas or even question marks. Here you can see the version of the speller.
Screenshot 2023-01-06 at 5 09 06

@rueter
Copy link
Member Author

rueter commented Jan 6, 2023

Tokeniser seems to be fine. Bug report needs to be more specific. Which platform and in which software are you seeing the issue?

     Running `target/debug/divvunspell tokenize 'Лов куцятне палыть чеерь толкс...
Аволь вейке монгак правтынь сельведь,
Кода совинь Мавзолеенть потс.

Весе содыть, кие течи кулось,
Ансяк тенень кемемс пек стака.'`
   0: "Лов"
   6: " "
   7: "куцятне"
  21: " "
  22: "палыть"
  34: " "
  35: "чеерь"
  45: " "
  46: "толкс"
  56: "."
  57: "."
  58: "."
  59: "
"
 283: "стака"
 293: "."

I am using a M2 Ventura 13.0.1
With LibreOffice. the language is Erzya (myv):
Version: 7.3.4.2 / LibreOffice Community
Build ID: 728fec16bd5f605073805c3c9e7c4212a0120dc5
CPU threads: 8; OS: Mac OS X 13.0.1; UI render: default; VCL: osx
Locale: myv-RU (myv_FI.UTF-8); UI: en-US
Calc: threaded

There are problems with the full stop ‹.› and ‹...› touching a previous word.
The comma, question mark, exclamation mark, quotation marks, parentheses, semicolons and colons do NOT cause a problem.

@Trondtr
Copy link
Contributor

Trondtr commented Jan 6, 2023

I can report the same as Jack, for sme on my LO, word + "." gives error message, whereas word + "…" does not. This behaviour has been there for some time. The Mac-internal speller (e.g. in Pages) does not display the same error, here everything works. I have Ventura as well, and new LO. My speller versions:
se, desktop, version 4.3.2, 21.10.2022-1357 (LO 7.2.2.2)
se version 4.2, 07.03.2020 (Pages 12.2.1) .... why the speller is not automatically updated here should perhaps deserve its own error msg...

@rueter
Copy link
Member Author

rueter commented Jan 6, 2023

I can report the same as Jack, for sme on my LO, word + "." gives error message, whereas word + "…" does not. This behaviour has been there for some time. The Mac-internal speller (e.g. in Pages) does not display the same error, here everything works. I have Ventura as well, and new LO. My speller versions: se, desktop, version 4.3.2, 21.10.2022-1357 (LO 7.2.2.2) se version 4.2, 07.03.2020 (Pages 12.2.1) .... why the speller is not automatically updated here should perhaps deserve its own error msg...

Actually, the only problem I have is with the word + "." in myv giving an error message. There is no problem with U+2026 Horizontal elipsis; the myv texts literally have three full stops U+002E. Hence, this is the only problem, and it may be aligned with Trond's sme problem.

@bbqsrc
Copy link
Member

bbqsrc commented Jan 6, 2023

Yeah, the second I hear LibreOffice I know it's just sending the wrong garbage to DivvunSpell. Will look into it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants