GDocs/MS Word extensions mix error with identical substring in earlier word #18

duomdaamaendra · 2020-09-11T11:55:37Z

duomdaamaendra · 2020-09-11T12:05:02Z

GramDivvun marks the misspelling "vátti" and gives several suggestions, and the right context: "Lea, vátti dat"
However in the running text it marks - by shadowing - the word "váttis" on the second row. It also corrects this word, with it getting the errational form "váttiss"

Here's the original text:

Sak 2: Vátti beassat dievdofámu čađa Govdageainnu dievdofápmu lea nanus. Nissonolbmuide lea váttiss beassat dan čađa, lohká Máret Rihttá Hætta.

Sak 2: Vátti beassat dievdofámu čađa Govdageainnu dievdofápmu lea nanus. Nissonolbmuide lea váttis beassat dan čađa, lohká Máret Rihttá Hætta.

Dat Lea nuuuu Vátti
Lea, vátti dat lea

Lea, Vátti dat lea

mon logan ahte vátti lea. It go gula: Vátti, Vátti!

snomos · 2022-01-28T10:10:15Z

giellalt/lang-sme#45 shows that this is not restricted to GDocs.

unhammer · 2022-03-10T14:32:54Z

Trying to figure out

I can't see any problems in the back-end:

$ echo 'Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon' |divvun-checker -l se|jq
WARNING: Line 121: Some but not all main-readings of "<omd.>" had wordform-tags (not completely mwe-disambiguated?), not splitting.
divvun-suggest: WARNING: Broken MWE wordform in analyses: .
divvun-suggest: WARNING: Broken MWE wordform in analyses: omd
{
  "errs": [
    [
      "«Mus",
      397,
      401,
      "punct-aistton-left",
      "Boasttuaisttonmearkkat",
      [
        "”Mus"
      ],
      "Aisttonmearkkat"
    ],
    [
      "nn",
      437,
      439,
      "typo",
      "Ii leat sátnelisttus",
      [
        "nu",
        "ná",
        "in",
        "en",
        "on",
        "na",
        "no",
        "nan",
        "an",
        "ja"
      ],
      "Čállinmeattáhus"
    ],
    [
      "gultturbargi",
      488,
      500,
      "typo",
      "Ii leat sátnelisttus",
      [
        "kulttorbargi",
        "kulturbargi",
        "gotturbargi",
        "guottubargi",
        "sulttubargi"
      ],
      "Čállinmeattáhus"
    ],
    [
      ".»",
      520,
      522,
      "punct-aistton-right",
      "Boasttuaisttonmearkkat",
      [
        ".”"
      ],
      "Aisttonmearkkat"
    ],
    [
      "«-",
      606,
      608,
      "punct-aistton-left",
      "Boasttuaisttonmearkkat",
      [
        "”-"
      ],
      "Aisttonmearkkat"
    ],
    [
      "…»",
      643,
      645,
      "punct-aistton-right",
      "Boasttuaisttonmearkkat",
      [
        "…”"
      ],
      "Aisttonmearkkat"
    ]
  ],
  "text": "Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon"
}

Start indices of "gávnnat" are 242 and 341; our first error index is at 397.

What methods do the gdocs/word extensions use to find the error to underline @bbqsrc ?

snomos · 2022-03-11T06:25:02Z

My initial hunch is that this bug is in the MS Office/GDocs plugins, and not in the backend. Thus @bbqsrc et all should look into it.

snomos · 2023-06-01T07:42:31Z

Here's another example, from SMA:

Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre goh dejtie maahta aktene gïetesne ryöknedh.

The actual error is correctly captured using the command line tool:

echo 'Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste \
lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre \
goh dejtie maahta aktene gïetesne ryöknedh.' \
| divvun-checker -a tools/grammarcheckers/sma.zcheck

as seen in the output:

{
  "errs": [
    [
      "eah",
      113,
      116,
      "typo",
      "\"eah\"-baakoe vååjnoe båajhtode tjaalasovveme.",
      [
        "ih"
      ],
      "\"eah\"-baakoe båajhtode tjaalasovveme"
    ]
  ],
  "text": "Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre goh dejtie maahta aktene gïetesne ryöknedh."
}

If the same string is checked in GDocs or Word, the substring eah starting at position 41 is highlighted as being the wrong word, which of course does not make sense linguistically, and is just plain wrong when the actual error is another place.

@zoomix could you forward this to whomever is most appropriate? This bug is very annoying to users, and should ble fixed ASAP.

snomos · 2023-12-07T13:15:13Z

This is now fixed in the latest server update, closing.

bbqsrc added the bug Something isn't working label Sep 11, 2020

snomos changed the title ~~GramDivvun in GoogleDocs issues~~ GDocs mixes error with identical substring in earlier word Jan 26, 2022

snomos mentioned this issue Jan 26, 2022

Regressions/bugs in GDocs extension #48

Open

9 tasks

divvun deleted a comment from duomdaamaendra Jan 26, 2022

This was referenced Jan 26, 2022

GramDivvun speller suggestion regression - no suggestions where it used to be giellalt/lang-sme#43

Closed

Marks particle as error instead of the preceding Err/Orth of the same mwe giellalt/lang-sme#45

Open

snomos changed the title ~~GDocs mixes error with identical substring in earlier word~~ GDocs/MS Word extensions mix error with identical substring in earlier word Jan 28, 2022

snomos mentioned this issue Nov 12, 2023

Context, highlight wrong for typo found as substring #52

Closed

snomos closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDocs/MS Word extensions mix error with identical substring in earlier word #18

GDocs/MS Word extensions mix error with identical substring in earlier word #18

duomdaamaendra commented Sep 11, 2020

duomdaamaendra commented Sep 11, 2020 •

edited by snomos

Loading

snomos commented Jan 28, 2022

unhammer commented Mar 10, 2022 •

edited

Loading

snomos commented Mar 11, 2022

snomos commented Jun 1, 2023

snomos commented Dec 7, 2023

GDocs/MS Word extensions mix error with identical substring in earlier word #18

GDocs/MS Word extensions mix error with identical substring in earlier word #18

Comments

duomdaamaendra commented Sep 11, 2020

duomdaamaendra commented Sep 11, 2020 • edited by snomos Loading

snomos commented Jan 28, 2022

unhammer commented Mar 10, 2022 • edited Loading

snomos commented Mar 11, 2022

snomos commented Jun 1, 2023

snomos commented Dec 7, 2023

duomdaamaendra commented Sep 11, 2020 •

edited by snomos

Loading

unhammer commented Mar 10, 2022 •

edited

Loading