Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDocs/MS Word extensions mix error with identical substring in earlier word #18

Closed
Tracked by #48
duomdaamaendra opened this issue Sep 11, 2020 · 6 comments
Closed
Tracked by #48
Labels
bug Something isn't working

Comments

@duomdaamaendra
Copy link
Member

Screenshot 2020-09-11 at 13 50 42

@duomdaamaendra
Copy link
Member Author

duomdaamaendra commented Sep 11, 2020

GramDivvun marks the misspelling "vátti" and gives several suggestions, and the right context: "Lea, vátti dat"
However in the running text it marks - by shadowing - the word "váttis" on the second row. It also corrects this word, with it getting the errational form "váttiss"

Here's the original text:

Sak 2: Vátti beassat dievdofámu čađa Govdageainnu dievdofápmu lea nanus. Nissonolbmuide lea váttiss beassat dan čađa, lohká Máret Rihttá Hætta.

Sak 2: Vátti beassat dievdofámu čađa Govdageainnu dievdofápmu lea nanus. Nissonolbmuide lea váttis beassat dan čađa, lohká Máret Rihttá Hætta.

Dat Lea nuuuu Vátti
Lea, vátti dat lea

Lea, Vátti dat lea

mon logan ahte vátti lea. It go gula: Vátti, Vátti!

@bbqsrc bbqsrc added the bug Something isn't working label Sep 11, 2020
@snomos snomos changed the title GramDivvun in GoogleDocs issues GDocs mixes error with identical substring in earlier word Jan 26, 2022
@divvun divvun deleted a comment from duomdaamaendra Jan 26, 2022
@snomos
Copy link
Member

snomos commented Jan 28, 2022

giellalt/lang-sme#45 shows that this is not restricted to GDocs.

@snomos snomos changed the title GDocs mixes error with identical substring in earlier word GDocs/MS Word extensions mix error with identical substring in earlier word Jan 28, 2022
@unhammer
Copy link
Member

unhammer commented Mar 10, 2022

Trying to figure out

lang-sme#45

I can't see any problems in the back-end:

$ echo 'Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon' |divvun-checker -l se|jq
WARNING: Line 121: Some but not all main-readings of "<omd.>" had wordform-tags (not completely mwe-disambiguated?), not splitting.
divvun-suggest: WARNING: Broken MWE wordform in analyses: .
divvun-suggest: WARNING: Broken MWE wordform in analyses: omd
{
  "errs": [
    [
      "«Mus",
      397,
      401,
      "punct-aistton-left",
      "Boasttuaisttonmearkkat",
      [
        "”Mus"
      ],
      "Aisttonmearkkat"
    ],
    [
      "nn",
      437,
      439,
      "typo",
      "Ii leat sátnelisttus",
      [
        "nu",
        "ná",
        "in",
        "en",
        "on",
        "na",
        "no",
        "nan",
        "an",
        "ja"
      ],
      "Čállinmeattáhus"
    ],
    [
      "gultturbargi",
      488,
      500,
      "typo",
      "Ii leat sátnelisttus",
      [
        "kulttorbargi",
        "kulturbargi",
        "gotturbargi",
        "guottubargi",
        "sulttubargi"
      ],
      "Čállinmeattáhus"
    ],
    [
      ".»",
      520,
      522,
      "punct-aistton-right",
      "Boasttuaisttonmearkkat",
      [
        ".”"
      ],
      "Aisttonmearkkat"
    ],
    [
      "«-",
      606,
      608,
      "punct-aistton-left",
      "Boasttuaisttonmearkkat",
      [
        "”-"
      ],
      "Aisttonmearkkat"
    ],
    [
      "…»",
      643,
      645,
      "punct-aistton-right",
      "Boasttuaisttonmearkkat",
      [
        "…”"
      ],
      "Aisttonmearkkat"
    ]
  ],
  "text": "Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon"
}

Start indices of "gávnnat" are 242 and 341; our first error index is at 397.

What methods do the gdocs/word extensions use to find the error to underline @bbqsrc ?

@snomos
Copy link
Member

snomos commented Mar 11, 2022

My initial hunch is that this bug is in the MS Office/GDocs plugins, and not in the backend. Thus @bbqsrc et all should look into it.

@snomos
Copy link
Member

snomos commented Jun 1, 2023

Here's another example, from SMA:

Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre goh dejtie maahta aktene gïetesne ryöknedh.

The actual error is correctly captured using the command line tool:

echo 'Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste \
lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre \
goh dejtie maahta aktene gïetesne ryöknedh.' \
| divvun-checker -a tools/grammarcheckers/sma.zcheck

as seen in the output:

{
  "errs": [
    [
      "eah",
      113,
      116,
      "typo",
      "\"eah\"-baakoe vååjnoe båajhtode tjaalasovveme.",
      [
        "ih"
      ],
      "\"eah\"-baakoe båajhtode tjaalasovveme"
    ]
  ],
  "text": "Naan båeries saemieh Raarvihkesne guhth leah gïelem noerebaeleste lïereme, leah ennje væjkele soptsestidh, mohte eah leah man gallesh, eevre goh dejtie maahta aktene gïetesne ryöknedh."
}

If the same string is checked in GDocs or Word, the substring eah starting at position 41 is highlighted as being the wrong word, which of course does not make sense linguistically, and is just plain wrong when the actual error is another place.

@zoomix could you forward this to whomever is most appropriate? This bug is very annoying to users, and should ble fixed ASAP.

@snomos
Copy link
Member

snomos commented Dec 7, 2023

This is now fixed in the latest server update, closing.

@snomos snomos closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants