Skip to content

Commit

Permalink
Including Cyrillic letters, both the usual set (taken from kpv) as we…
Browse files Browse the repository at this point in the history
…ll as the special Buryaad letters (taken from the phonology.twolc declaration).
  • Loading branch information
Trondtr committed Jun 29, 2023
1 parent 2648166 commit f46344a
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions tools/tokenisers/tokeniser-disamb-gt-desc.pmscript
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ Define alphabet "a-z" !! * lower-case ASCII
|"A-Z" !! * upper-case ASCII
|Lst({àáâãāăȧäåǎȁȃąæǽǣèéêēĕėëěȅȇȩęìíîĩīīĭi̇ïǐįȉȋɨòóôõōŏȯöőǒȍȏơǫɵøǭǿœùúûũūŭüůűǔȕȗưųʉýŷȳÿƴɏÀÁÂÃĀĂȦÄÅǍȀȂĄÆǼǢÈÉÊĒĔĖËĚȄȆȨĘÌÍÎĨĪĪĬİÏǏĮȈȊƗÒÓÔÕŌŎȮÖŐǑȌȎƠǪƟØǬǾŒÙÚÛŨŪŬÜŮŰǓȔȖƯŲɄÝŶȲŸƳɎšžčđðíŋňŧñńŠŽČĐÐÍŊŇŦÑ})
!! * select extended latin symbols
| Lst({абвгдеёжӝзӟиӥйкАБВГДЕЁЖӜЗӞИӤЙКлмноөпрстуүфхһцчӵшЛМНОӨПРСТУҮФХҺЦЧӴШщъыьэюяіӧЩЪЫЬЭЮЯІӦ})
!! * extended cyrillic
| "0-9" !! ASCII digits
| Lst({_§°}) !! * select symbols
!! * Combining diacritics as individual symbols,
Expand Down

0 comments on commit f46344a

Please sign in to comment.