Skip to content

Commit

Permalink
ZH sentence-level TN (#112)
Browse files Browse the repository at this point in the history
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* fix ar test

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.8 release (#79)

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <[email protected]>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <[email protected]>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <[email protected]>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <[email protected]>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <[email protected]>

* Add __init__.py files

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <[email protected]>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <[email protected]>

* Update date

Signed-off-by: Anand Joseph <[email protected]>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update copyrights

Signed-off-by: Anand Joseph <[email protected]>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <[email protected]>

* add elec fallback

Signed-off-by: Evelina <[email protected]>

* update ci

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <[email protected]>

* add elec fallback

Signed-off-by: Evelina <[email protected]>

* update ci

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <[email protected]>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <[email protected]>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Linnea Pari Leaver <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <[email protected]>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <[email protected]>

* wrap some more pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add graph pieces

Signed-off-by: Jim O'Regan <[email protected]>

* delete junk

Signed-off-by: Jim O'Regan <[email protected]>

* my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add date verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add right tokens

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* moved to tagger

Signed-off-by: Jim O'Regan <[email protected]>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <[email protected]>

* now most tests pass

Signed-off-by: Jim O'Regan <[email protected]>

* electronic

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <[email protected]>

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <[email protected]>

* whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <[email protected]>

* copy from English

Signed-off-by: Jim O'Regan <[email protected]>

* overwrite with version from en

Signed-off-by: Jim O'Regan <[email protected]>

* add basic test case

Signed-off-by: Jim O'Regan <[email protected]>

* fix call

Signed-off-by: Jim O'Regan <[email protected]>

* swap tsv sides

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* add optional_era variable

Signed-off-by: Jim O'Regan <[email protected]>

* add test case

Signed-off-by: Jim O'Regan <[email protected]>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <[email protected]>

* also add lowercase versions

Signed-off-by: Jim O'Regan <[email protected]>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <[email protected]>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <[email protected]>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <[email protected]>

* put the full stops back

Signed-off-by: Jim O'Regan <[email protected]>

* add filler words

Signed-off-by: Jim O'Regan <[email protected]>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <[email protected]>

* single line only

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <[email protected]>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <[email protected]>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* add missing test tooling

Signed-off-by: Jim O'Regan <[email protected]>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* add country codes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* first attempt

Signed-off-by: Jim O'Regan <[email protected]>

* add to t&c

Signed-off-by: Jim O'Regan <[email protected]>

* add to t&c

Signed-off-by: Jim O'Regan <[email protected]>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <[email protected]>

* Update __init__.py

Signed-off-by: Jim O’Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <[email protected]>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* slight changes to date

Signed-off-by: Jim O'Regan <[email protected]>

* tweak

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <[email protected]>

* problem with tusen

Signed-off-by: Jim O'Regan <[email protected]>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <[email protected]>

* add functions from hu

Signed-off-by: Jim O'Regan <[email protected]>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <[email protected]>

* try changing this year declaration

Signed-off-by: Jim O'Regan <[email protected]>

* add year + era

Signed-off-by: Jim O'Regan <[email protected]>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <[email protected]>

* expose variables

Signed-off-by: Jim O'Regan <[email protected]>

* extra param for itn mode

Signed-off-by: Jim O'Regan <[email protected]>

* change call

Signed-off-by: Jim O'Regan <[email protected]>

* change comment

Signed-off-by: Jim O'Regan <[email protected]>

* change comment

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading

Signed-off-by: Jim O'Regan <[email protected]>

* fix parens

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <[email protected]>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <[email protected]>

* some adapting

Signed-off-by: Jim O'Regan <[email protected]>

* insert kl. if absent

Signed-off-by: Jim O'Regan <[email protected]>

* fix comments

Signed-off-by: Jim O'Regan <[email protected]>

* the relative prefixed times

Signed-off-by: Jim O'Regan <[email protected]>

* + comments

Signed-off-by: Jim O'Regan <[email protected]>

* enable time

Signed-off-by: Jim O'Regan <[email protected]>

* space in both directions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix hours to

Signed-off-by: Jim O'Regan <[email protected]>

* split by before/after

Signed-off-by: Jim O'Regan <[email protected]>

* delete, not insert

Signed-off-by: Jim O'Regan <[email protected]>

* fix if

Signed-off-by: Jim O'Regan <[email protected]>

* kl. 9

Signed-off-by: Jim O'Regan <[email protected]>

* copy from en

Signed-off-by: Jim O'Regan <[email protected]>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <[email protected]>

* imports

Signed-off-by: Jim O'Regan <[email protected]>

* add trimmed file

Signed-off-by: Jim O'Regan <[email protected]>

* fix imports

Signed-off-by: Jim O'Regan <[email protected]>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <[email protected]>

* minutes/seconds

Signed-off-by: Jim O'Regan <[email protected]>

* suffix

Signed-off-by: Jim O'Regan <[email protected]>

* delete, not insert

Signed-off-by: Jim O'Regan <[email protected]>

* one optional

Signed-off-by: Jim O'Regan <[email protected]>

* export variable

Signed-off-by: Jim O'Regan <[email protected]>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <[email protected]>

* already disambiguated

Signed-off-by: Jim O'Regan <[email protected]>

* closure

Signed-off-by: Jim O'Regan <[email protected]>

* do not insert kl.

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <[email protected]>

* Delete measure.py

Signed-off-by: Jim O’Regan <[email protected]>

* Delete money.py

Signed-off-by: Jim O’Regan <[email protected]>

* remove unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused test pieces

Signed-off-by: Jim O'Regan <[email protected]>

* copy from es

Signed-off-by: Jim O'Regan <[email protected]>

* add SV ITN

Signed-off-by: Jim O'Regan <[email protected]>

* add/update __init__

Signed-off-by: Jim O'Regan <[email protected]>

* blank line

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix lang

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix

Signed-off-by: Jim O'Regan <[email protected]>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* space before, not after

Signed-off-by: Jim O'Regan <[email protected]>

* fix cardinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* spurious deletion

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <[email protected]>

* fix singulras

Signed-off-by: Jim O'Regan <[email protected]>

* add an export

Signed-off-by: Jim O'Regan <[email protected]>

* change integer graph

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <[email protected]>

* use cdrewrite

Signed-off-by: Jim O'Regan <[email protected]>

* just EOS/BOS

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <[email protected]>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <[email protected]>

* move comment

Signed-off-by: Jim O'Regan <[email protected]>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <[email protected]>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <[email protected]>

* accept both

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <[email protected]>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <[email protected]>

* retry

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <[email protected]>

* replace

Signed-off-by: Jim O'Regan <[email protected]>

* arcmap

Signed-off-by: Jim O'Regan <[email protected]>

* version without ones

Signed-off-by: Jim O'Regan <[email protected]>

* add another test

Signed-off-by: Jim O'Regan <[email protected]>

* change graph

Signed-off-by: Jim O'Regan <[email protected]>

* simplify

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <[email protected]>

* add a test

Signed-off-by: Jim O'Regan <[email protected]>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <[email protected]>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* move definition

Signed-off-by: Jim O'Regan <[email protected]>

* simplify

Signed-off-by: Jim O'Regan <[email protected]>

* tweak

Signed-off-by: Jim O'Regan <[email protected]>

* another test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <[email protected]>

* more tests

Signed-off-by: Jim O'Regan <[email protected]>

* match verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix last two failing tests

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused variable

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <[email protected]>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <[email protected]>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <[email protected]>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <[email protected]>

* wrong place

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused

Signed-off-by: Jim O'Regan <[email protected]>

* Update date.py

Signed-off-by: Jim O’Regan <[email protected]>

* Update time.py

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <[email protected]>

* trim comments

Signed-off-by: Jim O’Regan <[email protected]>

* remove commented line

Signed-off-by: Jim O’Regan <[email protected]>

* en halv

Signed-off-by: Jim O’Regan <[email protected]>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <[email protected]>

* fix electronic

Signed-off-by: Giacomo Cavallini <[email protected]>

* fix measure

Signed-off-by: Giacomo Cavallini <[email protected]>

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Signed-off-by: Giacomo Cavallini <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <[email protected]>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* Remove invalid tests

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <[email protected]>

* Cleanup

Signed-off-by: Anand Joseph <[email protected]>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <[email protected]>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <[email protected]>

* update for langauge import

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <[email protected]>

* a new class for whitelist

Signed-off-by: BuyuanCui <[email protected]>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <[email protected]>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <[email protected]>

* recreated due to format issue

Signed-off-by: BuyuanCui <[email protected]>

* caught duplicates, removed

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <[email protected]>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <[email protected]>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <[email protected]>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <[email protected]>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <[email protected]>

* updates

Signed-off-by: BuyuanCui <[email protected]>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <[email protected]>

* gramamr for Fraction

Signed-off-by: BuyuanCui <[email protected]>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <[email protected]>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <[email protected]>

* arrangements

Signed-off-by: BuyuanCui <[email protected]>

* added whitelist grammar

Signed-off-by: BuyuanCui <[email protected]>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <[email protected]>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <[email protected]>

* updates according to last PR

Signed-off-by: BuyuanCui <[email protected]>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <[email protected]>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <[email protected]>

* adjustment on the weight

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <[email protected]>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <[email protected]>

* verbalizer for fraction

Signed-off-by: BuyuanCui <[email protected]>

* added for mandarin grammar

Signed-off-by: BuyuanCui <[email protected]>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <[email protected]>

* merge conflict

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* deleted unsed import os

Signed-off-by: BuyuanCui <[email protected]>

* deleted unsed variables

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <[email protected]>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <[email protected]>

* format issue, reccreated

Signed-off-by: BuyuanCui <[email protected]>

* format issue recreated

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <[email protected]>

* fixed coding style and format

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <[email protected]>

* removed the comment

Signed-off-by: BuyuanCui <[email protected]>

* removed the comment

Signed-off-by: BuyuanCui <[email protected]>

* removing unnecessary comments

Signed-off-by: BuyuanCui <[email protected]>

* unnecessary comment removed

Signed-off-by: BuyuanCui <[email protected]>

* test file updated for more cases

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <[email protected]>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <[email protected]>

* added Mandarin as zh

Signed-off-by: BuyuanCui <[email protected]>

* removing for dplication

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicates

Signed-off-by: BuyuanCui <[email protected]>

* removing unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* updates to fix test file failures

Signed-off-by: BuyuanCui <[email protected]>

* updates to fix file failtures

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fixing pr checks

Signed-off-by: BuyuanCui <[email protected]>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <[email protected]>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: Alex Cui <[email protected]>
Co-authored-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <[email protected]>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <[email protected]>

* Decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* fraction updated

Signed-off-by: BuyuanCui <[email protected]>

* money updated

Signed-off-by: BuyuanCui <[email protected]>

* ordinal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* punctuation grammar added

Signed-off-by: BuyuanCui <[email protected]>

* time gramamr updated

Signed-off-by: BuyuanCui <[email protected]>

* tokenizaer updated

Signed-off-by: BuyuanCui <[email protected]>

* updates on certificate

Signed-off-by: BuyuanCui <[email protected]>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <[email protected]>

* cardinal updated

Signed-off-by: BuyuanCui <[email protected]>

* date grammar changed

Signed-off-by: BuyuanCui <[email protected]>

* decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updates

Signed-off-by: BuyuanCui <[email protected]>

* test data added

Signed-off-by: BuyuanCui <[email protected]>

* test python file edits

Signed-off-by: BuyuanCui <[email protected]>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <[email protected]>

* test cases updated

Signed-off-by: BuyuanCui <[email protected]>

* coding style fixed

Signed-off-by: BuyuanCui <[email protected]>

* dates updated for init files

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* removed comments

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <[email protected]>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <[email protected]>

* updated for tests reruns

Signed-off-by: BuyuanCui <[email protected]>

* updats

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <[email protected]>

* fixed import error

Signed-off-by: BuyuanCui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <[email protected]>

* fix path

Signed-off-by: Evelina <[email protected]>

* fix pytest

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* pip 1.2.0

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* coding style fix

Signed-off-by: BuyuanCui <[email protected]>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <[email protected]>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <[email protected]>

* added for tests on whitelist

Signed-off-by: BuyuanCui <[email protected]>

* added for test on word

Signed-off-by: BuyuanCui <[email protected]>

* added to run test on whitelist

Signed-off-by: BuyuanCui <[email protected]>

* added to run test on word

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <[email protected]>

* text arg

Signed-off-by: Nikolay Karpov <[email protected]>

* Failed text

Signed-off-by: Nikolay Karpov <[email protected]>

* add logger

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <[email protected]>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <[email protected]>

* info level

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <[email protected]>

* verbose

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <[email protected]>

* Exception

Signed-off-by: Nikolay Karpov <[email protected]>

* verbose

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <[email protected]>

* self.verbose

Signed-off-by: Nikolay Karpov <[email protected]>

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <[email protected]>

* clean up

Signed-off-by: Evelina <[email protected]>

* fix logging

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <[email protected]>

* fix format

Signed-off-by: Evelina <[email protected]>

* add IT TN to CI

Signed-off-by: Evelina <[email protected]>

* update patch

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <[email protected]>

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrote tokenizer

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* jenkins file update

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* tn bug

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* fixeds and updates

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* adjustments

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* testing commit

Signed-off-by: Alex Cui <[email protected]>

* removing unsed file

Signed-off-by: Alex Cui <[email protected]>

* updated test cases

Signed-off-by: Alex Cui <[email protected]>

* updating etst cases

Signed-off-by: Alex Cui <[email protected]>

* updates adapting to graphs

Signed-off-by: Alex Cui <[email protected]>

* updated cases for SH tests

Signed-off-by: Alex Cui <[email protected]>

* updated cases

Signed-off-by: Alex Cui <[email protected]>

* added some sentences

Signed-off-by: Alex Cui <[email protected]>

* test cases update

Signed-off-by: Alex Cui <[email protected]>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <[email protected]>

* fixings according to the ci

Signed-off-by: Alex Cui <[email protected]>

* removed not used

Signed-off-by: Alex Cui <[email protected]>

* notused removing

Signed-off-by: Alex Cui <[email protected]>

* format issue

Signed-off-by: Alex Cui <[email protected]>

* formt issue

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <[email protected]>

* removing unused files

Signed-off-by: Alex Cui <[email protected]>

* remiving unsed files;

Signed-off-by: Alex Cui <[email protected]>

* removing unsed files

Signed-off-by: Alex Cui <[email protected]>

* removing unsed files

Signed-off-by: Alex Cui <[email protected]>

* added sentences as test cases

Signed-off-by: Alex Cui <[email protected]>

* added senetnces as test cases

Signed-off-by: Alex Cui <[email protected]>

* removed commentyed out tests

Signed-off-by: Alex Cui <[email protected]>

* updating dates

Signed-off-by: Alex Cui <[email protected]>

* attemps to fix bug

Signed-off-by: Alex Cui <[email protected]>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <[email protected]>

* fixing existing issue

Signed-off-by: Alex Cui <[email protected]>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <[email protected]>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <[email protected]>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <[email protected]>

* far files

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <[email protected]>

* removing and combined to meaasure

Signed-off-by: Alex Cui <[email protected]>

* removing, not used

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <[email protected]>

* updates to fix space issue

Signed-off-by: Alex Cui <[email protected]>

* updates to fix space issue

Signed-off-by: Alex Cui <[email protected]>

* updates to solve the space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving sh issue

Signed-off-by: Alex Cui <[email protected]>

* resolving sh test issue

Signed-off-by: Alex Cui <[email protected]>

* adding anands updates

Signed-off-by: Alex Cui <[email protected]>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <[email protected]>

* updates

Signed-off-by: Alex Cui <[email protected]>

* updates

Signed-off-by: Alex Cui <[email protected]>

* updates

Signed-off-by: Alex Cui <[email protected]>

* removing fraction and math part

Signed-off-by: Alex Cui <[email protected]>

* removing comments

Signed-off-by: Alex Cui <[email protected]>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <[email protected]>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <[email protected]>

* updating zh date

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <[email protected]>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <[email protected]>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <[email protected]>

* adding lines to save far file

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <[email protected]>

* format and removing used comments

Signed-off-by: Alex Cui <[email protected]>

* removing this one, not used

Signed-off-by: Alex Cui <[email protected]>

* remove unused commentss�

Signed-off-by: Alex Cui <[email protected]>

* removing unsed comments

Signed-off-by: Alex Cui <[email protected]>

* removing unsed comments

Signed-off-by: Alex Cui <[email protected]>

* removing comments

Signed-off-by: Alex Cui <[email protected]>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* updates according to the github comments

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <[email protected]>

* punct grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_ordinal.py

Fixing style. 

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <690…
  • Loading branch information
21 people authored Apr 30, 2024
1 parent 0f67969 commit 8a05b51
Show file tree
Hide file tree
Showing 65 changed files with 1,006 additions and 1,622 deletions.
36 changes: 24 additions & 12 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pipeline {
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-27-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/04-30-24-0'
IT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-26-23-0'
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
Expand Down Expand Up @@ -189,7 +189,7 @@ pipeline {
}
}

stage('L0: Create RU TN/ITN Grammars & SV & PT & ZH') {
stage('L0: Create RU TN/ITN Grammars & SV & PT') {
when {
anyOf {
branch 'main'
Expand Down Expand Up @@ -228,16 +228,6 @@ pipeline {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=pt --text="dez " --cache_dir ${PT_TN_CACHE}'
}
}
stage('L0: ZH TN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
}
}
stage('L0: ZH ITN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="二零零二年一月二十八日 " --cache_dir ${ZH_TN_CACHE}'
}
}
}
}

Expand Down Expand Up @@ -267,9 +257,31 @@ pipeline {
}
}
}
stage('L0: Create ZH TN/ITN Grammar') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
parallel {
stage('L0: ZH ITN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
}
}
stage('L0: ZH TN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="6" --cache_dir ${ZH_TN_CACHE}'
}
}
}
}


// L1 Tests starts here

stage('L1: TN/ITN Tests CPU') {
when {
anyOf {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
from pynini.export import export
from pynini.lib import byte, pynutil, utf8

from nemo_text_processing.inverse_text_normalization.zh.utils import load_labels

NEMO_CHAR = utf8.VALID_UTF8_CHAR
NEMO_DIGIT = byte.DIGIT
NEMO_HEX = pynini.union(*string.hexdigits).optimize()
Expand Down
14 changes: 14 additions & 0 deletions nemo_text_processing/inverse_text_normalization/zh/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,17 @@ def get_various_formats(text: str) -> List[str]:
result.append(t.upper())
result.append(t.capitalize())
return result


def load_labels(abs_path):
"""
loads relative path file as dictionary
Args:
abs_path: absolute path
Returns dictionary of mappings
"""
with open(abs_path, encoding="utf-8") as label_tsv:
labels = list(csv.reader(label_tsv, delimiter="\t"))
return labels
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,5 @@
<
>
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
×
÷
°
-
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
amu 原子质量
bar
°
º
°c 摄氏度
°C 摄氏度
ºc 摄氏度
Expand Down Expand Up @@ -40,23 +38,6 @@ kw 千瓦
kW 千瓦
lb
lbs
m2 平方米
平方米
m3 立方米
立方米
mbps 兆比特每秒
mg 毫克
mhz 兆赫兹
mi2 平方英里
mi² 平方英里
mi 英里
min 分钟哦
ml 毫升
mm2 平方毫米
mm² 平方毫米
mol 摩尔
mpa 兆帕
mph 英里每小时
ng 纳克
nm 纳米
ns 纳秒
Expand All @@ -80,13 +61,7 @@ gb 吉字节
gpa 吉帕斯卡
gy 戈瑞
ha 公顷
m
mm 毫米
ms 毫秒
mv 毫伏
mw 毫瓦
pg 皮克
ps 皮秒
s
ms 毫秒
g
211 changes: 0 additions & 211 deletions nemo_text_processing/text_normalization/zh/data/measure/units_zh.tsv

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ Ft 匈牙利福林
以色列谢克尔
J$ 牙买加元
лв 哈萨克斯坦腾格
朝鲜园
лв 吉尔吉斯斯坦索姆
老挝基普
ден 马其顿代纳尔
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
1
2
3
4
5
6
7
8
9
Loading

0 comments on commit 8a05b51

Please sign in to comment.