Skip to content

Commit

Permalink
Zh tn bug 240712 (#187)
Browse files Browse the repository at this point in the history
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <[email protected]>

* Add tests for fix

Signed-off-by: anand-nv <[email protected]>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <[email protected]>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <[email protected]>

* Update cache paths

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkins tests

Signed-off-by: Anand Joseph <[email protected]>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <[email protected]>

* docker build only if in test mode

Signed-off-by: Anand Joseph <[email protected]>

* Fix missing variable

Signed-off-by: Anand Joseph <[email protected]>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <[email protected]>

* Fix commands not executing

Signed-off-by: Anand Joseph <[email protected]>

* Missing arguments

Signed-off-by: Anand Joseph <[email protected]>

* Missing quotes

Signed-off-by: Anand Joseph <[email protected]>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix paths

Signed-off-by: Anand Joseph <[email protected]>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <[email protected]>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <[email protected]>

* Undo path change

Signed-off-by: Anand Joseph <[email protected]>

* Fix intentional fail test

Signed-off-by: Anand Joseph <[email protected]>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <[email protected]>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <[email protected]>

* Update cache paths

Signed-off-by: Anand Joseph <[email protected]>

* Add interactive option

Signed-off-by: Anand Joseph <[email protected]>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <[email protected]>

* update isort version

Signed-off-by: Evelina <[email protected]>

* fix format

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <[email protected]>

* Added Armenian ITN

Signed-off-by: David Sargsyan <[email protected]>

* Added Armenian ITN

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <[email protected]>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <[email protected]>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <[email protected]>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <[email protected]>

* deleted unnecessary data

Signed-off-by: David Sargsyan <[email protected]>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <[email protected]>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <[email protected]>
Co-authored-by: David Sargsyan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <[email protected]>

* clean up

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <[email protected]>

* shorten test

Signed-off-by: Evelina <[email protected]>

* fix jenkins

Signed-off-by: Evelina <[email protected]>

* cased for TN

Signed-off-by: Evelina <[email protected]>

* revert debug changes

Signed-off-by: Evelina <[email protected]>

* fix args default

Signed-off-by: Evelina <[email protected]>

* try parallel

Signed-off-by: Evelina <[email protected]>

* debug parallel

Signed-off-by: Evelina <[email protected]>

* rerun

Signed-off-by: Evelina <[email protected]>

* rerun

Signed-off-by: Evelina <[email protected]>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <[email protected]>

* enable all ci tests

Signed-off-by: Evelina <[email protected]>

* enable all ci tests

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <[email protected]>

* fixing conflicts

Signed-off-by: David Sargsyan <[email protected]>

* fixing some more conflicts

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <[email protected]>

* fixed a minor issue

Signed-off-by: David Sargsyan <[email protected]>

* deleted unused imports

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <[email protected]>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <[email protected]>

* added Armenian dot

Signed-off-by: David Sargsyan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <[email protected]>
Signed-off-by: Ara Yeroyan <[email protected]>
Signed-off-by: tbartley94 <[email protected]>
Co-authored-by: David Sargsyan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Chinmay Patil <[email protected]>
Signed-off-by: tbartley94 <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Travis Bartley <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <[email protected]>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <[email protected]>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <[email protected]>

* missing _init_ for python

Signed-off-by: Travis Bartley <[email protected]>

* mislabled cache

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* r0.3.0 release (#151)

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <[email protected]>

* Cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* fix ar test

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <[email protected]>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.8 release (#79)

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <[email protected]>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <[email protected]>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <[email protected]>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <[email protected]>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <[email protected]>

* Add __init__.py files

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <[email protected]>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <[email protected]>

* Update date

Signed-off-by: Anand Joseph <[email protected]>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <[email protected]>

* Update copyrights

Signed-off-by: Anand Joseph <[email protected]>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <[email protected]>

* add elec fallback

Signed-off-by: Evelina <[email protected]>

* update ci

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <[email protected]>

* add elec fallback

Signed-off-by: Evelina <[email protected]>

* update ci

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <[email protected]>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <[email protected]>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Linnea Pari Leaver <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <[email protected]>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <[email protected]>

* wrap some more pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add graph pieces

Signed-off-by: Jim O'Regan <[email protected]>

* delete junk

Signed-off-by: Jim O'Regan <[email protected]>

* my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add date verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add right tokens

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* moved to tagger

Signed-off-by: Jim O'Regan <[email protected]>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <[email protected]>

* now most tests pass

Signed-off-by: Jim O'Regan <[email protected]>

* electronic

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <[email protected]>

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <[email protected]>

* whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <[email protected]>

* copy from English

Signed-off-by: Jim O'Regan <[email protected]>

* overwrite with version from en

Signed-off-by: Jim O'Regan <[email protected]>

* add basic test case

Signed-off-by: Jim O'Regan <[email protected]>

* fix call

Signed-off-by: Jim O'Regan <[email protected]>

* swap tsv sides

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* add optional_era variable

Signed-off-by: Jim O'Regan <[email protected]>

* add test case

Signed-off-by: Jim O'Regan <[email protected]>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <[email protected]>

* also add lowercase versions

Signed-off-by: Jim O'Regan <[email protected]>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <[email protected]>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <[email protected]>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <[email protected]>

* put the full stops back

Signed-off-by: Jim O'Regan <[email protected]>

* add filler words

Signed-off-by: Jim O'Regan <[email protected]>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <[email protected]>

* single line only

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <[email protected]>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <[email protected]>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* add missing test tooling

Signed-off-by: Jim O'Regan <[email protected]>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* add country codes from hu

Signed-off-by: Jim O'Regan <[email protected]>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <[email protected]>

* first attempt

Signed-off-by: Jim O'Regan <[email protected]>

* add to t&c

Signed-off-by: Jim O'Regan <[email protected]>

* add to t&c

Signed-off-by: Jim O'Regan <[email protected]>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <[email protected]>

* Update __init__.py

Signed-off-by: Jim O’Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <[email protected]>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* slight changes to date

Signed-off-by: Jim O'Regan <[email protected]>

* tweak

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <[email protected]>

* problem with tusen

Signed-off-by: Jim O'Regan <[email protected]>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <[email protected]>

* add functions from hu

Signed-off-by: Jim O'Regan <[email protected]>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <[email protected]>

* try changing this year declaration

Signed-off-by: Jim O'Regan <[email protected]>

* add year + era

Signed-off-by: Jim O'Regan <[email protected]>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <[email protected]>

* expose variables

Signed-off-by: Jim O'Regan <[email protected]>

* extra param for itn mode

Signed-off-by: Jim O'Regan <[email protected]>

* change call

Signed-off-by: Jim O'Regan <[email protected]>

* change comment

Signed-off-by: Jim O'Regan <[email protected]>

* change comment

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading

Signed-off-by: Jim O'Regan <[email protected]>

* fix parens

Signed-off-by: Jim O'Regan <[email protected]>

* move data loading

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <[email protected]>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <[email protected]>

* some adapting

Signed-off-by: Jim O'Regan <[email protected]>

* insert kl. if absent

Signed-off-by: Jim O'Regan <[email protected]>

* fix comments

Signed-off-by: Jim O'Regan <[email protected]>

* the relative prefixed times

Signed-off-by: Jim O'Regan <[email protected]>

* + comments

Signed-off-by: Jim O'Regan <[email protected]>

* enable time

Signed-off-by: Jim O'Regan <[email protected]>

* space in both directions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix hours to

Signed-off-by: Jim O'Regan <[email protected]>

* split by before/after

Signed-off-by: Jim O'Regan <[email protected]>

* delete, not insert

Signed-off-by: Jim O'Regan <[email protected]>

* fix if

Signed-off-by: Jim O'Regan <[email protected]>

* kl. 9

Signed-off-by: Jim O'Regan <[email protected]>

* copy from en

Signed-off-by: Jim O'Regan <[email protected]>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <[email protected]>

* imports

Signed-off-by: Jim O'Regan <[email protected]>

* add trimmed file

Signed-off-by: Jim O'Regan <[email protected]>

* fix imports

Signed-off-by: Jim O'Regan <[email protected]>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <[email protected]>

* minutes/seconds

Signed-off-by: Jim O'Regan <[email protected]>

* suffix

Signed-off-by: Jim O'Regan <[email protected]>

* delete, not insert

Signed-off-by: Jim O'Regan <[email protected]>

* one optional

Signed-off-by: Jim O'Regan <[email protected]>

* export variable

Signed-off-by: Jim O'Regan <[email protected]>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <[email protected]>

* already disambiguated

Signed-off-by: Jim O'Regan <[email protected]>

* closure

Signed-off-by: Jim O'Regan <[email protected]>

* do not insert kl.

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <[email protected]>

* Delete measure.py

Signed-off-by: Jim O’Regan <[email protected]>

* Delete money.py

Signed-off-by: Jim O’Regan <[email protected]>

* remove unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused test pieces

Signed-off-by: Jim O'Regan <[email protected]>

* copy from es

Signed-off-by: Jim O'Regan <[email protected]>

* add SV ITN

Signed-off-by: Jim O'Regan <[email protected]>

* add/update __init__

Signed-off-by: Jim O'Regan <[email protected]>

* blank line

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* fix lang

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix

Signed-off-by: Jim O'Regan <[email protected]>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* space before, not after

Signed-off-by: Jim O'Regan <[email protected]>

* fix cardinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* spurious deletion

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <[email protected]>

* fix singulras

Signed-off-by: Jim O'Regan <[email protected]>

* add an export

Signed-off-by: Jim O'Regan <[email protected]>

* change integer graph

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <[email protected]>

* use cdrewrite

Signed-off-by: Jim O'Regan <[email protected]>

* just EOS/BOS

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <[email protected]>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <[email protected]>

* move comment

Signed-off-by: Jim O'Regan <[email protected]>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <[email protected]>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <[email protected]>

* accept both

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <[email protected]>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <[email protected]>

* retry

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <[email protected]>

* replace

Signed-off-by: Jim O'Regan <[email protected]>

* arcmap

Signed-off-by: Jim O'Regan <[email protected]>

* version without ones

Signed-off-by: Jim O'Regan <[email protected]>

* add another test

Signed-off-by: Jim O'Regan <[email protected]>

* change graph

Signed-off-by: Jim O'Regan <[email protected]>

* simplify

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <[email protected]>

* add a test

Signed-off-by: Jim O'Regan <[email protected]>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <[email protected]>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* move definition

Signed-off-by: Jim O'Regan <[email protected]>

* simplify

Signed-off-by: Jim O'Regan <[email protected]>

* tweak

Signed-off-by: Jim O'Regan <[email protected]>

* another test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <[email protected]>

* more tests

Signed-off-by: Jim O'Regan <[email protected]>

* match verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix last two failing tests

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused variable

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* fix comment

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <[email protected]>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <[email protected]>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <[email protected]>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <[email protected]>

* wrong place

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* export

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused

Signed-off-by: Jim O'Regan <[email protected]>

* Update date.py

Signed-off-by: Jim O’Regan <[email protected]>

* Update time.py

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <[email protected]>

* trim comments

Signed-off-by: Jim O’Regan <[email protected]>

* remove commented line

Signed-off-by: Jim O’Regan <[email protected]>

* en halv

Signed-off-by: Jim O’Regan <[email protected]>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <[email protected]>

* fix electronic

Signed-off-by: Giacomo Cavallini <[email protected]>

* fix measure

Signed-off-by: Giacomo Cavallini <[email protected]>

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Signed-off-by: Giacomo Cavallini <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <[email protected]>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* Remove invalid tests

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <[email protected]>

* Cleanup

Signed-off-by: Anand Joseph <[email protected]>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <[email protected]>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <[email protected]>

* update for langauge import

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <[email protected]>

* a new class for whitelist

Signed-off-by: BuyuanCui <[email protected]>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <[email protected]>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <[email protected]>

* recreated due to format issue

Signed-off-by: BuyuanCui <[email protected]>

* caught duplicates, removed

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <[email protected]>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <[email protected]>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <[email protected]>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <[email protected]>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <[email protected]>

* updates

Signed-off-by: BuyuanCui <[email protected]>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <[email protected]>

* gramamr for Fraction

Signed-off-by: BuyuanCui <[email protected]>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <[email protected]>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <[email protected]>

* arrangements

Signed-off-by: BuyuanCui <[email protected]>

* added whitelist grammar

Signed-off-by: BuyuanCui <[email protected]>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <[email protected]>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <[email protected]>

* updates according to last PR

Signed-off-by: BuyuanCui <[email protected]>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <[email protected]>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <[email protected]>

* adjustment on the weight

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <[email protected]>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <[email protected]>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <[email protected]>

* verbalizer for fraction

Signed-off-by: BuyuanCui <[email protected]>

* added for mandarin grammar

Signed-off-by: BuyuanCui <[email protected]>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <[email protected]>

* merge conflict

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* deleted unsed import os

Signed-off-by: BuyuanCui <[email protected]>

* deleted unsed variables

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <[email protected]>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <[email protected]>

* format issue, reccreated

Signed-off-by: BuyuanCui <[email protected]>

* format issue recreated

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <[email protected]>

* fixed coding style and format

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <[email protected]>

* removed the comment

Signed-off-by: BuyuanCui <[email protected]>

* removed the comment

Signed-off-by: BuyuanCui <[email protected]>

* removing unnecessary comments

Signed-off-by: BuyuanCui <[email protected]>

* unnecessary comment removed

Signed-off-by: BuyuanCui <[email protected]>

* test file updated for more cases

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <[email protected]>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <[email protected]>

* added Mandarin as zh

Signed-off-by: BuyuanCui <[email protected]>

* removing for dplication

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <[email protected]>

* removed duplicates

Signed-off-by: BuyuanCui <[email protected]>

* removing unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* updates to fix test file failures

Signed-off-by: BuyuanCui <[email protected]>

* updates to fix file failtures

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <[email protected]>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fix style

Signed-off-by: BuyuanCui <[email protected]>

* fixing pr checks

Signed-off-by: BuyuanCui <[email protected]>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <[email protected]>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: Alex Cui <[email protected]>
Co-authored-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <[email protected]>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <[email protected]>

* Decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* fraction updated

Signed-off-by: BuyuanCui <[email protected]>

* money updated

Signed-off-by: BuyuanCui <[email protected]>

* ordinal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* punctuation grammar added

Signed-off-by: BuyuanCui <[email protected]>

* time gramamr updated

Signed-off-by: BuyuanCui <[email protected]>

* tokenizaer updated

Signed-off-by: BuyuanCui <[email protected]>

* updates on certificate

Signed-off-by: BuyuanCui <[email protected]>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <[email protected]>

* cardinal updated

Signed-off-by: BuyuanCui <[email protected]>

* date grammar changed

Signed-off-by: BuyuanCui <[email protected]>

* decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updates

Signed-off-by: BuyuanCui <[email protected]>

* test data added

Signed-off-by: BuyuanCui <[email protected]>

* test python file edits

Signed-off-by: BuyuanCui <[email protected]>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <[email protected]>

* test cases updated

Signed-off-by: BuyuanCui <[email protected]>

* coding style fixed

Signed-off-by: BuyuanCui <[email protected]>

* dates updated for init files

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* removed comments

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <[email protected]>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <[email protected]>

* updated for tests reruns

Signed-off-by: BuyuanCui <[email protected]>

* updats

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name change

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* file name

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <[email protected]>

* fixed import error

Signed-off-by: BuyuanCui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <[email protected]>

* fix path

Signed-off-by: Evelina <[email protected]>

* fix pytest

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* pip 1.2.0

Signed-off-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <[email protected]>

* coding style fix

Signed-off-by: BuyuanCui <[email protected]>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <[email protected]>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <[email protected]>

* added for tests on whitelist

Signed-off-by: BuyuanCui <[email protected]>

* added for test on word

Signed-off-by: BuyuanCui <[email protected]>

* added to run test on whitelist

Signed-off-by: BuyuanCui <[email protected]>

* added to run test on word

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <[email protected]>

* text arg

Signed-off-by: Nikolay Karpov <[email protected]>

* Failed text

Signed-off-by: Nikolay Karpov <[email protected]>

* add logger

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <[email protected]>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <[email protected]>

* info level

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <[email protected]>

* verbose

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <[email protected]>

* Exception

Signed-off-by: Nikolay Karpov <[email protected]>

* verbose

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: Evelina <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <[email protected]>

* self.verbose

Signed-off-by: Nikolay Karpov <[email protected]>

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <[email protected]>

* clean up

Signed-off-by: Evelina <[email protected]>

* fix logging

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <[email protected]>

* fix format

Signed-off-by: Evelina <[email protected]>

* add IT TN to CI

Signed-off-by: Evelina <[email protected]>

* update patch

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <[email protected]>

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrote tokenizer

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* jenkins file update

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* tn bug

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* fixeds and updates

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* adjustments

Signed-off-by: BuyuanCui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* testing commit

Signed-off-by: Alex Cui <[email protected]>

* removing unsed file

Signed-off-by: Alex Cui <[email protected]>

* updated test cases

Signed-off-by: Alex Cui <[email protected]>

* updating etst cases

Signed-off-by: Alex Cui <[email protected]>

* updates adapting to graphs

Signed-off-by: Alex Cui <[email protected]>

* updated …
  • Loading branch information
1 parent b8ae8f6 commit 5552cc6
Show file tree
Hide file tree
Showing 14 changed files with 85 additions and 48 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -476,4 +476,4 @@ pipeline {
cleanWs()
}
}
}
}
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.zh.graph_utils import NEMO_NOT_QUOTE, GraphFst, delete_space


class WordFst(GraphFst):
'''
tokens { char: "一" } -> 一
'''

def __init__(self, deterministic: bool = True, lm: bool = False):
super().__init__(name="char", kind="verbalize", deterministic=deterministic)

graph = pynutil.delete("name: \"") + NEMO_NOT_QUOTE + pynutil.delete("\"")
graph = pynini.closure(delete_space) + graph + pynini.closure(delete_space)
self.fst = graph.optimize()
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.zh.graph_utils import NEMO_NOT_QUOTE, GraphFst, delete_space


class WordFst(GraphFst):
'''
tokens { char: "一" } -> 一
'''

def __init__(self, deterministic: bool = True, lm: bool = False):
super().__init__(name="char", kind="verbalize", deterministic=deterministic)

graph = pynutil.delete("name: \"") + NEMO_NOT_QUOTE + pynutil.delete("\"")
graph = pynini.closure(delete_space) + graph + pynini.closure(delete_space)
self.fst = graph.optimize()
14 changes: 14 additions & 0 deletions nemo_text_processing/text_normalization/en/taggers/electronic.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):

cc_cues = pynutil.add_weight(pynini.string_file(get_abs_path("data/electronic/cc_cues.tsv")), MIN_NEG_WEIGHT,)

cc_cues = pynutil.add_weight(pynini.string_file(get_abs_path("data/electronic/cc_cues.tsv")), MIN_NEG_WEIGHT)

accepted_symbols = pynini.project(pynini.string_file(get_abs_path("data/electronic/symbol.tsv")), "input")
accepted_common_domains = pynini.project(
pynini.string_file(get_abs_path("data/electronic/domain.tsv")), "input"
Expand Down Expand Up @@ -135,6 +137,18 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
)
graph |= cc_phrases

if deterministic:
# credit card cues
numbers = pynini.closure(NEMO_DIGIT, 4, 16)
cc_phrases = (
pynutil.insert("protocol: \"")
+ cc_cues
+ pynutil.insert("\" domain: \"")
+ numbers
+ pynutil.insert("\"")
)
graph |= cc_phrases

final_graph = self.add_tokens(graph)

self.fst = final_graph.optimize()
13 changes: 6 additions & 7 deletions nemo_text_processing/text_normalization/zh/taggers/money.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
from nemo_text_processing.text_normalization.zh.graph_utils import GraphFst
from nemo_text_processing.text_normalization.zh.utils import get_abs_path

# def get_quantity(decimal):
suffix = pynini.union(
"万",
"十万",
Expand Down Expand Up @@ -107,7 +106,7 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
# larger money as decimals
graph_decimal = (
pynutil.insert('integer_part: \"')
+ pynini.closure(
+ (
pynini.closure(cardinal, 1)
+ pynutil.delete('.')
+ pynutil.insert('点')
Expand All @@ -117,14 +116,16 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
)
graph_decimal_money = (
pynini.closure(graph_decimal, 1)
+ pynini.closure(pynutil.insert(' quantity: \"') + suffix + pynutil.insert('\"'))
+ pynini.closure((pynutil.insert(' quantity: \"') + suffix + pynutil.insert('\"')), 0, 1)
+ pynutil.insert(" ")
+ pynini.closure(currency_mandarin_component, 1)
) | (
pynini.closure(currency_component, 1)
+ pynutil.insert(" ")
+ pynini.closure(graph_decimal, 1)
+ pynini.closure(pynutil.insert(" ") + pynutil.insert('quantity: \"') + suffix + pynutil.insert('\"'))
+ pynini.closure(
(pynutil.insert(" ") + pynutil.insert('quantity: \"') + suffix + pynutil.insert('\"')), 0, 1
)
)

graph = (
Expand All @@ -134,7 +135,5 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
| pynutil.add_weight(graph_decimal_money, -1.0)
)

final_graph = graph

final_graph = self.add_tokens(final_graph)
final_graph = self.add_tokens(graph)
self.fst = final_graph.optimize()
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ testITNWord() {
shift $#

# Load shUnit2
. /workspace/shunit2/shunit2
. /workspace/shunit2/shunit2
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ testITNWord() {
shift $#

# Load shUnit2
. /workspace/shunit2/shunit2
. /workspace/shunit2/shunit2
Original file line number Diff line number Diff line change
Expand Up @@ -119,4 +119,4 @@ testTNMath() {
shift $#

# Load shUnit2
. /workspace/shunit2/shunit2
. /workspace/shunit2/shunit2
4 changes: 3 additions & 1 deletion tests/nemo_text_processing/mr/test_cardinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@
from parameterized import parameterized

from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
from nemo_text_processing.text_normalization.normalize import Normalizer

from ..utils import CACHE_DIR, parse_test_case_file


class TestCardinal:
class TestPreprocess:

inverse_normalizer_mr = InverseNormalizer(lang='mr', cache_dir=CACHE_DIR, overwrite_cache=False)

@parameterized.expand(parse_test_case_file('mr/data_inverse_text_normalization/test_cases_cardinal.txt'))
Expand Down
1 change: 1 addition & 0 deletions tests/nemo_text_processing/mr/test_date.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from parameterized import parameterized

from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
from nemo_text_processing.text_normalization.normalize import Normalizer

from ..utils import CACHE_DIR, parse_test_case_file

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,22 @@
只有智商超过一定数值的人才能破解~只有智商超过一定数值的人才能破解
这是由人工智能控制的系统~这是由人工智能控制的系统
欧洲旅游目的地多到不知道怎么选~欧洲旅游目的地多到不知道怎么选
马斯科卖掉豪宅住进折叠屋~马斯科卖掉豪宅住进折叠屋
马斯科卖掉豪宅住进折叠屋~马斯科卖掉豪宅住进折叠屋
免除GOOGLE在一桩诽谤官司中的法律责任。~免除GOOGLE在一桩诽谤官司中的法律责任。
这对CHROME是有利的。~这对CHROME是有利的。
这可能是PILde使用者。~这可能是PILde使用者。
CSI侧重科学办案,也就是现场搜正和鉴识。~CSI侧重科学办案,也就是现场搜正和鉴识。
我以前非常喜欢一个软体,DRAW。~我以前非常喜欢一个软体,DRAW。
我爱你病毒。~我爱你病毒。
微软举办了RACETOMARKETCHALLENGE竞赛。~微软举办了RACETOMARKETCHALLENGE竞赛。
苹果销售量的复苏程度远超PC市场。~苹果销售量的复苏程度远超PC市场。
第三季还有两款ANDROID手机亮相。~第三季还有两款ANDROID手机亮相。
反而应试著让所有GOOGLE服务更加社交化。~反而应试著让所有GOOGLE服务更加社交化。
GOOGLE已提供一项NATIVECLIENT软体。~GOOGLE已提供一项NATIVECLIENT软体。
这些程式都支援PRE与ITUNES同步化。~这些程式都支援PRE与ITUNES同步化。
可以推断此次NTT可能也会将同样的策略用在LTE上。~可以推断此次NTT可能也会将同样的策略用在LTE上。
现今许多小型企业因成本考量被迫采用一般PC作为伺服器。~现今许多小型企业因成本考量被迫采用一般PC作为伺服器。
部落格宣布GOOGLECHROMES的诞生。~部落格宣布GOOGLECHROMES的诞生。
由ZIP订购机场接送或观光景点共乘服务。~由ZIP订购机场接送或观光景点共乘服务。
PAQUE表示短时间应该还不会全面开放。~PAQUE表示短时间应该还不会全面开放。
CBS是美国一家重要的广播电视网路公司。~CBS是美国一家重要的广播电视网路公司。
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#! /bin/sh

GRAMMARS_DIR=${1:-"/workspace/sparrowhawk/documentation/grammars"}
PROJECT_DIR=${2:-"/workspace/tests/en"}
PROJECT_DIR=${2:-"/workspace/tests"}

runtest () {
input=$1
Expand Down
1 change: 0 additions & 1 deletion tools/text_processing_deployment/export_grammars.sh
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,3 @@ else
echo "done mode: $MODE"
exit 0
fi

4 changes: 4 additions & 0 deletions tools/text_processing_deployment/pynini_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,10 @@ def parse_args():
from nemo_text_processing.inverse_text_normalization.ja.verbalizers.verbalize import (
VerbalizeFst as ITNVerbalizeFst,
)
from nemo_text_processing.text_normalization.hy.taggers.tokenize_and_classify import (
ClassifyFst as TNClassifyFst,
)
from nemo_text_processing.text_normalization.hy.verbalizers.verbalize import VerbalizeFst as TNVerbalizeFst
output_dir = os.path.join(args.output_dir, f"{args.language}_{args.grammars}_{args.input_case}")
export_grammars(
output_dir=output_dir,
Expand Down
2 changes: 1 addition & 1 deletion tools/text_processing_deployment/sh_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,4 @@ VERBALIZE_FAR=${CACHE_DIR}_${GRAMMARS}_${INPUT_CASE}/verbalize/verbalize.far
CONFIG=${LANGUAGE}_${GRAMMARS}_${INPUT_CASE}

cp $CLASSIFY_FAR /workspace/sparrowhawk/documentation/grammars_${CONFIG}/en_toy/classify/
cp $VERBALIZE_FAR /workspace/sparrowhawk/documentation/grammars_${CONFIG}/en_toy/verbalize/
cp $VERBALIZE_FAR /workspace/sparrowhawk/documentation/grammars_${CONFIG}/en_toy/verbalize/

0 comments on commit 5552cc6

Please sign in to comment.