-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jp itn update 240805 #208
Merged
Merged
Jp itn update 240805 #208
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@@ -16,11 +16,13 @@ | |||
from parameterized import parameterized | |||
|
|||
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer | |||
from nemo_text_processing.text_normalization.normalize import Normalizer |
Check notice
Code scanning / CodeQL
Unused import Note test
Import of 'Normalizer' is not used.
@@ -16,6 +16,7 @@ | |||
from parameterized import parameterized | |||
|
|||
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer | |||
from nemo_text_processing.text_normalization.normalize import Normalizer |
Check notice
Code scanning / CodeQL
Unused import Note test
Import of 'Normalizer' is not used.
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
resolving conflicts Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]>
BuyuanCui
force-pushed
the
jp_itn_update_240805
branch
from
August 20, 2024 17:09
a17a00d
to
e1d3d49
Compare
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
…NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
…-text-processing into jp_itn_update_240805
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
nemo_text_processing/inverse_text_normalization/ja/verbalizers/post_processing.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]> Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
nemo_text_processing/inverse_text_normalization/ja/verbalizers/whitelist.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Alex Cui <[email protected]>
…-text-processing into jp_itn_update_240805
Signed-off-by: Alex Cui <[email protected]>
mgrafu
approved these changes
Oct 1, 2024
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 24, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * temporal changes will change back Signed-off-by: Alex Cui <[email protected]> * update jp tn date Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases Signed-off-by: Alex Cui <[email protected]> * updats on Jenkins Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * jenkinspdate Signed-off-by: Alex Cui <[email protected]> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <[email protected]> * adding one more test item Signed-off-by: Alex Cui <[email protected]> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <[email protected]> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <[email protected]> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <[email protected]> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * resolving fraction space issue Signed-off-by: Alex Cui <[email protected]> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <[email protected]> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <[email protected]> * fixed typo on decimaltext Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * removing unsed grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * changed regular space to narrow space Signed-off-by: Alex Cui <[email protected]> * imports error fixing Signed-off-by: Alex Cui <[email protected]> * imports errors Signed-off-by: Alex Cui <[email protected]> * Jekins update for jp itn Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * update for fraction space issue Signed-off-by: Alex Cui <[email protected]> * reverting Signed-off-by: Alex Cui <[email protected]> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <[email protected]> * fixng style Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * removing unsed imports Signed-off-by: Alex Cui <[email protected]> * jp tn date update Signed-off-by: Alex Cui <[email protected]> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * removing previously created nemo imports Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * test order arrangement Signed-off-by: Alex Cui <[email protected]> * resolve fraction space issue Signed-off-by: Alex Cui <[email protected]> * style fix Signed-off-by: Alex Cui <[email protected]> * fix style Signed-off-by: Alex Cui <[email protected]> * space issue Signed-off-by: Alex Cui <[email protected]> * update jp tn Signed-off-by: Alex Cui <[email protected]> * removing unsed import Signed-off-by: Alex Cui <[email protected]> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * empty file Signed-off-by: Alex Cui <[email protected]> * to delete Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * add Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add jenkins file (#23) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <[email protected]> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * add minimal ordinal data Signed-off-by: Jim O'Regan <[email protected]> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * add // to symbols Signed-off-by: Jim O'Regan <[email protected]> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <[email protected]> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <[email protected]> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix language Signed-off-by: Jim O'Regan <[email protected]> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <[email protected]> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <[email protected]> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <[email protected]> * fix a pair of test cases Signed-off-by: Jim O'Regan <[email protected]> * fix plurals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * add usd$ Signed-off-by: Jim O'Regan <[email protected]> * insert "komma" Signed-off-by: Jim O'Regan <[email protected]> * "pund" is neuter Signed-off-by: Jim O'Regan <[email protected]> * fix test cases Signed-off-by: Jim O'Regan <[email protected]> * towards proper graphs Signed-off-by: Jim O'Regan <[email protected]> * GBP Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * make komma non-det Signed-off-by: Jim O'Regan <[email protected]> * more money tagger fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <[email protected]> * do a bit better with en/ett Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * fix more test cases Signed-off-by: Jim O'Regan <[email protected]> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <[email protected]> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <[email protected]> * add minimal tests Signed-off-by: Jim O'Regan <[email protected]> * expansions of era abbreviations Signed-off-by: Jim O'Regan <[email protected]> * use eras Signed-off-by: Jim O'Regan <[email protected]> * use eras in verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix examples in comment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <[email protected]> * fix separator Signed-off-by: Jim O'Regan <[email protected]> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <[email protected]> * load labels Signed-off-by: Jim O'Regan <[email protected]> * right first time Signed-off-by: Jim O'Regan <[email protected]> * missing space Signed-off-by: Jim O'Regan <[email protected]> * fix year in test cases Signed-off-by: Jim O'Regan <[email protected]> * getting closer to getting dates working Signed-off-by: Jim O'Regan <[email protected]> * add a (failing) test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <[email protected]> * also handle decades Signed-off-by: Jim O'Regan <[email protected]> * remove todo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add partially incomplete test data Signed-off-by: Jim O'Regan <[email protected]> * mostly fixed test cases Signed-off-by: Jim O'Regan <[email protected]> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <[email protected]> * missed wrapping Signed-off-by: Jim O'Regan <[email protected]> * no difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <[email protected]> * telephone tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <[email protected]> * try adding more brackets Signed-off-by: Jim O'Regan <[email protected]> * fix another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <[email protected]> * move abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add in abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <[email protected]> * single digit Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <[email protected]> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <[email protected]> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <[email protected]> * ok, this seems to work Signed-off-by: Jim O'Regan <[email protected]> * drop the tests starting with comma Signed-off-by: Jim O'Regan <[email protected]> * decimal tagger works Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <[email protected]> * lower case Signed-off-by: Jim O'Regan <[email protected]> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <[email protected]> * add a very minimal test case for time Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <[email protected]> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <[email protected]> * add prompt Signed-off-by: Jim O'Regan <[email protected]> * copy the roman handling from es Signed-off-by: Jim O'Regan <[email protected]> * greek letters Signed-off-by: Jim O'Regan <[email protected]> * some fixes to the time tagger Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <[email protected]> * more work on time Signed-off-by: Jim O'Regan <[email protected]> * |=, not = Signed-off-by: Jim O'Regan <[email protected]> * adapt verbaliser a little Signed-off-by: Jim O'Regan <[email protected]> * add some test cases from module comments Signed-off-by: Jim O'Regan <[email protected]> * export some variables to check Signed-off-by: Jim O'Regan <[email protected]> * small fix Signed-off-by: Jim O'Regan <[email protected]> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <[email protected]> * try doing this here Signed-off-by: Jim O'Regan <[email protected]> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <[email protected]> * fix errors in tests Signed-off-by: Jim O'Regan <[email protected]> * minimal test cases for measure Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <[email protected]> * merge different tsvs Signed-off-by: Jim O'Regan <[email protected]> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <[email protected]> * export some variables for testing Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * need an en/ett split here too Signed-off-by: Jim O'Regan <[email protected]> * fix decimal subgraph Signed-off-by: Jim O'Regan <[email protected]> * remove todo, I've just done it Signed-off-by: Jim O'Regan <[email protected]> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * include greek letters in maths Signed-off-by: Jim O'Regan <[email protected]> * include greek here too Signed-off-by: Jim O'Regan <[email protected]> * minor sg/pl Signed-off-by: Jim O'Regan <[email protected]> * dedup Signed-off-by: Jim O'Regan <[email protected]> * fix a test case Signed-off-by: Jim O'Regan <[email protected]> * put these under if, too Signed-off-by: Jim O'Regan <[email protected]> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <[email protected]> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <[email protected]> * export variables to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * fix some test cases Signed-off-by: Jim O'Regan <[email protected]> * here is one error Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <[email protected]> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <[email protected]> * export a variable Signed-off-by: Jim O'Regan <[email protected]> * add a tesst case Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * . is not a cardinal separator Signed-off-by: Jim O'Regan <[email protected]> * fix case Signed-off-by: Jim O'Regan <[email protected]> * add yen Signed-off-by: Jim O'Regan <[email protected]> * final fixes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove English roman tagger Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * remove some unused pieces Signed-off-by: Jim O'Regan <[email protected]> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <[email protected]> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <[email protected]> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <[email protected]> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <[email protected]> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * warnings about missing whitelist Signed-off-by: Jim O'Regan <[email protected]> * add sv Signed-off-by: Jim O'Regan <[email protected]> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <[email protected]> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <[email protected]> * fix year Signed-off-by: Jim O'Regan <[email protected]> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <[email protected]> * address codeql comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <[email protected]> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <[email protected]> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <[email protected]> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <[email protected]> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <[email protected]> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <[email protected]> * remove broken duplicate Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <[email protected]> * time tests now pass Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <[email protected]> * import delete_preserve_order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <[email protected]> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <[email protected]> * move to the correct subdirectory Signed-off-by: Jim O'Regan <[email protected]> * add swedish Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * fix here also Signed-off-by: Jim O'Regan <[email protected]> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <[email protected]> * add a date case Signed-off-by: Jim O'Regan <[email protected]> * remove duplication Signed-off-by: Jim O'Regan <[email protected]> * boost n_tagged Signed-off-by: Jim O'Regan <[email protected]> * also copyright this year Signed-off-by: Jim O'Regan <[email protected]> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <[email protected]> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <[email protected]> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <[email protected]> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <[email protected]> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <[email protected]> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * days of the week Signed-off-by: Jim O'Regan <[email protected]> * add more abbreviations Signed-off-by: Jim O'Regan <[email protected]> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * remove blank line Signed-off-by: Jim O'Regan <[email protected]> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <[email protected]> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <[email protected]> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * CI setup (#25) * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci _cr Signed-off-by: ekmb <[email protected]> * revert setup tool Signed-off-by: ekmb <[email protected]> * remove pytest-runner from setup.py Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * fix jenkins Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> * update test dir Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <[email protected]> * wip el words Signed-off-by: ekmb <[email protected]> * wip Signed-off-by: ekmb <[email protected]> * electronic pass Signed-off-by: ekmb <[email protected]> * test pass Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * remove unused imports Signed-off-by: ekmb <[email protected]> * add deterministic option normalized options Signed-off-by: ekmb <[email protected]> * update jenkins grammar folder Signed-off-by: ekmb <[email protected]> * clean up, update for SH Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * clean up Signed-off-by: ekmb <[email protected]> * reduce cardinal graph Signed-off-by: ekmb <[email protected]> * jenkins dir Signed-off-by: ekmb <[email protected]> * add weight for sh Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <[email protected]> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <[email protected]> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <[email protected]> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <[email protected]> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <[email protected]> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <[email protected]> * Fix stage Signed-off-by: Anand Joseph <[email protected]> * Change cache folder Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <[email protected]> * add whitelist to export Signed-off-by: ekmb <[email protected]> * update docstrings Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <[email protected]> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <[email protected]> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <[email protected]> * Fix for measures Signed-off-by: Anand Joseph <[email protected]> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <[email protected]> --------- Signed-off-by: Larisa Kempbell <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <[email protected]> * added pynini install note Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.6rc0 (#37) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <[email protected]> * Fix Jenkinsfile Signed-off-by: Anand Joseph <[email protected]> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <[email protected]> * Run language tests in stages Signed-off-by: Anand Joseph <[email protected]> * Update DE cache folder Signed-off-by: Anand Joseph <[email protected]> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <[email protected]> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <[email protected]> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <[email protected]> * fix telephone, ordinal Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * restarting ci Signed-off-by: ekmb <[email protected]> * update electronic Signed-off-by: ekmb <[email protected]> * review feedback, update whitelist Signed-off-by: ekmb <[email protected]> * rename capitalize func Signed-off-by: ekmb <[email protected]> * fix SH tests Signed-off-by: ekmb <[email protected]> * fix tests Signed-off-by: ekmb <[email protected]> * update jenkins folder name Signed-off-by: ekmb <[email protected]> * added cased arg to ITN Signed-off-by: ekmb <[email protected]> * add input_case arg to other lang Signed-off-by: ekmb <[email protected]> * jenkins dirs update Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * update test Signed-off-by: ekmb <[email protected]> * fix codeql errors Signed-off-by: ekmb <[email protected]> * fix sh Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> * update jenkins dir Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <[email protected]> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <[email protected]> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <[email protected]> * Add tests Signed-off-by: Anand Joseph <[email protected]> * Update cache folder for EN Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <[email protected]> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <[email protected]> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <[email protected]> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <[email protected]> * Update tests Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <[email protected]> * save Signed-off-by: Yang Zhang <[email protected]> * extend alignment for itn Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <[email protected]> * added test to pr doc Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci test Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix ci Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <[email protected]> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <[email protected]> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * fix sv tests (#52) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * 0.1.7 release (#53) Signed-off-by: ekmb <[email protected]> Signed-off-by: Alex Cui <[email protected]> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <[email protected]> * add inflection for quantities Signed-off-by: Jim O'Regan <[email protected]> * add a test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <[email protected]> * change integer Signed-off-by: Jim O'Regan <[email protected]> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <[email protected]> * more test cases Signed-off-by: Jim O'Regan <[email protected]> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <[email protected]> * superscript to superessive Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <[email protected]> * add vowels Signed-off-by: Jim O'Regan <[email protected]> * fix var Signed-off-by: Jim O'Regan <[email protected]> * bare minimum electronic test Signed-off-by: Jim O'Regan <[email protected]> * add another test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <[email protected]> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <[email protected]> * add some alternative measure forms Signed-off-by: Jim O'Regan <[email protected]> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <[email protected]> * add very minimal time test Signed-off-by: Jim O'Regan <[email protected]> * will want cardinal here Signed-off-by: Jim O'Regan <[email protected]> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <[email protected]> * move two letters Signed-off-by: Jim O'Regan <[email protected]> * add my copyright Signed-off-by: Jim O'Regan <[email protected]> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <[email protected]> * small changes Signed-off-by: Jim O'Regan <[email protected]> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <[email protected]> * other ways of reading w Signed-off-by: Jim O'Regan <[email protected]> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <[email protected]> * currency Signed-off-by: Jim O'Regan <[email protected]> * more inflection Signed-off-by: Jim O'Regan <[email protected]> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <[email protected]> * working now, add a comment Signed-off-by: Jim O'Regan <[email protected]> * also integer, and preserve order Signed-off-by: Jim O'Regan <[email protected]> * also accept the full words Signed-off-by: Jim O'Regan <[email protected]> * deduplicate Signed-off-by: Jim O'Regan <[email protected]> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <[email protected]> * duplicate space Signed-off-by: Jim O'Regan <[email protected]> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <[email protected]> * actually saving the adaptations Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <[email protected]> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <[email protected]> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <[email protected]> * remove pynini checks from tests Signed-off-by: Jim O'Regan <[email protected]> * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <[email protected]> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <[email protected]> * fix cache dir Signed-off-by: Jim O'Regan <[email protected]> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add basic tests (native verified) Signed-off-by: Jim O'Regan <[email protected]> * add components for read digits Signed-off-by: Jim O'Regan <[email protected]> * add an example with a different separator Signed-off-by: Jim O'Regan <[email protected]> * start adapting Signed-off-by: Jim O'Regan <[email protected]> * add 2-digit area codes Signed-off-by: Jim O'Regan <[email protected]> * add another Signed-off-by: Jim O'Regan <[email protected]> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <[email protected]> * export var Signed-off-by: Jim O'Regan <[email protected]> * in progress Signed-off-by: Jim O'Regan <[email protected]> * country codes Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <[email protected]> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <[email protected]> * nominal digits Signed-off-by: Jim O'Regan <[email protected]> * add IP prompt Signed-off-by: Jim O'Regan <[email protected]> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <[email protected]> * more work on telephone Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * minor adaptation; more needed Signed-off-by: Jim O'Regan <[email protected]> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <[email protected]> * adapt more Signed-off-by: Jim O'Regan <[email protected]> * nearly there Signed-off-by: Jim O'Regan <[email protected]> * replace with version from sv Signed-off-by: Jim O'Regan <[email protected]> * extend tests Signed-off-by: Jim O'Regan <[email protected]> * some tweaks Signed-off-by: Jim O'Regan <[email protected]> * add an IP test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <[email protected]> * move variables Signed-off-by: Jim O'Regan <[email protected]> * filter ordinals Signed-off-by: Jim O'Regan <[email protected]> * basic fraction tests Signed-off-by: Jim O'Regan <[email protected]> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <[email protected]> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <[email protected]> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <[email protected]> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <[email protected]> * add another test, including spaces Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, not in reality Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <[email protected]> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <[email protected]> * add a test for that Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <[email protected]> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <[email protected]> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <[email protected]> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <[email protected]> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <[email protected]> * swapping order Signed-off-by: Jim O'Regan <[email protected]> * more swapping Signed-off-by: Jim O'Regan <[email protected]> * remove import Signed-off-by: Jim O'Regan <[email protected]> * add an example Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <[email protected]> * some things fixed Signed-off-by: Jim O'Regan <[email protected]> * more adjustments to time Signed-off-by: Jim O'Regan <[email protected]> * more todo, but working for this subset Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <[email protected]> * sort|uniq Signed-off-by: Jim O'Regan <[email protected]> * timezone can be inflected too Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <[email protected]> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <[email protected]> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <[email protected]> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <[email protected]> * fix the commented ITN part Signed-off-by: Jim O'Regan <[email protected]> * add hu Signed-off-by: Jim O'Regan <[email protected]> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <[email protected]> * fix measure cardinals Signed-off-by: Jim O'Regan <[email protected]> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <[email protected]> * missed removing preserver_order Signed-off-by: Jim O'Regan <[email protected]> * fix test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * codeql Signed-off-by: Jim O'Regan <[email protected]> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <[email protected]> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add öre (also for NOK) Signed-off-by: Jim O’Regan <[email protected]> * Comment line, for now Signed-off-by: Jim O’Regan <[email protected]> * try breaking this into pieces Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * add missing __init__.py Signed-off-by: Jim O'Regan <[email protected]> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <[email protected]> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <[email protected]> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <[email protected]> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <[email protected]> * add [be]os_or_space Signed-off-by: Jim O'Regan <[email protected]> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <[email protected]> * add extra spaced versions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <[email protected]> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <[email protected]> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <[email protected]> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <[email protected]> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <[email protected]> * see if this makes a difference Signed-off-by: Jim O'Regan <[email protected]> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <[email protected]> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <[email protected]> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <[email protected]> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <[email protected]> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <[email protected]> * try again Signed-off-by: Jim O'Regan <[email protected]> * move that thing, merge some lines Signed-off-by: Jim O'Regan <[email protected]> * at least it fails quickly Signed-off-by: Jim O'Regan <[email protected]> * export original Signed-off-by: Jim O'Regan <[email protected]> * move things around for no real reason Signed-off-by: Jim O'Regan <[email protected]> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <[email protected]> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <[email protected]> * try this again Signed-off-by: Jim O'Regan <[email protected]> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <[email protected]> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <[email protected]> * ok, try here Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <[email protected]> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * get rid of duplicate input print Signed-off-by: Jim O'Regan <[email protected]> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <[email protected]> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <[email protected]> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <[email protected]> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <[email protected]> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <[email protected]> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <[email protected]> * rearrange slightly Signed-off-by: Jim O'Regan <[email protected]> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Fixing FRACTION grammar to match t the 1.0 release accuracy.
pot_processing.py grammar in the ja itn verbalizer removes all the spaces between the characters. However, fractions requires a space between its integer number component and the fraction component. This PR is to make sure that all the spaces are being deleted except the space between the fraction integer and fraction-fraction component, e.g., <number + space + number/number>.
Before your PR is "Ready for review"
Pre checks:
git commit -s
to sign.pytest
or (if your machine does not have GPU)pytest --cpu
from the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')
).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
pytest
and Sparrowhawk here.__init__.py
for every folder and subfolder, includingdata
folder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
to all newly added Python files?Copyright 2015 and onwards Google, Inc.
. See an example here.try import: ... except: ...
) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.