Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jp itn update 240805 #208

Merged
merged 287 commits into from
Oct 1, 2024
Merged

Jp itn update 240805 #208

merged 287 commits into from
Oct 1, 2024

Conversation

BuyuanCui
Copy link
Collaborator

@BuyuanCui BuyuanCui commented Aug 15, 2024

What does this PR do ?

Fixing FRACTION grammar to match t the 1.0 release accuracy.
pot_processing.py grammar in the ja itn verbalizer removes all the spaces between the characters. However, fractions requires a space between its integer number component and the fraction component. This PR is to make sure that all the spaces are being deleted except the space between the fraction integer and fraction-fraction component, e.g., <number + space + number/number>.

  • For yes, - for not relevant in below.

Before your PR is "Ready for review"

Pre checks:

  • [*] Have you signed your commits? Use git commit -s to sign.
  • [*] Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • [-] If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • [*] Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • [-] Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • [*] Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • [-] If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • [-] Remove import guards (try import: ... except: ...) if not already done.
  • [-] If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • [-] Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • [*] Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

@@ -16,11 +16,13 @@
from parameterized import parameterized

from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
from nemo_text_processing.text_normalization.normalize import Normalizer

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'Normalizer' is not used.
@@ -16,6 +16,7 @@
from parameterized import parameterized

from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
from nemo_text_processing.text_normalization.normalize import Normalizer

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'Normalizer' is not used.
BuyuanCui and others added 16 commits August 20, 2024 09:58
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
BuyuanCui and others added 14 commits September 25, 2024 14:07
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Buyuan(Alex) Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
@mgrafu mgrafu merged commit 9d89fd8 into main Oct 1, 2024
5 checks passed
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 24, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* temporal changes will change back

Signed-off-by: Alex Cui <[email protected]>

* update jp tn date

Signed-off-by: Alex Cui <[email protected]>

* resolving conflict

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases

Signed-off-by: Alex Cui <[email protected]>

* updats on Jenkins

Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* jenkinspdate

Signed-off-by: Alex Cui <[email protected]>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <[email protected]>

* adding one more test item

Signed-off-by: Alex Cui <[email protected]>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <[email protected]>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <[email protected]>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <[email protected]>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <[email protected]>

* resolving fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <[email protected]>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <[email protected]>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* removing unsed grammar

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* changed regular space to narrow space

Signed-off-by: Alex Cui <[email protected]>

* imports error fixing

Signed-off-by: Alex Cui <[email protected]>

* imports errors

Signed-off-by: Alex Cui <[email protected]>

* Jekins update for jp itn

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* reverting

Signed-off-by: Alex Cui <[email protected]>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <[email protected]>

* fixng style

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* removing unsed imports

Signed-off-by: Alex Cui <[email protected]>

* jp tn date update

Signed-off-by: Alex Cui <[email protected]>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* removing previously created nemo imports

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* test order arrangement

Signed-off-by: Alex Cui <[email protected]>

* resolve fraction space issue

Signed-off-by: Alex Cui <[email protected]>

* style fix

Signed-off-by: Alex Cui <[email protected]>

* fix style

Signed-off-by: Alex Cui <[email protected]>

* space issue

Signed-off-by: Alex Cui <[email protected]>

* update jp tn

Signed-off-by: Alex Cui <[email protected]>

* removing unsed import

Signed-off-by: Alex Cui <[email protected]>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <[email protected]>

* empty file

Signed-off-by: Alex Cui <[email protected]>

* to delete

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* add

Signed-off-by: Yang Zhang <[email protected]>

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* add jenkins file (#23)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix test case

Signed-off-by: Jim O'Regan <[email protected]>

* add // to symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix language

Signed-off-by: Jim O'Regan <[email protected]>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <[email protected]>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix plurals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add usd$

Signed-off-by: Jim O'Regan <[email protected]>

* insert "komma"

Signed-off-by: Jim O'Regan <[email protected]>

* "pund" is neuter

Signed-off-by: Jim O'Regan <[email protected]>

* fix test cases

Signed-off-by: Jim O'Regan <[email protected]>

* towards proper graphs

Signed-off-by: Jim O'Regan <[email protected]>

* GBP

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* make komma non-det

Signed-off-by: Jim O'Regan <[email protected]>

* more money tagger fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <[email protected]>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <[email protected]>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <[email protected]>

* add minimal tests

Signed-off-by: Jim O'Regan <[email protected]>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* use eras

Signed-off-by: Jim O'Regan <[email protected]>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* fix examples in comment

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <[email protected]>

* fix separator

Signed-off-by: Jim O'Regan <[email protected]>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <[email protected]>

* load labels

Signed-off-by: Jim O'Regan <[email protected]>

* right first time

Signed-off-by: Jim O'Regan <[email protected]>

* missing space

Signed-off-by: Jim O'Regan <[email protected]>

* fix year in test cases

Signed-off-by: Jim O'Regan <[email protected]>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <[email protected]>

* add a (failing) test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <[email protected]>

* also handle decades

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <[email protected]>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <[email protected]>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <[email protected]>

* missed wrapping

Signed-off-by: Jim O'Regan <[email protected]>

* no difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <[email protected]>

* telephone tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <[email protected]>

* try adding more brackets

Signed-off-by: Jim O'Regan <[email protected]>

* fix another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <[email protected]>

* move abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add in abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <[email protected]>

* single digit

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <[email protected]>

* ok, this seems to work

Signed-off-by: Jim O'Regan <[email protected]>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <[email protected]>

* decimal tagger works

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* lower case

Signed-off-by: Jim O'Regan <[email protected]>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <[email protected]>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <[email protected]>

* add prompt

Signed-off-by: Jim O'Regan <[email protected]>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <[email protected]>

* greek letters

Signed-off-by: Jim O'Regan <[email protected]>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <[email protected]>

* more work on time

Signed-off-by: Jim O'Regan <[email protected]>

* |=, not =

Signed-off-by: Jim O'Regan <[email protected]>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <[email protected]>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables to check

Signed-off-by: Jim O'Regan <[email protected]>

* small fix

Signed-off-by: Jim O'Regan <[email protected]>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <[email protected]>

* try doing this here

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <[email protected]>

* fix errors in tests

Signed-off-by: Jim O'Regan <[email protected]>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <[email protected]>

* merge different tsvs

Signed-off-by: Jim O'Regan <[email protected]>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <[email protected]>

* export some variables for testing

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <[email protected]>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <[email protected]>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <[email protected]>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* include greek letters in maths

Signed-off-by: Jim O'Regan <[email protected]>

* include greek here too

Signed-off-by: Jim O'Regan <[email protected]>

* minor sg/pl

Signed-off-by: Jim O'Regan <[email protected]>

* dedup

Signed-off-by: Jim O'Regan <[email protected]>

* fix a test case

Signed-off-by: Jim O'Regan <[email protected]>

* put these under if, too

Signed-off-by: Jim O'Regan <[email protected]>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <[email protected]>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <[email protected]>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* fix some test cases

Signed-off-by: Jim O'Regan <[email protected]>

* here is one error

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <[email protected]>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <[email protected]>

* export a variable

Signed-off-by: Jim O'Regan <[email protected]>

* add a tesst case

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <[email protected]>

* fix case

Signed-off-by: Jim O'Regan <[email protected]>

* add yen

Signed-off-by: Jim O'Regan <[email protected]>

* final fixes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove English roman tagger

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* remove some unused pieces

Signed-off-by: Jim O'Regan <[email protected]>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <[email protected]>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <[email protected]>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <[email protected]>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <[email protected]>

* add sv

Signed-off-by: Jim O'Regan <[email protected]>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <[email protected]>

* fix year

Signed-off-by: Jim O'Regan <[email protected]>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <[email protected]>

* address codeql comments

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <[email protected]>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <[email protected]>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <[email protected]>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <[email protected]>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <[email protected]>

* remove broken duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <[email protected]>

* time tests now pass

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <[email protected]>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <[email protected]>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <[email protected]>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <[email protected]>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <[email protected]>

* add swedish

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* fix here also

Signed-off-by: Jim O'Regan <[email protected]>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* add a date case

Signed-off-by: Jim O'Regan <[email protected]>

* remove duplication

Signed-off-by: Jim O'Regan <[email protected]>

* boost n_tagged

Signed-off-by: Jim O'Regan <[email protected]>

* also copyright this year

Signed-off-by: Jim O'Regan <[email protected]>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <[email protected]>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <[email protected]>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <[email protected]>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <[email protected]>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* days of the week

Signed-off-by: Jim O'Regan <[email protected]>

* add more abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* remove blank line

Signed-off-by: Jim O'Regan <[email protected]>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <[email protected]>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <[email protected]>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci _cr

Signed-off-by: ekmb <[email protected]>

* revert setup tool

Signed-off-by: ekmb <[email protected]>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* fix jenkins

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

* update test dir

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip el words

Signed-off-by: ekmb <[email protected]>

* wip

Signed-off-by: ekmb <[email protected]>

* electronic pass

Signed-off-by: ekmb <[email protected]>

* test pass

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* remove unused imports

Signed-off-by: ekmb <[email protected]>

* add deterministic option normalized options

Signed-off-by: ekmb <[email protected]>

* update jenkins grammar folder

Signed-off-by: ekmb <[email protected]>

* clean up, update for SH

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* clean up

Signed-off-by: ekmb <[email protected]>

* reduce cardinal graph

Signed-off-by: ekmb <[email protected]>

* jenkins dir

Signed-off-by: ekmb <[email protected]>

* add weight for sh

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <[email protected]>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <[email protected]>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <[email protected]>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <[email protected]>

* Fix stage

Signed-off-by: Anand Joseph <[email protected]>

* Change cache folder

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <[email protected]>

* add whitelist to export

Signed-off-by: ekmb <[email protected]>

* update docstrings

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <[email protected]>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <[email protected]>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <[email protected]>

* Fix for measures

Signed-off-by: Anand Joseph <[email protected]>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <[email protected]>

---------

Signed-off-by: Larisa Kempbell <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <[email protected]>

* added pynini install note

Signed-off-by: Yang Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <[email protected]>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <[email protected]>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <[email protected]>

* Run language tests in stages

Signed-off-by: Anand Joseph <[email protected]>

* Update DE cache folder

Signed-off-by: Anand Joseph <[email protected]>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <[email protected]>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <[email protected]>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <[email protected]>

* fix telephone, ordinal

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* restarting ci

Signed-off-by: ekmb <[email protected]>

* update electronic

Signed-off-by: ekmb <[email protected]>

* review feedback, update whitelist

Signed-off-by: ekmb <[email protected]>

* rename capitalize func

Signed-off-by: ekmb <[email protected]>

* fix SH tests

Signed-off-by: ekmb <[email protected]>

* fix tests

Signed-off-by: ekmb <[email protected]>

* update jenkins folder name

Signed-off-by: ekmb <[email protected]>

* added cased arg to ITN

Signed-off-by: ekmb <[email protected]>

* add input_case arg to other lang

Signed-off-by: ekmb <[email protected]>

* jenkins dirs update

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* update test

Signed-off-by: ekmb <[email protected]>

* fix codeql errors

Signed-off-by: ekmb <[email protected]>

* fix sh

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

* update jenkins dir

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <[email protected]>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <[email protected]>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <[email protected]>

* Add tests

Signed-off-by: Anand Joseph <[email protected]>

* Update cache folder for EN

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <[email protected]>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <[email protected]>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <[email protected]>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <[email protected]>

* Update tests

Signed-off-by: Anand Joseph <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <[email protected]>

* save

Signed-off-by: Yang Zhang <[email protected]>

* extend alignment for itn

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <[email protected]>

* added test to pr doc

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci test

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix ci

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

---------

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <[email protected]>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <[email protected]>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* fix sv tests (#52)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* 0.1.7 release (#53)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <[email protected]>

* Update Jenkinsfile

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for quantities

Signed-off-by: Jim O'Regan <[email protected]>

* add a test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <[email protected]>

* change integer

Signed-off-by: Jim O'Regan <[email protected]>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <[email protected]>

* more test cases

Signed-off-by: Jim O'Regan <[email protected]>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <[email protected]>

* superscript to superessive

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* add vowels

Signed-off-by: Jim O'Regan <[email protected]>

* fix var

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <[email protected]>

* add another test case

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <[email protected]>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <[email protected]>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <[email protected]>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <[email protected]>

* add very minimal time test

Signed-off-by: Jim O'Regan <[email protected]>

* will want cardinal here

Signed-off-by: Jim O'Regan <[email protected]>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <[email protected]>

* move two letters

Signed-off-by: Jim O'Regan <[email protected]>

* add my copyright

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* small changes

Signed-off-by: Jim O'Regan <[email protected]>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <[email protected]>

* other ways of reading w

Signed-off-by: Jim O'Regan <[email protected]>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <[email protected]>

* currency

Signed-off-by: Jim O'Regan <[email protected]>

* more inflection

Signed-off-by: Jim O'Regan <[email protected]>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* working now, add a comment

Signed-off-by: Jim O'Regan <[email protected]>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* also accept the full words

Signed-off-by: Jim O'Regan <[email protected]>

* deduplicate

Signed-off-by: Jim O'Regan <[email protected]>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <[email protected]>

* adapt comments

Signed-off-by: Jim O'Regan <[email protected]>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <[email protected]>

* duplicate space

Signed-off-by: Jim O'Regan <[email protected]>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <[email protected]>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <[email protected]>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <[email protected]>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix typo

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <[email protected]>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <[email protected]>

* fix cache dir

Signed-off-by: Jim O'Regan <[email protected]>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <[email protected]>

* add components for read digits

Signed-off-by: Jim O'Regan <[email protected]>

* add an example with a different separator

Signed-off-by: Jim O'Regan <[email protected]>

* start adapting

Signed-off-by: Jim O'Regan <[email protected]>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <[email protected]>

* add another

Signed-off-by: Jim O'Regan <[email protected]>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <[email protected]>

* export var

Signed-off-by: Jim O'Regan <[email protected]>

* in progress

Signed-off-by: Jim O'Regan <[email protected]>

* country codes

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <[email protected]>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <[email protected]>

* nominal digits

Signed-off-by: Jim O'Regan <[email protected]>

* add IP prompt

Signed-off-by: Jim O'Regan <[email protected]>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <[email protected]>

* more work on telephone

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <[email protected]>

* fix path

Signed-off-by: Jim O'Regan <[email protected]>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <[email protected]>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* adapt more

Signed-off-by: Jim O'Regan <[email protected]>

* nearly there

Signed-off-by: Jim O'Regan <[email protected]>

* replace with version from sv

Signed-off-by: Jim O'Regan <[email protected]>

* extend tests

Signed-off-by: Jim O'Regan <[email protected]>

* some tweaks

Signed-off-by: Jim O'Regan <[email protected]>

* add an IP test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <[email protected]>

* move variables

Signed-off-by: Jim O'Regan <[email protected]>

* filter ordinals

Signed-off-by: Jim O'Regan <[email protected]>

* basic fraction tests

Signed-off-by: Jim O'Regan <[email protected]>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <[email protected]>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <[email protected]>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <[email protected]>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <[email protected]>

* add another test, including spaces

Signed-off-by: Jim O'Regan <[email protected]>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <[email protected]>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <[email protected]>

* add a test for that

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <[email protected]>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <[email protected]>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <[email protected]>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <[email protected]>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <[email protected]>

* swapping order

Signed-off-by: Jim O'Regan <[email protected]>

* more swapping

Signed-off-by: Jim O'Regan <[email protected]>

* remove import

Signed-off-by: Jim O'Regan <[email protected]>

* add an example

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <[email protected]>

* some things fixed

Signed-off-by: Jim O'Regan <[email protected]>

* more adjustments to time

Signed-off-by: Jim O'Regan <[email protected]>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <[email protected]>

* sort|uniq

Signed-off-by: Jim O'Regan <[email protected]>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <[email protected]>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <[email protected]>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <[email protected]>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <[email protected]>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <[email protected]>

* add hu

Signed-off-by: Jim O'Regan <[email protected]>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <[email protected]>

* fix measure cardinals

Signed-off-by: Jim O'Regan <[email protected]>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <[email protected]>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <[email protected]>

* fix test

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* codeql

Signed-off-by: Jim O'Regan <[email protected]>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <[email protected]>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <[email protected]>

* Comment line, for now

Signed-off-by: Jim O’Regan <[email protected]>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* add missing __init__.py

Signed-off-by: Jim O'Regan <[email protected]>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <[email protected]>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <[email protected]>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <[email protected]>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <[email protected]>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <[email protected]>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <[email protected]>

* add extra spaced versions

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <[email protected]>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <[email protected]>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <[email protected]>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <[email protected]>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <[email protected]>

* see if this makes a difference

Signed-off-by: Jim O'Regan <[email protected]>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <[email protected]>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <[email protected]>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <[email protected]>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <[email protected]>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <[email protected]>

* try again

Signed-off-by: Jim O'Regan <[email protected]>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <[email protected]>

* at least it fails quickly

Signed-off-by: Jim O'Regan <[email protected]>

* export original

Signed-off-by: Jim O'Regan <[email protected]>

* move things around for no real reason

Signed-off-by: Jim O'Regan <[email protected]>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <[email protected]>

* try this again

Signed-off-by: Jim O'Regan <[email protected]>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <[email protected]>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <[email protected]>

* ok, try here

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <[email protected]>

* remove unused imports

Signed-off-by: Jim O'Regan <[email protected]>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <[email protected]>

* change the variable names

Signed-off-by: Jim O'Regan <[email protected]>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <[email protected]>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <[email protected]>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <[email protected]>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <[email protected]>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <[email protected]>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <[email protected]>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <[email protected]>

* rearrange slightly

Signed-off-by: Jim O'Regan <[email protected]>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <[email protected]>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <[email protected]>

* whitespace fixes

Signed-off-by: Jim O'Regan <[email protected]>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <[email protected]>

---------

Signed-off-by: Jim O'Regan <[email protected]>
Signed-off-by: Jim O’Regan <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* Remove unused imports

Signed-off-by: anand-nv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <[email protected]>

---------

Signed-off-by: ealbasiri <[email protected]>
Signed-off-by: anand-nv <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <[email protected]>
Signed-off-by: Alex Cui <[email protected]>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

---------

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.