Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jp itn update 240805 #208

Merged
merged 287 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
287 commits
Select commit Hold shift + click to select a range
b6301ea
temporal fixings attempt to fixn SH test errors, will fix back
BuyuanCui Aug 16, 2024
3ee98f4
temporal changes will change back
BuyuanCui Aug 16, 2024
73ad43b
update jp tn date
BuyuanCui Aug 19, 2024
b0cc5e5
resolving conflict
BuyuanCui Aug 20, 2024
334dfa8
adding grammars back in the tokenizer
BuyuanCui Aug 19, 2024
fb912bb
fixing ci test cases
BuyuanCui Aug 20, 2024
6f158db
updats on Jenkins
BuyuanCui Aug 20, 2024
c86c2f4
with pynini closure had errors chaing back to no closure version
BuyuanCui Aug 20, 2024
223a724
jenkinspdate
BuyuanCui Aug 20, 2024
adac539
changing the data format, to align to the blind test data
BuyuanCui Aug 15, 2024
d508f02
adding one more test item
BuyuanCui Aug 16, 2024
90fc3cb
temporal fixings attempt to fixn SH test errors, will fix back
BuyuanCui Aug 16, 2024
c7be548
adding grammars back in the tokenizer
BuyuanCui Aug 19, 2024
c633665
fixing ci test cases
BuyuanCui Aug 20, 2024
1153511
with pynini closure had errors chaing back to no closure version
BuyuanCui Aug 20, 2024
e1d3d49
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 20, 2024
f7d3dc5
Merge branch 'main' into jp_itn_update_240805
BuyuanCui Aug 20, 2024
4395a5b
resolving fraction space issue
BuyuanCui Aug 24, 2024
c851223
resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE
BuyuanCui Aug 24, 2024
c5f3a61
resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and …
BuyuanCui Aug 24, 2024
4e3ba8a
fixed typo on decimaltext
BuyuanCui Aug 24, 2024
0fb1b0f
Merge branch 'jp_itn_update_240805' of https://github.com/NVIDIA/NeMo…
BuyuanCui Aug 24, 2024
3d9bc4a
removing unsed grammar
BuyuanCui Aug 24, 2024
8252a38
removing unsed grammar
BuyuanCui Aug 24, 2024
16976bf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 24, 2024
2855f75
removing unsed improts
BuyuanCui Aug 24, 2024
ed3686f
removing unused import
BuyuanCui Aug 24, 2024
3033ef7
changed regular space to narrow space
BuyuanCui Aug 24, 2024
961add2
Merge branch 'jp_itn_update_240805' of https://github.com/NVIDIA/NeMo…
BuyuanCui Aug 24, 2024
d1917cd
imports error fixing
BuyuanCui Aug 24, 2024
1d84fd4
imports errors
BuyuanCui Aug 24, 2024
7a29d75
Jekins update for jp itn
BuyuanCui Aug 24, 2024
85e1604
update for fraction space issue
BuyuanCui Sep 4, 2024
4832f56
update for fraction space issue
BuyuanCui Sep 4, 2024
de6438f
update for fraction space issue
BuyuanCui Sep 4, 2024
44cc40e
reverting
BuyuanCui Sep 4, 2024
f150122
update for fraction space issuel chaing narrow space to regular norma…
BuyuanCui Sep 4, 2024
76c9124
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 4, 2024
c34e140
fixing style
BuyuanCui Sep 4, 2024
76191f1
fixng style
BuyuanCui Sep 4, 2024
423e595
style fix
BuyuanCui Sep 4, 2024
4037ddf
style fix
BuyuanCui Sep 4, 2024
da58040
style fix
BuyuanCui Sep 4, 2024
a0b7d2e
removing unsed imports
BuyuanCui Sep 4, 2024
aa1396c
Merge branch 'jp_itn_update_240805' of https://github.com/NVIDIA/NeMo…
BuyuanCui Sep 4, 2024
2c4b7ef
jp tn date update
BuyuanCui Sep 4, 2024
1e18ad9
Update test_cases_fraction.txt
BuyuanCui Sep 4, 2024
d28245a
removing previously created nemo imports
BuyuanCui Sep 5, 2024
c0e9943
space issue
BuyuanCui Sep 5, 2024
e2f7443
test order arrangement
BuyuanCui Sep 5, 2024
abc2919
resolve fraction space issue
BuyuanCui Sep 5, 2024
dd47c5c
style fix
BuyuanCui Sep 5, 2024
265d562
fix style
BuyuanCui Sep 5, 2024
02f23cd
Merge branch 'jp_itn_update_240805' of https://github.com/NVIDIA/NeMo…
BuyuanCui Sep 5, 2024
0fb4c6c
space issue
BuyuanCui Sep 5, 2024
8a67a02
update jp tn
BuyuanCui Sep 5, 2024
15d1caa
removing unsed import
BuyuanCui Sep 5, 2024
6648f7e
Update post_processing.py
BuyuanCui Sep 12, 2024
528d734
empty file
BuyuanCui Sep 24, 2024
c6e0944
to delete
BuyuanCui Sep 24, 2024
1ceef2b
removing
BuyuanCui Sep 24, 2024
59f4619
resolving merge conflict
BuyuanCui Sep 24, 2024
915655c
add contributing (#21)
yzhang123 Jan 26, 2023
9e919ac
add jenkins file (#23)
ekmb Jan 30, 2023
3408bc2
Swedish TN (#12)
jimregan Jan 30, 2023
0c83664
CI setup (#25)
ekmb Jan 30, 2023
0bbbc3f
Merge EN riva release 22.10 (#26)
anand-nv Jan 30, 2023
224fd68
Eng TN - update urls to handle dictionary words (#27)
ekmb Feb 1, 2023
90cb0b4
Tn en astronomical no (#28)
anand-nv Feb 1, 2023
6724949
Add whitelist param to ITN (#30)
ekmb Feb 3, 2023
69956f3
Eng tn itn (#31)
anand-nv Feb 3, 2023
69ec623
Fix parse "None" as string (#33)
anand-nv Feb 6, 2023
26623b0
read double digits for telephone grammar (#32)
LarisaKe Feb 6, 2023
6f99186
Install (#35)
yzhang123 Feb 7, 2023
5253d28
Install (#36)
yzhang123 Feb 7, 2023
fd313be
0.1.6rc0 (#37)
ekmb Feb 8, 2023
aee5c04
Add ci (#39)
anand-nv Feb 10, 2023
83dc17d
support the use of phonetic superscript letters for ordinals, because…
jimregan Feb 14, 2023
9fce09b
update fr cache path for ci (#44)
mgrafu Feb 16, 2023
e29f877
update ITN to work after Punctuation capitalization model (#22)
ekmb Feb 16, 2023
5126bc3
En names (#42)
anand-nv Feb 19, 2023
ff51000
update doc and fix alignment for itn (#47)
yzhang123 Feb 27, 2023
beda4e0
Align ci test (#51)
yzhang123 Mar 6, 2023
493a123
Audio-based TN for Swedish (#49)
jimregan Mar 8, 2023
c184662
fix sv tests (#52)
ekmb Mar 9, 2023
3346861
0.1.7 release (#53)
ekmb Mar 9, 2023
943dbd8
En names (#56)
anand-nv Mar 27, 2023
deb4fec
fix bug for hh:mm:ss normalization (#57)
mgrafu Mar 29, 2023
5ce61c0
rewrite regex to silence deprecation warning (#55)
jimregan Apr 5, 2023
ed40e9d
Hungarian TN ✅ (#9)
jimregan Apr 5, 2023
95ede85
Es bugfix (#59)
mgrafu Apr 7, 2023
3f5181e
Store input_case in Normalizer (#65)
rlangman May 8, 2023
21ea4dc
Swedish telephone fix (#60)
jimregan May 11, 2023
c53b37e
log instead of print in graph_utils.py (#68)
eginhard May 17, 2023
e297bfb
CER estimation speedup for audio-based text normalization (#73)
vsl9 May 27, 2023
495837f
add measure coverage for TN and ITN (#62)
ealbasiri Jun 6, 2023
99a3328
upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)
mgrafu Jun 6, 2023
cc96211
add country codes from hu (#77)
jimregan Jun 8, 2023
0141a5a
fix electronic case for username (#75)
ekmb Jun 8, 2023
11bd0af
0.1.8 release (#79)
ekmb Jun 13, 2023
880629b
Codeswitched ES/EN ITN (#78)
anand-nv Jun 14, 2023
e8563aa
electronic verbalizer fallback (#81)
ekmb Jun 20, 2023
7c6eaab
minor normalize.py edit for usability (#84)
lleaver Jun 28, 2023
a017a84
Swedish ITN (#40)
jimregan Jun 29, 2023
7de1be4
Italian_TN (#67)
GiacomoLeoneMaria Jun 29, 2023
31a1a79
Zh itn (#74)
BuyuanCui Jun 30, 2023
8a48769
updated pynini_export.py file to create far files (#88)
BuyuanCui Jul 6, 2023
db4dbbf
readd Swedish (#87)
jimregan Jul 17, 2023
8f6fdcf
Zh tn 0712 (#89)
BuyuanCui Aug 7, 2023
42a906f
Zh tn char (#95)
BuyuanCui Aug 8, 2023
b8dee69
audio-based TN fix for empty pred_text/text (#92)
ekmb Aug 15, 2023
f1527d4
pip 1.2.0
ekmb Aug 15, 2023
e877542
French tn (#91)
mgrafu Aug 25, 2023
7e62762
Add whitelist_tech.tsv (#96)
anand-nv Aug 29, 2023
84732db
Zhitn 0727 (#93)
BuyuanCui Sep 4, 2023
59c77c1
Es tn romans fix (#98)
mgrafu Sep 6, 2023
8636fee
Change docker image (#102)
anand-nv Sep 7, 2023
43d8c5b
Print warning instead exception (#97)
karpnv Sep 27, 2023
0534c62
warning regardless of verbose flag (#107)
karpnv Oct 3, 2023
f31155c
Unpin setuptools (#106)
pplantinga Oct 4, 2023
66ff8f4
fixed warnings: File is not always closes. (#113)
XuesongYang Oct 10, 2023
73683d3
fix bug #111 (ar currencies) (#117)
mgrafu Oct 23, 2023
a829f0e
Logging clean up + IT TN fix (#118)
ekmb Oct 24, 2023
16cc2b7
Time_IT_TN (#105)
GiacomoLeoneMaria Oct 25, 2023
79823ae
IT TN improvement on tests (#120)
mgrafu Oct 26, 2023
cac6228
add single letter exception for roman numerals (#121)
mgrafu Oct 27, 2023
53cadca
fix broken path for nondet whitelist (#124)
mgrafu Nov 3, 2023
9ea397b
Increase weights for serial (en TN) (#128)
anand-nv Nov 21, 2023
6809c16
add measures file for FR TN (#131)
mgrafu Dec 8, 2023
39a57ba
Sh jenkins (#127)
anand-nv Jan 19, 2024
a8fccca
update isort - fix precommit (#138)
ekmb Feb 14, 2024
1afdf69
Armenian itn (#136)
davidks13 Feb 15, 2024
af645c8
Fix CI (#142)
ekmb Feb 29, 2024
8850ece
Armenian TN (#137)
davidks13 Mar 13, 2024
8594846
Marathi ITN (#134)
ChinmayPatil11 Mar 13, 2024
1f921e3
jenkins fix (#150)
tbartley94 Mar 13, 2024
c414db4
r0.3.0 release (#151)
ekmb Mar 13, 2024
7803763
Fix text=line[text] to text=line[text_field] (#153)
ssh-meister Mar 19, 2024
b137a2e
use real string on docstring (#157)
kevsan4 Mar 30, 2024
b2b81a3
Sh postprocess (#147)
anand-nv Apr 16, 2024
9e9227a
update run_evaluate script for cased itn (#164)
mgrafu Apr 25, 2024
1113d96
remove unused function from ar tn decimals (#165)
mgrafu Apr 25, 2024
aa002e4
ZH sentence-level TN (#112)
BuyuanCui Apr 30, 2024
a4b353c
preparing release, updating change log (#168)
tbartley94 May 3, 2024
4ac39e5
hotfix (#169)
ekmb May 3, 2024
6731b3a
hotfix (#170)
tbartley94 May 3, 2024
c1c792c
DE TN Fixes (#177)
zoobereq Jun 6, 2024
134456e
Tts en tech terms (#167)
mgrafu Jun 7, 2024
bb276da
Normalizes the '%' sign (#180)
zoobereq Jun 7, 2024
8cbb6a8
FR TN Fixes (#181)
zoobereq Jun 7, 2024
b62b8d8
EN TN fixes for Issue #166 (#185)
zoobereq Jul 17, 2024
40586aa
IT TN Fixes for #166 (#183)
zoobereq Jul 17, 2024
1394377
HU TN Fixes issue #166 (#184)
zoobereq Jul 18, 2024
8417eb0
Jp itn 20240221 (#141)
BuyuanCui Jul 19, 2024
254e5b6
update en tn folder to see if CI tests run - DO NOT MERGE (#199)
anand-nv Jul 24, 2024
8edf324
Reverts EN TN fixes for Issue #166 (#202)
zoobereq Aug 13, 2024
60addc6
es and es_en changes for unified models (#143)
mgrafu Aug 14, 2024
21cbeb0
ES TN Fixes for Issue #166 (#206)
zoobereq Aug 15, 2024
59b8845
Zh tn bug 240712 (#187)
BuyuanCui Aug 16, 2024
f02d3f9
EN TN Fixes for Issue 166 (#207)
zoobereq Aug 19, 2024
6c20fc3
Fix for nv-bug 4786175 (#213)
zoobereq Aug 21, 2024
5faeccc
Release commit r1.1.0 (#217)
tbartley94 Aug 21, 2024
261aab7
EN TN Fixes for nv-bug 4786225 (#218)
zoobereq Aug 22, 2024
b4d3edd
Applies fixes for nv-bug 4786263 (#220)
zoobereq Aug 22, 2024
4643185
Fix invalid escape sequences (#219)
TheKevJames Aug 23, 2024
218b529
IT TN Fixes for Issue #166 (#221)
zoobereq Aug 26, 2024
d5b188f
ES TN Fix for Issue #166 (#224)
zoobereq Sep 3, 2024
46f7611
Expands per/unit mappings and updates the cache (#227)
zoobereq Sep 11, 2024
f6f03d4
Cardinals up to a hundred trillions, timeFST and transliteration (#209)
kurt0cougar Sep 17, 2024
a6b03f0
Marathi ITN (#134)
ChinmayPatil11 Mar 13, 2024
9fa751c
ZH sentence-level TN (#112)
BuyuanCui Sep 25, 2024
c7106c5
ZH sentence-level TN (#112)
BuyuanCui Apr 30, 2024
edd3b13
Tts en tech terms (#167)
mgrafu Jun 7, 2024
d5c8295
Normalizes the '%' sign (#180)
zoobereq Jun 7, 2024
94019d4
FR TN Fixes (#181)
zoobereq Jun 7, 2024
938d4a5
EN TN fixes for Issue #166 (#185)
zoobereq Jul 17, 2024
ad5807c
IT TN Fixes for #166 (#183)
zoobereq Jul 17, 2024
11e438a
HU TN Fixes issue #166 (#184)
zoobereq Jul 18, 2024
3411f7f
Jp itn 20240221 (#141)
BuyuanCui Jul 19, 2024
963d7a5
update en tn folder to see if CI tests run - DO NOT MERGE (#199)
anand-nv Jul 24, 2024
5b084a4
Reverts EN TN fixes for Issue #166 (#202)
zoobereq Aug 13, 2024
9a8acb8
es and es_en changes for unified models (#143)
mgrafu Aug 14, 2024
9fe1b44
ES TN Fixes for Issue #166 (#206)
zoobereq Aug 15, 2024
6345d33
Zh tn bug 240712 (#187)
BuyuanCui Aug 16, 2024
b9fe44f
EN TN Fixes for Issue 166 (#207)
zoobereq Aug 19, 2024
7638018
for fraction upgrade on accuracy
BuyuanCui Aug 9, 2024
7aedf6d
for fraction upgrade on accuracy
BuyuanCui Aug 9, 2024
88fd572
update to go back to 1.0 release version
BuyuanCui Aug 15, 2024
8589934
weight adjustment
BuyuanCui Aug 15, 2024
62108a2
update to go back to 1.0 version for beter accuracy
BuyuanCui Aug 15, 2024
ff899e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
d01dbf7
changing the data format, to align to the blind test data
BuyuanCui Aug 15, 2024
f04cc62
removing grammars difficult for SH tests
BuyuanCui Aug 16, 2024
3310645
adding one more test item
BuyuanCui Aug 16, 2024
c1cecfe
temporal fixings attempt to fixn SH test errors, will fix back
BuyuanCui Aug 16, 2024
e403edc
temporal changes will change back
BuyuanCui Aug 16, 2024
c675c3d
update jp tn date
BuyuanCui Aug 19, 2024
9868b18
resolving conflict
BuyuanCui Aug 20, 2024
375ea03
adding grammars back in the tokenizer
BuyuanCui Aug 19, 2024
ebe5394
fixing ci test cases
BuyuanCui Aug 20, 2024
9c64aa0
updats on Jenkins
BuyuanCui Aug 20, 2024
1ce907d
with pynini closure had errors chaing back to no closure version
BuyuanCui Aug 20, 2024
c70ab57
jenkinspdate
BuyuanCui Aug 20, 2024
931246e
changing the data format, to align to the blind test data
BuyuanCui Aug 15, 2024
7f25f7c
adding one more test item
BuyuanCui Aug 16, 2024
31ccb41
temporal fixings attempt to fixn SH test errors, will fix back
BuyuanCui Aug 16, 2024
63db65f
adding grammars back in the tokenizer
BuyuanCui Aug 19, 2024
04af911
fixing ci test cases
BuyuanCui Aug 20, 2024
622f20f
with pynini closure had errors chaing back to no closure version
BuyuanCui Aug 20, 2024
3f01e9e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 20, 2024
9f40d34
resolving fraction space issue
BuyuanCui Aug 24, 2024
febc49f
resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE
BuyuanCui Aug 24, 2024
c075f3d
resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and …
BuyuanCui Aug 24, 2024
5cd8400
fixed typo on decimaltext
BuyuanCui Aug 24, 2024
c309054
removing unsed improts
BuyuanCui Aug 24, 2024
c53ce21
removing unused import
BuyuanCui Aug 24, 2024
88d9fd2
changed regular space to narrow space
BuyuanCui Aug 24, 2024
c8fccad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 24, 2024
1c69ee0
imports error fixing
BuyuanCui Aug 24, 2024
c774e5d
imports errors
BuyuanCui Aug 24, 2024
ea591f1
Jekins update for jp itn
BuyuanCui Aug 24, 2024
b978a6c
update for fraction space issue
BuyuanCui Sep 4, 2024
9a11e63
update for fraction space issue
BuyuanCui Sep 4, 2024
c49ad89
update for fraction space issue
BuyuanCui Sep 4, 2024
622d536
reverting
BuyuanCui Sep 4, 2024
6ead948
update for fraction space issuel chaing narrow space to regular norma…
BuyuanCui Sep 4, 2024
6b172cd
fixing style
BuyuanCui Sep 4, 2024
c1f0dca
fixng style
BuyuanCui Sep 4, 2024
16973f0
style fix
BuyuanCui Sep 4, 2024
5241247
style fix
BuyuanCui Sep 4, 2024
1e1553d
style fix
BuyuanCui Sep 4, 2024
a69b3bb
removing unsed imports
BuyuanCui Sep 4, 2024
e4dc8bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 4, 2024
734759e
jp tn date update
BuyuanCui Sep 4, 2024
542c8b2
removing previously created nemo imports
BuyuanCui Sep 5, 2024
f1a3783
space issue
BuyuanCui Sep 5, 2024
4d418f3
test order arrangement
BuyuanCui Sep 5, 2024
b1f72b9
style fix
BuyuanCui Sep 5, 2024
68f7319
Update test_cases_fraction.txt
BuyuanCui Sep 4, 2024
3b482d8
space issue
BuyuanCui Sep 5, 2024
7c18a52
update jp tn
BuyuanCui Sep 5, 2024
ef99df2
removing unsed import
BuyuanCui Sep 5, 2024
03b073c
empty file
BuyuanCui Sep 24, 2024
65e7fcb
to delete
BuyuanCui Sep 24, 2024
9593df0
removing
BuyuanCui Sep 24, 2024
30e462f
resolving conflict, files copied from main if not ja
BuyuanCui Sep 26, 2024
6afcc2a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 26, 2024
e938b15
deleting merge conflict infos
BuyuanCui Sep 26, 2024
fcbe850
Merge branch 'jp_itn_update_240805' of https://github.com/NVIDIA/NeMo…
BuyuanCui Sep 26, 2024
fc98604
changing jp tn cache date to today just incase
BuyuanCui Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ pipeline {
IT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/08-22-24-0'
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-15-24-0'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/09-27-24-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,14 @@

NEMO_CHAR = utf8.VALID_UTF8_CHAR

NEMO_NARROW_NON_BREAK_SPACE = "\u202F"
NEMO_DIGIT = byte.DIGIT
NEMO_LOWER = pynini.union(*string.ascii_lowercase).optimize()
NEMO_UPPER = pynini.union(*string.ascii_uppercase).optimize()
NEMO_ALPHA = pynini.union(NEMO_LOWER, NEMO_UPPER).optimize()
NEMO_ALNUM = pynini.union(NEMO_DIGIT, NEMO_ALPHA).optimize()
NEMO_HEX = pynini.union(*string.hexdigits).optimize()
NEMO_NON_BREAKING_SPACE = u"\u00A0"
NEMO_NON_BREAKING_SPACE = "\u00A0"
NEMO_SPACE = " "
NEMO_WHITE_SPACE = pynini.union(" ", "\t", "\n", "\r", u"\u00A0").optimize()
NEMO_NOT_SPACE = pynini.difference(NEMO_CHAR, NEMO_WHITE_SPACE).optimize()
Expand All @@ -45,6 +46,7 @@
NEMO_GRAPH = pynini.union(NEMO_ALNUM, NEMO_PUNCT).optimize()

NEMO_SIGMA = pynini.closure(NEMO_CHAR)

NEMO_NOT_ALPHA = pynini.difference(NEMO_SIGMA, NEMO_ALPHA).optimize()
NEMO_LOWER_NOT_A = pynini.union(
"b",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
decimal = decimal.just_decimal

fraction_word = pynutil.delete("分の") | pynutil.delete(" 分 の ") | pynutil.delete("分 の ") | pynutil.delete("分 の")
integer_word = pynini.accep("と") | pynini.accep("荷")
optional_sign = (

integer_word = pynutil.delete("と") | pynutil.delete("荷")
root_word = pynini.accep("√") | pynini.cross("ルート", "√")

graph_sign = (
pynutil.insert("negative: \"") + (pynini.accep("-") | pynini.cross("マイナス", "-")) + pynutil.insert("\"")
)

root_word = pynini.accep("√") | pynini.cross("ルート", "√")
root_integer = (
graph_integer = (
pynutil.insert("integer_part: \"")
+ (
(decimal | decimal + integer_word)
Expand All @@ -54,15 +56,16 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
+ pynutil.insert("\"")
)

root_denominator = (
graph_denominator = (
pynutil.insert("denominator: \"")
+ (
((decimal) | (cardinal + root_word + cardinal) | (root_word + cardinal) | cardinal)
+ pynini.closure(pynutil.delete(' '), 0, 1)
)
+ pynutil.insert("\"")
)
root_numerator = (

graph_numerator = (
pynutil.insert("numerator: \"")
+ (
pynini.closure(pynutil.delete(' '))
Expand All @@ -71,26 +74,30 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
+ pynutil.insert("\"")
)

graph_root_fraction = (
pynini.closure((optional_sign + pynutil.insert(" ")), 0, 1)
+ root_denominator
graph_fraction_sign = (
graph_sign
+ pynutil.insert(" ")
+ graph_denominator
+ pynutil.insert(" ")
+ fraction_word
+ root_numerator
+ graph_numerator
)

graph_root_with_integer = (
pynini.closure((optional_sign + pynutil.insert(" ")), 0, 1)
+ root_integer
# + inetegr_word
graph_fraction_no_sign = graph_denominator + pynutil.insert(" ") + fraction_word + graph_numerator

graph_regular_fractions = graph_fraction_sign | graph_fraction_no_sign

graph_integer_fraction_sign = (
pynini.closure((graph_sign + pynutil.insert(" ")), 0, 1)
+ pynutil.add_weight(graph_integer, 1.1)
+ pynutil.insert(" ")
+ root_denominator
+ graph_denominator
+ pynutil.insert(" ")
+ fraction_word
+ root_numerator
+ graph_numerator
)

final_graph = graph_root_fraction | graph_root_with_integer
final_graph = graph_regular_fractions | graph_integer_fraction_sign

final_graph = self.add_tokens(final_graph)
self.fst = final_graph.optimize()

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,11 @@ def __init__(

classify = (
pynutil.add_weight(whitelist_graph, 1.01)
| pynutil.add_weight(cardinal_graph, 1.0) # was -1.1
| pynutil.add_weight(cardinal_graph, 1.0)
| pynutil.add_weight(ordinal_graph, 1.1)
| pynutil.add_weight(date_graph, 1.1)
| pynutil.add_weight(decimal_graph, 1.1)
| pynutil.add_weight(fraction_graph, 1.1)
| pynutil.add_weight(fraction_graph, 1.0)
| pynutil.add_weight(time_graph, 1.0)
| pynutil.add_weight(word_graph, 100)
| pynutil.add_weight(punct_graph, 1.1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.


from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.ja.graph_utils import NEMO_NOT_QUOTE, GraphFst
from nemo_text_processing.inverse_text_normalization.ja.graph_utils import NEMO_NOT_SPACE, GraphFst


class WordFst(GraphFst):
Expand All @@ -26,5 +25,5 @@ class WordFst(GraphFst):

def __init__(self):
super().__init__(name="word", kind="classify")
word = pynutil.insert("name: \"") + NEMO_NOT_QUOTE + pynutil.insert("\"")
word = pynutil.insert("name: \"") + NEMO_NOT_SPACE + pynutil.insert("\"")
self.fst = word.optimize()
8 changes: 8 additions & 0 deletions nemo_text_processing/inverse_text_normalization/ja/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,15 @@ def get_abs_path(rel_path):

Args:
rel_path: relative path to this file
<<<<<<< HEAD
<<<<<<< HEAD

=======

>>>>>>> 0a4a21c (Jp itn 20240221 (#141))
=======

>>>>>>> 59f46198ab4c8880c6a5fb88f3cbee9530156498
Returns absolute path
"""
return os.path.dirname(os.path.abspath(__file__)) + '/' + rel_path
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.ja.graph_utils import NEMO_NOT_QUOTE, GraphFst
from nemo_text_processing.inverse_text_normalization.ja.graph_utils import (
NEMO_NON_BREAKING_SPACE,
NEMO_NOT_QUOTE,
GraphFst,
)


class FractionFst(GraphFst):
Expand All @@ -33,34 +37,29 @@ def __init__(self):
"""
super().__init__(name="fraction", kind="verbalize")

sign_component = pynutil.delete("negative: \"") + pynini.closure("-") + pynutil.delete("\"")
sign_component = pynutil.delete("negative: \"") + pynini.closure("-", 1) + pynutil.delete("\"")

# integer_component = (
# pynutil.delete("integer_part: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")
# ) | (
# sign_component
# + pynutil.delete(" ")
# + pynutil.delete("integer_part: \"")
# + pynini.closure(NEMO_NOT_QUOTE)
# + pynutil.delete("\"")
# )
integer_component = (
pynutil.delete("integer_part: \"") + pynini.closure(NEMO_NOT_QUOTE, 1) + pynutil.delete("\"")
)

integer_component = pynutil.delete("integer_part: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")
denominator_component = (
pynutil.delete("denominator: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")
pynutil.delete("denominator: \"") + pynini.closure(NEMO_NOT_QUOTE, 1) + pynutil.delete("\"")
)
numerator_component = pynutil.delete("numerator: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")

final_graph = (
pynini.closure(sign_component, 0, 1)
+ pynutil.delete(" ")
+ pynini.closure(integer_component + pynutil.delete(" "))
# + pynini.closure(sign_component + pynutil.delete(" "))
numerator_component = (
pynutil.delete("numerator: \"") + pynini.closure(NEMO_NOT_QUOTE, 1) + pynutil.delete("\"")
)

regular_graph = (
pynini.closure((sign_component + pynutil.delete(" ")), 0, 1)
+ pynini.closure(integer_component + pynutil.delete(" ") + pynutil.insert(NEMO_NON_BREAKING_SPACE))
+ numerator_component
+ pynutil.delete(" ")
+ pynutil.insert("/")
+ denominator_component
)

final_graph = self.delete_tokens(final_graph)
final_graph = self.delete_tokens(regular_graph)

self.fst = final_graph.optimize()

This file was deleted.

Loading
Loading