-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* updates Signed-off-by: BuyuanCui <[email protected]> * updates and fixings according to document on natonal gideline Signed-off-by: BuyuanCui <[email protected]> * Decimal grammar added Signed-off-by: BuyuanCui <[email protected]> * fraction updated Signed-off-by: BuyuanCui <[email protected]> * money updated Signed-off-by: BuyuanCui <[email protected]> * ordinal grammar added Signed-off-by: BuyuanCui <[email protected]> * punctuation grammar added Signed-off-by: BuyuanCui <[email protected]> * time gramamr updated Signed-off-by: BuyuanCui <[email protected]> * tokenizaer updated Signed-off-by: BuyuanCui <[email protected]> * updates on certificate Signed-off-by: BuyuanCui <[email protected]> * data updated and added due to updates and chanegs to the existing grammar Signed-off-by: BuyuanCui <[email protected]> * cardinal updated Signed-off-by: BuyuanCui <[email protected]> * date grammar changed Signed-off-by: BuyuanCui <[email protected]> * decimal grammar added Signed-off-by: BuyuanCui <[email protected]> * grammar updated Signed-off-by: BuyuanCui <[email protected]> * grammar updated Signed-off-by: BuyuanCui <[email protected]> * grammar added Signed-off-by: BuyuanCui <[email protected]> * grammar updates Signed-off-by: BuyuanCui <[email protected]> * test data added Signed-off-by: BuyuanCui <[email protected]> * test python file edits Signed-off-by: BuyuanCui <[email protected]> * updates for tn1.0 and previous tn grammar from contribution Signed-off-by: BuyuanCui <[email protected]> * test cases updated Signed-off-by: BuyuanCui <[email protected]> * coding style fixed Signed-off-by: BuyuanCui <[email protected]> * dates updated for init files Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated the date for zh Signed-off-by: BuyuanCui <[email protected]> * removed unsed imports Signed-off-by: BuyuanCui <[email protected]> * removed comments Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added back the itn tests Signed-off-by: BuyuanCui <[email protected]> * added back measure and math from previou TN Signed-off-by: BuyuanCui <[email protected]> * updated for tests reruns Signed-off-by: BuyuanCui <[email protected]> * updats Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated weights Signed-off-by: BuyuanCui <[email protected]> --------- Signed-off-by: BuyuanCui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
- Loading branch information
1 parent
68f482f
commit a9aa462
Showing
65 changed files
with
2,943 additions
and
718 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
nemo_text_processing/text_normalization/zh/data/date/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
nemo_text_processing/text_normalization/zh/data/date/day.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
1 一 | ||
2 二 | ||
3 三 | ||
4 四 | ||
5 五 | ||
6 六 | ||
7 七 | ||
8 八 | ||
9 九 | ||
01 一 | ||
02 二 | ||
03 三 | ||
04 四 | ||
05 五 | ||
06 六 | ||
07 七 | ||
08 八 | ||
09 九 | ||
10 十 | ||
11 十一 | ||
12 十二 | ||
13 十三 | ||
14 十四 | ||
15 十五 | ||
16 十六 | ||
17 十七 | ||
18 十八 | ||
19 十九 | ||
20 二十 | ||
21 二十一 | ||
22 二十二 | ||
23 二十三 | ||
24 二十四 | ||
25 二十五 | ||
26 二十六 | ||
27 二十七 | ||
28 二十八 | ||
29 二十九 | ||
30 三十 | ||
31 三十一 |
21 changes: 21 additions & 0 deletions
21
nemo_text_processing/text_normalization/zh/data/date/months.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
1 一 | ||
2 二 | ||
3 三 | ||
4 四 | ||
5 五 | ||
6 六 | ||
7 七 | ||
8 八 | ||
9 九 | ||
10 十 | ||
11 十一 | ||
12 十二 | ||
01 一 | ||
02 二 | ||
03 三 | ||
04 四 | ||
05 五 | ||
06 六 | ||
07 七 | ||
08 八 | ||
09 九 |
16 changes: 16 additions & 0 deletions
16
nemo_text_processing/text_normalization/zh/data/date/suffix.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
ad 公元 | ||
AD 公元 | ||
a.d. 公元 | ||
A.D. 公元 | ||
ce 公元 | ||
CE 公元 | ||
c.e. 公元 | ||
C.E. 公元 | ||
bc 公元前 | ||
BC 公元前 | ||
b.c. 公元前 | ||
B.C. 公元前 | ||
bce 公元前 | ||
BCE 公元前 | ||
b.c.e. 公元前 | ||
B.C.E. 公元前 |
16 changes: 16 additions & 0 deletions
16
nemo_text_processing/text_normalization/zh/data/date/suffixes.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
ad 公元 | ||
AD 公元 | ||
a.d. 公元 | ||
A.D. 公元 | ||
ce 公元 | ||
CE 公元 | ||
c.e. 公元 | ||
C.E. 公元 | ||
bc 公元前 | ||
BC 公元前 | ||
b.c. 公元前 | ||
B.C. 公元前 | ||
bce 公元前 | ||
BCE 公元前 | ||
b.c.e. 公元前 | ||
B.C.E. 公元前 |
6 changes: 0 additions & 6 deletions
6
nemo_text_processing/text_normalization/zh/data/date/year_suffix.tsv
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,3 +88,5 @@ mw 毫瓦 | |
pg 皮克 | ||
ps 皮秒 | ||
s 秒 | ||
ms 毫秒 | ||
g 克 |
2 changes: 1 addition & 1 deletion
2
nemo_text_processing/text_normalization/zh/data/money/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
52 changes: 52 additions & 0 deletions
52
nemo_text_processing/text_normalization/zh/data/money/currency_mandarin.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
美元 | ||
美金 | ||
欧元 | ||
英镑 | ||
加元 | ||
瑞士法郎 | ||
法郎 | ||
加拿大元 | ||
元 | ||
圆 | ||
韩元 | ||
墨西哥比索 | ||
比索 | ||
新西兰元 | ||
新加坡元 | ||
港元 | ||
港币 | ||
人民币 | ||
挪威克朗 | ||
克朗 | ||
韩元 | ||
土耳其里拉 | ||
里拉 | ||
印度卢比 | ||
卢比 | ||
俄罗斯卢布 | ||
卢布 | ||
巴西雷亚尔 | ||
雷亚尔 | ||
南非兰特 | ||
兰特 | ||
丹麦克朗 | ||
波兰兹罗提 | ||
罗提 | ||
新台币 | ||
台币 | ||
泰铢 | ||
马来西亚林吉特 | ||
印尼盾 | ||
盾 | ||
匈牙利福林 | ||
福林 | ||
捷克克朗 | ||
以色列新谢克尔 | ||
新谢克尔 | ||
智利比索 | ||
菲律宾披索 | ||
阿联酋迪拉姆 | ||
迪拉姆 | ||
哥伦比亚披索 | ||
马来西亚令吉 | ||
日元 |
63 changes: 0 additions & 63 deletions
63
nemo_text_processing/text_normalization/zh/data/money/currency_symbol.tsv
This file was deleted.
Oops, something went wrong.
2 changes: 1 addition & 1 deletion
2
nemo_text_processing/text_normalization/zh/data/number/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 1 addition & 2 deletions
3
...rmalization/zh/data/number/digit_teen.tsv → ...rmalization/zh/data/number/digit_tens.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,8 @@ | ||
1 | ||
2 二 | ||
3 三 | ||
4 四 | ||
5 五 | ||
6 六 | ||
7 七 | ||
8 八 | ||
9 九 | ||
9 九 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
am | ||
AM | ||
a.m. | ||
A.M. | ||
am | ||
AM | ||
a.m. | ||
A.M. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
pm | ||
p.m. | ||
PM | ||
P.M. | ||
pm | ||
p.m. | ||
PM | ||
P.M. |
Oops, something went wrong.