Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zh tn 0712 #89

Merged
merged 39 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
cf47489
updates
BuyuanCui Jul 13, 2023
d211444
updates and fixings according to document on natonal gideline
BuyuanCui Jul 13, 2023
a731165
Decimal grammar added
BuyuanCui Jul 13, 2023
3a69d63
fraction updated
BuyuanCui Jul 13, 2023
b9415e6
money updated
BuyuanCui Jul 13, 2023
fd9f9e8
ordinal grammar added
BuyuanCui Jul 13, 2023
0cb2443
punctuation grammar added
BuyuanCui Jul 13, 2023
dc4c57e
time gramamr updated
BuyuanCui Jul 13, 2023
1e9a523
tokenizaer updated
BuyuanCui Jul 13, 2023
3a51b51
updates on certificate
BuyuanCui Jul 13, 2023
0fa4908
data updated and added due to updates and chanegs to the existing gr…
BuyuanCui Jul 13, 2023
1ad9e4a
cardinal updated
BuyuanCui Jul 13, 2023
3c8758f
date grammar changed
BuyuanCui Jul 13, 2023
f465325
decimal grammar added
BuyuanCui Jul 13, 2023
4746770
grammar updated
BuyuanCui Jul 13, 2023
3468b9f
grammar updated
BuyuanCui Jul 13, 2023
007ea44
grammar added
BuyuanCui Jul 13, 2023
e24d5e9
grammar updates
BuyuanCui Jul 13, 2023
c626499
test data added
BuyuanCui Jul 13, 2023
daeafa2
test python file edits
BuyuanCui Jul 13, 2023
fc2a14a
updates for tn1.0 and previous tn grammar from contribution
BuyuanCui Jul 15, 2023
bbcc635
test cases updated
BuyuanCui Jul 15, 2023
ebd5a48
coding style fixed
BuyuanCui Jul 15, 2023
a90c4ff
dates updated for init files
BuyuanCui Jul 17, 2023
0a0fd53
Merge branch 'NVIDIA:main' into ZH_TN_0712
BuyuanCui Jul 17, 2023
21b6f88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 17, 2023
bba15f8
updated the date for zh
BuyuanCui Jul 28, 2023
5adea95
Merge branch 'ZH_TN_0712' of github.com:BuyuanCui/NeMo-text-processin…
BuyuanCui Jul 28, 2023
00a3ea3
removed unsed imports
BuyuanCui Jul 31, 2023
4d7828d
removed comments
BuyuanCui Jul 31, 2023
7def290
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 31, 2023
5279ac8
added back the itn tests
BuyuanCui Aug 1, 2023
0732419
Merge branch 'ZH_TN_0712' of github.com:BuyuanCui/NeMo-text-processin…
BuyuanCui Aug 1, 2023
c5ffce0
added back measure and math from previou TN
BuyuanCui Aug 2, 2023
7db0a30
updated for tests reruns
BuyuanCui Aug 7, 2023
bcca98e
updats
BuyuanCui Aug 7, 2023
3a91d58
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 7, 2023
23fc6a0
updated weights
BuyuanCui Aug 7, 2023
5a8f0ad
Merge branch 'ZH_TN_0712' of github.com:BuyuanCui/NeMo-text-processin…
BuyuanCui Aug 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pipeline {
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-29-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-12-23-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'

}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
40 changes: 40 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/day.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
1 一
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
01 一
02 二
03 三
04 四
05 五
06 六
07 七
08 八
09 九
10 十
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
20 二十
21 二十一
22 二十二
23 二十三
24 二十四
25 二十五
26 二十六
27 二十七
28 二十八
29 二十九
30 三十
31 三十一
21 changes: 21 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/months.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
1 一
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
10 十
11 十一
12 十二
01 一
02 二
03 三
04 四
05 五
06 六
07 七
08 八
09 九
16 changes: 16 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/suffix.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
ad 公元
AD 公元
a.d. 公元
A.D. 公元
ce 公元
CE 公元
c.e. 公元
C.E. 公元
bc 公元前
BC 公元前
b.c. 公元前
B.C. 公元前
bce 公元前
BCE 公元前
b.c.e. 公元前
B.C.E. 公元前
16 changes: 16 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/suffixes.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
ad 公元
AD 公元
a.d. 公元
A.D. 公元
ce 公元
CE 公元
c.e. 公元
C.E. 公元
bc 公元前
BC 公元前
b.c. 公元前
B.C. 公元前
bce 公元前
BCE 公元前
b.c.e. 公元前
B.C.E. 公元前

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,5 @@ mw 毫瓦
pg 皮克
ps 皮秒
s 秒
ms 毫秒
g 克
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,9 @@ SCR 塞舌尔卢比
SGD 新加坡元
SBD 所罗门群岛元
SOS 索马里先令
KRW 韩元
ZAR 南非兰特
LKR 斯里兰卡卢比
SEK 瑞典克朗
CHF 瑞士法郎
SRD 苏里南元
SYP 叙利亚镑
TWD 新台币
Expand All @@ -115,3 +113,89 @@ JPY¥ 日元
HK$ 港元
A$ 澳元
CAD$ 加元
US$ 美元
€ 欧元
£ 英镑
Fr 瑞士法郎
₣ 法郎
¥ 圆
Kr 韩元
NXN$ 墨西哥比索
NZD$ 新西兰元
SGD$ 新加坡元
HKD$ 港元
NOKKr 挪威克朗
₩ 韩元
TRY₺ 土耳其里拉
₽ 俄罗斯卢布
BRLR$ 巴西雷亚尔
DKKKr 丹麦克朗
TWDNT$ 新台币
฿ 泰铢
RM 马来西亚林吉特
Rp 印尼盾
Kč 捷克克朗
₪ 以色列新谢克尔
CLP$ 智利披索
₱ 菲律宾披索
د.إ 阿联酋迪拉姆
COL$ 哥伦比亚披索
L 罗马尼亚列伊
JPY¥ 日元
¥ 人民币
Lek 阿尔巴尼亚列克
ƒ 阿鲁巴盾
Br 白俄罗斯卢布
BZ$ 伯利兹元
$b 玻利维亚玻利维亚诺
KM 波斯尼亚和黑塞哥维那可兑换马克
P 博茨瓦纳普拉
лв 保加利亚列弗
R$ 巴西雷亚尔
៛ 柬埔寨瑞尔
¥ 人民币
₡ 哥斯达黎加科隆
kn 克罗地亚库纳
₱ 古巴比索
kr 丹麦克朗
RD$ 多米尼加共和国比索
¢ 加纳塞地
Q 危地马拉格查尔
L 洪都拉斯伦皮拉
Ft 匈牙利福林
₹ 印度卢比
£ 英镑
₪ 以色列谢克尔
J$ 牙买加元
лв 哈萨克斯坦腾格
₩ 朝鲜园
лв 吉尔吉斯斯坦索姆
₭ 老挝基普
ден 马其顿代纳尔
₨ 毛里求斯卢比
₮ 蒙古图格里克
MT 莫桑比克梅蒂卡尔
C$ 尼加拉瓜科尔多瓦
₦ 尼日利亚奈拉
₨ 巴基斯坦卢比
B/. 巴拿马巴尔博亚
Gs 巴拉圭瓜拉尼
S/. 秘鲁索尔
₱ 菲律宾比索
zł 波兰兹罗提
lei 罗马尼亚列伊
₽ 卢布
Дин. 塞尔维亚第纳尔
S 索马里先令
R 南非兰特
CHF 瑞士法郎
NT$ 新台币
TT$ 特立尼达和多巴哥元
₺ 土耳其里拉
₴ 乌克兰格里夫纳
$ 美元
$U 乌拉圭比索
лв 乌兹别克斯坦索姆
Bs 委内瑞拉玻利瓦尔
₫ 越南东
Z$ 津巴布韦元
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
美元
美金
欧元
英镑
加元
瑞士法郎
法郎
加拿大元
韩元
墨西哥比索
比索
新西兰元
新加坡元
港元
港币
人民币
挪威克朗
克朗
韩元
土耳其里拉
里拉
印度卢比
卢比
俄罗斯卢布
卢布
巴西雷亚尔
雷亚尔
南非兰特
兰特
丹麦克朗
波兰兹罗提
罗提
新台币
台币
泰铢
马来西亚林吉特
印尼盾
匈牙利福林
福林
捷克克朗
以色列新谢克尔
新谢克尔
智利比索
菲律宾披索
阿联酋迪拉姆
迪拉姆
哥伦比亚披索
马来西亚令吉
日元

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
1
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
9 九
8 changes: 8 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/time/AM.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
am
AM
a.m.
A.M.
am
AM
a.m.
A.M.
8 changes: 8 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/time/PM.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pm
p.m.
PM
P.M.
pm
p.m.
PM
P.M.
Loading