Skip to content

Commit

Permalink
Zh tn 0712 (#89)
Browse files Browse the repository at this point in the history
* updates

Signed-off-by: BuyuanCui <[email protected]>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <[email protected]>

* Decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* fraction updated

Signed-off-by: BuyuanCui <[email protected]>

* money updated

Signed-off-by: BuyuanCui <[email protected]>

* ordinal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* punctuation grammar added

Signed-off-by: BuyuanCui <[email protected]>

* time gramamr updated

Signed-off-by: BuyuanCui <[email protected]>

* tokenizaer updated

Signed-off-by: BuyuanCui <[email protected]>

* updates on certificate

Signed-off-by: BuyuanCui <[email protected]>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <[email protected]>

* cardinal updated

Signed-off-by: BuyuanCui <[email protected]>

* date grammar changed

Signed-off-by: BuyuanCui <[email protected]>

* decimal grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar updated

Signed-off-by: BuyuanCui <[email protected]>

* grammar added

Signed-off-by: BuyuanCui <[email protected]>

* grammar updates

Signed-off-by: BuyuanCui <[email protected]>

* test data added

Signed-off-by: BuyuanCui <[email protected]>

* test python file edits

Signed-off-by: BuyuanCui <[email protected]>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <[email protected]>

* test cases updated

Signed-off-by: BuyuanCui <[email protected]>

* coding style fixed

Signed-off-by: BuyuanCui <[email protected]>

* dates updated for init files

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <[email protected]>

* removed unsed imports

Signed-off-by: BuyuanCui <[email protected]>

* removed comments

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <[email protected]>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <[email protected]>

* updated for tests reruns

Signed-off-by: BuyuanCui <[email protected]>

* updats

Signed-off-by: BuyuanCui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <[email protected]>

---------

Signed-off-by: BuyuanCui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <[email protected]>
  • Loading branch information
BuyuanCui and pre-commit-ci[bot] committed Sep 17, 2024
1 parent c91108f commit e4925eb
Show file tree
Hide file tree
Showing 65 changed files with 2,943 additions and 718 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pipeline {
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-29-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-12-23-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'

}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
40 changes: 40 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/day.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
1
2
3
4
5
6
7
8
9
01
02
03
04
05
06
07
08
09
10
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
20 二十
21 二十一
22 二十二
23 二十三
24 二十四
25 二十五
26 二十六
27 二十七
28 二十八
29 二十九
30 三十
31 三十一
21 changes: 21 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/months.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
1
2
3
4
5
6
7
8
9
10
11 十一
12 十二
01
02
03
04
05
06
07
08
09
16 changes: 16 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/suffix.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
ad 公元
AD 公元
a.d. 公元
A.D. 公元
ce 公元
CE 公元
c.e. 公元
C.E. 公元
bc 公元前
BC 公元前
b.c. 公元前
B.C. 公元前
bce 公元前
BCE 公元前
b.c.e. 公元前
B.C.E. 公元前
16 changes: 16 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/date/suffixes.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
ad 公元
AD 公元
a.d. 公元
A.D. 公元
ce 公元
CE 公元
c.e. 公元
C.E. 公元
bc 公元前
BC 公元前
b.c. 公元前
B.C. 公元前
bce 公元前
BCE 公元前
b.c.e. 公元前
B.C.E. 公元前

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,5 @@ mw 毫瓦
pg 皮克
ps 皮秒
s
ms 毫秒
g
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,9 @@ SCR 塞舌尔卢比
SGD 新加坡元
SBD 所罗门群岛元
SOS 索马里先令
KRW 韩元
ZAR 南非兰特
LKR 斯里兰卡卢比
SEK 瑞典克朗
CHF 瑞士法郎
SRD 苏里南元
SYP 叙利亚镑
TWD 新台币
Expand All @@ -115,3 +113,89 @@ JPY¥ 日元
HK$ 港元
A$ 澳元
CAD$ 加元
US$ 美元
欧元
£ 英镑
Fr 瑞士法郎
法郎
¥
Kr 韩元
NXN$ 墨西哥比索
NZD$ 新西兰元
SGD$ 新加坡元
HKD$ 港元
NOKKr 挪威克朗
韩元
TRY₺ 土耳其里拉
俄罗斯卢布
BRLR$ 巴西雷亚尔
DKKKr 丹麦克朗
TWDNT$ 新台币
฿ 泰铢
RM 马来西亚林吉特
Rp 印尼盾
捷克克朗
以色列新谢克尔
CLP$ 智利披索
菲律宾披索
د.إ 阿联酋迪拉姆
COL$ 哥伦比亚披索
L 罗马尼亚列伊
JPY¥ 日元
人民币
Lek 阿尔巴尼亚列克
ƒ 阿鲁巴盾
Br 白俄罗斯卢布
BZ$ 伯利兹元
$b 玻利维亚玻利维亚诺
KM 波斯尼亚和黑塞哥维那可兑换马克
P 博茨瓦纳普拉
лв 保加利亚列弗
R$ 巴西雷亚尔
柬埔寨瑞尔
¥ 人民币
哥斯达黎加科隆
kn 克罗地亚库纳
古巴比索
kr 丹麦克朗
RD$ 多米尼加共和国比索
¢ 加纳塞地
Q 危地马拉格查尔
L 洪都拉斯伦皮拉
Ft 匈牙利福林
印度卢比
英镑
以色列谢克尔
J$ 牙买加元
лв 哈萨克斯坦腾格
朝鲜园
лв 吉尔吉斯斯坦索姆
老挝基普
ден 马其顿代纳尔
毛里求斯卢比
蒙古图格里克
MT 莫桑比克梅蒂卡尔
C$ 尼加拉瓜科尔多瓦
尼日利亚奈拉
巴基斯坦卢比
B/. 巴拿马巴尔博亚
Gs 巴拉圭瓜拉尼
S/. 秘鲁索尔
菲律宾比索
波兰兹罗提
lei 罗马尼亚列伊
卢布
Дин. 塞尔维亚第纳尔
S 索马里先令
R 南非兰特
CHF 瑞士法郎
NT$ 新台币
TT$ 特立尼达和多巴哥元
土耳其里拉
乌克兰格里夫纳
$ 美元
$U 乌拉圭比索
лв 乌兹别克斯坦索姆
Bs 委内瑞拉玻利瓦尔
越南东
Z$ 津巴布韦元
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
美元
美金
欧元
英镑
加元
瑞士法郎
法郎
加拿大元
韩元
墨西哥比索
比索
新西兰元
新加坡元
港元
港币
人民币
挪威克朗
克朗
韩元
土耳其里拉
里拉
印度卢比
卢比
俄罗斯卢布
卢布
巴西雷亚尔
雷亚尔
南非兰特
兰特
丹麦克朗
波兰兹罗提
罗提
新台币
台币
泰铢
马来西亚林吉特
印尼盾
匈牙利福林
福林
捷克克朗
以色列新谢克尔
新谢克尔
智利比索
菲律宾披索
阿联酋迪拉姆
迪拉姆
哥伦比亚披索
马来西亚令吉
日元

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
1
2
3
4
5
6
7
8
9
9
8 changes: 8 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/time/AM.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
am
AM
a.m.
A.M.
am
AM
a.m.
A.M.
8 changes: 8 additions & 0 deletions nemo_text_processing/text_normalization/zh/data/time/PM.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pm
p.m.
PM
P.M.
pm
p.m.
PM
P.M.
Loading

0 comments on commit e4925eb

Please sign in to comment.