Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jp tn 20241017 #240

Merged
merged 36 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
61d0fdc
ja tn
BuyuanCui Oct 17, 2024
3b6dc4e
adding ja
BuyuanCui Oct 17, 2024
f9808ff
removing
BuyuanCui Oct 17, 2024
9a904d4
updated tests
BuyuanCui Oct 17, 2024
60e221a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2024
34d6086
addressing comment
BuyuanCui Oct 17, 2024
a22ea97
addressing ci
BuyuanCui Oct 17, 2024
903f41e
addressing ci
BuyuanCui Oct 17, 2024
c187d74
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 17, 2024
83828e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2024
31c7d68
addresing comment
BuyuanCui Oct 17, 2024
65f8fbc
removing
BuyuanCui Oct 17, 2024
af72af8
adresing comment
BuyuanCui Oct 17, 2024
99fd9cf
removing unused import
BuyuanCui Oct 17, 2024
b38212c
addressing comment
BuyuanCui Oct 17, 2024
f90b906
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 17, 2024
fdec2d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2024
7f9bade
addressing comment;
BuyuanCui Oct 18, 2024
1538988
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 18, 2024
a5d36a7
addressing comment
BuyuanCui Oct 18, 2024
883991d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
28e2d24
date for ja
BuyuanCui Oct 18, 2024
6ae363c
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 18, 2024
8104a8a
addresing comment
BuyuanCui Oct 18, 2024
ad74bdb
addressing comment
BuyuanCui Oct 18, 2024
f7c8357
jenkins
BuyuanCui Oct 18, 2024
7d1165c
addresing comment
BuyuanCui Oct 18, 2024
cfced66
addressing comment
BuyuanCui Oct 18, 2024
93d3e58
typo
BuyuanCui Oct 18, 2024
6693229
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
a99db7f
adressing comment
BuyuanCui Oct 18, 2024
f912973
addressing comment
BuyuanCui Oct 18, 2024
60f612d
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 18, 2024
01076a7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
f70785d
ci
BuyuanCui Oct 18, 2024
a27f06a
Merge branch 'jp_tn_20241017' of https://github.com/NVIDIA/NeMo-text-…
BuyuanCui Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ pipeline {
IT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/08-22-24-0'
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/09-27-24-0'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-17-24-1'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down
18 changes: 18 additions & 0 deletions nemo_text_processing/text_normalization/ja/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from nemo_text_processing.text_normalization.ja.taggers.tokenize_and_classify import ClassifyFst
from nemo_text_processing.text_normalization.ja.verbalizers.verbalize import VerbalizeFst
from nemo_text_processing.text_normalization.ja.verbalizers.verbalize_final import VerbalizeFinalFst
13 changes: 13 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
13 changes: 13 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/date/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
31 changes: 31 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/date/day.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
1 一
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
10 十
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
20 二十
21 二十一
22 二十二
23 二十三
24 二十四
25 二十五
26 二十六
27 二十七
28 二十八
29 二十九
30 三十
31 三十一
12 changes: 12 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/date/era.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
令和
平成
昭和
大正
明治
西暦
和暦
西洋暦
グレゴリオ暦
紀元前
紀元
紀元後
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
R. 令和
H. 平成
S. 昭和
T. 大正
M. 明治
12 changes: 12 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/date/month.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
1
2
3
4
5
6
7
8
9
10
11 十一
12 十二
15 changes: 15 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/date/week.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
月曜日
火曜日
水曜日
木曜日
金曜日
土曜日
日曜日
祝日
月曜日
火曜日
水曜日
木曜日
金曜日
土曜日
日曜日
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
1 一
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
10 changes: 10 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/numbers/teen.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
10
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
2 二十
3 三十
4 四十
5 五十
6 六十
7 七十
8 八十
9 九十
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0
23 changes: 23 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/symbol.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
& アンド
# ハッシュタグ
@ アット
§ セクション
トレードマーク
® 登録商標マーク
© 著作権
_ アンダースコア
% パーセント
* 星印
+ プラス
/ スラッシュ
= エコール
^ 曲折アクセント記号
| 縦棒
~ ティルダ
$ ドール
£ ポンド
ユーロ
ウォン
¥
°
º
13 changes: 13 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/time/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
23 changes: 23 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/time/division.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
今朝
今夜
今晩
午前
午後
夕方
夜中
夜半
早朝
明け方
深夜
毎朝
毎夜
毎晩
毎日
真夜中
翌日
未明
正午
真夜中の
24 changes: 24 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/time/hour.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
1
2
3
4
5
6
7
8
9
10
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
20 二十
21 二十一
22 二十二
23 二十三
24 二十四
60 changes: 60 additions & 0 deletions nemo_text_processing/text_normalization/ja/data/time/minute.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
1 一
2 二
3 三
4 四
5 五
6 六
7 七
8 八
9 九
10 十
11 十一
12 十二
13 十三
14 十四
15 十五
16 十六
17 十七
18 十八
19 十九
20 二十
21 二十一
22 二十二
23 二十三
24 二十四
25 二十五
26 二十六
27 二十七
28 二十八
29 二十九
30 三十
31 三十一
32 三十二
33 三十三
34 三十四
35 三十五
36 三十六
37 三十七
38 三十八
39 三十九
40 四十
41 四十一
42 四十二
43 四十三
44 四十四
45 四十五
46 四十六
47 四十七
48 四十八
49 四十九
50 五十
51 五十一
52 五十二
53 五十三
54 五十四
55 五十五
56 五十六
57 五十七
58 五十八
59 五十九
60 六十
Loading
Loading