Releases · wenet-e2e/WeTextProcessing

20 Jun 08:33

xingchensong

1.0.2

053507e

1.0.2 Latest

Latest

What's Changed

[fix] tn chinese, add punc by @xingchensong in #242
[tn] chinese, append traditional_to_simple by @xingchensong in #243
[itn] fix 八百一千=>800 1000 二十一千=>20 1000, 零千零万 by @weimeng23 in #246

Full Changelog: 1.0.1...1.0.2

Contributors

weimeng23 and xingchensong

Assets 3

06 Jun 16:24

xingchensong

1.0.1

d9f47ca

1.0.1

What's Changed

[fix] fix tn, week range by @xingchensong in #238
[fix] fix tn, punct with space by @xingchensong in #239
[fix] fix tn, remove useless mapping in whitelist by @xingchensong in #240
[wheel] disable global logging config by @xingchensong in #241 (取消全局日志配置，避免覆盖其他程序的日志等级)

Full Changelog: 1.0.0...1.0.1

Contributors

xingchensong

Assets 3

05 Jun 10:55

xingchensong

1.0.0

0f386d8

1.0.0

Breaking Changes

support english tn, see #202 , Most of the english rules were copied from NeMo, but the difference is that we made a significant simplification of the rules, those changes result in
- FST size comparison: 76M (NeMo) vs. 7M (Ours)
- Building time comparison (when you want to develop new rules): 777s (NeMo) vs. 41s (Ours)


NeMo	WeText

support online building of fst, enjoy wetext without pain #230

pip install wetextprocessing

from itn.chinese.inverse_normalizer import InverseNormalizer
from tn.chinese.normalizer import Normalizer as ZhNormalizer
from tn.english.normalizer import Normalizer as EnNormalizer

zh_tn_text = "你好 WeTextProcessing 1.0，船新版本儿，船新体验儿，简直666，9和10"
zh_itn_text = "你好 WeTextProcessing 一点零，船新版本儿，船新体验儿，简直六六六，九和六"
en_tn_text = "Hello WeTextProcessing 1.0, life is short, just use wetext, 666, 9 and 10"
zh_tn_model = ZhNormalizer(remove_erhua=True, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=False, overwrite_cache=True)
en_tn_model = EnNormalizer(overwrite_cache=True)
print("中文 TN (去除儿化音，重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字不转换，重新在线构图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (暂时还没有可控的选项，后面会加...):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(overwrite_cache=False)
zh_itn_model = InverseNormalizer(overwrite_cache=False)
en_tn_model = EnNormalizer(overwrite_cache=False)
print("中文 TN (复用之前编译好的图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (复用之前编译好的图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (复用之前编译好的图):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(remove_erhua=False, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=True, overwrite_cache=True)
print("中文 TN (不去除儿化音，重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字也进行转换，重新在线构图):\n\t{} => {}\n".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))

Minor changes

[refactor] support building fst online by @xingchensong in #230
[fix] remove redundant mapping in whitelist by @xingchensong in #231
[tn] english tn, support range by @xingchensong in #233
[fix] fix itn 三四十万一万六七 by @xingchensong in #234
[fix] fix itn 洞>0,拐>7 by @xingchensong in #235
[fix] fix tn, remove useless mapping in english tn by @xingchensong in #236

Full Changelog: 0.2.1...1.0.0

Contributors

xingchensong

Assets 3

05 Jun 05:23

xingchensong

0.2.1

385e35f

0.2.1

What's Changed

[itn] fix idcard number ends with X by @weimeng23 in #193
fix #190 by @pengzhendong in #194
fix #155 by @pengzhendong in #196
[itn] 帮我导航到中关村一百零一号 by @xingchensong in #197
[itn] 车牌号5位6位，包含零 by @weimeng23 in #198
feat(tn): [cr_id_skip] Support english tn, cardinal and word by @xingchensong in #203
[tn] english tn, support ordinal by @xingchensong in #204
[tn] english tn, support date by @xingchensong in #205
[tn] english tn, support decimal by @xingchensong in #207
[tn] english tn, support fraction by @xingchensong in #209
[tn] english tn, support time by @xingchensong in #210
[tn] english tn, support measure by @xingchensong in #211
[tn] english, support money by @xingchensong in #212
[tn] english, support telephone by @xingchensong in #213
[tn] english, support electronic by @xingchensong in #214
[tn] tn english, support roman by @xingchensong in #215
[tn] english tn, support whitelist by @xingchensong in #216
[format] add copyright by @xingchensong in #217
[tn] set whitelist weight = 1.0 by @xingchensong in #218
[runtime] support english tn by @xingchensong in #219
[runtime] fix english tn by @xingchensong in #220
[tn] simplify tn by @xingchensong in #221
[runtime] fix english tn order by @xingchensong in #222
[fix] english tn by @xingchensong in #224
[tn] support punct by @xingchensong in #225
[fix] remove punc in decimal by @xingchensong in #226
[fix] remove punc in measure by @xingchensong in #227
[fix] english tn, whitelist exclude punct by @xingchensong in #228
[cicd] update wheels by @xingchensong in #229