Skip to content

Commit

Permalink
Merge pull request #5 from longbridgeapp/feat/make-update-data
Browse files Browse the repository at this point in the history
Update OpenCC official data
  • Loading branch information
huacnlee authored Nov 8, 2022
2 parents 7453947 + 90dba01 commit cf82c71
Show file tree
Hide file tree
Showing 23 changed files with 995 additions and 247 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@

# Project-local glide cache, RE: https://github.com/Masterminds/glide/issues/736
.glide/
.DS_Store
.DS_Store
tmp/
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
update\:data:
# Fetch to update data from https://github.com/BYVoid/OpenCC
mkdir -p ./tmp && rm -Rf tmp/OpenCC-master
wget https://github.com/BYVoid/OpenCC/archive/refs/heads/master.zip -O tmp/opencc.zip
unzip tmp/opencc.zip -d tmp/
sh ./merge-data.sh
test:
sh ./merge-data.sh
go test ./...
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@ func main() {
- `tw2t.json` Traditional Chinese (Taiwan standard) to Traditional Chinese 臺灣正體到繁體(OpenCC 標準)
- `s2hk-finance.json` 针对香港市场金融数据,做了特殊补充。

## Development Guides

- dictionary - 用来同步 OpenCC 官方的字典,请勿改动,这个文件夹应该是靠命令来生成的。
- addition-dictionary - 用来存放此项目提前修复的字典,执行 `make update:data` 的时候,会把这里的内容补充到 dictionary 里面。

采用 `make update:data` 命令可以从 OpenCC 官方仓库更新词典。

```bash
$ make update:data
```

## Benchmarks

See [benchmark_test.go](https://github.com/longbridgeapp/opencc/tree/master/tests/benchmark_test.go)
Expand Down
1 change: 1 addition & 0 deletions addition-dictionary/STCharacters.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
厘 厘
7 changes: 7 additions & 0 deletions addition-dictionary/STPhrases.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
高峰 高峰
什么 什麼
讲下 講吓
回流 回流
迴流 回流
公厘 公厘
厘米 厘米
53 changes: 22 additions & 31 deletions config/hk2s.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,27 @@
"file": "TSPhrases.ocd2"
}
},
"conversion_chain": [
{
"dict": {
"type": "group",
"dicts": [
{
"type": "ocd2",
"file": "HKVariantsRevPhrases.ocd2"
},
{
"type": "ocd2",
"file": "HKVariantsRev.ocd2"
}
]
}
},
{
"dict": {
"type": "group",
"dicts": [
{
"type": "ocd2",
"file": "TSPhrases.ocd2"
},
{
"type": "ocd2",
"file": "TSCharacters.ocd2"
}
]
}
"conversion_chain": [{
"dict": {
"type": "group",
"dicts": [{
"type": "ocd2",
"file": "HKVariantsRevPhrases.ocd2"
}, {
"type": "ocd2",
"file": "HKVariantsRev.ocd2"
}]
}
}, {
"dict": {
"type": "group",
"dicts": [{
"type": "ocd2",
"file": "TSPhrases.ocd2"
}, {
"type": "ocd2",
"file": "TSCharacters.ocd2"
}]
}
]
}]
}
36 changes: 15 additions & 21 deletions config/s2hk.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,21 @@
"file": "STPhrases.ocd2"
}
},
"conversion_chain": [
{
"dict": {
"type": "group",
"dicts": [
{
"type": "ocd2",
"file": "STPhrases.ocd2"
},
{
"type": "ocd2",
"file": "STCharacters.ocd2"
}
]
}
},
{
"dict": {
"conversion_chain": [{
"dict": {
"type": "group",
"dicts": [{
"type": "ocd2",
"file": "STPhrases.ocd2"
}, {
"type": "ocd2",
"file": "HKVariants.ocd2"
}
"file": "STCharacters.ocd2"
}]
}
}, {
"dict": {
"type": "ocd2",
"file": "HKVariants.ocd2"
}
]
}]
}
45 changes: 19 additions & 26 deletions config/s2twp.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,26 @@
"file": "STPhrases.ocd2"
}
},
"conversion_chain": [
{
"dict": {
"type": "group",
"dicts": [
{
"type": "ocd2",
"file": "STPhrases.ocd2"
},
{
"type": "ocd2",
"file": "STCharacters.ocd2"
}
]
}
},
{
"dict": {
"conversion_chain": [{
"dict": {
"type": "group",
"dicts": [{
"type": "ocd2",
"file": "TWPhrases.ocd2"
}
},
{
"dict": {
"file": "STPhrases.ocd2"
}, {
"type": "ocd2",
"file": "TWVariants.ocd2"
}
"file": "STCharacters.ocd2"
}]
}
}, {
"dict": {
"type": "ocd2",
"file": "TWPhrases.ocd2"
}
}, {
"dict": {
"type": "ocd2",
"file": "TWVariants.ocd2"
}
]
}]
}
3 changes: 2 additions & 1 deletion dictionary/HKPhrasesFinance.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,5 @@
除權日 除淨日
摘牌 除牌
追加保證金通知 追收孖展
被迫倉 被挾倉
被迫倉 被挾倉
高峰 高峰
8 changes: 4 additions & 4 deletions dictionary/JPVariants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@
淨 浄
淺 浅
渴 渇
溌 潑
潑 溌
溪 渓
溫 温
溼 湿
Expand Down Expand Up @@ -216,7 +216,7 @@
穗 穂
穩 穏
穰 穣
竃 竈
竈 竃
竊 窃
粹 粋
糉 粽
Expand All @@ -229,7 +229,7 @@
縣 県
縱 縦
總 総
繋 繫
繫 繋
繡 繍
繩 縄
繪 絵
Expand Down Expand Up @@ -312,7 +312,7 @@
鄉 郷
酢 醋
醉 酔
醗 醱
醱 醗
醫 医
醬 醤
釀 醸
Expand Down
94 changes: 92 additions & 2 deletions dictionary/STCharacters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@
厍 厙
厐 龎
厕 廁
厘 厘
厢 廂
厣 厴
厦 廈
Expand Down Expand Up @@ -455,7 +455,7 @@
坚 堅
坛 壇 罈
坜 壢
坝 壩
坝 壩
坞 塢
坟 墳
坠 墜
Expand Down Expand Up @@ -952,6 +952,7 @@
汉 漢
汤 湯
汹 洶
沄 澐
沈 沈 瀋
沟 溝
没 沒
Expand Down Expand Up @@ -3863,29 +3864,118 @@
𫢸 僤
𫧃 𣍐
𫧮 𪋿
𫫇 噁
𫬐 㘔
𫭟 塸
𫭢 埨
𫭼 𡑍
𫮃 墠
𫰛 娙
𫵷 㠣
𫶇 嵽
𫷷 廞
𫸩 彄
𬀩 暐
𬀪 晛
𬂩 梜
𬃊 櫍
𬇕 澫
𬇙 浿
𬇹 漍
𬉼 熰
𬊈 燖
𬊤 燀
𬍛 瓅
𬍡 璗
𬍤 璕
𬒈 礐
𬒗 𥗽
𬕂 篢
𬘓 紃
𬘘 紞
𬘡 絪
𬘩 綎
𬘫 綄
𬘬 綪
𬘭 綝
𬘯 綧
𬙂 縯
𬙊 纆
𬙋 纕
𬜬 蔄
𬜯 䓣
𬞟 蘋
𬟁 虉
𬟽 蝀
𬣙 訏
𬣞 詝
𬣡 諓
𬣳 詪
𬤇 諲
𬤊 諟
𬤝 譓
𬨂 軝
𬨎 輶
𬩽 鄩
𬪩 醲
𬬩 釴
𬬭 錀
𬬮 鋹
𬬱 釿
𬬸 鉥
𬬹 鉮
𬬻 鑪
𬬿 鉊
𬭁 鉧
𬭊 𨧀
𬭎 鋐
𬭚 錞
𬭛 𨨏
𬭤 鍭
𬭩 鎓
𬭬 鏏
𬭭 鏚
𬭯 䥕
𬭳 𨭎
𬭶 𨭆
𬭸 鏻
𬭼 鐩
𬮱 闉
𬮿 隑
𬯀 隮
𬯎 隤
𬱖 頔
𬱟 頠
𬳵 駓
𬳶 駉
𬳽 駪
𬳿 駼
𬴂 騑
𬴃 騞
𬴊 驎
𬶋 鮈
𬶍 鮀
𬶏 鮠
𬶐 鮡
𬶟 鯻
𬶠 鰊
𬶨 鱀
𬶭 鰶
𬶮 鱚
𬷕 鵏
𬸘 鶠
𬸚 鸑
𬸣 鶱
𬸦 鷟
𬸪 鷭
𬸯 鷿
𬹼 齘
𬺈 齮
𬺓 齼
𰬸 繐
𰰨 菕
𰶎 譅
𰾄 鋂
𰾭 鑀
𱊜 𪈼
厘 厘
Loading

0 comments on commit cf82c71

Please sign in to comment.