Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinyin-utf8.dat 里面的 5 是什么意思 #19

Open
zhustec opened this issue Jun 22, 2015 · 4 comments
Open

pinyin-utf8.dat 里面的 5 是什么意思 #19

zhustec opened this issue Jun 22, 2015 · 4 comments

Comments

@zhustec
Copy link
Contributor

zhustec commented Jun 22, 2015

tone_index 1 2 3 4 分别是 4 个声,5 是什么?而且有 91 个有 5.

如果 tone_index 是 5 的话,下面代码就会出错

tone_index = pinyin[-1].to_i
pinyin = pinyin[0...-1]
%w(a o e i u v).each { |v|
  break if pinyin.tr! v, TONE_MARK[v.to_sym][tone_index]
}

代码来自 (https://github.com/flyerhzm/chinese_pinyin/blob/master/lib/chinese_pinyin.rb#L86-L90)

因为每个 TONE_MARK[v.to_sym] 都只有 5 个元素,因此会取到 nil,再进行 tr! 就会报错。

有些字的第一个拼音的 tone 不是 5,所以不会出问题。但也有不少字的第一个拼音的 tone 是 5,

awk '$2~/5/{print $1,$2}' data/pinyin-utf8.dat
㟷 da5
了 le5
们 men5
們 men5
吗 ma5
吧 ba5
吶 ne5
呗 bei5
哟 yo5

这些字转成拼音都会报错.

Pinyin.t('呗', tonemarks: true)

# : in `tr!': no implicit conversion of nil into String (TypeError)
@flyerhzm
Copy link
Owner

@hongliang-goudou 这个是你转换的,知道是什么原因吗?

@zhustec
Copy link
Contributor Author

zhustec commented Jun 22, 2015

@flyerhzm 并不是转换的问题,是 Mandarin.dat 里面 5

@flyerhzm
Copy link
Owner

@wittyfox 5 应该是轻声吧 http://baike.baidu.com/view/1632699.htm

@zhustec
Copy link
Contributor Author

zhustec commented Jun 22, 2015

@flyerhzm 恩,刚才回头补了下汉语拼音也发现了,但是 5 应该改成 0 才可以,或者把调整一下 TONE_MARK 然后用 tone_index - 1 访问

flyerhzm added a commit that referenced this issue Jun 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants