Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于主题抽取中的代码 #4

Open
acmilannesta opened this issue Jun 20, 2019 · 4 comments
Open

关于主题抽取中的代码 #4

acmilannesta opened this issue Jun 20, 2019 · 4 comments

Comments

@acmilannesta
Copy link

acmilannesta commented Jun 20, 2019

作者你好, 我刚入门NLP。对于下面这几行代码不是很理解,想请问一下, 在extract_entity函数中,为何要将_ps递减10,谢谢!

if len(_t) == 1 and re.findall(u'[^\u4e00-\u9fa5a-zA-Z0-9\*]', _t) and _t not in additional_chars:           
    _ps1[i] -= 10
@bojone
Copy link
Owner

bojone commented Jun 29, 2019

防止抽取出的实体包含非法字符。

@Hejp5665
Copy link

虽然我是菜鸟,但我觉得你的代码不够简洁

@bojone
Copy link
Owner

bojone commented Jul 19, 2019

请教简洁写法

@natureLanguageQing
Copy link

能把苏老师逼成这样子的你也是厉害,主要是因为数据传输的过程中会有一定的数据丢失,或者是数据转存中的方法不统一,所以造成读取文件的时候会报错,针对数据集做一些优化这个必不可少

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants