Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError when calling crfcut engine in sent_tokenize function #859

Open
pavaris-pm opened this issue Nov 7, 2023 · 4 comments
Labels
question asking questions/giving suggestions

Comments

@pavaris-pm
Copy link
Contributor

i've try the crfcut engine in sent_tokenize function in stable release version of PyThaiNLP via

pip install --upgrade pythainlp

this is what i expected

sent_tokenize(sentence_1, engine="crfcut")
# output: ['ฉันไปประชุมเมื่อวันที่ 11 มีนาคม']

however, i got this as an output instead

sent_tokenize(sentence_1, engine="crfcut")

# ModuleNotFoundError: No module named 'pycrfsuite'

since it is a missing package problem, it can be solved by pip install python-crfsuite in order to make it compatible to be used. However, is it better to fix it so that the user has no need to take an extra step to install crfsuite everytime they want to use an engine, or we can just leave it as usual here. What do you think ?

@wannaphong
Copy link
Member

python-crfsuite is often python problem when python was released new version. You can see #655. We doesn't add python-crfsuite to the dependencies list.

@wannaphong
Copy link
Member

I looking new model to removed all crfsuite model but these models are quite efficient and therefore not worth replacing. Deep learning model are not much better.

@fabswt
Copy link

fabswt commented Nov 26, 2024

I'm confused. If PyThaiNLP uses pycrfsuite (and it does), why not just add it to the list of requirements?

@wannaphong
Copy link
Member

I'm confused. If PyThaiNLP uses pycrfsuite (and it does), why not just add it to the list of requirements?

Hello! Many functions can work without pycrfsuite if it doesn't use pycrfsuite model. Newest sent_tokenize engine can work without using pycrfsuite. We has the plan to remove all python-crfsuite models from PyThaiNLP #655 but the processing is slow. (I and other main contributors are still is not enough free time.)

We was survey pythainlp user. We found many user use word tokenizer is the main feature and python-crfsuite dependency fails to build under python 3.10 #626, so we remove pycrfsuite from the list of requirements and add it to extra packages to avoid dependency problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question asking questions/giving suggestions
Projects
None yet
Development

No branches or pull requests

3 participants