Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with make_lemma_dictionary for treetagger engine #12

Open
JaySLee opened this issue Aug 5, 2020 · 0 comments
Open

Issues with make_lemma_dictionary for treetagger engine #12

JaySLee opened this issue Aug 5, 2020 · 0 comments

Comments

@JaySLee
Copy link

JaySLee commented Aug 5, 2020

Hi,

Firstly, thanks for a great and useful package! I've been experimenting with the make_lemma_dictionary function and was wondering if the addition of the following features would be helpful:

  1. Because the text is separated into tokens prior to its being sent into treetag(), some of the context is lost. Would it make sense to have an option to keep the text as is, i.e., full sentences? Here's an example: c("That food is really nice.","That felt is really nice."). Because the token/line with 'felt' is all by itself (as the other terms already appear), TreeTagger uses the default interpretation of felt as a verb. Passing in the full sentences to treetag() allows for the proper tagging.

  2. I had some issues getting the treetag() function itself to work; potential bugs have been raised with koRpus' developer. I was wondering if a debug flag could be passed to treetag as well as an option to unsuppress messages, so that users could diagnose problems.

Thanks!

Best,
Jay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant