Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the Avg Perceptron POS model till converged. #188

Open
Ayushk4 opened this issue Jan 19, 2020 · 6 comments
Open

Train the Avg Perceptron POS model till converged. #188

Ayushk4 opened this issue Jan 19, 2020 · 6 comments
Labels
help wanted good for beginners

Comments

@Ayushk4
Copy link
Member

Ayushk4 commented Jan 19, 2020

As of now, the Avg Perceptron POS model added in #131 gives ~60 per cent accuracy on CoNLL 2003 which is decent for the 30+ classes it addresses. But still is low compared to claimed ~90% accuracy.

We can port weights from other libraries (depending on the licence permissions). Alternatively, we can train those.

But first, it will be better to know precisely the current accuracy Avg Perceptron POS Tagger offers on CoNLL 2000 and GMB as well (CorpusLoaders.jl provides the APIs for these two datasets.)

@Ayushk4 Ayushk4 added the help wanted good for beginners label Jan 20, 2020
@tejasvaidhyadev
Copy link
Member

tejasvaidhyadev commented Jan 30, 2020

Hi @Ayushk4
CorpusLoaders.jl APIs can not be directly used because of datatype constraint input require in Avg Perceptron POS model. I think we can provide new methods in Avg Perceptron POS model or CorpusLoaders.jl to make them compatible with each other.
As I was working with Pre train weight predict function is returning missing values. (Accuracy on the test set of CoNLL by CorpusLoder.jl is 0.47(using Pertain model )
While the training model is giving some unexcepted error.

@tejasvaidhyadev
Copy link
Member

Hi @Ayushk4
The process is getting killed due to unknown reasons during training on the CoNLL dataset.
I am attaching gist file of code used (Sorry, code is little messy)
I will be updating the status soon

@Ayushk4
Copy link
Member Author

Ayushk4 commented Feb 9, 2020

Hi, sorry for the late response.

Can you upload the code as notebook format, or fix the indentation in gistfile to make the code more readable?

@Ayushk4
Copy link
Member Author

Ayushk4 commented Feb 9, 2020

It will be great if you could also share the notebook used for measuring the performance of Avg. Preceptoron tagger on CoNLL and specify which CoNLL dataset you used?

I personally like the idea of writing new APIs to handle different data types and documents that TextAnalysis.jl provides.
Feel free to open an issue (or better yet send a PR) for the same.

@tejasvaidhyadev
Copy link
Member

Updated gist with comments and I will soon upload the Notebook used for measuring the performance of Avg. Preceptoron tagger on CoNLL Test(set)
Should I add API in TextAnalysis.jl or CorpusLoaders.jl?

@Ayushk4
Copy link
Member Author

Ayushk4 commented Feb 10, 2020

For CoNLL and dataset related APIs, they go to CorpusLoaders.jl.

For perception tagger related APIs supporting new inputs, they go to TextAnalysis.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted good for beginners
Projects
None yet
Development

No branches or pull requests

2 participants