Train the Avg Perceptron POS model till converged. #188

Ayushk4 · 2020-01-19T09:52:07Z

As of now, the Avg Perceptron POS model added in #131 gives ~60 per cent accuracy on CoNLL 2003 which is decent for the 30+ classes it addresses. But still is low compared to claimed ~90% accuracy.

We can port weights from other libraries (depending on the licence permissions). Alternatively, we can train those.

But first, it will be better to know precisely the current accuracy Avg Perceptron POS Tagger offers on CoNLL 2000 and GMB as well (CorpusLoaders.jl provides the APIs for these two datasets.)

tejasvaidhyadev · 2020-01-30T18:41:12Z

Hi @Ayushk4
CorpusLoaders.jl APIs can not be directly used because of datatype constraint input require in Avg Perceptron POS model. I think we can provide new methods in Avg Perceptron POS model or CorpusLoaders.jl to make them compatible with each other.
As I was working with Pre train weight predict function is returning missing values. (Accuracy on the test set of CoNLL by CorpusLoder.jl is 0.47(using Pertain model )
While the training model is giving some unexcepted error.

tejasvaidhyadev · 2020-02-06T19:32:58Z

Hi @Ayushk4
The process is getting killed due to unknown reasons during training on the CoNLL dataset.
I am attaching gist file of code used (Sorry, code is little messy)
I will be updating the status soon

Ayushk4 · 2020-02-09T07:52:37Z

Hi, sorry for the late response.

Can you upload the code as notebook format, or fix the indentation in gistfile to make the code more readable?

Ayushk4 · 2020-02-09T07:56:48Z

It will be great if you could also share the notebook used for measuring the performance of Avg. Preceptoron tagger on CoNLL and specify which CoNLL dataset you used?

I personally like the idea of writing new APIs to handle different data types and documents that TextAnalysis.jl provides.
Feel free to open an issue (or better yet send a PR) for the same.

tejasvaidhyadev · 2020-02-10T11:28:30Z

Updated gist with comments and I will soon upload the Notebook used for measuring the performance of Avg. Preceptoron tagger on CoNLL Test(set)
Should I add API in TextAnalysis.jl or CorpusLoaders.jl?

Ayushk4 · 2020-02-10T16:15:37Z

For CoNLL and dataset related APIs, they go to CorpusLoaders.jl.

For perception tagger related APIs supporting new inputs, they go to TextAnalysis.jl.

Ayushk4 added the help wanted good for beginners label Jan 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train the Avg Perceptron POS model till converged. #188

Train the Avg Perceptron POS model till converged. #188

Ayushk4 commented Jan 19, 2020

tejasvaidhyadev commented Jan 30, 2020 •

edited

Loading

tejasvaidhyadev commented Feb 6, 2020

Ayushk4 commented Feb 9, 2020

Ayushk4 commented Feb 9, 2020

tejasvaidhyadev commented Feb 10, 2020

Ayushk4 commented Feb 10, 2020

Train the Avg Perceptron POS model till converged. #188

Train the Avg Perceptron POS model till converged. #188

Comments

Ayushk4 commented Jan 19, 2020

tejasvaidhyadev commented Jan 30, 2020 • edited Loading

tejasvaidhyadev commented Feb 6, 2020

Ayushk4 commented Feb 9, 2020

Ayushk4 commented Feb 9, 2020

tejasvaidhyadev commented Feb 10, 2020

Ayushk4 commented Feb 10, 2020

tejasvaidhyadev commented Jan 30, 2020 •

edited

Loading