CoNLL support #10

ksteimel · 2018-04-26T03:31:25Z

It'd be good to support CoNLL format in a generic sense (and then perhaps some of the more specific CoNLL formats as an offshoot). I'd be happy to work on this if this is something you think would be worth it.

oxinabox · 2018-04-26T06:08:36Z

I think it is worth it yes.

@Evizero has support for it in MLDatasets.jl
https://github.com/JuliaML/MLDatasets.jl/blob/master/src/CoNLL.jl
which would be a starting point.

if that is ported across, and enhanced to match the CorpusLoaders style:

Lazily loaded from disk
using MultiResolutionIterators.jl

And is working well, perhaps we can talk about deprecating it out of MLDatasets.jl.
Though there are perhaps pros to having two loaders for that, since MLDatasets.jl's is much simpler maybe.

Ayushk4 · 2019-05-29T20:07:23Z

I am starting with the addition of CoNLL 2003 Corpus. The original files from the shared task are freely available.

To extract the required files from it, one needs to have the Reuters Corpus file rcv1.tar.xz and build the original files with it. This is available from Dataverse Harward or NIST website. However, obtaining the Reuters corpus requires a user agreement and maybe some time for it to get approved.

Instead of doing this, there are files of CoNLL 2003 that have been built and are openly available.

I feel it will be very very difficult to take care of the downloading part with the former method and that I should go with the latter approach. What do you suggest in this case?

Edit: I feel the latter approach will be simpler overall as well as easier to multiplicate this to other CoNLL datasets.

oxinabox · 2019-05-29T20:08:12Z

The later sounds legit

Ayushk4 mentioned this issue May 30, 2019

Support for CoNLL Corpora #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoNLL support #10

CoNLL support #10

ksteimel commented Apr 26, 2018

oxinabox commented Apr 26, 2018

Ayushk4 commented May 29, 2019 •

edited

Loading

oxinabox commented May 29, 2019

CoNLL support #10

CoNLL support #10

Comments

ksteimel commented Apr 26, 2018

oxinabox commented Apr 26, 2018

Ayushk4 commented May 29, 2019 • edited Loading

oxinabox commented May 29, 2019

Ayushk4 commented May 29, 2019 •

edited

Loading