Dataset: CoNLL 2003 (English)

The English dataset was obtained from PapersWithCode which was introduced by Sang and Meulder in their paper in 2013, Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. At the time of writing, the dataset consists of the following:

English Data	Articles	Sentences	Tokens	LOC	MISC	ORG	PER
Training set	946	14,987	203,621	7140	3438	6321	6600
Development set	216	3,466	51,362	1837	922	1341	1842
Test set	231	3,684	46,435	1668	702	1661	1617

Moreover, the leaderboard for Named Entity Recognition (NER) with this dataset can be found here. The state of the art currently is an F1 score of 94.6 using ACE + document-context model, which is described in more detail in their paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dataset: CoNLL 2003 (English)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dataset: CoNLL 2003 (English)