A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Nov 29, 2024 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Portuguese pre-trained BERT models
The hands-on NLTK tutorial for NLP in Python
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
A curated list of beginner resources in Natural Language Processing
Projects and useful articles / links
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners.
A lexicon for Sudachi
A curated list of NLP resources for Hungarian
A Dutch RoBERTa-based language model
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
summaries of all the papers I read
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
An open information extraction system that provides compact extractions
Created by Alan Turing