📓 Update 01 October 2023: this collection is now available in arekit-ss for a quick sampling of contexts with most subject-object relation mentions with just single script into
JSONL/CSV/SqLite
including (optional) language transfering 🔥 [Learn more ...]
Release Notes:
- List of synonyms has been expanded; not it covers all extracted named entities in
*.ann
files;- Providing collection reader.
RuSentRel corpus [paper] of version 1.1 consisted of analytical articles from Internet-portal inosmi.ru. These are translated into Russian texts in the domain of international politics obtained from foreign authoritative sources. The collected articles contain both the author's opinion on the subject matter of the article and a large number of references mentioned between the participants of the described situations. In total, 73 large analytical texts were labeled with about 2000 relations.
The texts were processed by the automatic name entity (NE) recognizer, based on CRF method [paper].
NE were categorized into four classes: Persons, Organizations, Places and Geopolitical Entities
(states and capitals as states).
Automatic labeling contains a few errors that have not yet been corrected. Preliminary analysis
showed that the F-measure of determining the correct entity boundaries exceeds 95%.
Recognized NE were composed in *.ann
files.
For verbose description, please see References section.
For model application, please refer to the following repositores:
- Scikit-learn classifiers application
- Piecewise CNN application
📓 Update 01 October 2023: this collection is now available in arekit-ss for a quick sampling of contexts with most subject-object relation mentions with just single script into
JSONL/CSV/SqLite
including (optional) language transfering 🔥 [Learn more ...]
Folder reader
contains a collection reader (source file parsers), written in Python-3.6.
Please refer to read.py, as it provides an example of how this collection could be parsed/readed.
Parameter | Training collection | Test collection |
---|---|---|
Number of documents | 44 | 29 |
Sentences (avg./doc.) | 74.5 | 137 |
NE (avg./doc.) | 194 | 300 |
unique NE (avg./doc.) | 33.3 | 59.9 |
positive pairs of NE (avg./doc.) | 6.23 | 14.7 |
negative pairs of NE (avg./doc.) | 9.33 | 15.6 |
Share of attitudes expressed in a single sentence | 76.5% | 73% |
Statistics for the whole Collection:
Parameter | Collection |
---|---|
Avg. dist. between NE within a sentence in words | 10.2 |
Human labeling agreement (F1(P, N)) | 0.55 |
Contradiction (Acc.) | 0.01 |
Separately for train and test collections, we compose and group these sets by sizes and the resulted statistics for the first eight groups is presented in table below.
We decide a context sentiment with a pair of entities, when related sentiment attitude could be found.
train-sent | Total | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|
train-sent | 467 | 47% | 15% | 4.4% | 4.3% | 2.2% | 0.9% | 0.8% | 1.0% |
test-sent | 669 | 47% | 13% | 5.0% | 4.2% | 2.4% | 1.0% | 1.1% | 1.3% |
In most cases we deal with single-context attitudes in train and test collections. However, the distribution of the sentiment single-context attitudes represent 47% is about a half of all occured attitudes. Considering such a distinctive factor for attitudes labeling, it is important to take into account the labels of several contexts
@article{loukachevitch2018extracting,
Author = {Loukachevitch, N. and Rusnachenko, N.},
Title = {Extracting Sentiment Attitudes from Analytical Texts},
Journal = {In Proceedings of International conference Dialog-2018},
Year = {2018}
}