Skip to content

Dataset of Polish movie reviews used for document-level sentiment classification.

License

Notifications You must be signed in to change notification settings

narolski/filmwebplus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Filmweb+

Filmweb+ is a dataset of Polish movie reviews used for the purpouse of a document-level sentiment classification task.

It contains 27202 reviews in total, which are divided into unlabelled, positive, neutral and negative classes based on rating given by the review's author.

Class (indice) # of reviews
Unlabelled (-1) 7710
Negative (0) 3085
Neutral (1) 9436
Positive (2) 6971

Reviews have been fetched from Filmweb, Film.org.pl, Kinoblog, FDB and blogfilmowy24. They contain an average of 514 words (or 3582 characters) per document.

The dataset has been pre-processed in order to remove unnecessary HTML tags or other redundant metadata, such as image descriptions, using the Spacey preprocessor. It is available in a CSV format.

Document-level Sentiment Classification Results

A ULMFiT-based document level sentiment classification system proposed in work "Sentiment Classification using Machine Learning" (Narolski, 2020) has achieved a state-of-the-art result of 94,19% for Polish document-level sentiment classification task for a corpora of long, real-world text documents, with over 500 words per document on average.

Citation

If you've found this dataset helpful for your work, please consider citing the work "Sentiment Classification using Machine Learning" (Narolski, 2020) or providing an appropriate authorship attribution.

About

Dataset of Polish movie reviews used for document-level sentiment classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published