This repository contains the code, manuscript, datasets, and other materials accompanying our work published as a research paper to the International Joint Conferences on Artificial Intelligence (IJCAI).
Consider the following exemplar news article:
Although the title may give us a coarse indication of the content of the article (e.g. Politics), a careful reading of the text reveals that about 66% of the article is about Diplomacy, 28% about Arms Control, 3% about Geopolitics, and 3% about Foreign Policy.
Such categorisation is valuable in areas such as information retrieval and recommendation as it allows for finer grained searches and organisation than classifications into single categories. Other examples may include the labelling of proportions of sentiments (e.g. surprise or joy), or the labelling of images when multiple objects are present at the same time.
Despite years of advances in automated classification, humans are still better on such tasks. As a result, crowdsourcing has increasingly been a popular way to leverages human annotators of various abilities and domain experience to perform tasks that would be too difficult or expensive to process computationally or using experts, but would only require simple instructions to complete.
However, collecting reliable judgments from unknown members of a crowd (also called workers) remains a challenging task. It is well known that crowdsourcing platforms suffers from malicious participants (also called spammers) which provide judgments randomly regardless of the document. Such spammers can constitute up to 45% of the population of workers. This increases the cost of acquiring judgments and degrade accuracy of the aggregation.
In this present work, we introduce a new method to aggregate judgments of proportions across multiple categories that for the first time accounts for spammers.
This repository uses a total of three datasets to evaluate accuracy, including two novel crowdsourced judgments for a total of 796 annotations about proportions of objects in images and colours in countries flags.
-
SemEval-2007. Each worker was presented with a list of news headlines and was asked to give numeric judgments between zero and a hundred for each of six sentiments. A total of 1,000 judgments are available accross 100 news headlines.
-
IAPR-TC12. Each worker were presented with images and was asked to estimate the proportion of each of the six regions in it (e.g. landscape/nature or man-made). We collected a total of 336 judgments from a set of 16 images.
-
Colours. Twenty-three participants were asked to judge the proportion of 10 colours in 20 countries' flag. We crowdsourced a total of 460 judgments of proportion.
Our proposed model (that we call multi-category independent Bayesian classifier combination, or MBCC for short) builds on the strength of prior approaches to deal with aggregating distributions while at the same time accounting for spammers. In particular, we extend IBCC, and associate with each document a categorical distribution representing the proportions of each category.
The factor graph below illustrates the generative process (that is, the process by which our model assumes the judgments of proportions from the workers have been generated) that learns both the proportions per document, and the accuracy of each worker. This is a typical factor graph where each node represent a random variable and each connection a probabilistic conditional dependency.
-
we start by sampling a confusion matrix for each worker. Each row \(\pi\) of a confusion matrix is distributed according to a Dirichlet distribution with hyperparameter \(\alpha\).
-
we then sample a categorical distribution \(\Lambda\) for each document, which represent the aggregated judgment of the proportion by all workers. This categorical distribution \(\Lambda\) is similarly drawn from a Dirichlet prior with hyperparameter \(\epsilon\).
-
we then repeateadly sample this distribution \(\lambda\) \(\n\) times to obtain multiple discrete categories \(z\).
-
we then use those samples \(\z\) as index of the workers' confusion matrix \(\pi\), and samples discrete judgments \(\c\) from the appropriate row of the confusion matrix of each worker.
-
finally, we find the most likely categorical distributions \Phi which generated the samples \(\c\) for all documents and workers.
All source code used to generate the results and figures in the paper are in the src
and scripts
directory.
The data used in this study is provided in data
and the sources for the manuscript text and figures are in manuscript
.
The poster and presentation can be found in poster/poster.pdf
and poster/presentation.pdf
respectively.
You can download a copy of all the files in this repository by cloning the git repository:
git clone https://github.com/alexandry-augustin/mbcc.git
The model was developed on Ubuntu Linux using MonoDevelop as IDE.
You’ll need a working Python environment and the Infer.NET 2.6 library to run the code.
If you use our code or dataset, please cite as follows:
@inproceedings{augustin2017mbcc, title={Bayesian aggregation of categorical distributions with applications in crowdsourcing}, author={Augustin, Alexandry and Venanzi, Matteo and Hare, J and Rogers, A and Jennings, NR}, year={2017}, organization={AAAI Press/International Joint Conferences on Artificial Intelligence} }
All source code is made available under the MIT license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE for the full license text.
The manuscript text is not open source. The authors reserve the rights to the article content, which has been published in the proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).