This repo is for the Mis2-KDD 2021 paper: Dataset of Propaganda Techniques of the State-Sponsored Information Operation of the People’s Republic of China
We present our dataset that focuses on propaganda techniques in Mandarin based on a state-linked information operations dataset from the PRC released by Twitter in July 2019. The dataset consists of multi-label propaganda techniques of the sampled tweets.
In total, we have 9,950 labeled tweets with 21 different propaganda techniques. The tweets are the state-linked information operations dataset from the PRC released by Twitter.
@online{2106.07544,
Author = {Rong-Ching Chang and Chun-Ming Lai and Kai-Lai Chang and Chu-Hsing Lin},
Title = {Dataset of Propaganda Techniques of the State-Sponsored Information Operation of the People's Republic of China},
Year = {2021},
Eprint = {arXiv:2106.07544},
Organization = {Tunghai University},
url = {https://arxiv.org/abs/2106.07544}
}
- The list of propaganda techniques and their labels
- How the dataset looks like:
- Downloading the dataset from Twitter
Label | Propaganda Techniques |
---|---|
1 | Presenting Irrelevant Data |
2 | Straw Man |
3 | Whataboutism |
4 | Oversimplification |
5 | Obfuscation |
6 | Appeal to authority |
7 | Black-and-white |
8 | Name Calling |
9 | Loaded Language |
10 | Exaggeration or Minimisation |
11 | Flag-waving |
12 | Doubt |
13 | Appeal to fear or prejudice |
14 | Slogans |
15 | Thought-terminating cliché |
16 | Bandwagon |
17 | Reductio ad Hitlerum |
18 | Repetition |
19 | Neutral Political |
20 | Non-Political |
21 | Meme humor |
Tweetid | Propaganda Techniques |
---|---|
990189929 836699648 | 3,8,9 |
114879827 6281364480 | 8,9,13,14 |
We do not provide the tweet content. To fully utilize the dataset, follow the below steps.
To comply with Twitter's developer policy, we do not share the content of the tweet directly but the tweetid and labels. As the data was released, you can download it on twitter directly.
-
Go to > Twitter Information Operation website
https://transparency.twitter.com/en/reports/information-operations.html -
Go to >03. Download Archive, enter your email
-
By now you should have granted access to download the released dataset.
-
Go to >Datasets released in August 2019
-
Go to > China (July 2019, set 1) - 744 Accounts
-
Go to > Tweet Information (158 MB)
-
Download it.
-
Now you will have the twitter dataset we build upon. You can use any ways to concate the downloaded dataset with our labeled dataset by the Twitterid. So you will have the following set up and more to empower your research:
Tweetid | Tweet | Propaganda Techniques |
---|---|---|
990189929 836699648 | 3,8,9 | |
114879827 6281364480 | 8,9,13,14 |
We also provided a jupyter notebook for your reference on data preprocessing steps.