Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 807 Bytes

GT.md

File metadata and controls

11 lines (9 loc) · 807 Bytes
layout title
default
Ground Truth

-- Ground Truth

Once we have the dataset, composed by 591 tweets, we had to label it in order to build the ground truth. We have divided it in three parts and each one of them tried to classify the tweets using the label given by Google. The work was so time consuming, but necessary to test our hand-craft classifier.

This process has been assisted by a script written by us in Python, which takes the dataset in JSON as input and then tweet-by-tweet, with the assistance of another script that suggest some tags given some entities, we were able to classify them without any problems (if you are interested on it, check out here).

The output produced is a CSV file, which is better for readability.