Skip to content
Nathan Leiby edited this page Oct 27, 2013 · 2 revisions

Crowdsourced Crisis Reporting

In crisis situations like contested elections, natural disasters, and troubling humanitarian situations, there's an information gap between the information providers (the voters, disaster survivors, and victims) and the information responders (the election monitors, aid organizations, NGOs and journalists).

Crowdsourced crisis reporting platforms, like Ushahidi and others, aim to narrow this information gap. They provide centralized software to collect, curate, and publish reports coming from the ground.

Here's how the existing report review process works:

Review process

  1. First, a report is submitted, via SMS, e-mail, Twitter or the web.
  2. Next, admins (often volunteers) review the report. They typically perform a number of tasks:
  • Identify the language (to see if they have the language skills to process it, and if not route it to a different reviewer) and whether translation is needed.
  • Ensure the report hasn't already been submitted, to reduce duplicate work.
  • Categorize the message content, figure out the location, and remove any sensitive information like telephone numbers and names.
  1. The report is published in some format (e.g., on a map in the Ushahidi system).
  2. The publication leads to increased awareness for responders.

Crowdsourced crisis reporting is already a reality, but there's a big problem: currently, the review process is heavily manual. It's slow, tedious, requires domain experts, and doesn't scale up well for large volumes or fast-paced situations.

Our opportunity is to use computing to make this process scale. We've built open-source tools, using natural language processing and machine learning, to support and improve the human review process. No longer must the reviewers do everything from scratch -- now they have automated suggestions to help them. Sensitive information can be automatically flagged, via named entity recognition. Using text classification, categories can be simply confirmed instead of chosen from scratch.

With these tools, crisis report reviewers can reduce the time and tedium they spend processing, and focus their energies on verifying and responding to the reports instead -- the part that really matters.

Terminology and User Stories

Terminology

Instance
A website which is running an Ushahidi map. It usually has a specific cause (e.g., "Zombie attacks in Washington DC") on which it seeks to gather data. Generally, allows anyone to send a message via public interfaces (web form, email, sms) and may also gather data from web streams (twitter, RSS).
Crowdmap
A platform which allows anyone to host a free Ushahidi instance.
Message
May be received by email, SMS, or Twitter (voice also possible)
Report (=Incident)
An annotated message, including additional fields such as
Category
A "tag" which describes a report (e.g., "Buildings on Fire" or "). In Ushahidi 2.x, sometimes categories are also used to describe workflow information for _annotators_ (e.g., "Needs to be translated"); we believe this is planned to change in Ushahidi 3.x.
"Parent"/"child" category
A "parent" category is the top-level of the categorization heirarchy. For example, the "parent" category might be crime while the "child" categories might include assault or theft. In Ushahidi 2.x, a "parent" category cannot be directly selected.
Approve
Allow a _report_ to appear as a dot the map, visible to public.
Verify
Mark a _report_ as reliable/true information, because you are confident in it (from viewing multiple sources, seeing a news article, viewing in person, etc)
Trusted
Can mark a reporter/user as trusted, which results in their _reports_ automatically being _approved_.

Users

Citizen
A person unaffiliated with the Ushahidi _instance_, who submits information to the system via a _message_.
Annotator
A volunteer or employee who takes incoming _messages_ and creates a _report_. They provide additional information (such as one or more _categories_, a location name, and latitude/longitude).
Admin
Runs the _instance_. Can modify the list of available _categories_. Has the ability to approve or verify a message. May also be an annotator.

Work Flow

See [Crowdsourced Crisis Reporting](Crowdsourced Crisis Reporting) for more.

How to Submit a Report

Submit Report

Message Queue

Message Queue

User Stories

Story #0: A Citizen be assisted in choosing categories.

An citizen files a report from the public user interface. She types the message text, the some categories are automatically highlighted in bold, with a confidence percentage written in parentheses. Sometimes, a "parent" category is highlighted, while other times a "child" category is highlighted.

Story #1: An Annotator should be assisted in choosing categories.

An annotator opens the message queue in the admin interface. She selects a message to annotate. Upon opening the message, several of the categories are already highlighted in bold, with a confidence percentage written in parentheses. The annotator is quickly able to check those boxes to confirm what the system has suggested.

Story #2: An Annotator should not have to waste time processing near-duplicate messages.

An annotator opens the message queue in the admin interface. Several messages have a tag next to them ("suspect duplicate"). Upon clicking this tag, the two messages appear side-by-side and the annotator is presented with a choice ("Is this message a duplicate? Yes/No"). The annotator sees that this new message is a re-tweet of a previous message. The Annotator chooses "Yes", and the new message is then linked to the previous message without any additional user input.

Alternatives:

  1. An annotator selects a new message from the queue. They see a listing of all existing reports that are near duplicates in their content. They annotator then knows whether they've already recorded this information on the map, and it's useless to re-annotate.

  2. Several annotators are facing 100s of new messages in the queue. 20 of these new, non-annotated messages are grouped together before the annotator reads them, because they are near duplicates. The annotator can then respond to the near-duplicates (e.g. Twitter re-tweets) as a unit, and saves time rather than annotating each of them individually.

Story #3: An admin should be able to be assisted regardless of the scale of their instance.

An admin creates a new instance and begins receiving messages. When the admin (also playing role of annotator) views the message queue and file reports, she notices that categories are already being suggested for the default category set. She is impressed that there's no warm-up period before the machine learning kicks in!

(We may be able to pre-train our classifier to work on specific non-vanilla Ushahidi instances, e.g., if there are 10 baseline categories to use for an Ushahdi instance with an environmental focus, we could guess those categories based on previous training external to this particular instance.)

Story #4 "Global Review": An admin should be able to quickly review and correct mis-labeled data.

A group of expert admins prepares to review all reports, updating their categories to strictly adhere to the UN OCHA guidelines. Because these guidelines are complex and specific, sometimes citizens and volunteer annotators miscategorize reports initially, despite their best intentions. The machine suggests a list of reports to the experts, which are likely to be labeled incorrectly or have urgent content. These messages are then addressed first by the experts, and the highest risk (incorrect, urgent) items are fixed promptly.

Story #5 "PII": An admin should be able to quickly remove personally identifying information (PII), so that it never appears on the public map.