Skip to content

Web applications for human annotation on documents

Notifications You must be signed in to change notification settings

alxdru/app-anno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

annotate-app

Cloud-based Web application for data annotators to easily do text annotations that serve Natural Language Processing tasks. The project’s main goal was to come up with a better option in today’s free annotation applications by offering a single Cloud tool for annotators to receive, assign and collaborate seamlessly on annotation tasks.

Inbound annotation tasks are received through messaging queue consumer micro-services.

Authentication is done via redirection to Auth0 and using auth0 library.

Check out the other repositories for the micro-services involved in the backend.

UI Platform Flow

Tasks tab

User can choose an annotation task and proceed to annotate.

image

Annotate tab

Main tab

User can read the task description and labels with which annotations can be done. Simply select text by click on words or doing multiple text selection hold click - drag - release.

image

Annotation Menu

Selected text is being displayed. A label can be chosen. Menu can either be Closed or Save the annotation.

image

Annotations tab

User's annotations are displayed with their associated task name.

image

Statistics tab

User's statistics are displayed. How many Annotated Tasks were completed out of the Open Tasks.

image

List Documents tab

In this tab Users can see conflicted annotations with other annotators and attempt to solve the issue. Clicking on Solve under the conflicted user redirects the current user to the actual annotated piece of text corpus and change its label to resolve the conflict.

image

Managing annotations

Below I am associating the JSON objects from REST calls with the actual features of the application.

Labels

These are the types of labels that can be assigned to annotated portions of text. Of course this can be extended with other specific labels.

"parameters": {
"labels": [{
"_id": {
"$oid": "60d6fbbff059c100830017c6"
},
"name": "T_ORG",
"display_name": "Organization"
}, {
"_id": {
"$oid": "60d6fbbff059c100830017c7"
},
"name": "T_LOC",
"display_name": "Location"
}, {
"_id": {
"$oid": "60d6fbbff059c100830017c8"
},
"name": "T_PERS",
"display_name": "Person"
}],}

Annotation Structure

At the end of the annotating process the following JSON structure will be saved:

{
"_id": {
"$oid": "60e76cd5df510e81a0c56f27"
},
"user": {
"id": "auth0|60e485956c9def00704d7aa7",
"name": "First Last",
"email": "[email protected]"
},
"annotationProperties": [{ // List containing all the annotations realized
"labels": ["Organization"],
"_id": {
"$oid": "60e76cd5df510e81a0c56f28"
},
"entity": "Organization ",
"startPosition": "50", // Start position in the annotated text corpus
"endPosition": "62" // End position in the annotated text corpus
}, {
"labels": ["Person"],
"_id": {
"$oid": "60e76cd5df510e81a0c56f29"
},
"entity": "Person ",
"startPosition": "74",
"endPosition": "80"
}],
"taskId": "60d6fbbaf059c100830017c0",  // UUID of annotation task
"taskText": "This is an annotation task",
"__v": 0
}

Managing Conflicts

There can be cases when parts of the text corpus, that were annotated by multiple users with different labels, raises conflicts. These conflicts should be resolved before annotations can be used by an NLP system which can produce confusions.

The application can query these kind of conflicts. The JSON objects of these queries rendered by the app look as following:

{
"values": [
{
"conflictedUser": {
"name": "First Last",
"id": "google-oauth2|101297217533613321464",
"email": "[email protected]" // Email address of annotator the current user conflicted with. Helpful for working on a resolution.
},
"_id": "610e9a7b60f0f00083145078",
"conflictedProperties": [
{
"_id": "610e9a7b60f0f00083145079",
"conflictedEntity": "Watermelon Sugar",
"conflictedStartPos": "167",
"conflictedEndPos": "185"
}
],
"annotationId": "610e9a0760f0f0008314506f",
"taskId": "6108222e8d1404008260e9ea",
"taskText": "Annotate news texts with provided labels" // Provided task text for better clarification to annotators
}
]
}

Annotation Task

{
"_id": {
"$oid": "60bfa444b3dce54f1055b491"
},
"createdAt": "2021-06-08T16:53:52.530Z",
"description": "This is an annotation task",
"maxUsers": 1,
"type": "Entity annotation",
"parameters": {
"labels": [{ // Labels that can be used in this annotation task
"name": "T_ORG",
"display_name": "Organization"
}, {
"name": "T_LOC",
"display_name": "Location"
}, {
"name": "T_PERS",
"display_name": "Person"
}],
"text": "This is task where you need to label text that is Organization, Location, Person..."
}
}

High-level architecture

Deployment was done using SAP BTP, but any other Cloud Provider would do with some minor changes to the actual messaging applications. The platform components are:

  • Microservices for communication with external platforms (Node.js + Event Mesh) // Annotation tasks ingestion
  • Microservices for internal communication (Node.js + Event Mesh) // Consumers that read task messages and bring them into the app's storage for consumption
  • RESTful APIs for application's tabs (Node.js) //
  • Application that serves as the main GUI (Vue.js)
  • Authentication system was done through Auth0
  • SAP Event Mesh as messaging service
  • Storage with NoSQL MongoDB instance

image

Project setup

npm install

Compiles and hot-reloads for development

npm run serve

Compiles and minifies for production

npm run build

Lints and fixes files

npm run lint

Customize configuration

See Configuration Reference.