Annotation editor

Introduction

The core of tagtog the text annotation editor for data augmentation. This editor is designed to make the user feel comfortable annotating text. We have created a minimalist user interface to interfere as little as possible in the reading experience to increase annotator's focus and the efficiency during annotation tasks.

The annotation editor is used to manually annotate text or/and train a machine learning model to automatically annotate text. By enabling automatic annotations you can build awesome stuff you didn't think of at first.

This web editor includes features as automatic annotations, overlapping text annotations or support for full-text articles, that reduce significantly the time required to annotate text.

tagtog annotation editor with text span annotations and document labels. The editor is mainly divided into: Document area, Toolbar and Sidebar.

Type of annotations

You can annotate at text span level or at document level. Let's take a look to the type of annotations you can create using tagtog:

Annotation type Description
Entity Span of text representing a named entity. It can be any span: a part of a word, a word, a sentence or a group of words. Each entity belongs to one or more entity classes (e.g. Barack Obama is person and politic). Overlapped annotations are supported. More.
Normalization Id assigned to a named entity. These annotations help in disambiguation. Normalization or canonicalization is the process for assigning an id or unique name to data that has more than one possible representation. This process is supported by dictionaries. For example an air filter in automotive can make reference to a cabin air filter or an engine air filter. With tagtog you can assign the correct reference to the entity. Each entity can have assigned one or more Ids (e.g. Id from Wikipedia, and an Id from your internal database).
Entity label

Label (boolean, string, enum) assigned to a named entity. Each entity can have assigned one or more labels.

Let's say you are extracting technical issues from reports in a CRM. When annotating those reports, you can add extra information to those entities (technical issues), for example severity. You can use this metadata to build a statistical model that retrieves the severity given a particular technical issue in a specific context.

Relation

Relation between two named entities. Each relation belong to one specific relation type (e.g. BRCA2 gene is located is_located on the chromosome 13 location).

Currently tagtog supports bidirectional relationships (A relates to B, and B relates to A) to connect two entities. If you want to connect more than two entities you need to create more than one relation.

In order to set or see relations, remember you need first to define at least one Relation Type in Settings > Relations. Otherwise the option to See or Add relations in the menu will be disabled.

Document label

Label (boolean, string, enum) assigned to a document or text. One or more labels can be assigned to a document. These annotations help in text or intent classification.

For example, if you are classifying emails in order to dispatch them to different departments, you can create a document label (enum) and classify emails as, for example, sales, technical support or legal. You can use the labeled data to train a text classifier model and classify emails automatically.

Hotkeys map

If you hover the mouse on the icon hotkeys, the list of hotkeys is displayed.

Hotkey Description
[ Previous document in the pool
] Next document in the pool
s Save document
r Start a new relation (only available when the annotation menu is visible)
d Delete annotation (only available when the annotation menu is visible)

Components

The editor is mainly divided into: Document area, Toolbar and Sidebar.

Document area

The text is displayed in the document area. There you can read and annotate text.

Text annotations

Once a piece of text is annotated, it becomes an entity. In tagtog you can operate with entities and do things as normalize them, relate them, etc.

The background color of each annotation depends on the color picked for the Entity Type. The font color changes based on the background color so the contrast is appropriate to read.


In green the gene names, in red the mutations. Font color change depending on the entity background color

Create new text annotations

A new text annotation is created by highlighting text with the mouse. Position the cursor at the beginning of the text you want to highlight. Press and hold your primary mouse button (commonly the left-button). While holding the mouse button, drag the cursor to the end of the text and let go of the mouse button. Once completed, all the text from the beginning to the end should be highlighted using the same Entity Type used in the previous text annotation. Currently the only way to change the entity type used for new annotations is by first changing the entity type of existing annotation.

Tips & tricks:

  • If you double-click, you annotate the word clicked.
  • If you try to annotate a word that starts or ends in space, the space won't be annotated.
Overlapping text annotations

Just create a new annotation that is contained within the span of existing one or that only overlaps part of it. Overlapping text annotations are recognizable at a glance while not disturbing you from reading the text.

Example of contained annotation, the car make Toyota is contained in the model Toyota Corolla

Three entities annotated, two annotations are overlapping

Sample of customer feedback. Two annotations (first in pink, second in yellow) within the same span representing a vehicle part and the failing part.


Pre-annotations

Automatic annotations created upon the creation or removal of other equal annotation (same entity type and same text). These type of annotations increase annotator's efficiency as potential candidates for new/to-remove annotations are automatically identified.

Pre-annotation type Description
Pre-selection Equal entities that are annotated upon manual annotation. E.g. if you annotate HER2 as Entity Type Gene, all occurrences of the string "HER2" will be annotated as Entity Type Gene. Pre-selections are visualized with a yellow border and the background color of the Entity Type. If you click on one of these pre-annotations, the pre-annotation will turn into a regular annotation.
Pre-deselection Equal entities that are removed upon manual removal, e.g. if you remove an existing annotation with the text "HER2" and Entity Type Gene, all annotations with the text "HER2" with the same Entity Type will be pre-deselected. Pre-deselections are visualized with a yellow border and white background color. If you click on one of these pre-deselections, the annotation will be removed.
Annotation Menu

By clicking on the primary mouse button (commonly the left-button) on a text annotation, you display the annotation menu.

These are the actions you can perform:

Action Hotkey Description
Delete d Delete annotation
Add relation r

Start a relation if a Relation Type is defined for the Entity Type of this entity. Once the relation is initialized, you can see highlighted the annotations you can relate your entity to. Other annotations are faded to indicate that you cannot relate the entity to these.

Click on one of the available entities to set the relation. From that moment, both entities will be connected. Both entities will display this icon on the top .

See relations - See the relations this entity is part of.
Change Type - Change the Entity Type of entity. If you hover the mouse on this menu item, the list of possible Entity Types will show up.
Normalizations

Each dictionary created for the entity type will appear as an input box. If the box is not empty, the entity is normalized to that value.

If you type at least 3 characters, a list of recommended dictionary entries will appear. To select a normalization simply choose an entry and press the key or click the ↵ icon.

Update dictionary from annotation editor

If you are using dictionaries, these are automatically updated upon manual normalization. If you add a new normalization, this will either add a new entry to the dictionary or update an existing entry with a new term. By design, the dictionary won't be updated when a normalization is removed.

You can always download the most updated version of a dictionary at Settings > Dictionaries.

Toolbar

The toolbar is located on the top of the document area. From it you can perform these actions:

Original source

In case the document or text comes from a known provider, clicking this link you access the original source.

For example, if you upload a PubMed document by PubMedId (PMID), tagtog understands the source. Clicking on this button you will go to the article in Pubmed.

Annotations from other users

Click on the user list to show all the project members. Click on the one you are interested, the version of the annotations for that user will be displayed on the document area.

Depending on your permissions you are able to edit or not the different versions of the annotations. A locker icon indicates that your permissions on that version are read-only.

More information on multi-user annotation

Import annotations from other versions

You might want to start from the annotations of other user or replace the master version with your annotations. For such cases, you can use the import option in the toolbar.

If you click on that option, a list of actions shows up:

Action Description
Copy to master Replace master's annotations version with the version displayed in the document area.
Copy to mine Replace your annotations with the version displayed in the document area.

The availability of these options depends on the role permissions. More information on multi-user annotation

Pre-annotations

Here you can select whether pre-selections or pre-deselections are activated or deactivated.

Each time you load a new document, the default settings from Settings > Annotations will apply. The changes in this menu won't change these default values and only will affect the current document. There are two options: pre-selections and pre-deselections. You can find more information about pre-annotations here.

Save a document

Each time a change is made in the document (e.g. new annotation or relation added), the Save button will turn into green to indicate there are changes to save. Click the button to save the changes.

Confirm a document

Usually users confirm the document once the annotations has been reviewed. This is used to indicate that this document can be used as training data for AI, or simply that all annotations has been reviewed by a human. There are different annotation flows you can use for your project.

To confirm a document click on the button with the icon

Once you have confirmed a document, many actions are disabled. You can undo the Confirm action by clicking again the button. It is a toggle button.

View / output mode

Here you can select which way you want to display or export the annotated document.

Annotated documents can be exported in various formats: output formats

tagtog Web Editor refers to the visualization of the annotated document in the annotation editor.


Clear annotations

Click on the button with the icon to remove all the annotations in the current document. This won't remove the document.


Remove document

Click on the button with the icon to remove the document from the document pool.


Document navigator

Each button with an arrow pointing to left and right. If you click on the button with the left arrow, the previous document in the pool will be loaded. If you click on the button with the right arrow, the next document in the pool will be loaded.


Sidebar

The sidebar appearance changes depending on how you configured your project. It will only display those actionable items for those entity types used in the project.

These are the components you can find in the sidebar:


Document labels

If you have any document label configured at Settings > Document Labels they will appear in this section in the side bar. Here the user can define the value of a document label for the current document. Once a change is made, you can save the document as usual.

Clicking on the icon you reset the label to the default value ?


Entity tally

The entity tally displays statistics for each entity type in the current document.

On the top of this section you find a summary with the number entities annotated and the entities not normalized. E.g. . Below the header, you can find the statistics for the annotations in the current document:


Entities are classified under Entity types. For each type some statistics are displayed: number of entities, manual annotated entities, automatic annotated entities, normalized entities


To digest the status of the annotated entities as fast as possible and reduce the noise of repeated annotations, you can group entities by:

Group by Description
Normalization(default)

Group annotations by normalization. Very useful to understand which concepts are annotated in the current document.

Entities not normalized are highlighted to spot them at a glance.

Clicking on the icon you expand a view with the information of each single annotation.

Text Group annotations by text. It is very common that in the same text, the same entity is repeated multiple times. Sometimes it is better to understand that only two unique entities have been identified in this text, e.g. gene BRCA2 and gene HER2 instead of getting the total number of annotations, included repeated ones.
No group Entities are not grouped. They will appear one by one, in the same order they appear in text. This is very handy if you need to review each single annotation. Soon we will enable hotkeys so you can navigate this menu fast and easily.

Entities grouped by normalization. If you click on any of the annotations listed, the annotation will be highlighted in the text of the document area.


Relation tally

It keeps the count of the relations defined in the current document. In this section you can remove existing relations, clicking on the button .

This tally only appears if you have relation types defined at Settings > Relations.