Welcome to tagtog
This is the official documentation for tagtog, an efficient text annotation tool ready to train AI. Served on the cloud or on-premises. Easy.
Annotate manually or automatically. Use already trained machine learning models or train your own model to annotate and work at scale. Simply annotate manually and tagtog will generate a model with no hassle in deployment or maintenance. Save time and costs.
Invite others to annotate in your projects and collaborate to build together awesome stuff: find custom insights in text automatically, augment or index your data, train your own AI, classify text, etc.
Below are some things you can with tagtog. For more in-depth information or tutorials, use the navigation to the left to explore our documentation.
What can you achieve?
Manual text annotation
Use the text annotation editor to quickly annotate and normalize entities, relations and more. If you just need to annotate at document level, you can use labels to tag documents. The user interface adapts automatically so you only see what you need. No more complex or non-intuitive annotating systems/interfaces, engage your domain experts from the very beginning.
Too much work for a single person? Invite others and work in group. Each annotator can annotate the same text to facilitate the review process. Once the annotations are completed, you can export them to JSON format.
Not fast enough? Use automatic annotations to speed up annotator tasks. Use trained models or customize your own. Users will see the automatic annotations and just make an action when predictions are wrong. Powered by NLP and active learning algorithms, tagtog uses the corrections to improve accuracy and continuously reduce the load of annotators' work.
Do you need to deal with PDFs? Use the PDF Annotation tool to annotate native PDFs within tagtog.
Get relevant insights from text automatically
Insights are just meta information (annotations) on the top of the text analyzed. You can either combine some of our already trained machine learning models or create a custom model adapted to your problem to automatically generate this meta-information without the hassle of deployments, setup, etc. Just annotate, you don’t need to code, deal with complex installations or juggle with data.
1Provide tagtog with as many initial examples as possible. Bootstrap a model with pre-annotated data or dictionaries with the terminology you would like to identify automatically. If you don't have these resources, no worries. Just start from scratch.
2Train a model: import text (cloud or on-premises) through the API or the user interface. The text is annotated automatically. You or a team just need to correct the wrong model predictions in the annotation editor. Repeat this step until you get the accuracy level required.
3Congratulations! You get a machine learning model ready for production use. Simply send text to the API or user interface to retrieve relevant insights. If you detect any problem with the results, you can always continue annotating to fine tune the model.
Index your data
Use already trained models or simply train your own model within tagtog to annotate automatically your data. These annotations will be the meta-information used to index your data. This data augmentation improves the discoverability and it is key to make search quicker and more intuitive for users.
1Train a machine learning model as described in the previous use case or use an already trained model.
2Send your text items to tagtog using the API. Get the annotations in JSON format and decide whether to store these in your own system or keep them at tagtog. Your data is indexed.
3If you keep these annotations at tagtog, you can use the concept search via API as your search engine. Otherwise, you can store this data locally as meta-information for your own search engine.
Train your own AI
With tagtog you can quickly create training data to feed your AI models.
1Use tagtog to annotate text and create training data. This is a collaborative tool, so you can invite others to share the effort. Augment your data: label documents, annotate text, relations, etc. Export this data to your model to train it.
2Import the new model's predictions to tagtog and use the annotation editor to correct any wrong prediction. The user interface is built to minimize annotator's effort while maximizing the input for your model. Once done, export the new training data to your model.
3Repeat the last step until you have tuned your model.
Already trained high-quality NER models ready to extract entities of interest in your domain. You can combine several models to customize your solution. Find below some examples. More are coming.
|Vehicle Parts. Recognize automotive components in text. Perfect for vehicle forums, automotive reports, garages, CRMs, etc.|
|GGP (Gene or Gene Product). Recognize gene names in text.|
|Game of Thrones. Recognize more than 2000 characters of Game of Thrones in text. Perfect for forums or fans.|
NLP annotation editor
tagtog comes with an advanced text annotation tool for data augmentation. Its design is based on the feedback from annotator groups.
Focus is key in order to generate a high-quality annotated corpora, and we know this. Each single mistake in the annotations means a step backward and more effort to achieve the desired results. With this in mind and to make the user feel comfortable while reading, we have created a minimalist user interface where the text is presented with natural spacing, same as any user can be familiar with. By protecting the reading experience the tool is more accessible. This is specially important when you involve subject matter experts in your annotation projects.
Usually annotation tasks involve dealing with big volumes of texts. Speed and efficiency are essential to minimize costs and time. tagtog includes features as automatic annotations, overlapping text annotations or full-text annotation, that reduce significantly the time required to annotate text.
Do you need to annotate or process PDFs? Use the PDF annotation tool, it is fully integrated with tagtog web interface.
With the annotation editor you can add different types of annotations:
|Entity||Span of text representing a named entity. It can be any span: a part of a word, a word, a sentence or a group of words. Each entity belong to one or more entity classes (e.g. Obama is a person and a politic).|
|Normalization||Id assigned to a named entity. These annotations help in disambiguation. For example an
Attribute (boolean, string, enum) assigned to a named entity. Let's say you are extracting technical issues from reports in a CRM. When annotating those reports, you can add extra information to those entities (technical issues), for example, the severity for each of them.
Entity labels can be used with specific entity groups (e.g. only can be set for technical issues) or to all entities (e.g. we can set comments for all entities)
|Relation||Relation between two named entities. Each relation belong to one specific relation type (e.g.
|Document label||Label (boolean, string, enum) assigned to a document or text. These annotations help in text or intent classification. For example, if you are classifying emails in order to dispatch them to different departments, you can create a document label (enum) and classify emails as, for example,
Define your annotation guidelines within the application and invite others to join your project. A group of users can annotate together the same corpus, each document or piece of text can be annotated separately by each member to facilitate the review process. You can compare the annotations of each member and use the most convenient. Once annotations are ready, each document can be marked as completed to make other team members aware that it is ready for review. There is an admin role with privileges to manage annotators' work.
It is easier than ever to collaborate and train machine learning models together. Each time a document is marked as completed, a model is being trained automatically and can be used right away. You can of course use the tool only for manual annotation.
Cloud or On-premises
|On the Cloud. To use tagtog on the cloud, you don't need to install anything; just sign up, create a project, and start annotating. Out of these annotations you can create and use a machine learning model without worrying about hardware requirements, databases, scalability, deployments and any other hassle or cost related to setting up a production environment and maintaining it.|
|On-premises. If you need to meet strong privacy regulations, legal requirements, or you simply want to make a custom installation within your infrastructure or any public cloud (AWS, Google, Azure, etc) , tagtog is also served on-premises. This is a self-contained version (no Internet connection is required) of tagtog, no data will leave your infrastructure. To make the installation the easiest possible we offer tagtog contained in a Docker image.|
Each fragment of text annotated or processed using tagtog is indexed within your project. The search engine makes easier to discover patterns or find actionable insights. This is specially handy when you have trained a model that annotates automatically relevant information. For example, if you have trained a model that has extracted skills in thousands of CVs, those got indexed and you can search across them using normalizations, entity classes, etc. You can do things like:
entity:softskill:time-management to retrieve all CVs from people who are good organizing their work time.
This search engine can be used through the user interface or the API. You can use directly the API as your search interface or simply augment your existing engine.
Annotators can use it to find texts that are not annotated yet or documents related to their specific annotation task.
Import text from known sources
When you need training data, bring real content directly from the source with no effort. tagtog supports shortcuts and automation for these resources:
|PubMed is the largest and the most widely used database of life sciences and biomedical literature. It contains more than 26 million scientific or clinical articles going back to 1966.|
|PubMed Central (PMC) is a full-text archive of biomedical and life sciences journal literature with more than 4.7 million articles.|
|tagtog uses the Twitter API to import tweets into tagtog. Coming soon.|
|Reddit is one of the largest communities on the Internet, tagtog uses the Reddit API to import discussions into tagtog. Coming soon.|
|With more than 5 million pages, Wikipedia is the largest encyclopedia ever. Wikidata acts as a central storage repository for Wikipedia and consist mainly of uniquely identified items each one having a label, a description and any number of aliases. This information is typically used for normalization tasks. Coming soon.|
Access our repository of public corpora or share your own.
You can import these corpora into tagtog and build a machine learning model that generates similar annotations automatically. You can also download, reuse or extend these corpora.
|FlyBase||IDP4+||V300||Go to corpora full list|