Inputs & Outputs

Input formats

Raw

Input type Description
Text Plain text.
File See below
URL Web address pointing to any website. e.g. http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000245.v1.p1. This is an experimental feature and you may find errors rendering the text depending on the HTML being analyzed.
PMID PubMed is a free online database of references on life sciences. Each record in the PubMed database is assigned a special number to identify it. This is the PMID. Any PMID is only a number, e.g. 12781165. It also accepts inputs as: PMID12781165. You can introduce a list of documents separated by comma and each of them will be uploaded. e.g. 25821226,12781165. You can find this id at the bottom of the document at PubMed.
PMCID PubMed CentralĀ® (PMC) is a free archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). Each record in the PubMed Central database is assigned a special number to identify it. This is the PMCID. Any PMCID is a number plus the PMC prefix, e.g. PMC165443. You can introduce a list of documents separated by comma and each of them will be uploaded. e.g. PMC165443,PMC165213. You can find this id usually at the top of the document at PubMed Central. This feature relies on the availability of the PubMed provider.

Files

You can import files to tagtog. These are the formats supported:

File extension Description
txt Any plain text file
pdf Two variants are possible: NativePDF (supported on Cloud-Large and On-Premises ML only) to annotate directly on top of the PDF, and Simple to annotate on a stripped out plain text representation of the PDF.
xml

NCBI Journal Publishing Tag Set (versions JATS 1.0 and NLM 2.x and 3.0). This includes all PLOS journals or F1000Research articles.

BioMed Central format. This includes all articles in BioMed Central, ChemistryCentral, or SpringerOpen, among others.

html Sections are not recognized. Currently, the text content is just stripped out.
source code files Supported programming language extensions include: .java, .scala, .js, .py, .python, .jsx, .c, .h, .mm, .M, .cpp, .sql, .cs, .css, .r, .vb, .php, .swift, .go, .m, .sass, .less, .rb, .sh, .ts, .tsx, .shell, .sh

Bundle files

File extension Description
tar.gz tarball gzip. Bundle of files with accepted format. Coming soon.
zip zip file. Bundle of files with accepted format. Coming soon

Annotation input formats

You can import files with annotations. This is useful if you want to import documents that were annotated outside tagtog (for example by your own machine learning model) or you want to update the annotations for a specific document.

Format Description
anndoc

Use the anndoc format to upload via API both a document's content (plain.html) and its annotations (ann.json). Example

Output formats


Format Description
ann.json Only annotations. Official documentation
html, xml No annotations provided within this format, only content.
txt Plain text. No annotations provided within this format, only content.
entitiestsv EntitiesTsv documentation
pubannotation Official documentation Coming soon

Other formats?

We are currently experimenting with other formats to ease your work. Stay tuned :smirk::bird:.