Dictionaries are used to normalize entities.
The dictionary file to upload must be a
.tsv file (tab-separated values) (or a compressed
.tar.gz containing a single
The dictionary format should follow this pattern:
entity_1_id recName1 recName2 ... entity_2_id recName1 recName2 @@@ altName1 altName2 ... ...
The syntax is simple:
Each entity is defined in a new line. All columns are separated by tabs.
The first column is the entity's unique id. It can be an internal id (e.g. your database) or recognized (using known sources as Wikipedia).
After the id, a list of names follows. These are considered different names (synonyms) of the entity.
You can define recommended names and, optionally, alternative names. At least one recommended name must be given. Alternative names are those placed after the special delimiter
@@@ (also separated within tabs). Use them when you know that some names appear less frequently than the standard ones. With this information, the system can handle synonyms better.
👉 Here are some sample, reference dictionaries.