Annotation platforms are tools that facilitate the procedure of transferring external information to any type of raw resources.
The information we add varies from one case to another. Images can be annotated by adding meta-data such as the date of creation or name of items depicted in them. Text documents can be enriched by linguistic knowledge such as part of speech tags or named entity tags. The annotation might be trivial to
humans, albeit not for machines. For further information about linguistic annotation, you may refer to my previous post: “Annotation, Motivations and Methods”.
In this blog post, I will focus on a text annotation tool called “INCEpTION”1 which is developed by researchers from the UKP Lab at TU Darmstadt. One of the use cases according to INCEpTION project website is the Impresso Project which provides newspapers archives from the 18th century to the present in several languages.
The version of INCEpTION that I experimented with to write this blog post is the standalone INCEpTION — 0.13.0.
- Project Creation
The creation of an annotation process in INCEpTION requires pre-existing familiarity with the technical jargon of the annotation project (for example terms like Layer or Tagset).
INCEpTION shows high adaptability toward new projects. This adaptability can be seen at the first step: importing the textual input data. A very trivial although important feature is the possibility to import text documents in formats such as txt, pdf, conll, etc. This feature helps the project administrator to save time altering their input to the desirable format of the project in contrast to other annotation platforms which require the input in a specific format. These inputs may already contain some previous annotations hence the CONLL files.
The user has the option of controlling the page size, font and other settings to be more comfortable annotating text data. INCEpTION doesn’t allow creating folders ti import files inside a project. This might be an issue in projects dealing with huge number of files.
The adaptability can also be seen when it comes to the definition of annotation layers and tagsets. Inception provides very common linguistic layers, such as Dependency, Lemma, Named Entity, Part of Speech, etc. However, the project administrators are not confined to use these predefined layers and can define their own layers, even edit the currently available layers and adapt them to their own projects using the interface.
Compared to the brat annotation tool, INCEpTION has the advantage of an interface for defining users with different access levels: administrator, curator and annotator.
After the project is created, the annotation process will begin.
A manual annotation cycle consists of devising guidelines, labeling by annotators, evaluation based on inter-annotator agreement and revision of guidelines based on the labeling of the annotators. This cycle continues until an acceptable IAA is reached.
I will briefly describe how INCEpTION contributes to each of these steps.
This rudimentary step is also taken care of by the platform. INCEpTION has an interface for the annotator administrator to upload the guidelines. The annotators can access the guidelines before and during the annotation process.
INCEpTION provides very practical features for facilitating the annotation process such as Human in the Loop learning, the possibility of integerating knowledge bases, annotation recommenders.
1-1 Active Learning
Active learning is the capability of the system to learn a model while the annotators are annotating labels and relations on
the data. This is a quite interesting feature which can accelerate the annotation process in annotation projects where the data is big enough that the model first learns the pattern of the data and then acquires the capability to suggest labels. However, the human annotator makes the final decision whether to keep or discard the annotation. The actions by the annotator will also improve the active learning process hence this functionality of also referred to as Human in the Loop Learning.
1-2 Knowledge Bases
INCEpTION platform as a semantic annotation platform allows for the use of knowledge bases. The user can create the knowledge base using the interface or import it using rdf files or even connect to an external knowledge base such as YAGO or SPARQL. The annotators have access to the concepts defined in the knowledge base to annotate on the raw data as an annotation layer.
1-3 External Recommenders
One of the most intriguing features of INCEpTION is the use of external recommenders. The use of available NLP trained machine learning models may be a challenge to scholars who are less familiar with programming. Tools which provide an interface for the use of such models build bridges to overpass the challenges of applicability of these models, such as platform-dependency. If manual annotation is the purpose of the project, recommenders can help accelerate the process by recommending tags to the human annotators. This process is called human-in-the-loop automatic annotation.
Developers of INCEpTION tools have provided the tools to connect external recommenders to interact with the annotation tools. The output of these models is supposed to be adapted to a standard format to be readable by INCEpTION. However, NLP pipeline trained-models such as Spacy or Stanford CoreNLP servers are already available on the GitHub page of the project and can be easily integrated in the software.
The software does not yet support external recommendation for relation annotations such as co-reference annotation.
Use of the SPACY external recommender for Named Entity Annotation
The INCEpTION project interface provides a tool to evaluate the annotation based on different inter-annotator agreement metrics. This is a necessary and practical feature in an annotation platform. Any annotation project needs to be evaluated with a quantifiable metric. Thus, such a feature will relieve the administrator from the burden of using another platform or code script to calculate the inter-annotator agreement coefficients of the annotation.
4-Curation or Revision
Curation is needed to revise the annotators’ annotations. Revision of the annotations may be primarily needed in the first steps of the project when the annotators are learning the annotation process. INCEpTION also provides this functionality.
The INCEpTION project gives the administrator the tool to supervise the project in a glance. By using the monitoring feature, the project administrator observes the progress of each annotator.
The INCEpTION project is an open-source platform for textual annotation. This tool is enriched by many extraordinary features which suggest it as a very practical annotation platform for any text annotation project. Its adaptability and flexibility make it a top choice among other available platforms. The distinguished feature of using recommenders accelerates annotation processes.
The INCEpTION project has lots of practical features when administrating an annotation project from the beginning. However it is a bit challenging to import annotated documents to the annotation platform and use the other functionalities such as evaluation.
The INCEpTION project is still under development and embedding more and more features. The development team is highly responsive through mailing lists and GitHub, responding to issues that the users may confront.
Edited by Thomas Durlacher and Sytze van Herck
- Klie, J.-C., Bugert, M., Boullosa, B., Eckart de Castilho, R. and Gurevych, I. (2018): The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation. In Proceedings of System Demonstrations of the 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, New Mexico, USA (pdf) (bib) (flyer)