TERMite Enrichment plugin

Description

The TERMite Enrichment plugin is a batch-transform* plugin that communicates with the TERMite API from SciBite to enrich data using their NLP service.

* More information about the different DISQOVER plugin types can be found here.

Download and installation

The plugin code can be downloaded from the table below and installed on your DISQOVER instance. Following steps are necessary to install the plugin:

  1. Unzip the downloaded plugin in the /disqover/data/plugins/ folder on your DISQOVER instance. 
  2. Restart your DISQOVER instance.

DISQOVER version

TERMite version

Download plugin code

Version 6.10 and higher

6.30

TERMite Enrichment plugin

After the installation on the DISQOVER server, the TERMite Enrichment plugin will show up in the list of components under the Plugin tab.

Adding the component behaves the same way as the other default components of the Data Ingestion Engine.

Component fields

Basic section

  • Class: The pipeline class from which you want to select data to process in the plugin (required).
  • Filter: The option to apply a filter to this class to select a subset of data to process

Input predicates section

  • TERMite server IP: The IP or DNS of the TERMite API service (required).
  • Port: The port of the server for usage of the API. Default is set to None.
  • Text: The input predicate from the selected class to push to the TERMite service for NLP (required).
  • Entities List: The list of entities to be screened in the input predicate by the TERMite service. The list should be comma separated e.g., DRUG, GENE, INDICATION (required).

Advanced section

  • Number of Processes: The amount of processing cores to be used by the plugin. Increasing the number will allow for faster processing of data but will put a higher load on the DISQOVER server as well as the TERMite service.

Output predicates section

  • Response: A newly defined predicate in which the output of the TERMite service will be stored (required).

Termite fields

Output

The data stored in the response predicate will be JSON structured data: the response from the TERMite API service, without the API response information. Different fields can be further extracted from this output in the pipeline.

An example of a Transform Literal function to extract the external URI is: 

function GetEntityInfo(_entity)
ifthen(ListNotEmpty(DictGet(StrParseJSON($$output_predicate.lit), _entity)),
       Map(DictGet(StrParseJSON($$output_predicate.lit), _entity), _el,
       "{\"name\": \"" + DictGet(_el, "name") + "\", \"external_uri\": \"" + ifthen(DictHasKey(_el, "external_uri"), FillNull(DictGet(_el, "external_uri")), null) + "\"}"),
       []);

#Example to extract information from the DRUG entity
set @extracted_drug_info = GetEntityInfo("DRUG");

Here, $$output_predicate.lit is the defined output predicate. The function creates a dictionary with the name and external URI of the recognised entity. This can then be further processed in e.g., an Extract Distinct component and Synced/Merged with Public Data.

Technical information and requirements

To limit the load on the TERMite service, the responses captured from the TERMite API is cached. This means the response data is stored locally on the DISQOVER server for faster retrieval when rerunning the plugin component in the Data Ingestion Engine and limiting the load on the TERMite service.

For more information on this Asset, please reach out to us!