Pubtator pipeline

Description

Data model for ingesting the Pubtator data source.

Data set content

The Pubtator source contains annotated Publications for PubMed abstracts and PMC full-text articles. The annotations specify biomedical concepts, including Genes, Diseases, Cell Lines, Mutations, Chemicals and Species with name and identifier information.  The data source is public and freely available for download from the Pubtators web system via RESTful API or FTP.

For more information, please reach out to us. 

Data model

The pipeline base configures the following data into canonical types:

  1. Publication
    This is the main category containing the publications with annotations set as filters. This category is mapped with the federated data.

  2. Chemical
    Linking to the Publication category for identified chemicals and mapped with the federated data.

  3. Cell line
    Linking to the Publication category for identified cell lines and mapped with the federated data.

  4. Organism
    Linking to the Publication category for identified species and mapped with the federated data.

  5. Disease
    Linking to the Publication category for identified diseases and mapped with the federated data.

  6. Gene
    Linking to the Publication category for identified genes and mapped with the federated data.

  7. Variant
    Linking to the Publication category for identified mutations and mapped with the federated data.

For more information, see the Data Ingestion Configuration in the Data tab of your DISQOVER instance.

Download and import

Download the pipeline yml file(s) on your local computer. You can import the file(s) into the DISQOVER Data Ingestion Engine (in a new or existing pipeline) by clicking on the opened pipeline menu bar and choosing import. 

Pipeline_Pubtator.yml

Base pipeline to import Pubtator data.