What are the guidelines for building a new pipeline?

Follow below guidelines to create a new pipeline using the Data Ingestion Engine.

General

  • All components should have a name​.
  • All components should have a class tagged.
  • All predicates should get a prefix related to the class.
  • Evaluate data quality using QC components.

Pipeline structure

The pipeline is divided in three main steps. Each step is consolidated by No Operation components.

  • Step 1: Data Source Definition / Import / Instance setup
  • Step 2: Linkage / Inference / Synchronization
  • Step 3: Configuration / Indexing

Pipeline_structure

Guidelines for Step 1

  • Each import gets a unique class assigned.
  • After import, actions are organized in 4 parallel branches:
    • Set the URI(s) [mandatory, except for RDF import]
    • Set the label(s) [mandatory]
    • Transform Literals for linkage [not mandatory]
    • Transform Literals set for final display [not mandatory]

Guidelines for Step 1-2

  • Each Expression should have a unit test. For ​Transform Literal ​component this means a unit test for all transformed properties.
  • Each Transform Literal component should be annotated in the comment section​.
  • Transformations applied to resources with the same filter having a similar context are grouped in the same “Transform” component.
  • Each Filter should have a unit test.
  • If two classes contain resources with equivalent URIs, merging before syncing is advised.

Guidelines for Step 3

  • Each Canonical Type is defined with a different icon and URI.
  • After publishing, at least one dashboard, one list template and one template pop-out for each canonical type are defined and set as default.