Follow below guidelines to create a new pipeline using the Data Ingestion Engine.
General
- All components should have a name.
- All components should have a class tagged.
- All predicates should get a prefix related to the class.
- Evaluate data quality using QC components.
Pipeline structure
The pipeline is divided in three main steps. Each step is consolidated by No Operation components.
- Step 1: Data Source Definition / Import / Instance setup
- Step 2: Linkage / Inference / Synchronization
- Step 3: Configuration / Indexing
Guidelines for Step 1
- Each import gets a unique class assigned.
- After import, actions are organized in 4 parallel branches:
- Set the URI(s) [mandatory, except for RDF import]
- Set the label(s) [mandatory]
- Transform Literals for linkage [not mandatory]
- Transform Literals set for final display [not mandatory]
Guidelines for Step 1-2
- Each Expression should have a unit test. For Transform Literal component this means a unit test for all transformed properties.
- Each Transform Literal component should be annotated in the comment section.
- Transformations applied to resources with the same filter having a similar context are grouped in the same “Transform” component.
- Each Filter should have a unit test.
- If two classes contain resources with equivalent URIs, merging before syncing is advised.
Guidelines for Step 3
- Each Canonical Type is defined with a different icon and URI.
- After publishing, at least one dashboard, one list template and one template pop-out for each canonical type are defined and set as default.