General Guidelines
Use the following guidelines when creating a pipeline to find the optimal balance between disk space, memory and execution run time.
[+] net positive effect
[-] net negative effect
|
Disk Space |
Memory |
Time |
Import Only import properties that will be used. |
++ |
|
+ |
Import Divide big files in batches in one importer, this decreases time and the possibilities of errors. |
|
++ |
+ |
Merge Classes If multiple classes should be merged, perform a sequential merging to the one class having the highest number of instances. |
+ |
|
+ |
Merge Classes vs. Infer Merge Classes component takes precedence over Inferring components. |
|
- |
+ |
Create Relationship (by identifier) vs. Create Relationship (by label) Choose Create Relationship "by identifier" over "by label " whenever possible. |
|
|
++ |
Remove Resources Obsolete resources in a class need to be removed in the pipeline as early as possible to increase subsequent components’ efficiency. |
+ |
+ |
+ |
Create Compact Class When many resources/predicates become obsolete, create a compact class. This increases subsequent components’ efficiency. Do this as early as possible in the pipeline. |
- |
+ |
+ |
Publish in DISQOVER Toggle on “Automatically drop predicates" in the Publish in DISQOVER component Indexer. |
+ |
|
+ |
Importer Overview
Information on the different types of importers can be found in the table below. They list the pros and cons of each type so an optimal decision of format can be made that fits the data and use case.
Importer Type |
Advantages |
Disadvantages |
Import RDF |
|
|
Import XML |
|
|
Import JSON |
|
|
Import Excel |
|
|
Import CSV |
|
|
Import Identifier Block |
|
|