Custom technical lineage JSON file examples
Warning The “single-file definition” option for custom technical lineage is now deprecated and will officially reach its end-of-life on July 31, 2026. We encourage you to transition to the “batch definition”, if you haven't already.
- Batch definition
- Single-file definition
In this section, we provide three detailed examples of how to configure your metadata, assets and lineage JSON files for the batch definition option.
Example 1 is the least complex. It shows a configuration for working with data sources that conform to the traditional (System) > Database > Schema > Table > Column hierarchy. Examples 2 and 3 are more complex examples.
Helpful considerations to keep in mind
-
Don't use asset files in the following scenarios:
- Your data source consists of the traditional (System) > Database > Schema > Table > Column asset types and hierarchy. In that case, full names are automatically, correctly constructed.
- You are working with assets that are not part of that traditional asset hierarchy (in which case, you need to use the
propsproperty to achieve stitching) and you definepropsin one or more lineage files.
- Don't use the
propsproperty for the traditional (System) > Database > Schema > Table > Column hierarchy.Typical scenario in which you need to use the props propertyFirst off, let's examine why you don't need to use the
propsproperty for the traditional database > schema > table > column hierarchy. Let's say you have an assets file in which you define a leaf kind of asset:{ "nodes": [{ "name": "Snowflake", "type": "System" }, { "name": "DB1", "type": "Database" }, { "name": "PUBLIC", "type": "Schema" }], "parent": { "name": "T1", "type": "Table" }, "leaf": { "name": "COL1", "type": "Column" } }In this case, the full name of the leaf asset (in this case a Column asset) is automatically and correctly constructed as: "snowflake>DB1>PUBLIC>T1>COL1".
However, for the following custom hierarchy, you can use the
propsproperty to specify the correct full name of the leaf asset, in this case a File asset.{ "nodes": [{ "name": "gcs", "type": "GCS File System" }, { "name": "bucket1", "type": "GCS Bucket" }, { "name": "/", "type": "Directory" }], "parent": { "name": "examples", "type": "Directory" }, "leaf": { "name": "data.xls", "type": "File" }, "props": { "fullname": "gcs > bucket1/examples/data.xls", "domain_id": "<domain in which the file asset resides>" }If you don't provide the full name of the leaf asset, it will be constructed using the default traditional formatting (system) > database > schema > table > column. The result would be the full name: "gcs > bucket1 > / > examples > data.xls". However, this is not the correct construction for File assets. The full name provided in the example above ensures the correct construction, so that stitching is achieved.
Examples
Scenario
In this example, we are working with a single data source with the traditional (System) > Database > Schema > Table > Column asset types and hierarchy. Therefore, we don't need to include an asset file or use the props property.
Which files to include
We are including the following files:
__CUSTOM-LINEAGE__
├── lineage.json
├── metadata.json
└── source_codes
├── source_code_view_1.txt
└── source_code_view_2.txt
- A metadata JSON file. Tip You always need exactly one metadata JSON file.
- One lineage JSON file, to define the lineage relationships and to show:
- Column-level lineage.
- Column-level lineage from the same source (
src), which is achieved by adding another entry with the same source. - Indirect lineage, in other words the target (
trg) is a parent asset. - Table-level lineage, in other words both the source and the target are a parent asset.
- Two source code files in a source_code directory.Tip Source code files are always optional.
The metadata JSON file
{
"version": 3,
"application_name": "custom lineage batch example 1",
"asset_types": {
"System": {
"uuid": "00000000-0000-0000-0000-000000031302"
},
"Column": {
"uuid": "00000000-0000-0000-0000-000000031008"
},
"Table": {
"uuid": "00000000-0000-0000-0000-000000031007"
},
"Database": {
"uuid": "00000000-0000-0000-0000-000000031006"
},
"Schema": {
"uuid": "00000000-0000-0000-0001-000400000002"
}
}
}
The lineage JSON file
[{
"src": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col1",
"type": "Column"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "VIEW_1",
"type": "Table"
},
"leaf": {
"name": "col1",
"type": "Column"
}
}
}, {
"src": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col1",
"type": "Column"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "VIEW_1",
"type": "Table"
},
"leaf": {
"name": "col2",
"type": "Column"
}
},
"source_code": {
"path": "source_codes/source_code_view_1.txt",
"highlights": [{
"start": 0,
"len": 43
}],
"transformation_display_name": "view_1 creation"
}
}, {
"src": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col2",
"type": "Column"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "VIEW_1",
"type": "Table"
}
}
},
{
"src": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T2",
"type": "Table"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "VIEW_2",
"type": "Table"
}
},
"source_code": {
"path": "source_codes/source_code_view_2.txt",
"highlights": [{
"start": 0,
"len": 39
}],
"transformation_display_name": "view_2 creation"
}
}
]
Scenario
In this example, we are working with two data sources:
- A source system with the following asset types and hierarchy: GCS File System > GCS Bucket > Directory > File.
- A target system with the traditional (System) > Database > Schema > Table > Column asset types and hierarchy.
To achieve stitching, we need to use the props property to define the correct full-name construction for the File asset in the source system. We can define props in a lineage file. By doing so, we don't need to include any asset files.
Which files to include
We are including the following files:
__CUSTOM-LINEAGE__
├── lineage.json
├── metadata.json
- A metadata JSON file. Tip You always need exactly one metadata JSON file.
- One lineage JSON file, with the
propsproperty, to define the correct full-name construction for the File asset in the source system and achieve stitching.Tip By defining thepropsin the lineage file, we don't need to include an assets file. If, for example, we had several lineage files, it would be easier to include an assets file and define thepropsthere once, rather than define thepropsin each lineage file.
We are not including source code files. They are always optional.
The metadata JSON file
{
"version": 3,
"application_name": "custom lineage batch example 2",
"asset_types": {
"System": {
"uuid": "00000000-0000-0000-0000-000000031302"
},
"Column": {
"uuid": "00000000-0000-0000-0000-000000031008"
},
"Table": {
"uuid": "00000000-0000-0000-0000-000000031007"
},
"Database": {
"uuid": "00000000-0000-0000-0000-000000031006"
},
"Schema": {
"uuid": "00000000-0000-0000-0001-000400000002"
},
"File": {
"uuid": "00000000-0000-0000-0000-000000031304"
},
"Directory": {
"uuid": "00000000-0000-0000-0000-000000031303"
},
"GCS Bucket": {
"uuid": "00000000-0000-0000-0001-002700000002"
},
"GCS File System": {
"uuid": "00000000-0000-0000-0001-002700000001"
}
}
}
The lineage JSON file
[{
"src": {
"nodes": [{
"name": "gcs",
"type": "GCS File System"
}, {
"name": "catingestiontest",
"type": "GCS Bucket"
}, {
"name": "/",
"type": "Directory"
}, {
"name": "ingestion-test",
"type": "Directory"
}],
"parent": {
"name": "ingestion copy",
"type": "Directory"
},
"leaf": {
"name": "mytest.csv",
"type": "File"
},
"props": {
"fullname": "f13bf705-13a4-44c9-843e-f341feccfb6e > catingestiontest/ingestion-test/ingestion copy/mytest.csv/1611609340099809",
"domain_id": "fea1b0b0-705f-4e0d-b5eb-1f21132cc718"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col1",
"type": "Column"
}
}
}]
Scenario
In this example, we are working with the same two data sources as in Example 2:
- A source system with the following asset types and hierarchy: GCS File System > GCS Bucket > Directory > File.
- A target system with the traditional (System) > Database > Schema > Table > Column asset types and hierarchy.
To achieve stitching, we need to use the props property to define the correct full-name construction for the File asset in the source system. However, this time we have multiple lineage files. Instead of defining props in each lineage file, let's include an asset file. That way we only have to define props there once.
Which files to include
We are including the following files:
__CUSTOM-LINEAGE__
├── assets.json
├── lineage-1.json
├── lineage-2.json
├── metadata.json
- A metadata JSON file. Tip You always need exactly one metadata JSON file.
- An assets JSON file, in which provide the list of data objects we want to include in the technical lineage, and define
props. - Two lineage JSON files, to define the lineage relationships.
We are not including source code files. They are always optional.
Metadata JSON file
{
"version": 3,
"application_name": "custom lineage batch example 2",
"asset_types": {
"System": {
"uuid": "00000000-0000-0000-0000-000000031302"
},
"Column": {
"uuid": "00000000-0000-0000-0000-000000031008"
},
"Table": {
"uuid": "00000000-0000-0000-0000-000000031007"
},
"Database": {
"uuid": "00000000-0000-0000-0000-000000031006"
},
"Schema": {
"uuid": "00000000-0000-0000-0001-000400000002"
},
"File": {
"uuid": "00000000-0000-0000-0000-000000031304"
},
"Directory": {
"uuid": "00000000-0000-0000-0000-000000031303"
},
"GCS Bucket": {
"uuid": "00000000-0000-0000-0001-002700000002"
},
"GCS File System": {
"uuid": "00000000-0000-0000-0001-002700000001"
}
}
}
Assets JSON file
[{
"nodes": [{
"name": "gcs",
"type": "GCS File System"
}, {
"name": "catingestiontest",
"type": "GCS Bucket"
}, {
"name": "/",
"type": "Directory"
}, {
"name": "ingestion-test",
"type": "Directory"
}],
"parent": {
"name": "ingestion copy",
"type": "Directory"
},
"leaf": {
"name": "mytest.csv",
"type": "File"
},
"props": {
"fullname": "f13bf705-13a4-44c9-843e-f341feccfb6e > catingestiontest/ingestion-test/ingestion copy/mytest.csv/1611609340099809",
"domain_id": "fea1b0b0-705f-4e0d-b5eb-1f21132cc718"
}
}, {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col3",
"type": "Column"
}
}]
Lineage JSON file (1 of 2)
[{
"src": {
"nodes": [{
"name": "gcs",
"type": "GCS File System"
}, {
"name": "catingestiontest",
"type": "GCS Bucket"
}, {
"name": "/",
"type": "Directory"
}, {
"name": "ingestion-test",
"type": "Directory"
}],
"parent": {
"name": "ingestion copy",
"type": "Directory"
},
"leaf": {
"name": "mytest.csv",
"type": "File"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col1",
"type": "Column"
}
}
}]
Lineage JSON file (2 of 2)
[{
"src": {
"nodes": [{
"name": "gcs",
"type": "GCS File System"
}, {
"name": "catingestiontest",
"type": "GCS Bucket"
}, {
"name": "/",
"type": "Directory"
}, {
"name": "ingestion-test",
"type": "Directory"
}],
"parent": {
"name": "ingestion copy",
"type": "Directory"
},
"leaf": {
"name": "mytest.csv",
"type": "File"
}
},
"trg": {
"nodes": [{
"name": "snowflake",
"type": "System"
}, {
"name": "DB1",
"type": "Database"
}, {
"name": "PUBLIC",
"type": "Schema"
}],
"parent": {
"name": "T1",
"type": "Table"
},
"leaf": {
"name": "col2",
"type": "Column"
}
}
}]
This section shows some example lineage.json files for simple custom technical lineage and advanced custom technical lineage.
Each example can be used to generate technical lineage graphs in Collibra to represent the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables with the following columns:
|
IOT_JSON |
IOT_DEVICES_PER_COUNTRY |
|---|---|
|
CCA3 |
COUNTRY |
|
DEVICE_ID |
NUMBER_DEVICES |
Example JSON file for a simple custom technical lineage
In the following example, the tree section defines the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables and columns. The tables are in a schema named COLLIBRA. The COLLIBRA schema is in a database named COLLIBRA and a system named Databricks.
Important If you define the System asset in your lineage.json file, the useCollibraSystemName property in your lineage harvester configuration file (deprecated) must be set to true; otherwise, relations will not be created between the relevant assets in Collibra and stitching will fail.
To show the transformation code at the bottom of the technical lineage graph, specify the mapping and source_code properties in the lineages section.
{
"version": "1.0",
"tree": [
{
"name": "Databricks",
"type": "system",
"children": [
{
"name": "COLLIBRA",
"type": "database",
"children": [
{
"name": "COLLIBRA",
"type": "schema",
"children": [
{
"name": "IOT_JSON",
"type": "table",
"leaves": [
{
"name": "CCA3",
"type": "column"
},
{
"name": "DEVICE_ID",
"type": "column"
}
]
},
{
"name": "IOT_DEVICES_PER_COUNTRY",
"type": "table",
"leaves": [
{
"name": "COUNTRY",
"type": "column"
},
{
"name": "NUMBER_DEVICES",
"type": "column"
}
]
}
]
}
]
}
]
}
],
"lineages": [
{
"src_path": [
{
"system": "Databricks"
},
{
"database": "COLLIBRA"
},
{
"schema": "COLLIBRA"
},
{
"table": "IOT_JSON"
},
{
"column": "CCA3"
}
],
"trg_path": [
{
"system": "Databricks"
},
{
"database": "COLLIBRA"
},
{
"schema": "COLLIBRA"
},
{
"table": "IOT_DEVICES_PER_COUNTRY"
},
{
"column": "COUNTRY"
}
],
"mapping": "dev_no_bat_per_country_view",
"source_code": "INSERT INTO ... SELECT CCA3 AS COUNTRY...FROM IOT_JSON"
}
]
}
Example JSON file for an advanced custom technical lineage
In the following example, the tree section defines the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables and columns. The tables are in a schema named COLLIBRA. The COLLIBRA schema is in a database named COLLIBRA and a system named Databricks.If you define the System asset in your lineage.json file, the useCollibraSystemName property in your lineage harvester configuration file (deprecated) must be set to true; otherwise, relations will not be created between the relevant assets in Collibra and stitching will fail.
{
"version": "1.0",
"tree": [
{
"name": "Databricks",
"type": "system",
"children": [
{
"name": "COLLIBRA",
"type": "database",
"children": [
{
"name": "COLLIBRA",
"type": "schema",
"children": [
{
"name": "IOT_JSON",
"type": "table",
"leaves": [
{
"name": "CCA3",
"type": "column"
},
{
"name": "DEVICE_ID",
"type": "column"
}
]
},
{
"name": "IOT_DEVICES_PER_COUNTRY",
"type": "table",
"leaves": [
{
"name": "COUNTRY",
"type": "column"
},
{
"name": "NUMBER_DEVICES",
"type": "column"
}
]
}
]
}
]
}
]
}
],
"lineages": [
{
"src_path": [
{
"system": "Databricks"
},
{
"database": "COLLIBRA"
},
{
"schema": "COLLIBRA"
},
{
"table": "IOT_JSON"
},
{
"column": "CCA3"
}
],
"trg_path": [
{
"system": "Databricks"
},
{
"database": "COLLIBRA"
},
{
"schema": "COLLIBRA"
},
{
"table": "IOT_DEVICES_PER_COUNTRY"
},
{
"column": "COUNTRY"
}
],
"mapping_ref":
{
"source_code": "transforms.sql",
"mapping": "dev_no_bat_per_country_view",
"codebase_pos": [
{
"pos_start": 71, "pos_len": 69
}
]
}
}
],
"codebase_files":
{
"transforms.sql":
{
"mapping_refs":
{
"dev_no_bat_per_country_view":
{
"pos_start": 0,
"pos_len": 246
}
}
}
}
}
Example technical lineage graphs
Both example lineage.json files generate the following technical lineage graph, which contains 2 nodes and 1 edge.
The following technical lineage graph is generated by using the example lineage.json file for an advanced custom technical lineage. The bottom part shows the transformation code that generated the data flow.
In the lineages section, the pos_start property is specified with 71 and the pos_len property is specified with 69. The specifications indicate that the transformation code that starts at position 71 and the following 69 characters are highlighted in blue. Line 2 in the technical lineage graph contains the highlighted transformation code.