Dataplex: Supported transformation details

Collibra Data Lineage visualizes lineage for Google Dataplex down to the column level. To view the technical lineage for Google Dataplex, ensure that you select Objects in the toolbar of your technical lineage graph.

Function scope

Collibra Data Lineage captures lineage for the following Google Cloud assets. Currently, only Column, Table, and File assets are processed and included in the technical lineage.

  • BigQuery.
  • Other Google Cloud services (GCS), only when they contribute lineage for BigQuery assets. Collibra Data Lineage does not collect metadata directly from other GCS. However, if these services generate lineage for BigQuery assets, that lineage is captured by Dataplex and included in the exported lineage file. Collibra Data Lineage then ingests this exported lineage, so any indirect lineage created by these services is reflected in the technical lineage for BigQuery assets.
    Note The column-level lineage generated in Collibra Data Lineage is subject to the limitations of the data lineage feature in Dataplex. For details, go to Limitations in the About data lineage topic of the Dataplex Universal Catalog documentation.

Lineage extraction mechanism

Collibra Data Lineage retrieves lineage metadata via the Google Data Lineage API to provide visibility into BigQuery and GCS data flows:

  • Technical lineage for Google Dataplex can start from GCS or BigQuery and end in BigQuery.
  • You can choose to create table-level lineage or column-level lineage for Google Dataplex when you synchronize the Technical Lineage for Google Dataplex capability.
  • Stitching works for the column-level lineage, regardless of whether you integrated Dataplex Universal Catalog or registered Google BigQuery databases by using the BigQuery JDBC connector.
  • Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, to ingest transformation details, the Service Account user defined in the Edge connection requires,

Differences between technical lineage for Google Dataplex and Google BigQuery

You can create technical lineage for Google BigQuery by using a JDBC connection or for Google Dataplex by using a Google Cloud Platform (GCP) connection. Consider the following differences to determine which data source and connection type to use.

Feature Support in technical lineage for Google Dataplex (column-level lineage) Support in technical lineage for Google Dataplex (table-level lineage) Support in technical lineage for Google BigQuery
SQL transformation code Yes No Yes
Executed SQL in stored procedures No (table-level only) Yes No
Ingest lineage from...

BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex

BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex BigQuery
BigQuery external tables Yes Yes Yes
Stitching Yes No Yes