Ways to integrate Google BigQuery data sources

In Collibra Platform, you can work with Google BigQuery data sources in the following ways:

  • Use the Dataplex Universal Catalog integration to integrate all metadata from BigQuery data sources.
  • Use the BigQuery JDBC connector to register individual BigQuery data sources.

It is important to understand the differences between these methods, as they produce different results in Collibra Platform.

Integrating BigQuery data sources via Dataplex Universal Catalog integration

When you integrate Dataplex Universal Catalog, metadata from BigQuery data sources is also ingested into Collibra Platform for the BigQuery Entry type in Dataplex Universal Catalog. The assets represent BigQuery databases, schemas, tables, and columns, and their associated aspects.

Integrating BigQuery metadata from Dataplex Universal Catalog supports sampling, profiling, and classification (in preview).

For more information, go to Steps: Integrate Google Dataplex Universal Catalog via Edge.

Registering a BigQuery data source via the BigQuery JDBC connector

If you register a specific BigQuery data source by using the BigQuery JDBC connector, the assets represent the tables and columns in the database. Registering BigQuery databases also supports sampling, profiling, and classification.

Note When you register a BigQuery data source via the JDBC connector, BigQuery Aspects are not ingested. To ingest Aspects, use a Dataplex Universal Catalog integration.

For more information, go to Steps overview: Data source registration via Edge.

Combining the ways of working with BigQuery

You can integrate BigQuery by using both the Dataplex Universal Catalog integration and JDBC connector. Using the two methods together allows you to display the desired information in Collibra Platform.

If you first register a BigQuery data source using the BigQuery JDBC connector, make sure to use the same System asset when integrating Dataplex Universal Catalog. This ensures that the Dataplex Universal Catalog integration skips any assets already registered via the JDBC connection.

Since the Dataplex Universal Catalog integration supports sampling, profiling, and classification (in preview), you can choose to migrate to the Dataplex Universal Catalog integration only, if you previously combined it with BigQuery JDBC synchronization.

Migrating to use the Google Dataplex Universal Catalog integration only

If you have previously used both the Dataplex Universal Catalog integration and BigQuery JDBC synchronization for some data sources, and now want to use only the Dataplex Universal Catalog integration, complete the following steps:

  1. Edit your Dataplex Universal Catalog capability by adding the BigQuery JDBC connection:
    1. Open a site.
      1. On the main toolbar, click Products iconCogwheel icon Settings.
        The Settings page opens.
      2. In the tab pane, click Edge.
        The Sites tab opens and shows a table with an overview of your sites.
      3. In the table, click the name of the site whose status is Healthy.
        The site page opens.
    2. In the Capabilities section, click the name of your Dataplex Universal Catalog capability.
    3. Click Edit.
    4. Add your BigQuery JDBC connection in the JDBC GCP Connection (in preview) field.
    5. Click Save.
  2. Synchronize the Dataplex Universal Catalog integration again.
    You can now set up sampling, profiling, and classification to profile and classify the data, and request sample data for the integrated assets.

For more information about integrating Dataplex Universal Catalog and setting up sampling, profiling, and classification, go to Steps: Integrate Google Dataplex Universal Catalog via Edge.