About BigQuery’s Data Catalog and Why Aggua is a Better Option
BigQuery is one of the most popular big data and analytics platforms available today. It’s a powerful tool that offers a ton of benefits to data managers, but it also has some limitations that can be frustrating. These limitations are the reason why we decided to develop a better platform — Aggua.
BigQuery Data Catalog Overview
BigQuery is a serverless, scalable data warehouse solution built on the Google Cloud Platform (GCP). Its query engine can run SQL queries on terabytes of data in seconds and petabytes in only minutes. By removing the burden of having to maintain additional infrastructure and the need to develop or reload indexes, it dramatically improves an organization’s data management. Consequently, it facilitates improved data-driven decision making.
GCP’s data catalog provides organizations with a unified search index of all their GCP project data assets. These data assets include text files, CSV files, datasets, tables, spreadsheets, views, and data streams.
The metadata of these assets—the asset name, description, and column definitions—is used by the data catalog to create and maintain what makes your data easier to discover.
The data catalog also holds metadata for assets maintained on other GCP services enabling users to easily retrieve details via its UI or API. Metadata is automatically created, updated or modified whenever you index or store an asset, update it in its source system, or tag it with the Data Catalog.
Limitations of BigQuery’s Data Catalog
When our clients first started using BigQuery, they were impressed with many of its features. However, once they got more experience with it, they noticed some limitations.
Buggy User Experience
Google's data catalog is a tool for managing and accessing large sets of structured data, such as product or customer information. It supports tagging, preserving descriptions, and multiple ownerships.
However, the user experience is reportedly poor. In comparison to certain other platforms, Google does not prioritize usability as much, and many users have encountered issues that hamper productivity.
Inability to List BigQuery Entries
On GDC, there are two ways to retrieve entries, one is search and the other is list entries. These are used to retrieve entries from a specific entry group that the user establishes. Since BigQuery's data catalog assets are not part of the entry group, it seems that there is no means to list them.
Simply put, querying https://datacatalog.googleapis.com/v1/projects/myproject/location/europe-west1/entryGroups/@bigquery/ will not list BigQuery entries.
Limited Tag Length
Tags make it easy to organize and assign a semantic meaning, data lineage, and other additional information to data assets. However, the tag can carry just 2000 bytes.
We noticed that users had to revisit the documentation each time they added new data and updated the metadata in order to reflect the changes. That can be particularly aggravating if you are heading a data team. It's difficult enough to get your data team to start documenting; adding another layer of difficulty means having to do it all over again. Even if you're taking precautions by regularly making backups of your data, it’s not an ideal experience.
The data catalog platform is designed to be used by data analysts, who can easily find and manipulate data in the cloud. However, if you want to use the platform for anything else—such as running a business off of its data—you're out of luck. The UI and API are difficult to work with, making it nearly impossible to onboard users, especially business users.
Aggua’s Data Catalog
Aggua was developed to be a centralized data management hub for teams whose architecture is based on BigQuery. It provides the key components you need to make the most out of your data. This includes cost, performance, popularity, and usage.
In order to facilitate better project collaboration between team members, our automated data catalog allows you to see all of your data in one location, and see how it is interconnected across platforms. Users can apply filters to data sets, or check for any items recommended by the platform. Aggua helps your teams find anything they need—jobs, dashboards, events, tables and views—all in one place.
Our platform's powerful modeling and testing capabilities will help your business confidently map out a data-driven strategy and be rest assured that all information is always accurate and up to date.
Additionally, you can rely on our usage monitoring tools to avoid overspending, enhance efficiency, and strictly implement governance principles.