BigLake: Unify data lakes & data warehouses | Google Cloud
Jump to

BigLake

BigLake is a storage engine that provides a unified interface for analytics and AI engines to query multiformat, multicloud, and multimodal data in a secure, governed, and performant manner. Build a single-copy AI lakehouse designed to reduce management of and need for custom data infrastructure.

  • Continuous innovation including new research BigQuery's Evolution toward a Multi-Cloud Lakehouse to be presented at the 2024 SIGMOD event.

  • Deploy a Google-recommended solution that unifies data lakes and data warehouses for storing, processing, and analyzing both structured and unstructured data

  • Store a single copy of structured and unstructured data and query using analytics and AI

  • Fine-grained access control and multicloud governance over distributed data

  • Fully managed experience with automatic data management for your open-format lakehouse

Benefits

Freedom of choice

Unlock analytics on distributed data regardless where and how it’s stored, while choosing the best analytics tools, open source or cloud native over a single copy of data. 

Secure and performant data lakes

Fine-grained access control across open source engines like Apache Spark, Presto and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery.

Unified governance & management at scale

Integrates with Dataplex to provide management at scale, including logical data organization, centralized policy & metadata management, quality and lifecycle management for consistency across distributed data. 

Key features

Key features

Fine grained security controls

BigLake eliminates the need to grant file level access to end users. Apply table, row, column level security policies on object store tables similar to existing BigQuery tables.

Multi-compute analytics

Maintain a single copy of structured and unstructured data and make it uniformly accessible across Google Cloud and open source engines, including BigQuery, Vertex AI, Dataflow, Spark, Presto, Trino, and Hive using BigLake connectors. Centrally manage security policies in one place, and have it consistently enforced across the query engines by the API interface built into the connectors.

Multicloud governance

Discover all BigLake tables, including those defined over Amazon S3, Azure data lake Gen 2 in Data Catalog. Configure fine grained access control and have it enforced across clouds when querying with BigQuery Omni.

Built for artificial intelligence (AI)

Object tables enable use of multimodal data for governed AI workloads. Easily build AI use cases using BigQuery SQL and its Vertex AI integrations. 

Built on open formats

Supports open table and file formats including Parquet, Avro, ORC, CSV, JSON. The API serves multiple compute engines through Apache Arrow. Table format natively supports Apache Iceberg, Delta, and Hudi via manifest.

logo for bol.com
As a rapidly growing e-commerce company, we have seen rapid growth in data. BigLake allows us to unlock the value of data lakes by enabling access control on our views while providing a unified interface to our users and keeping data storage costs low. This in turn allows quicker analysis on our datasets by our users.

Documentation

Documentation

Google Cloud Basics

Introduction to BigLake

Introduce BigLake concepts and learn what it can do for you to simplify your analytics experience.

Quickstart

Getting started with BigLake

Learn how to create and manage BigLake tables, query a BigLake table through BigQuery or other open source engines using connectors.

Quickstart

Query Cloud Storage data in BigLake tables

Learn how to query data stored in a Cloud Storage BigLake table.

Not seeing what you’re looking for?

Pricing

Pricing

BigLake pricing is based on querying BigLake tables, including:

1. BigQuery pricing applies for queries over BigLake tables defined on Google Cloud Storage. 

2. BigQuery Omni pricing applies for queries over BigLake tables defined on Amazon S3 & Azure data lake Gen 2.

3. Queries from open-source engines using BigLake connectors: BigLake connectors use BigQuery Storage API, and corresponding prices apply - billed on bytes read, and Egress.

4. Additional costs apply for query acceleration using metadata caching, object tables, and BigLake Metastore.

Ex: * The first 1 TB of data processed with BigQuery each month is free.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud
  • ‪English‬
  • ‪Deutsch‬
  • ‪Español‬
  • ‪Español (Latinoamérica)‬
  • ‪Français‬
  • ‪Indonesia‬
  • ‪Italiano‬
  • ‪Português (Brasil)‬
  • ‪简体中文‬
  • ‪繁體中文‬
  • ‪日本語‬
  • ‪한국어‬
Console
Google Cloud