Zeta's Lakehouse Journey: A Composable, Scalable, and Federated Architecture

By Ananth Packkildurai

Zeta Decoded

~5 min read · January 6, 2026 (Updated: January 6, 2026) · Free: Yes

As a leader in the AI Marketing Cloud, Zeta powers mission-critical, agentic marketing workloads for some of the world's most sophisticated enterprises. Today, 44 percent of the Fortune 100 rely on Zeta's platform to activate customer intelligence and deliver autonomous marketing outcomes. At this scale, the quality, readiness, and adaptability of our underlying data assets directly influence the business value we can generate for our clients. Ensuring those assets are AI-ready, clean, governed, semantically consistent, and immediately consumable is fundamental to our mission.

Our landscape expanded significantly with the acquisitions of LiveIntent and Marigold. These additions brought tremendous strategic value but also introduced a much broader set of data sources, schemas, processing patterns, and operational models. As a result, we found ourselves operating in an inherently heterogeneous ecosystem where traditional warehouse-centric or single-format lake approaches would impose significant friction. We needed an architecture capable of absorbing diverse data assets, normalizing them efficiently, and making them usable across analytical, activation, and AI workloads, without compromising scale, cost efficiency, or governance.

The Lakehouse paradigm emerged as the architectural model best suited to address these needs. It provides a unified data layer that brings transactional guarantees, schema enforcement, and governance to low-cost open object storage. This combination allows us to integrate data from disparate systems, preserve fidelity, maintain consistency, and expose it through a single, high-performance analytical surface. Most importantly, it enables us to build durable, reusable, AI-ready data assets that power both real-time activation and advanced AI workloads across our expanding customer base.

Composable Data Platform: Our Core Design Principle

To support AI-driven marketing workloads at scale, we designed our foundational data platform as a set of modular, independently replaceable components.

The core layers include:

Object storage is the universal persistence plane. Parquet is the standard columnar file format.
Lakehouse table formats provide transactional guarantees, schema evolution, and time-travel semantics.
Metadata catalogs governing discoverability, versioning, and governance across the organization.
Multiple compute engines, such as Spark, Snowflake, Clickhouse, Trino, and others, are consuming these assets for diverse analytical and AI workloads.

Decoupling these layers ensures that each component can evolve without imposing downstream disruption. This composable design allows us to bring new engines online, adopt emerging table formats, enhance governance, or introduce specialized optimizations, all without re-architecting the entire platform. It is fundamental to supporting AI workflows that include streaming inference, real-time activation, large-scale batch computation, and exploratory analytics.

Why We Chose S3Table?

We selected Apache Iceberg as our Lakehouse table format because it offers strong transactional guarantees, schema evolution, time-travel semantics, and an open specification supported by a broad and growing ecosystem. Iceberg's emphasis on hidden partitioning, metadata scaling, and cross-engine interoperability aligns well with our composable platform strategy, where multiple compute engines: Spark, Flink, Trino, and others, must operate seamlessly over shared datasets. Iceberg provides a robust foundation for long-term compatibility and avoids the lock-in concerns associated with proprietary table formats.

Once we aligned on Iceberg as the table format, we needed a scalable and operationally efficient mechanism to manage table state, metadata updates, and multi-tenant lifecycle operations. After evaluating multiple catalog and maintenance solutions, we chose S3Table paired with AWS Glue Catalog as the control plane for managing Iceberg tables. Our prior work, published on the AWS Storage Blog, demonstrated how S3Table enables Zeta to scale multi-tenant ingestion pipelines with predictable performance and cost efficiency. In comparative testing, S3Table offered simpler operational semantics, faster metadata access paths, and more predictable behavior under high-concurrency workloads than alternative catalog-backed Iceberg implementations. This combination of Iceberg for table format and S3Table + Glue for management gives us the flexibility of an open Lakehouse with the operational stability required for our AI-driven, multi-tenant platform.

Multi-Account Federated Catalog Architecture

Federated Multi-Account Lakehouse Design

With the table format and management plane established, the next challenge was enabling secure, consistent access to these Iceberg tables across multiple AWS accounts without duplicating data or centralizing everything into a single environment. Unifying data from multiple business units and historical acquisitions into a single, centralized catalog often introduces significant delays in time-to-market. Each environment brings its own account boundaries, governance models, and operational constraints, and forcing full consolidation before delivering value slows down product integration and AI feature rollout.

To avoid that bottleneck, we adopted a multi-account federated catalog architecture. This approach allows each account, producer, governance hub, and consumer to maintain autonomy over its own storage, metadata, and lifecycle policies while still enabling unified discovery and governed access to shared datasets. Rather than replicating or migrating datasets across accounts, data remains in the producer account, and consumers gain secure, on-demand access through well-defined catalog associations and permissions. This pattern aligns with our composable platform philosophy and minimizes both duplication and operational friction.

Future Work

As we continue to evolve our Lakehouse foundation, we are investing in three major areas to further strengthen scalability, governance, and developer productivity:

Establishing a Unified Data Product Framework

We are formalizing a unified data product development framework that standardizes how datasets are modeled, published, validated, and governed across domains. By defining consistent patterns for data creation and management, the framework promotes shared semantics, reduces unnecessary duplication, and improves interoperability across our AI and analytics ecosystem.

Deepening Lineage and Observability with OpenLineage Integration

Integrating OpenLineage across ingestion, transformation, and consumption layers will provide end-to-end visibility into data flows. The OpenLineage integration enables proactive governance, faster incident resolution, improved auditability, and richer operational insights for all pipelines.

Engineering the Next 10x Leap in Platform Scale

As customer adoption accelerates, we are scaling the infrastructure by an order of magnitude and engineering the next wave of performance and capacity improvements across storage, catalog services, and compute orchestration.

If you're interested in building the next generation agentic marketing platform, apply on our careers page.

#technology #data-lakehouse #lakehouse-architecture #data #data-science