Back to Blog
LakehouseDelta LakeApache IcebergData EngineeringSparkUnity Catalog

Lakehouse Architecture — Delta Lake vs Apache Iceberg and Unity Catalog

A deep-dive into open table formats for production lakehouses: Delta Lake’s transaction log and ACID guarantees, Apache Iceberg’s metadata layers and hidden partitioning, a direct format comparison, time travel, schema and partition evolution, Unity Catalog for cross-cloud governance, and delta-rs for Spark-free Delta Lake access.

2026-05-02

Why the Data Lake Promise Fell Short

The data lake was supposed to be the answer to the data warehouse's rigidity. Store everything in cheap object storage — S3, GCS, ADLS — in open formats like Parquet and ORC, then query it on demand with Spark or Presto. No schema-on-write, no expensive proprietary storage, no vendor lock-in.

The reality was different. Raw data lakes became data swamps. Without ACID guarantees, concurrent writes produced corrupted datasets. Without versioning, a bad pipeline run meant manually restoring Parquet files from S3 backups. Without schema enforcement, downstream consumers discovered at query time that a producer had silently changed a column type. Without fine-grained access control, data governance required bolting on external systems that never quite fit.

The Lakehouse architecture, introduced by Databricks researchers in 2021, addresses these failures by layering an open table format — a metadata and transaction management system — on top of the existing object store. The data files remain Parquet. The storage remains S3. But the table format adds ACID transactions, time travel, schema enforcement, and partition management without requiring a proprietary storage engine.

Two open table formats dominate the ecosystem: Delta Lake, open-sourced by Databricks, and Apache Iceberg, originally designed at Netflix. Understanding their architectural differences — not just their feature checklists — is the prerequisite for choosing the right one for your stack.

Delta Lake: The Transaction Log at the Center

Delta Lake's central design decision is the Delta transaction log (also called the Delta log or _delta_log). Every operation on a Delta table — INSERT, UPDATE, DELETE, MERGE, schema change — appends a JSON commit file to this log directory. The log is the source of truth for the current state of the table. Parquet data files are immutable; they are never modified in place. Mutations create new Parquet files, and the log records which files are active.

This architecture gives Delta Lake optimistic concurrency: two writers can read the current table state, compute their changes in parallel, and attempt to commit. If their changes touch overlapping files, one writer's commit fails and retries. If the changes are independent (different partitions, non-overlapping predicates), both commits succeed without coordination. The log is append-only and stored on the same object store as the data — no external coordination service is required.

# Delta Lake table structure on S3
s3://my-lakehouse/tables/events/
├── _delta_log/
│   ├── 00000000000000000000.json   ← initial commit: schema + add actions
│   ├── 00000000000000000001.json   ← insert batch
│   ├── 00000000000000000002.json   ← update: remove old files, add new
│   ├── 00000000000000000003.json   ← schema change: add column
│   ├── 00000000000000000010.checkpoint.parquet  ← checkpoint every N commits
│   └── _last_checkpoint             ← pointer to latest checkpoint
├── year=2026/month=04/day=28/
│   ├── part-00000-abc123.parquet
│   └── part-00001-def456.parquet
├── year=2026/month=04/day=29/
│   └── part-00000-ghi789.parquet
└── year=2026/month=05/day=01/
    └── part-00000-jkl012.parquet

# Each commit JSON records actions — example commit 2 (an UPDATE):
{
  "commitInfo": {"timestamp": 1746144000000, "operation": "MERGE"},
  "remove": {"path": "year=2026/month=04/day=29/part-00000-old.parquet",
              "deletionTimestamp": 1746144000000, "dataChange": true},
  "add":    {"path": "year=2026/month=04/day=29/part-00000-ghi789.parquet",
              "size": 204800, "stats": "{"numRecords":5000,...}",
              "dataChange": true}
}

Reading a Delta table means reading the log to reconstruct the current file list, then reading only those Parquet files. Checkpoints (written every 10 commits by default) snapshot the full active file list as a Parquet file, so readers don't scan thousands of JSON log entries on large tables. Time travel is implemented by reading only the log entries up to a specific version number or timestamp.

from delta import DeltaTable
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = (
    SparkSession.builder
    .appName("lakehouse-demo")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .getOrCreate()
)

TABLE_PATH = "s3://my-lakehouse/tables/events"

# Create a Delta table with schema enforcement and partitioning
(
    spark.createDataFrame([], schema="event_id STRING, user_id STRING, event_type STRING, "
                                     "amount DOUBLE, occurred_at TIMESTAMP, year INT, month INT")
    .write
    .format("delta")
    .partitionBy("year", "month")
    .option("delta.enableChangeDataFeed", "true")
    .option("delta.minWriterVersion", "4")
    .option("delta.minReaderVersion", "1")
    .save(TABLE_PATH)
)

# MERGE (upsert) — the most common lakehouse write pattern
events_df = spark.read.format("delta").load(TABLE_PATH)
new_events = spark.createDataFrame([
    ("evt-001", "usr-42", "purchase", 149.99, "2026-05-02T10:00:00", 2026, 5),
    ("evt-002", "usr-17", "refund",   -49.99, "2026-05-02T10:01:00", 2026, 5),
], schema=events_df.schema)

dt = DeltaTable.forPath(spark, TABLE_PATH)

(
    dt.alias("target")
    .merge(new_events.alias("source"), "target.event_id = source.event_id")
    .whenMatchedUpdateAll()
    .whenNotMatchedInsertAll()
    .execute()
)

# Time travel — read the table as it was at version 1
df_v1 = spark.read.format("delta").option("versionAsOf", 1).load(TABLE_PATH)
df_v1.show()

# Or by timestamp
df_yesterday = (
    spark.read.format("delta")
    .option("timestampAsOf", "2026-05-01T00:00:00")
    .load(TABLE_PATH)
)

# Describe history
dt.history(10).select("version", "timestamp", "operation", "operationMetrics").show()

Note

Delta Lake's optimistic concurrency works well when writers operate on different partitions. When multiple writers update the same partition simultaneously, conflict rates rise. For high-concurrency write workloads — streaming sinks, multiple Spark jobs writing to the same table — partition your tables to match your write patterns, or use Delta's row-level concurrency (available in Delta 3.x with deletion vectors) to reduce conflicts at finer granularity than the file level.

Apache Iceberg: Metadata Layers and Hidden Partitioning

Iceberg was designed from scratch at Netflix to address problems that Hive-partitioned data lakes could not solve at scale: partition explosion (tens of thousands of Hive partitions crashing metastore queries), table scans that couldn't leverage predicate pushdown, and no safe concurrent write semantics. The result is an architecture with more metadata layers than Delta Lake — but those layers enable capabilities that are architecturally difficult in Delta.

An Iceberg table consists of four metadata layers: a catalog pointer to the current metadata file, the metadata file (table schema, partition spec, snapshot history), manifest lists (one per snapshot, listing all manifest files), and manifest files (each listing a subset of data files with per-file statistics). This hierarchy allows Iceberg to skip entire manifest files during planning — a query predicate is evaluated against per-file column statistics to eliminate files before any data scan begins.

# Iceberg table structure on S3
s3://my-lakehouse/tables/events-iceberg/
├── metadata/
│   ├── v1.metadata.json          ← table metadata: schema, partition spec
│   ├── v2.metadata.json          ← after schema evolution
│   ├── snap-001.avro             ← manifest list for snapshot 1
│   ├── snap-002.avro             ← manifest list for snapshot 2
│   └── version-hint.text         ← points to current metadata version
├── data/
│   ├── event_type=purchase/
│   │   ├── 00000-0-abc.parquet
│   │   └── 00001-0-def.parquet
│   └── event_type=refund/
│       └── 00000-0-ghi.parquet
└── metadata/manifests/
    ├── manifest-001.avro          ← lists data files with column stats
    └── manifest-002.avro

# Manifest file entry (Avro) — per-file statistics enable predicate pushdown:
{
  "status": 1,          # ADDED
  "data_file": {
    "file_path": "data/event_type=purchase/00000-0-abc.parquet",
    "file_format": "PARQUET",
    "record_count": 12500,
    "column_sizes": {"1": 102400, "2": 51200},
    "value_counts": {"1": 12500, "2": 12500},
    "null_value_counts": {"1": 0, "2": 14},
    "lower_bounds": {"1": "evt-00001", "3": 0.01},
    "upper_bounds": {"1": "evt-12500", "3": 9999.99}
  }
}

Iceberg's most operationally significant feature is hidden partitioning. In Hive and Delta Lake, partitioning is user-visible: writers must supply the correct partition column values, and readers must include partition predicates to get partition pruning. In Iceberg, a partition spec maps source columns to partition values via transform functions (year(ts), bucket(user_id, 16), truncate(category, 3)) — computed automatically at write time, invisible to readers. Readers query the raw column; Iceberg's planner maps the predicate to the partition values internally.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DoubleType, TimestampType

spark = (
    SparkSession.builder
    .appName("iceberg-demo")
    .config("spark.sql.extensions",
            "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .config("spark.sql.catalog.glue_catalog",
            "org.apache.iceberg.spark.SparkCatalog")
    .config("spark.sql.catalog.glue_catalog.warehouse",
            "s3://my-lakehouse/tables/")
    .config("spark.sql.catalog.glue_catalog.catalog-impl",
            "org.apache.iceberg.aws.glue.GlueCatalog")
    .getOrCreate()
)

# Create Iceberg table with hidden partitioning
spark.sql("""
  CREATE TABLE IF NOT EXISTS glue_catalog.analytics.events (
    event_id    STRING  NOT NULL,
    user_id     STRING  NOT NULL,
    event_type  STRING,
    amount      DOUBLE,
    occurred_at TIMESTAMP NOT NULL
  )
  USING iceberg
  PARTITIONED BY (
    days(occurred_at),          -- hidden partition: extract day from timestamp
    bucket(16, user_id)         -- hidden partition: 16 hash buckets on user_id
  )
  TBLPROPERTIES (
    'write.format.default'          = 'parquet',
    'write.parquet.compression-codec' = 'zstd',
    'history.expire.max-snapshot-age-ms' = '604800000',  -- 7 days
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '10'
  )
""")

# Writers supply raw column values — no partition column to manage
spark.sql("""
  INSERT INTO glue_catalog.analytics.events VALUES
  ('evt-001', 'usr-42', 'purchase', 149.99, TIMESTAMP '2026-05-02 10:00:00'),
  ('evt-002', 'usr-17', 'refund',   -49.99, TIMESTAMP '2026-05-02 10:01:00')
""")

# Readers query raw columns — partition pruning is automatic and invisible
result = spark.sql("""
  SELECT user_id, SUM(amount) as total
  FROM glue_catalog.analytics.events
  WHERE occurred_at BETWEEN TIMESTAMP '2026-05-01' AND TIMESTAMP '2026-05-03'
  GROUP BY user_id
""")
result.show()

# Time travel
df_snapshot = spark.read.option("snapshot-id", "3051729675574597004").table("glue_catalog.analytics.events")
df_as_of    = spark.read.option("as-of-timestamp", "1746144000000").table("glue_catalog.analytics.events")

# Schema evolution — add column (backward compatible)
spark.sql("ALTER TABLE glue_catalog.analytics.events ADD COLUMN session_id STRING AFTER user_id")

# Partition evolution — change strategy without rewriting data
spark.sql("""
  ALTER TABLE glue_catalog.analytics.events
  REPLACE PARTITION FIELD days(occurred_at) WITH months(occurred_at)
""")

Note

Iceberg's partition evolution is one of its strongest production advantages. When a table outgrows its original partition strategy — daily partitions producing too many small files, hash buckets undersized for growth — you can evolve the partition spec without rewriting existing data. New files use the new spec; old files retain their original partitioning. The planner handles both transparently.

Delta Lake vs Iceberg: Direct Comparison

Both formats solve the same core problems — ACID transactions, time travel, schema evolution — but make different architectural tradeoffs. The right choice depends on your existing infrastructure, query engine ecosystem, and operational complexity tolerance.

Transaction model

Delta Lake uses an append-only JSON log with optimistic concurrency. Conflicts are detected at commit time; conflict resolution is coarse-grained (file-level by default, row-level with deletion vectors in Delta 3.x). Iceberguses snapshot isolation with optimistic concurrency, with conflict detection at the manifest level. Both support serializable writes but Iceberg's multi-level metadata enables finer conflict detection at scale.

Partitioning

Delta Lake uses Hive-style physical partitioning — partition columns are stored as directory segments (year=2026/month=05/), visible to writers and readers, and cannot be changed without rewriting. Iceberg uses hidden partitioning with transform functions — computed at write time, invisible to readers, and evolvable without data rewrites.

Catalog and engine support

Delta Lake has first-class support in Databricks, Spark, and increasingly DuckDB and Polars (via delta-rs). Catalog support outside Databricks requires Unity Catalog or an open alternative. Iceberg has native support in Spark, Flink, Trino, Presto, Athena, BigQuery (via federation), DuckDB, Snowflake, and more. Iceberg has broader cross-engine adoption.

Metadata overhead and performance at scale

Delta Lake log scanning can become expensive at very high commit counts before checkpointing kicks in. Iceberg's manifest-level pruning allows planning to skip large portions of table history without reading individual file metadata, scaling better to tables with hundreds of thousands of data files and complex partition layouts.

Schema and column mapping

Both support adding columns, renaming columns, dropping columns, and changing nullability. Iceberg uses column IDs internally rather than column names — renaming a column does not require rewriting data files, and column statistics are preserved across renames. Delta Lake column mapping (introduced in Delta 2.0) achieves similar results by storing a physical name mapping in the log.

Time Travel in Production: Patterns and Pitfalls

Time travel — the ability to query a table at a historical point in time or version — is the feature that makes lakehouse tables genuinely different from file-based storage. It enables data debugging, incremental processing, regulatory audit trails, and ML reproducibility without maintaining separate snapshot tables.

The practical limit is retention. Both Delta Lake and Iceberg retain historical snapshots until you explicitly expire them. On high-volume tables, retaining all history indefinitely bloats storage. The right retention window depends on your use case: CDC consumers might need 7 days of incremental replay; ML training pipelines might need 90-day reproducibility; regulatory audits might require 7 years of point-in-time reads.

-- Delta Lake: time travel and history inspection (Spark SQL)

-- Query at specific version
SELECT COUNT(*) FROM delta.`s3://my-lakehouse/tables/events` VERSION AS OF 5;

-- Query at timestamp
SELECT * FROM delta.`s3://my-lakehouse/tables/events`
TIMESTAMP AS OF '2026-04-15T00:00:00';

-- Inspect full table history
DESCRIBE HISTORY delta.`s3://my-lakehouse/tables/events`;

-- VACUUM: remove files no longer referenced by any version
-- Default retention: 7 days (168 hours). Reduce carefully — breaks time travel before cutoff.
VACUUM delta.`s3://my-lakehouse/tables/events` RETAIN 168 HOURS;

-- OPTIMIZE: compact small files and optionally Z-ORDER by frequently queried columns
OPTIMIZE delta.`s3://my-lakehouse/tables/events`
ZORDER BY (user_id, event_type);


-- Iceberg: snapshot management (Spark SQL with Iceberg extensions)

-- List snapshots
SELECT * FROM glue_catalog.analytics.events.snapshots;

-- List history
SELECT * FROM glue_catalog.analytics.events.history;

-- Expire old snapshots (retain last 7 days)
CALL glue_catalog.system.expire_snapshots(
  table => 'analytics.events',
  older_than => TIMESTAMP '2026-04-25 00:00:00',
  retain_last => 5
);

-- Remove orphan files (data files not referenced by any snapshot)
CALL glue_catalog.system.remove_orphan_files(table => 'analytics.events');

-- Rewrite (compact) small files
CALL glue_catalog.system.rewrite_data_files(
  table => 'analytics.events',
  strategy => 'sort',
  sort_order => 'zorder(user_id, event_type)',
  options => map('target-file-size-bytes', '268435456')  -- 256 MB target
);

Note

Never run VACUUM on a Delta table with a retention period shorter than the longest-running streaming query reading from it. Structured Streaming with Delta records the latest committed offset in the checkpoint, but a vacuum with insufficient retention can remove files that the stream's checkpoint still references. The safe floor is spark.databricks.delta.retentionDurationCheck.enabled = trueand a retention period longer than your streaming job's maximum lag plus two hours of buffer.

Unity Catalog: Governance Across the Lakehouse

A lakehouse without governance is still a data swamp — just a well-organized one. Who can read the events table? Which columns contain PII? What is the lineage from raw events to the daily revenue summary? These questions require a catalog that does more than store table locations.

Databricks Unity Catalog is the governance layer for the Databricks lakehouse. It provides a three-level namespace (catalog → schema → table), fine-grained column-level access control, automated data lineage, centralized audit logging, and cross-cloud/cross-region table sharing via Delta Sharing. For teams not on Databricks, open alternatives include Open Unity Catalog (the recently open-sourced Unity Catalog OSS project) and DataHub with Iceberg catalog integration.

-- Unity Catalog: three-level namespace and fine-grained access control

-- Create catalog and schema
CREATE CATALOG IF NOT EXISTS prod_lakehouse
  COMMENT 'Production data lakehouse';

CREATE SCHEMA IF NOT EXISTS prod_lakehouse.analytics
  COMMENT 'Analytical datasets — finance-team-readable';

-- Register a Delta table in Unity Catalog
CREATE TABLE IF NOT EXISTS prod_lakehouse.analytics.daily_revenue
USING DELTA
LOCATION 's3://my-lakehouse/tables/daily-revenue'
COMMENT 'Daily aggregated revenue — source: events table via dbt'
TBLPROPERTIES (
  'delta.columnMapping.mode' = 'name',
  'sensitivity' = 'confidential',
  'owner_team'  = 'finance-data'
);

-- Column-level access: tag PII columns and restrict access
ALTER TABLE prod_lakehouse.analytics.events
  ALTER COLUMN user_email SET TAGS ('pii' = 'true', 'gdpr_scope' = 'true');

-- Grant: only the analytics-readers group can read this table
GRANT SELECT ON TABLE prod_lakehouse.analytics.daily_revenue
  TO analytics_readers;

-- Deny column-level access for PII field to non-PII-authorized users
CREATE ROW FILTER restrict_pii_users
ON prod_lakehouse.analytics.events (user_email STRING)
RETURN CASE
  WHEN is_account_group_member('pii_authorized') THEN user_email
  ELSE NULL
END;

ALTER TABLE prod_lakehouse.analytics.events
  SET ROW FILTER restrict_pii_users ON (user_email);

-- Delta Sharing: share table read-only with an external recipient (no data copy)
CREATE RECIPIENT external_partner
  COMMENT 'External analytics partner — read-only revenue summary';

CREATE SHARE revenue_share
  COMMENT 'Shared revenue summary for partner';

ALTER SHARE revenue_share
  ADD TABLE prod_lakehouse.analytics.daily_revenue
  PARTITION (region = 'EU');   -- share only EU partition

GRANT SELECT ON SHARE revenue_share TO RECIPIENT external_partner;

delta-rs: Delta Lake Without Spark

Spark is the natural runtime for lakehouse workloads at scale, but it is heavyweight for microservices, lambda functions, and lightweight ETL scripts that need to read or write Delta tables. The delta-rs project is a Rust-native Delta Lake implementation with Python bindings. It can read and write Delta tables on S3, GCS, and ADLS with no JVM dependency. The Python API integrates with pandas and PyArrow, and increasingly with Polars for high-performance single-machine analytics.

# delta-rs: pip install deltalake
# No JVM, no Spark — reads/writes Delta tables via native Rust + PyArrow

import pyarrow as pa
import pyarrow.compute as pc
from deltalake import DeltaTable, write_deltalake

TABLE_URI = "s3://my-lakehouse/tables/events"

# S3 storage options — use IAM role in production, not static credentials
storage_options = {
    "AWS_REGION": "eu-central-1",
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",  # required for S3 (no atomic rename)
}

# Write a new Delta table (or append to existing) from a PyArrow table
schema = pa.schema([
    pa.field("event_id",   pa.string(),    nullable=False),
    pa.field("user_id",    pa.string(),    nullable=False),
    pa.field("event_type", pa.string()),
    pa.field("amount",     pa.float64()),
    pa.field("occurred_at", pa.timestamp("us", tz="UTC"), nullable=False),
])

batch = pa.table({
    "event_id":   ["evt-001", "evt-002", "evt-003"],
    "user_id":    ["usr-42",  "usr-17",  "usr-99"],
    "event_type": ["purchase","refund",  "purchase"],
    "amount":     [149.99,    -49.99,    299.00],
    "occurred_at": pa.array(
        ["2026-05-02T10:00:00Z", "2026-05-02T10:01:00Z", "2026-05-02T10:05:00Z"],
        type=pa.timestamp("us", tz="UTC"),
    ),
}, schema=schema)

write_deltalake(
    TABLE_URI,
    batch,
    mode="append",                        # or "overwrite", "error"
    partition_by=["event_type"],
    schema_mode="merge",                  # merge new schema if columns added
    storage_options=storage_options,
)

# Read with predicate pushdown — only reads relevant Parquet files
dt = DeltaTable(TABLE_URI, storage_options=storage_options)

# Filter pushed down to file selection via Delta log stats
df = dt.to_pyarrow_dataset().to_table(
    filter=pc.field("event_type") == "purchase"
)
print(f"Purchases: {len(df)}")

# Read as Polars LazyFrame — zero-copy, columnar, fast
import polars as pl
lf = pl.scan_delta(TABLE_URI, storage_options=storage_options)
result = (
    lf
    .filter(pl.col("amount") > 0)
    .group_by("event_type")
    .agg(pl.col("amount").sum().alias("total_revenue"))
    .collect()
)
print(result)

# Time travel
dt_v2 = DeltaTable(TABLE_URI, version=2, storage_options=storage_options)
df_v2 = dt_v2.to_pandas()

# Schema inspection
print(dt.schema().to_pyarrow())

# Table metadata and history
print(dt.metadata())
print(dt.history(5))

Note

The AWS_S3_ALLOW_UNSAFE_RENAME: true flag is required because S3 does not support atomic rename operations, which Delta Lake uses for log commits. delta-rs works around this with a conditional PUT + DELETE sequence that is safe in practice but not strictly atomic. In high-concurrency write scenarios with delta-rs, use a DynamoDB locking backend (AWS_S3_LOCKING_PROVIDER: dynamodb) to serialize commits and prevent write-write conflicts on S3.

Choosing Between Delta Lake and Iceberg

The choice between Delta Lake and Iceberg is primarily an ecosystem decision, not a feature decision. Both formats are production-proven at hyperscale — Netflix runs Iceberg on tables with hundreds of petabytes; Databricks' customers run Delta Lake on similarly enormous volumes.

Choose Delta Lake if...

Your primary compute is Databricks or Spark on a single cloud. You want the tightest integration with Unity Catalog, Delta Live Tables, and Databricks Workflows. Your write patterns involve high-frequency small updates where deletion vectors give you row-level write efficiency without full file rewrites. Your team is already familiar with the Delta ecosystem.

Choose Apache Iceberg if...

You need true multi-engine access — Spark for ETL, Trino or Presto for ad-hoc SQL, Flink for streaming, Snowflake for sharing. You anticipate partition evolution as your table grows. You prefer a more open catalog ecosystem (Nessie, Polaris, AWS Glue, Hive Metastore). You operate across multiple clouds where format portability is a hard requirement.

Consider both if...

You are building a new lakehouse from scratch. The industry is converging on universal format readers — DuckDB, Polars, and Apache Arrow can read both formats natively. Starting with one format and later migrating via Delta-to-Iceberg conversion tools is feasible for tables under a few terabytes. For new greenfield lakehouses, Iceberg's broader engine support and partition evolution give it a slight architectural edge.

Production Readiness Checklist

Deploying a lakehouse table to production requires more than creating a Delta or Iceberg table and starting to write to it. The following practices prevent the most common failure modes: storage cost explosions from unmanaged file accumulation, query degradation from small file problems, broken time travel queries, and governance gaps that surface during audits.

Schedule regular compaction and snapshot expiration

Streaming writes and frequent small-batch appends produce many small Parquet files. Without scheduled compaction (OPTIMIZE/rewrite_data_files), query performance degrades as the planner opens thousands of tiny files. Run compaction on a daily schedule for high-ingestion tables, targeting 128–256 MB output files. Pair compaction with snapshot expiration — otherwise compaction creates new files but old ones remain referenced by retained snapshots.

Monitor write amplification and file counts per partition

A healthy partition has 1–20 Parquet files each 128–512 MB. Partitions with thousands of small files indicate a write pattern mismatch — writes are too fine-grained for the partition granularity. Partitions with very few enormous files indicate over-wide partitions that can't benefit from predicate pushdown. Query delta.`+"`"+`table`+"`"+`.files or Iceberg's table$.files metadata table to detect both conditions before they impact query performance.

Enforce schema on write, not just on read

Both Delta Lake and Iceberg support schema enforcement — reject writes that don't conform to the declared schema — and schema evolution — allow additive changes via explicit ALTER TABLE. Enable schema enforcement by default (mergeSchema = false in Delta, the default in Iceberg) and require explicit schema evolution operations. Producers that silently add or change columns should fail loudly at write time, not silently corrupt downstream consumers.

Test time travel before it is needed

Time travel is most valuable during incidents — recovering from a bad pipeline run, auditing a downstream anomaly, restoring a consumer to a known-good state. Validate that time travel works at your required retention depth in a staging environment before declaring it part of your SLA. Run a monthly test: query the table at a point 7 days ago and verify the row count matches your expectation. Silent VACUUM or expire_snapshots misconfigurations can silently break time travel before you need it.

Register tables in a catalog before exposing to consumers

A Delta or Iceberg table that only lives as a path on S3 has no discoverable owner, schema, SLA, or lineage. Register every production table in Unity Catalog, AWS Glue, or an equivalent before consumers depend on it. Catalog registration should be automated in CI/CD — not a manual step. Tags for data classification, PII fields, and retention policy must be set at registration time, not retroactively after a compliance finding.

Migrating from a data lake to a lakehouse or evaluating Delta Lake vs Iceberg?

We design and implement lakehouse architectures on AWS, GCP, and Azure — from table format selection and storage layout to Unity Catalog governance, streaming ingestion, and production query optimization. Let’s talk.

Get in Touch

Related Articles