Ice‑keeper keeps Apache Iceberg tables performing well by identifying unhealthy partitions through metadata‑driven diagnostics and applying targeted rewrites that are fully visible in Superset dashboards.

At the Canadian Centre for Cyber Security (CCCS), we rely heavily on Iceberg tables to store, manage, and analyse high‑volume cybersecurity data. While Apache Iceberg provides the foundation for a structured data lake, achieving low-latency query performance and efficient metadata pruning requires proactive table management.

Our internal tool, Ice-keeper, automates essential maintenance tasks — including expiring snapshots, removing orphan files, and managing data lifecycles. However, its most critical role is optimizing physical data layout. This article focuses on how Ice‑keeper automates the rewrite_data_files process through a continuous feedback loop: diagnosing partitions, identifying fragmentation, and executing targeted rewrites.

By allowing table owners to choose between binpack, sort, or Z-order strategies based on their specific access patterns, we ensure our storage layer remains performant at scale. We will walk through how Ice‑keeper diagnoses partition health, applies these strategies, and uses Journal and Partition Health tables to provide operational visibility via Superset.

We're excited to share that Ice‑keeper is now open source and available for the community to explore and build on

The Optimization Loop in One Picture

  1. Diagnose partitions by scanning file metadata, such as sizes, delete files, and lower and upper column bounds.
  2. Select partitions that fail health criteria.
  3. Rewrite those partitions using binpack, sort, or Z‑order.

Diagnostics are computed from Iceberg metadata, not full data scans. The per‑partition summary includes file size distribution, counts of data files and delete files, counts of files outside the target size band, and a correlation score for ordering. These are used to compute boolean flags such as should_binpack and should_sort that drive rewrites. Ice‑keeper is written in Python and leverages PyIceberg and Spark; the control logic remains lightweight, while Spark executes scalable Iceberg procedures.

Operationally, Ice‑keeper discovers tables through catalogs and schemas; it reads tblproperties to learn which maintenance actions are enabled and how they should run. Those settings are stored in a Maintenance Schedule table, which acts as the operational source of truth for what should run and how.

Superset Dashboards: Intent, Action, and Outcome

All operational data flows into three tables visualized in Superset:

  • Maintenance Schedule: per‑table configuration such as optimization strategy, target file size, and minimum or maximum age windows. This reflects tblproperties and stays in sync as cyber security tables are added, changed, or removed.
  • Journal: an append‑only log of every action executed with start time, execution time, SQL invoked, status, number of files rewritten and delete files removed. This makes actions auditable and measurable.
  • Partition Health: before and after state per partition that includes file counts and sizes, delete file counts and correlation score. This validates impact and explains why a rewrite happened.

Binpack: Control File‑Size Variance

Binpack optimizes table health by consolidating small fragmented files into larger, uniform files. Ice-keeper identifies candidate partitions by detecting file size distributions that deviate from a defined threshold band centered around the target file size. Because this strategy ignores record ordering, it is the most resource-efficient way to reduce metadata overhead. Ice-keeper evaluates each partition based on the volume of files falling outside these size boundaries, triggering a rewrite once a specific unhealthy density is reached.

Example: Diagnostic SQL query generated by Ice-keeper

WITH scored AS (
  SELECT
    partition_key,
    COUNT(*) AS files,
    COUNT_IF(
      (file_size_in_bytes < 50331648 OR file_size_in_bytes > 120795955)
    ) AS num_files_targeted_for_rewrite,
    COUNT_IF(content > 0) AS n_delete_files
  FROM data_files
  GROUP BY partition_key
)
SELECT
  partition_key,
  files,
  num_files_targeted_for_rewrite,
  n_delete_files,
  (num_files_targeted_for_rewrite > 5 OR n_delete_files > 0) AS should_binpack
FROM scored;

How to interpret the results

  • The size thresholds represent the lower and upper bounds derived from the desired target file size.
  • num_files_targeted_for_rewrite counts files outside that band.
  • n_delete_files counts files that contain delete file content.
  • should_binpack is true when more than five files are out of band, or when any delete files exist.

These values populate the Partition Health table and corresponding Superset dashboards, allowing engineers to quickly locate partitions where binpack will meaningfully improve file layout.

Example: Configurecatalog.telemetry.login table for binpack:

ALTER TABLE catalog.telemetry.login SET TBLPROPERTIES (
  'ice-keeper.should-optimize' = 'true',
  'ice-keeper.optimization-strategy' = 'binpack',
  'ice-keeper.optimization-target-file-size-bytes' = '268435456'
);

This configuration is reflected in the Maintenance Schedule table.

Sort Diagnostics: Rank Correlation as a Health Signal

Files can be near the target size, yet order can decay due to late data, backfills, or merges. Ice‑keeper detects drift by comparing, the rank ordering of each data file's lower and upper bounds for the sorted column, per partition and then computing a correlation between those two rank lists. Healthy partitions show correlation near 1.0; when the value drops below a threshold, the partition is selected for a sorted rewrite. The Spark corr function is used to evaluate the Pearson correlation factor.

Pearson correlation is computed as:

r = Σ((x — x̄)(y — ȳ)) / √[Σ(x — x̄)² * Σ(y — ȳ)²]

where x and y​ are the rank positions by lower and upper bounds for a given data file.

Example: Diagnostic SQL query generated by Ice-keeper

WITH data_files_with_bounds AS (
  SELECT
    partition_key,
    file_id,
    readable_metrics.user_name.lower_bound AS lb,
    readable_metrics.user_name.upper_bound AS ub
  FROM iceberg_data_files_with_metrics
),
ranked AS (
  SELECT
    partition_key,
    file_id,
    ROW_NUMBER() OVER (PARTITION BY partition_key ORDER BY lb) AS rn1,
    ROW_NUMBER() OVER (PARTITION BY partition_key ORDER BY ub) AS rn2
  FROM data_files_with_bounds
)
SELECT
  partition_key,
  COUNT(*) AS files,
  corr(rn1, rn2) AS corr,
  (corr(rn1, rn2) < 0.97) AS should_sort
FROM ranked
GROUP BY partition_key
ORDER BY should_sort DESC, partition_key;

This query expresses the core diagnostic used to evaluate sorted layouts. If the ranking by lower bounds and the ranking by upper bounds diverge, the correlation decreases and the partition becomes a candidate for a sorted rewrite. rn1 and rn2 are data file ranking numbers of the lower and upper bounds respectively. The query then applies the correlation factor to these values corr(rn1, rn2)

Example: Configure the catalog.telemetry.logintable for sorting:

ALTER TABLE catalog.telemetry.login SET TBLPROPERTIES (
  'ice-keeper.should-optimize' = 'true',
  'ice-keeper.optimization-strategy' = 'user_name ASC NULL FIRST',
  'ice-keeper.optimization-target-file-size-bytes' = '268435456'
);

Z‑order Diagnostics: Multi‑Column Locality via Interleaved Bounds

When queries filter on multiple columns together, a single column sort does not preserve locality. Z‑order creates a single linear order by interleaving bits from multiple columns; therefore, rows that are close in multiple dimensions land near each other. Diagnostics reuse the rank correlation idea, but the bounds come from interleaved values. Because dimensions are blended, expected correlation is lower, so thresholds are softer and depend on file count.

Example: Diagnostic SQL query generated by Ice-keeper

WITH data_files_with_bounds AS (
  SELECT
    partition_key,
    zorder2Tuple(
      readable_metrics.src_ip.lower_bound, 
      readable_metrics.dst_ip.lower_bound) AS the_lower_bound,
    zorder2Tuple(
      readable_metrics.src_ip.upper_bound, 
      readable_metrics.dst_ip.upper_bound) AS the_upper_bound
  FROM iceberg_data_files_with_metrics
),
ranked AS (
  SELECT
    partition_key,
    ROW_NUMBER() OVER (PARTITION BY partition_key ORDER BY the_lower_bound) AS rn1,
    ROW_NUMBER() OVER (PARTITION BY partition_key ORDER BY the_upper_bound) AS rn2
  FROM data_files_with_bounds
)
SELECT
  partition_key,
  COUNT(*) AS files,
  corr(rn1, rn2) AS corr,
  CASE
    WHEN COUNT(*) < 40  THEN corr(rn1, rn2) < 0.80
    WHEN COUNT(*) < 100 THEN corr(rn1, rn2) < 0.85
    ELSE                     corr(rn1, rn2) < 0.88
  END AS should_zorder
FROM ranked
GROUP BY partition_key
ORDER BY should_zorder DESC, partition_key;

Unlike single column sort, where a fixed rule such as corr(rn1, rn2) < 0.97 is sufficient to identify drift, Z‑order uses a softer rule. Correlation is expected to be lower when multiple dimensions are interleaved, so Ice‑keeper evaluates Z‑order health with a file count aware CASE statement that adapts the threshold to the number of files in the partition. Small partitions tolerate lower correlation; large partitions require stronger agreement between the interleaved lower and upper bound rankings. This provides a more accurate signal of multi‑column locality.

Example: Configure the catalog.telemetry.flow table for Z‑order strategy:

ALTER TABLE catalog.telemetry.flow SET TBLPROPERTIES (
  'ice-keeper.should-optimize' = 'true',
  'ice-keeper.optimization-strategy' = 'zorder(src_ip, dst_ip)',
  'ice-keeper.optimization-target-file-size-bytes' = '536870912',
  'ice-keeper.min-age-to-optimize' = '2',
  'ice-keeper.max-age-to-optimize' = '72'
);

This intent appears in Maintenance Schedule; diagnostics and rewrites show up in Partition Health and Journal.

Rewrite Mechanics: Turning Signals into Actions

When a partition is flagged by diagnostics, Ice‑keeper crafts a specific Iceberg rewrite procedure call and executes it with Spark. Rewrites are partition scoped, which reduces runtime, preserves ingestion parallelism, and makes retries and audits straightforward.

Example: Generated procedure for table catalog.telemetry.login

CALL catalog.rewrite_data_files(
  table => 'telemetry.login',
  where => "(
    ingestion_date >= timestamp('2026-02-15 11:00:00')
    and ingestion_date <  timestamp('2026-02-15 11:00:00') + interval 1 hour
  )",
  strategy => 'user_name ASC NULL FIRST',
  options => map(
    'target-file-size-bytes', '268435456',
    'rewrite-all', 'true'
  )
);

Example: Generated procedure for table catalog.telemetry.flow

CALL catalog.rewrite_data_files(
  table => 'telemetry.flow',
  where => "(
    ingestion_date >= timestamp('2026-02-15 11:00:00')
    and ingestion_date <  timestamp('2026-02-15 11:00:00') + interval 1 hour
  )",
  strategy => 'zorder(src_ip, dst_ip)',
  options => map(
    'target-file-size-bytes', '536870912',
    'rewrite-all', 'true'
  )
);

Each execution is recorded in the Journal with the start time, status, execution time, SQL statement, and several additional attributes that describe the operation.

For time partitioned tables, Ice‑keeper evaluates rewrites within a configurable age window. Minimum and maximum age settings ensure the system skips the current ingestion partition and only targets partitions old enough to be stable. If a run is missed, the next run evaluates the same window and picks up any partitions that still fall within that range. This keeps maintenance predictable and prevents unnecessary work on very recent data.

Future Work

A natural next step is to auto-tune the target file size based on recent partitions, using file size distributions, scan patterns, and rewrite efficiency to propose a per table or per partition target. This would reduce manual tuning and allow the system to adapt as workloads evolve.

Ice‑keeper already notifies table owners when the same maintenance task fails three consecutive runs, ensuring issues do not linger. A related enhancement would be to surface high usage surges when configuration changes cause too many partitions to become eligible at once. This would give teams advance warning before costs increase or maintenance backlogs form.

Conclusion

Iceberg delivers warehouse grade semantics on cloud object storage, but only when tables are kept healthy. Ice‑keeper provides the automation that makes maintenance reliable, repeatable, and observable. The diagnostic loop is grounded in metadata driven signals: file size distributions; delete file presence; rank correlation from lower and upper bounds. This focuses rewrites where they matter most.

Operationally, Ice‑keeper makes every action auditable. The Maintenance Schedule defines intent; the Partition Health table explains why and whether work was needed; the Journal records exactly what ran. Visualized through Superset, you get a clear control plane to diagnose drift, confirm rewrites, and monitor long term table health.

The result is a practical, signal-driven system. Ice‑keeper turns Iceberg maintenance into a low touch background service, keeping the data layout consistent and performant.

If you'd like to dive deeper, Ice‑keeper is fully open sourced, feel free to explore the repo or get involved.