A feature platform is a critical component for machine learning (ML) systems, enabling feature registration, online feature serving, offline feature serving, and a feature store to manage and serve feature data efficiently. Using the real-time lakehouse stack of Apache Flink, Apache Paimon, Apache Iceberg, and high-performance query engines like Apache Doris, StarRocks, or Presto, we can build a robust feature platform that supports both real-time and batch ML workloads.

Features of this setup:

  • Streaming-native with batch fallback
  • Unified compute path, multiple materializations
  • Fresh, fast features online with deep, consistent history offline
  • Zero vendor lock-in and high flexibility

Components

  • Feature Registration — A system to define, version, and manage feature definitions.
  • Feature Store — A centralized repository to store and manage feature data, supporting both online and offline use cases.
  • Online Feature Serving — Low-latency feature retrieval for real-time ML inference (e.g., recommendation systems).
  • Offline Feature Serving — High-throughput feature retrieval for training ML models or batch inference.

Feature Registration (Central Registry)

  • Metadata about features (name, data type, freshness, owner, tags, lineage)
  • Tracks what features are materialized where (Paimon, Iceberg, etc.)
  • Can be built using: (DB + REST API)
  • Or integrate with Feast, or open-source alternatives like Tecton SDK + metadata DB

A centralized Feature repository to store and manage feature data, supporting both online and offline use cases.

Online Feature Store (Paimon)

Why Paimon? Paimon supports streaming writes and low-latency reads, making it ideal for online feature serving. It handles incremental updates and changelog-based processing, perfect for real-time feature updates

  • Write feature vectors as Flink changelogs
  • Can integrate with Flink SQL or Table API

Supports:

  • Primary-key-based upserts
  • Time-versioned lookups
  • Low-latency serving via key-based retrieval

Offline Feature Store (Iceberg)

Why Iceberg? Iceberg's ACID transactions, schema evolution, and partitioning make it ideal for large-scale, historical feature data used in model training.

Store snapshots of features for:

  • Model training
  • Batch scoring
  • Backfills & re-computation

Supports

  • Time-travel (for training/inference consistency)
  • Schema evolution
  • Large-scale batch analytics
  • Flink or Spark aggregates Paimon data (e.g., daily or hourly aggregates) and writes to Iceberg.
  • Iceberg tables are partitioned by date or user_id for efficient querying.

Online Feature Serving

For online feature serving, low-latency access is critical. Paimon serves as the primary store, with Fluss as an optional in-memory layer for ultra-low-latency use cases (e.g., <10ms).

Paimon for Online Serving

  • Paimon tables store precomputed features (e.g., user_click_rate, last_purchase_amount).
  • Flink continuously updates Paimon tables with fresh data from event streams.
  • A serving layer (e.g., a REST API or gRPC service) queries Paimon for features using key-value lookups.

Redis for Ultra-Low Latency (Optional)

  • For ultra-low-latency use cases, Redis can cache hot features in memory.
  • Flink writes to Redis for ephemeral features with a TTL, which are then flushed to Paimon for persistence.
  • Example: A bidding system retrieves user_bid_score from Redis

Implementation:

  • Deploy a Flink job to compute and update features in Paimon (or Fluss).
  • Use a lightweight API server (e.g., FastAPI, gRPC) to serve features from Paimon/Fluss to ML models.

Fast Offline Serving: Doris / ClickHouse / Presto

  • Query feature tables via external table connectors (Iceberg/Paimon)
  • Vectorized OLAP for: Batch inference, Monitoring dashboards, Training set exploration

Materialization Paths

None

Feature Consistency Strategy

None

Optional Enhancements

  • Feature Lineage UI → Show data flows and freshness across stores
  • Feature Testing Framework → Alert on drift/null spikes/value range anomalies
  • Monitoring Dashboard → Doris/StarRocks over Iceberg for model observability