Databricks just released Lakewatch few days back and this isn't just another SIEM release. It feels like a fundamental shift in how security platforms are designed for the modern data + AI stack.

For years, security teams have been operating under a constraint that shouldn't exist in the first place.

The Core Problem with Traditional SIEMs

Legacy SIEM platforms were built in a different era — one where data volume was smaller, architectures were closed, and AI-driven threats didn't exist.

The biggest issue?

They charge per byte ingested.

That single design decision leads to cascading problems:

  • Teams sample logs instead of storing everything
  • Valuable telemetry gets deleted or ignored
  • Detection accuracy suffers because you're working with partial visibility

Meanwhile, attackers? They operate with full visibility and zero constraints.

This isn't just a cost issue — it's an architectural flaw.

Enter Lakewatch: Reimagining SIEM with Lakehouse Architecture & AI

What Databricks has done with Lakewatch is apply the same open lakehouse principles that disrupted data warehousing to security operations.

Instead of pushing your data into a proprietary SIEM:

  • Your security data lives in your lakehouse in open formats like delta or Apache Iceberg
  • It sits alongside business and operational data
  • Built on open security standards (Open CyberSecurity Schema Framework)
  • It's governed via Unity Catalog

This changes everything:

  • No duplication
  • No vendor lock-in
  • No per-byte ingestion tax

You get full telemetry at scale which is exactly what modern detection systems need.

None
Databricks Lakewatch : Reimagining SEIM for the Agentic Era

Why "Agentic SIEM" Actually Matters

Lakewatch isn't just about storage or cost — it's built for the agentic era of security.

With AI-powered workflows driven by Genie agents:

  • Threat detection becomes adaptive and contextual
  • Analysts can query data in natural language
  • Detection rules can be generated and improved automatically

Under the hood, Lakewatch leverages models like Anthropic's Claude — bringing strong reasoning capabilities into threat correlation and analysis.

This is a shift from:

Static dashboards → Intelligent, reasoning-driven security systems

What Problems Lakewatch Actually Solves

Let's break it down more concretely.

1. Eliminates Cost-Driven Blind Spots

You no longer have to choose between:

  • Cost
  • Coverage

You can retain and analyze everything.

2. Breaks Data Silos

Security data is no longer isolated:

  • Correlate security + product + infrastructure data
  • Enable cross-domain investigations

3. Democratizes Threat Hunting

With natural language interfaces:

  • Non-experts can participate in investigations
  • Querying petabytes of data becomes accessible

4. Enables AI-Native Security Operations

Instead of bolting AI on top:

  • AI is embedded into ingestion, detection, and response

Lakewatch Prerequisites

Before getting started, a few foundational components are required:

  • Unity Catalog
  • SQL Serverless Warehouse
  • Serverless Job Compute

Additional setup:

  • Installed by Databricks account admin
  • OAuth configuration required for:
  • Installation
  • Lakewatch Web UI access

How Lakewatch Works (Behind the Scenes)

This is where things get interesting — because Lakewatch is deeply aligned with Lakehouse architecture patterns.

1. Data Ingestion & Normalization

Lakewatch ingests multi-modal security data:

  • Application logs
  • Infrastructure logs
  • Security telemetry

All data is normalized into the Open Cybersecurity Schema Framework (OCSF):

  • Standardized structure
  • Easier correlation across sources
  • Future-proof and open

2. Medallion Architecture for Security Data

Lakewatch structures data using the familiar Bronze → Silver → Gold pattern:

  • Bronze → Raw ingested logs
  • Silver → Cleaned and enriched data
  • Gold → Normalized OCSF-compliant datasets

These are created as managed tables within Unity Catalog:

  • Organized into dedicated schemas
  • Governed centrally
  • Queryable via SQL

3. Detection Engine

Detection rules are first-class citizens:

  • Written in SQL or PySpark
  • Used to identify suspicious behavior patterns

These rules:

  • Continuously scan incoming data
  • Trigger alerts based on defined conditions

4. Observability + Investigation Layer

Security teams can:

  • Query data directly using SQL
  • Apply filters and search patterns
  • Investigate across massive datasets in real time

5. AI-Powered Genie Workflows

This is the real differentiator.

Databricks Genie automates critical workflows:

  • Onboarding new log sources → auto parsing into OCSF
  • Generating new detection rules from threat intelligence
  • Reducing false positives by refining rules
  • Translating natural language → SQL queries

With Genie Spaces:

  • Analysts can ask:

"Show me suspicious login activity from new geolocations in the last 24 hours"

…and get results instantly.

Key Concepts/Terms You Need to Know

If you're working with Lakewatch, a few terms are essential:

Observables

A piece of data that identifies an entity:

  • IP address
  • Email
  • File hash
  • Username

Observables help:

  • Track behavior over time
  • Assign and adjust risk scores dynamically

Notables

Generated when detection rules are triggered.

Think of them as:

"Events that require attention"

Each Notable includes:

  • Summary of detected behavior
  • Timeline of activity
  • Associated observables

Presets

Preset is a specification that defines one or more ETL (Extract, Transform, Load) processes for transforming raw data into normalized datasets. Presets help you quickly create Datasources, which transform raw logs into output tables for downstream security analytics.

  • Lakewatch provides prebuilt presets for common sources of data. These presets are designed to help you create Datasources with minimal configuration.
  • You can also create, test and register custom presets for specific data sources.

Why This Matters Going Forward

We're entering a phase where:

  • Systems generate massive telemetry
  • Threats evolve faster than manual rule-writing
  • AI agents are both defenders and attackers

A SIEM that:

  • Stores all data
  • Is open and extensible
  • Uses AI as a core primitive

…is no longer optional — it's necessary.

Lakewatch feels like an early but important step in that direction.

Availability

Lakewatch is currently in private preview, with some first customers like Adobe, Dropbox

If you're interested in trying out Lakewatch, you'll need to reach out to your Databricks account team.

Final Thoughts

Lakewatch isn't just another feature in the Databricks ecosystem — it's a signal.

A signal that:

  • Security is becoming data-native
  • AI is becoming operational, not experimental
  • Open architectures are finally reaching the SOC

If this direction holds, the future of SIEM won't look like dashboards and alerts.

It will look like: Agents reasoning over your entire data universe in real time.

Next I'm considering a hands-on guide walking through Lakewatch installation, architecture setup, and practical usage example in a SOC workflow. If that's something you'd find valuable, let me know in the comments and follow for updates.