First Look inside Databricks Lakewatch: A Modern, Open, Agentic SIEM for the Agentic Era

Databricks just released Lakewatch few days back and this isn't just another SIEM release. It feels like a fundamental shift in how…

hitesh sahni

~5 min read · April 5, 2026 (Updated: April 5, 2026) · Free: No

Databricks just released Lakewatch few days back and this isn't just another SIEM release. It feels like a fundamental shift in how security platforms are designed for the modern data + AI stack.

For years, security teams have been operating under a constraint that shouldn't exist in the first place.

The Core Problem with Traditional SIEMs

Legacy SIEM platforms were built in a different era — one where data volume was smaller, architectures were closed, and AI-driven threats didn't exist.

The biggest issue?

They charge per byte ingested.

That single design decision leads to cascading problems:

Teams sample logs instead of storing everything
Valuable telemetry gets deleted or ignored
Detection accuracy suffers because you're working with partial visibility

Meanwhile, attackers? They operate with full visibility and zero constraints.

This isn't just a cost issue — it's an architectural flaw.

Enter Lakewatch: Reimagining SIEM with Lakehouse Architecture & AI

What Databricks has done with Lakewatch is apply the same open lakehouse principles that disrupted data warehousing to security operations.

Instead of pushing your data into a proprietary SIEM:

Your security data lives in your lakehouse in open formats like delta or Apache Iceberg
It sits alongside business and operational data
Built on open security standards (Open CyberSecurity Schema Framework)
It's governed via Unity Catalog

This changes everything:

No duplication
No vendor lock-in
No per-byte ingestion tax

You get full telemetry at scale which is exactly what modern detection systems need.

Databricks Lakewatch : Reimagining SEIM for the Agentic Era

Why "Agentic SIEM" Actually Matters

Lakewatch isn't just about storage or cost — it's built for the agentic era of security.

With AI-powered workflows driven by Genie agents:

Threat detection becomes adaptive and contextual
Analysts can query data in natural language
Detection rules can be generated and improved automatically

Under the hood, Lakewatch leverages models like Anthropic's Claude — bringing strong reasoning capabilities into threat correlation and analysis.

This is a shift from:

Static dashboards → Intelligent, reasoning-driven security systems

What Problems Lakewatch Actually Solves

Let's break it down more concretely.

1. Eliminates Cost-Driven Blind Spots

You no longer have to choose between:

Cost
Coverage

You can retain and analyze everything.

2. Breaks Data Silos

Security data is no longer isolated:

Correlate security + product + infrastructure data
Enable cross-domain investigations

3. Democratizes Threat Hunting

With natural language interfaces:

Non-experts can participate in investigations
Querying petabytes of data becomes accessible

4. Enables AI-Native Security Operations

Instead of bolting AI on top:

AI is embedded into ingestion, detection, and response

Lakewatch Prerequisites

Before getting started, a few foundational components are required:

Unity Catalog
SQL Serverless Warehouse
Serverless Job Compute

Additional setup:

Installed by Databricks account admin
OAuth configuration required for:
Installation
Lakewatch Web UI access

How Lakewatch Works (Behind the Scenes)

This is where things get interesting — because Lakewatch is deeply aligned with Lakehouse architecture patterns.

1. Data Ingestion & Normalization

Lakewatch ingests multi-modal security data:

Application logs
Infrastructure logs
Security telemetry

All data is normalized into the Open Cybersecurity Schema Framework (OCSF):

Standardized structure
Easier correlation across sources
Future-proof and open

2. Medallion Architecture for Security Data

Lakewatch structures data using the familiar Bronze → Silver → Gold pattern:

Bronze → Raw ingested logs
Silver → Cleaned and enriched data
Gold → Normalized OCSF-compliant datasets

These are created as managed tables within Unity Catalog:

Organized into dedicated schemas
Governed centrally
Queryable via SQL

3. Detection Engine

Detection rules are first-class citizens:

Written in SQL or PySpark
Used to identify suspicious behavior patterns

These rules:

Continuously scan incoming data
Trigger alerts based on defined conditions

4. Observability + Investigation Layer

Security teams can:

Query data directly using SQL
Apply filters and search patterns
Investigate across massive datasets in real time

5. AI-Powered Genie Workflows

This is the real differentiator.

Databricks Genie automates critical workflows:

Onboarding new log sources → auto parsing into OCSF
Generating new detection rules from threat intelligence
Reducing false positives by refining rules
Translating natural language → SQL queries

With Genie Spaces:

Analysts can ask:

"Show me suspicious login activity from new geolocations in the last 24 hours"

…and get results instantly.

Key Concepts/Terms You Need to Know

If you're working with Lakewatch, a few terms are essential:

Observables

A piece of data that identifies an entity:

IP address
Email
File hash
Username

Observables help:

Track behavior over time
Assign and adjust risk scores dynamically

Notables

Generated when detection rules are triggered.

Think of them as:

"Events that require attention"

Each Notable includes:

Summary of detected behavior
Timeline of activity
Associated observables

Presets

Preset is a specification that defines one or more ETL (Extract, Transform, Load) processes for transforming raw data into normalized datasets. Presets help you quickly create Datasources, which transform raw logs into output tables for downstream security analytics.

Lakewatch provides prebuilt presets for common sources of data. These presets are designed to help you create Datasources with minimal configuration.
You can also create, test and register custom presets for specific data sources.

Why This Matters Going Forward

We're entering a phase where:

Systems generate massive telemetry
Threats evolve faster than manual rule-writing
AI agents are both defenders and attackers

A SIEM that:

Stores all data
Is open and extensible
Uses AI as a core primitive

…is no longer optional — it's necessary.

Lakewatch feels like an early but important step in that direction.

Availability

Lakewatch is currently in private preview, with some first customers like Adobe, Dropbox

If you're interested in trying out Lakewatch, you'll need to reach out to your Databricks account team.

Final Thoughts

Lakewatch isn't just another feature in the Databricks ecosystem — it's a signal.

A signal that:

Security is becoming data-native
AI is becoming operational, not experimental
Open architectures are finally reaching the SOC

If this direction holds, the future of SIEM won't look like dashboards and alerts.

It will look like: Agents reasoning over your entire data universe in real time.

Next I'm considering a hands-on guide walking through Lakewatch installation, architecture setup, and practical usage example in a SOC workflow. If that's something you'd find valuable, let me know in the comments and follow for updates.

#security #databricks #ai #ai-agent #cybersecurity