When you're building your first Databricks Asset Bundle (DAB), the temptation is strong: copy-paste from your existing jobs, hardcode the IDs, and call it done. But this approach creates brittle, environment-locked configurations that break the moment anything changes.

Let's explore the evolution from hardcoded IDs to proper resource references, and why it matters for maintainable DAB projects.

If you are not yet a member of Medium, you can access the extended version on the SunnyData blog for free.

None

The Problem: hardcoded resource IDs

Here's what most teams start with when converting Databricks jobs to YAML:

- task_key: sql_query
          sql_task:
            file: 
              path: ../sql/create_demo_table.dbquery.ipynb
            warehouse_id: 8648cb3bc8783615

This works, until it doesn't. The issues with hardcoded IDs include:

  • Environment lock-in: you can't promote this code to staging or production without manual modifications
  • Fragility: if the warehouse gets recreated or modified, your job breaks silently
  • Maintenance overhead: Every resource reference needs manual tracking and updating
  • Poor collaboration: team members can't easily understand which resources are being used

The same problem affects job_id, pipeline_id, compute_id, and virtually every other resource reference in your bundles.

Solution #1: Lookups (The Stepping Stone)

The first improvement most teams make is implementing resource lookups by name:

So the first solution usually is to implement a lookup:

variables:
  warehouse_id:
    lookup:
      warehouse: 'SQL Warehouse'

Then reference it as a variable:

warehouse_id: ${var.warehouse_id}

This is better than hardcoding, but it introduces new problems:

Why lookups fall short

1. Timing Issues

Lookups search for already-deployed resources, not resources defined in your bundle. During deployment, resources are created sequentially. If your job references a warehouse that hasn't been deployed yet, the lookup fails — even though the warehouse is defined in your bundle.

Consider this warehouse definition:

resources:
  sql_warehouses:
    small_warehouse:
      name: "SQL Warehouse"
      cluster_size: "2X-Small"
      auto_stop_mins: 1
      enable_serverless_compute: true

When deploying the bundle, you might encounter an error like:

The warehouse will exist, just not yet when the lookup runs.

None

2. Weak Dependencies

Lookups rely on string matching against resource names. If someone renames "SQL Warehouse" to "SQL Warehouse — Small" in the UI, your deployment breaks without warning. There's no true dependency relationship — just a fragile name-based search.

3. Duplicate Name Conflicts

If multiple warehouses share the same name (perhaps across different workspaces or naming conflicts), lookups become ambiguous and unpredictable.

Solution #2: Resource References (The Right Way)

The robust solution is creating explicit references between bundle resources using the resource reference syntax.

How Resource References Work

Given this warehouse definition:

resources:
  sql_warehouses:
    small_warehouse:

You reference it using the bundle resource key (small_warehouse), not the display name:

warehouse_id: ${resources.sql_warehouses.small_warehouse.id}

Why This Approach Wins

1. Strong Dependencies

Databricks understands the relationship between your job and the warehouse. It automatically handles deployment ordering, ensuring the warehouse exists before creating dependent resources.

2. Refactoring Safety

Rename the warehouse display name? No problem. The reference uses the stable bundle key (small_warehouse), not the display name. Your deployments continue working.

3. Type Safety

If you mistype a resource reference, you get immediate feedback during validation rather than discovering issues during deployment.

4. Cross-Environment Portability

Resource references work identically across dev, staging, and production. The same YAML deploys everywhere without modification.

Complete example: job with resource references

Here's a practical example showing multiple resource references:

resources:
  sql_warehouses:
    analytics_warehouse:
      name: "Analytics SQL Warehouse"
      cluster_size: "Medium"
      auto_stop_mins: 10
      enable_serverless_compute: true
  pipelines:
    data_ingestion_pipeline:
      name: "Data Ingestion Pipeline"
      target: "production"
      # ... pipeline configuration
  jobs:
    daily_analytics_job:
      name: "Daily Analytics Job"
      
      tasks:
        - task_key: run_ingestion
          pipeline_task:
            pipeline_id: ${resources.pipelines.data_ingestion_pipeline.id}
        
        - task_key: analytics_query
          depends_on:
            - task_key: run_ingestion
          sql_task:
            file:
              path: ../sql/daily_analytics.sql
            warehouse_id: ${resources.sql_warehouses.analytics_warehouse.id}
      
      schedule:
        quartz_cron_expression: "0 0 6 * * ?"
        timezone_id: "UTC"

Notice how the job references both the pipeline and the warehouse using resource references. Databricks automatically understands the dependencies and deployment order.

Available resource types for references

Your DAB bundles can manage and reference these resource types:

Compute Resources:

  • clusters - Interactive and job clusters
  • sql_warehouses - SQL warehouses and endpoints

Data Pipeline Resources:

  • pipelines - Delta Live Tables pipelines
  • quality_monitors - Data quality monitoring

Data Assets:

  • catalogs - Unity Catalog catalogs
  • schemas - Database schemas
  • volumes - Unity Catalog volumes
  • tables - Managed and external tables

ML & Analytics:

  • experiments - MLflow experiments
  • models - MLflow models
  • registered_models - Model Registry entries
  • model_serving_endpoints - Model serving infrastructure

Jobs & Apps:

  • jobs - Workflow jobs
  • apps - Databricks Apps

Other:

  • dashboards - Lakeview dashboards
  • secret_scopes - Secret management

Best Practices for Resource References

  1. Always use resource references for bundle-managed resources instead of lookups.
  2. Reserve lookups for external resources not managed by your bundle (e.g., shared warehouses managed by another team)
  3. Use descriptive bundle keys that clearly indicate the resource purpose (e.g., analytics_warehouse not warehouse_1)
  4. Organize resources in separate YAML files for better maintainability (e.g., warehouses.yml, jobs.yml, pipelines.yml)
  5. Document dependencies in comments for complex resource relationships

Migration Strategy: From Lookups to References

If you're currently using lookups, here's how to migrate:

Before (using lookup):

variables:
  warehouse_id:
    lookup:
      warehouse: 'SQL Warehouse'
resources:
  jobs:
    my_job:
      tasks:
        - warehouse_id: ${var.warehouse_id}

After (using reference):

resources:
  sql_warehouses:
    main_warehouse:
      name: "SQL Warehouse"
      # ... configuration
  jobs:
    my_job:
      tasks:
        - warehouse_id: ${resources.sql_warehouses.main_warehouse.id}

The migration is straightforward: define the resource in your bundle, then reference it by its bundle key.

Conclusion

Moving from hardcoded IDs to lookups is progress, but resource references are the mature solution for DAB projects. They provide strong dependencies, deployment ordering, refactoring safety, and true environment portability.

By correctly referencing resources within your bundles, you create configurations that are maintainable, reliable, that work across environments, and are easy for teams to understand and modify.

None
Hubert Dudek (author)

If you like this blog post, consider buying me a coffee :-) https://ko-fi.com/hubertdudek