What are Lakeflow Jobs?
Lakeflow Jobs (formerly Databricks Workflows/Jobs) is the native orchestration service in Databricks. It allows you to define multi-task pipelines as directed acyclic graphs (DAGs) and schedule them to run automatically. Each job consists of one or more tasks with defined dependencies between them.
Task Types
Common task types in a Lakeflow Job: - Notebook task: Run a Databricks notebook - SQL query task: Execute a SQL statement - Pipeline task: Run a Delta Live Tables / Lakeflow Spark pipeline - Dashboard task: Refresh a SQL dashboard Tasks are connected as a DAG — each task defines which tasks it depends on. Tasks only run after all their dependencies complete successfully.
Trigger Types
Three ways to trigger a Lakeflow Job: 1. Scheduled (cron): Run at fixed intervals (hourly, daily, etc.) 2. File arrival: Trigger when new files appear in a cloud storage location 3. Table update: Trigger when a Delta table is updated Choosing the right trigger: - Use scheduled triggers for regular, time-based processing - Use file arrival triggers for event-driven ingestion - Use table update triggers for dependency chains between pipelines
Control Flow
Lakeflow Jobs support control flow patterns: - Retries: Automatically retry failed tasks (configurable max retries and timeout) - Conditional tasks: Branch execution based on task results (if/else) - For-each loops: Iterate over a list of values, running a task for each These patterns are essential for building robust production pipelines that handle failures gracefully.
Job DAG Example:
[Ingest Bronze] → [Transform Silver] → [Aggregate Gold]
↓ ↓
[on failure: notify] [Refresh Dashboard]
↓
[Send Report Email]