Lakeflow Jobs Development
FIRST: Use the parent databricks-core skill for CLI basics, authentication, profile selection, and data exploration commands.
Lakeflow Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Asset Bundles (DABs), Python SDK, or CLI.
Reference Files
| Use Case | Reference File | |----------|----------------| | Configure task types (notebook, Python, SQL, dbt, pipeline, JAR, run_job, for_each) | references/task-types.md | | Set up triggers and schedules (cron, periodic, file arrival, table update, continuous) | references/triggers-schedules.md | | Configure notifications, health rules, retries, timeouts, queues | references/notifications-monitoring.md | | Complete worked examples (ETL, warehouse refresh, event-driven, ML training, multi-env, streaming, cross-job) | references/examples.md |
Scaffolding a New Job Project
Use databricks bundle init with a config file to scaffold non-interactively. This creates a project in the <project_name>/ directory:
databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null
project_name: letters, numbers, underscores only
After scaffolding, create AGENTS.md in the project directory to provide agents with project guidance. Use this content:
# Declarative Automation Bundles Project
This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment.
## Prerequisites
Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS/Linux with Homebrew: `brew tap databricks/tap && brew install databricks`
- Linux without Homebrew: download the versioned archive from the [official releases](https://github.com/databricks/cli/releases), verify the published checksum, inspect its contents, and install the binary only after explicit approval
- Windows: `winget install Databricks.DatabricksCLI`
Never pipe a remote installer directly into a shell. Verify the installed binary with `databricks -v`.
## For AI Agents
Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.
If skills are not available, install them: `databricks aitools install`
Project Structure
my-job-project/
├── databricks.yml # Bundle configuration
├── resources/
│ └── my_job.job.yml # Job definition
├── src/
│ ├── my_notebook.ipynb # Notebook tasks
│ └── my_module/ # Python wheel package
│ ├── __init__.py
│ └── main.py
├── tests/
│ └── test_main.py
└── pyproject.toml # Python project config (if using wheels)
Quick Start
Asset Bundles (DABs) — recommended
# resources/jobs.yml
resources:
jobs:
my_etl_job:
name: "[${bundle.target}] My ETL Job"
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/notebooks/extract.py
Python SDK
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create(
name="my-etl-job",
tasks=[
Task(
task_key="extract",
notebook_task=NotebookTask(
notebook_path="/Workspace/Shared/etl/extract",
source=Source.WORKSPACE,
),
),
],
)
print(f"Created job: {job.job_id}")
CLI
databricks jobs create --json '{
"name": "my-etl-job",
"tasks": [{
"task_key": "extract",
"notebook_task": {
"notebook_path": "/Workspace/Shared/etl/extract",
"source": "WORKSPACE"
}
}]
}'
Core Concepts
Multi-Task Workflows
Jobs support DAG-based task dependencies:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.py
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.py
- task_key: load
depends_on:
- task_key: transform
run_if: ALL_SUCCESS # Only run if all dependencies succeed
notebook_task:
notebook_path: ../src/load.py
run_if conditions:
ALL_SUCCESS(default) — run when all dependencies succeedALL_DONE— run when all dependencies complete (success or failure)AT_LEAST_ONE_SUCCESS— run when at least one dependency succeedsNONE_FAILED— run when no dependencies failedALL_FAILED— run when all dependencies failedAT_LEAST_ONE_FAILED— run when at least one dependency failed
Task Types Summary
| Task Type | Use Case | Reference |
|-----------|----------|-----------|
| notebook_task | Run notebooks | references/task-types.md#notebook-task |
| spark_python_task | Run Python scripts | references/task-types.md#spark-python-task |
| python_wheel_task | Run Python wheels | references/task-types.md#python-wheel-task |
| sql_task | Run SQL queries/files/dashboards/alerts | references/task-types.md#sql-task |
| dbt_task | Run dbt projects | references/task-types.md#dbt-task |
| pipeline_task | Trigger SDP (formerly DLT) pipelines | references/task-types.md#pipeline-task |
| spark_jar_task | Run Spark JARs | references/task-types.md#spark-jar-task |
| run_job_task | Trigger other jobs | references/task-types.md#run-job-task |
| for_each_task | Loop over inputs | references/task-types.md#for-each-task |
Trigger Types Summary
| Trigger Type | Use Case | Reference |
|--------------|----------|-----------|
| schedule | Cron-based scheduling | references/triggers-schedules.md#cron-schedule |
| trigger.periodic | Interval-based | references/triggers-schedules.md#periodic-trigger |
| trigger.file_arrival | File arrival events | references/triggers-schedules.md#file-arrival-trigger |
| trigger.table_update | Unity Catalog table change events | references/triggers-schedules.md#table-update-trigger |
| continuous | Always-running jobs | references/triggers-schedules.md#continuous-jobs |
Compute Configuration
Job Clusters (recommended)
Define reusable cluster configurations shared across tasks:
job_clusters:
- job_cluster_key: shared_cluster
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
spark_conf:
spark.speculation: "true"
tasks:
- task_key: my_task
job_cluster_key: shared_cluster
notebook_task:
notebook_path: ../src/notebook.py
Autoscaling Clusters
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
autoscale:
min_workers: 2
max_workers: 8
Existing Cluster
tasks:
- task_key: my_task
existing_cluster_id: "0123-456789-abcdef12"
notebook_task:
notebook_path: ../src/notebook.py
Serverless Compute
For notebook and Python tasks, omit cluster configuration to use serverless:
tasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
# No cluster config = serverless
Job Parameters
Parameters defined at job level are passed to ALL tasks (no need to repeat per task):
parameters:
- name: env
default: "dev"
- name: date
default: "{{start_date}}" # Dynamic value reference
Access in notebooks:
catalog = dbutils.widgets.get("env")
load_date = dbutils.widgets.get("date")
Pass to specific tasks:
tasks:
- task_key: my_task
notebook_task:
notebook_path: ../src/notebook.py
base_parameters:
env: "{{job.parameters.env}}"
custom_param: "value"
Common Operations
Python SDK
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# List jobs
jobs = w.jobs.list()
# Get job details
job = w.jobs.get(job_id=12345)
# Run job now
run = w.jobs.run_now(job_id=12345)
# Run with parameters
run = w.jobs.run_now(
job_id=12345,
job_parameters={"env": "prod", "date": "2024-01-15"},
)
# Cancel run
w.jobs.cancel_run(run_id=run.run_id)
# Delete job
w.jobs.delete(job_id=12345)
CLI
# List jobs
databricks jobs list
# Get job details
databricks jobs get 12345
# Run job
databricks jobs run-now 12345
# Run with parameters (must use --json with job_id inside)
databricks jobs run-now --json '{"job_id": 12345, "job_parameters": {"env": "prod"}}'
# Cancel run
databricks jobs cancel-run 67890
# Delete job
databricks jobs delete 12345
Asset Bundle Operations
# Validate configuration
databricks bundle validate --profile <profile>
# Deploy to a target
databricks bundle deploy -t dev --profile <profile>
# Run a job
databricks bundle run <job_name> -t dev --profile <profile>
# Check run status
databricks jobs get-run --run-id <id> --profile <profile>
# Destroy resources
databricks bundle destroy --auto-approve
Permissions (DABs)
resources:
jobs:
my_job:
name: "My Job"
permissions:
- level: CAN_VIEW
group_name: "data-analysts"
- level: CAN_MANAGE_RUN
group_name: "data-engineers"
- level: CAN_MANAGE
user_name: "admin@example.com"
Permission levels:
CAN_VIEW— view job and run historyCAN_MANAGE_RUN— view, trigger, and cancel runsCAN_MANAGE— full control including edit and delete
Unit Testing
Run unit tests locally:
uv run pytest
Development Workflow
- Validate:
databricks bundle validate --profile <profile> - Deploy:
databricks bundle deploy -t dev --profile <profile> - Run:
databricks bundle run <job_name> -t dev --profile <profile> - Check run status:
databricks jobs get-run --run-id <id> --profile <profile>
Common Issues
| Issue | Solution |
|-------|----------|
| Job cluster startup slow | Use job clusters with job_cluster_key for reuse across tasks |
| Task dependencies not working | Verify task_key references match exactly in depends_on |
| Schedule not triggering | Check pause_status: UNPAUSED and valid timezone |
| File arrival not detecting | Ensure path has proper permissions and uses cloud storage URL |
| Table update trigger missing events | Verify Unity Catalog table and proper grants |
| Parameter not accessible | Use dbutils.widgets.get() in notebooks |
| admins group error | Cannot modify admins permissions on jobs |
| Serverless task fails | Ensure task type supports serverless (notebook, Python) |
Related Skills
- databricks-dabs — DABs configuration patterns shared by jobs and pipelines
- databricks-pipelines — SDP (formerly DLT) pipelines triggered by
pipeline_task