CI/CD in Databricks
The Databricks CI/CD story has two main components: 1. Git Folders (formerly Repos): Version control integration in the workspace UI for developing notebooks 2. Declarative Automation Bundles (formerly DABs / Databricks Asset Bundles): Infrastructure-as-code for packaging and deploying Databricks assets across environments
Git Folders
Git Folders connect your Databricks workspace to a Git repository (GitHub, GitLab, Azure DevOps, etc.). Capabilities: - Clone repositories into workspace - Create branches, commit changes, and push - Pull updates from remote - Create pull requests via the Git provider's interface Git Folders are primarily for collaborative notebook development — allowing multiple developers to work on the same notebooks with proper version control.
Declarative Automation Bundles
Declarative Automation Bundles package Databricks assets (jobs, pipelines, notebooks, libraries) into a deployable unit that can be promoted across environments (dev → staging → production). Key concepts: - Bundle configuration: YAML files defining what to deploy - Environment variables: Override settings per target environment - Targets: Named deployment environments (dev, staging, prod) - Databricks CLI: The command-line tool to validate, deploy, and manage bundles
# Initialize a new bundle project
databricks bundle init
# Validate configuration
databricks bundle validate
# Deploy to dev environment
databricks bundle deploy -t dev
# Deploy to production
databricks bundle deploy -t production
# Run a deployed job
databricks bundle run my_job -t production