You Can Pass This Exam For Free
Choose Your Study Path
No prior Spark or data engineering experience. You'll build foundational knowledge from scratch over 3 weeks.
Exam Overview
Format
45 questions, 90 minutes. Multiple choice (single select and multiple select). SQL-first — questions prefer SQL syntax, falling back to Python when SQL isn't applicable.
Scoring
Pass/fail based on percentage score. Passing: 70%. No penalty for wrong answers — always guess if unsure.
Domains & Weights
- Databricks Intelligence Platform6%
- Data Ingestion and Loading21%
- Data Transformation and Modeling21%
- Working with Lakeflow Jobs12%
- Implementing CI/CD12%
- Troubleshooting, Monitoring, and Optimization15%
- Governance and Security12%
Registration
$200 USD. Available through Kryterion testing centers or online proctored. Schedule at databricks.com/certification.
Topic Priority Table
Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.
Databricks Intelligence Platform
The smallest domain at ~6%, but it sets the foundation. Tests your understanding of the Databricks Lakehouse architecture, Delta Lake basics, Unity Catalog fundamentals, and compute services. These are conceptual questions — know the 'what' and 'why' rather than detailed syntax.
Key Topics
Must-Know Concepts
- The Lakehouse combines data lake flexibility (open formats, cheap storage) with data warehouse reliability (ACID transactions, schema enforcement)
- Delta Lake is the storage format that enables Lakehouse features: ACID, time travel, schema enforcement, and audit history
- Unity Catalog provides centralized governance: three-level namespace (catalog.schema.table), access controls, data lineage
- Compute types: All-Purpose Clusters (interactive development), Job Clusters (automated workloads), SQL Warehouses (SQL analytics)
- Serverless compute: instant startup, no cluster configuration required, billed per DBU — available for SQL Warehouses, Jobs, and DLT pipelines
- Databricks Connect: allows running Spark code from a local IDE (VS Code, IntelliJ) against remote Databricks compute
Common Exam Traps
Data Ingestion and Loading
A major domain at ~21% with 7 sub-objectives. Tests your ability to load data into Delta tables using COPY INTO, Auto Loader, Lakeflow Connect, and JDBC/ODBC. You must know the SQL syntax for each approach, when to use which, and how to handle schema evolution and error records.
Key Topics
Must-Know Concepts
- COPY INTO syntax: COPY INTO target_table FROM 'source_path' FILEFORMAT = format FORMAT_OPTIONS ('key' = 'value')
- Auto Loader uses cloudFiles format: spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load('path')
- Auto Loader schema inference automatically detects schema from data. Use rescuedDataColumn to capture schema mismatches instead of failing
- COPY INTO is idempotent per file path — it tracks which files have been loaded and skips them on subsequent runs
- Lakeflow Connect provides managed connectors for SaaS sources (Salesforce, databases) with built-in CDC support
- File formats supported: JSON, CSV, Parquet, Avro, ORC, text. Know FORMAT_OPTIONS for each (header, delimiter, multiLine)
- Lakehouse Federation: query external databases (PostgreSQL, Snowflake, BigQuery) in-place via Unity Catalog foreign tables without copying data
- trigger(availableNow=True): processes all available data then stops — combines streaming benefits (checkpoint tracking) with batch semantics
Common Exam Traps
Data Transformation and Modeling
Another major domain at ~21% with 7 sub-objectives. Covers data cleaning, joins, column manipulation, deduplication, Spark tuning for transformations, building Gold-layer aggregations, and data quality enforcement. Expect SQL-heavy questions on MERGE INTO, window functions, and data quality constraints.
Key Topics
Must-Know Concepts
- MERGE INTO syntax for upserts: MERGE INTO target USING source ON condition WHEN MATCHED THEN UPDATE WHEN NOT MATCHED THEN INSERT
- Deduplication using window functions: ROW_NUMBER() OVER (PARTITION BY key ORDER BY timestamp DESC) to keep the latest record
- Column manipulation: ALTER TABLE ADD/DROP COLUMN, CAST for type changes, COALESCE for null handling
- Data quality: CHECK constraints (ALTER TABLE ADD CONSTRAINT), NOT NULL constraints, and expectations in Delta Live Tables
- Gold layer design: pre-aggregated tables optimized for business reporting, built from Silver-layer cleaned data
- Join types: INNER, LEFT, RIGHT, FULL OUTER, CROSS, SEMI, ANTI — know when to use each
Common Exam Traps
Working with Lakeflow Jobs
This domain covers ~12% with 4 sub-objectives. Tests your ability to configure and manage multi-task workflows including control flows (if/else conditions), task configuration, DAG dependencies between tasks, and job scheduling triggers. Know how to build, schedule, and troubleshoot production workflows.
Key Topics
Must-Know Concepts
- Jobs consist of one or more tasks with DAG dependencies: sequential, parallel, or conditional (if/else) execution
- Task types: Notebook, Python script, SQL, JAR, Delta Live Tables pipeline, dbt — know when to use each
- Control flow tasks: If/Else conditions based on task values, allowing dynamic pipeline branching
- Scheduling triggers: cron-based (periodic), continuous (runs immediately after previous completes), file arrival (triggers on new data)
- Job clusters vs all-purpose clusters: job clusters are created for the run and terminated after — more cost-effective for production
- Retry policies: configure max retries and timeout per task to handle transient failures
- Delta Live Tables (DLT): declarative ETL framework — define transformations with @dlt.table (Python) or CREATE STREAMING LIVE TABLE (SQL), and the system manages orchestration and data quality
- DLT expectations: data quality constraints using @dlt.expect, @dlt.expect_or_drop, @dlt.expect_or_fail to validate data during pipeline runs
- Repair Run: re-execute only failed and downstream tasks in a job, preserving results of successful tasks to avoid reprocessing
Common Exam Traps
Implementing CI/CD
This domain covers ~12% with 4 sub-objectives. Tests your ability to implement CI/CD workflows using Git Folders (formerly Repos), Declarative Automation Bundles (formerly DABs), the Databricks CLI, and environment configuration for dev/staging/prod promotion.
Key Topics
Must-Know Concepts
- Git Folders: clone remote repos, create branches, commit changes, push/pull — all within the Databricks workspace UI
- Declarative Automation Bundles (formerly Databricks Asset Bundles / DABs): define Databricks resources in databricks.yml, deploy across environments with 'databricks bundle deploy'
- Databricks CLI: authenticate, deploy bundles, manage workspace resources from the command line
- Environment promotion: use separate catalogs or schemas per environment (dev/staging/prod) in Unity Catalog
- Bundle targets: define dev, staging, and prod targets in databricks.yml with different workspace URLs and permissions
Common Exam Traps
Troubleshooting, Monitoring, and Optimization
This domain covers ~15% with 5 sub-objectives. Tests your ability to analyze job performance trends, monitor pipeline health, interpret Spark UI to find bottlenecks, use Liquid Clustering for optimization, and diagnose cluster issues. Expect scenario-based questions about fixing slow or failing jobs.
Key Topics
Must-Know Concepts
- Spark UI: understand stages, tasks, shuffle read/write, spill to disk, and executor timeline for diagnosing performance issues
- Data skew: one partition has significantly more data than others, causing a single task to be much slower — fix with salting or repartitioning
- Shuffle operations: wide transformations (joins, groupBy, distinct) require data redistribution across executors — minimize unnecessary shuffles
- Liquid Clustering: replaces static partitioning with CLUSTER BY — adapts to query patterns without full data rewrites
- OPTIMIZE: compacts small files into larger ones for better read performance. VACUUM: removes old data files beyond the retention period
Common Exam Traps
Governance and Security
This domain covers ~12% with 4 sub-objectives. Tests your knowledge of managed vs external tables, Unity Catalog access controls (GRANT/REVOKE), column masking for sensitive data, row-level security with row filters, and Attribute-Based Access Control (ABAC) policies. Know the exact SQL syntax for granting and revoking permissions.
Key Topics
Must-Know Concepts
- GRANT syntax: GRANT privilege ON object_type object_name TO principal (e.g., GRANT SELECT ON TABLE catalog.schema.orders TO analyst_group)
- REVOKE syntax: REVOKE privilege ON object_type object_name FROM principal
- Managed tables: data stored in Unity Catalog-managed location. DROP TABLE deletes both metadata and data files
- External tables: data stored in customer-managed location. DROP TABLE removes only metadata from the catalog
- Column masking: apply a SQL function to mask column values based on the querying user's identity or group membership
- Row filters: apply a SQL predicate to filter rows based on the querying user — users only see rows they're authorized to access
- Data lineage: Unity Catalog automatically captures column-level lineage — tracks how data flows between tables, notebooks, and jobs, viewable in Catalog Explorer
Common Exam Traps
Key Databricks Concepts Compared
These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.
Top Mistakes to Avoid
Exam-Ready Checklist
Recommended Resources
Free & Official Resources
Paid Courses & Practice Exams
These are recommended if you prefer a structured learning path. They can save time but are not required to pass.