You Can Pass This Exam For Free
Choose Your Study Path
You passed the DE Associate exam and have 6–12 months of Databricks experience. You need to level up on advanced topics like streaming internals, testing strategies, and production monitoring.
Exam Overview
Format
60 questions, 120 minutes. Multiple choice (single select and multiple select). Scenario-heavy — most questions present 3–5 sentence production scenarios requiring you to choose the best approach. Covers advanced topics not on the Associate exam.
Scoring
Pass/fail based on percentage score. Passing: 70%. No penalty for wrong answers — always guess if unsure. Questions are weighted equally across all domains.
Domains & Weights
- Data Modeling and Design16%
- Data Processing30%
- Data Governance and Security18%
- Monitoring, Logging, and Optimization16%
- Testing and Deployment20%
Registration
$200 USD. Available through Kryterion testing centers or online proctored. Schedule at databricks.com/certification. Prerequisite: None officially required, but Databricks recommends passing the Associate exam first and having 1+ years of production Databricks experience.
Topic Priority Table
Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.
Data Modeling and Design
Covers ~16% with focus on production data modeling patterns. Tests your ability to design star schemas, implement SCD Type 1 and Type 2 patterns, choose between views and materialized views, design fact and dimension tables, and apply the Medallion Architecture at scale. Expect scenario-based questions about trade-offs.
Key Topics
Must-Know Concepts
- SCD Type 1: overwrite current values using MERGE INTO with WHEN MATCHED THEN UPDATE SET * — no history preserved
- SCD Type 2: preserve history by closing the current record (set end_date, is_current=false) and inserting a new version with start_date and is_current=true
- Star schema design: central fact tables (transactions, events) surrounded by dimension tables (customers, products, dates) joined via foreign keys
- Views vs materialized views: views recompute on every query (always fresh), materialized views precompute results (faster reads, may be stale)
- Generated columns: automatically computed from other columns using GENERATED ALWAYS AS syntax — useful for derived fields like date parts or hashes
- Medallion Architecture at scale: Bronze (raw/append-only), Silver (cleaned/validated/deduplicated), Gold (business aggregates) — professional exam tests advanced Silver-to-Gold patterns
Common Exam Traps
Data Processing
The largest domain at ~30%. Tests advanced data processing patterns: complex Structured Streaming pipelines, CDC processing with MERGE and Change Data Feed, stream-static and stream-stream joins, watermark handling, incremental processing with trigger modes, batch optimization, and multi-hop pipeline design. Expect deep scenario questions.
Key Topics
Must-Know Concepts
- Watermarks define the maximum allowed lateness for event-time data: withWatermark('event_time', '10 minutes') drops data arriving more than 10 minutes late
- Stream-static joins: the static DataFrame is loaded once at stream start and NOT refreshed. Use for slowly-changing reference data only
- Stream-stream joins require watermarks on both streams to bound state and prevent unbounded memory growth
- foreachBatch pattern: apply batch operations (MERGE, JDBC writes) to each streaming micro-batch — standard for streaming-to-Delta upserts
- Change Data Feed: enable with delta.enableChangeDataFeed = true, read with spark.read.option('readChangeFeed', 'true').table('my_table') or table_changes('my_table', start_version)
- trigger(availableNow=True) processes all available data across multiple micro-batches then stops — preferred over trigger(once=True) for large backlogs
- CDC processing pattern: stage incoming changes, deduplicate by key taking the latest change, then MERGE into the target table
- Output modes: append (new rows only, default for non-aggregation), update (changed rows only), complete (full result table, only for aggregations)
Common Exam Traps
Data Governance and Security
Covers ~18% of the exam. Tests advanced governance: column masking with custom SQL functions, row-level security with row filters, information_schema for metadata queries, audit logging, secrets management, dynamic views for access control, data lineage, and compliance patterns like GDPR deletion.
Key Topics
Must-Know Concepts
- Column masking: CREATE FUNCTION mask_email(email STRING) RETURNS STRING RETURN CASE WHEN is_member('admin') THEN email ELSE '***@***' END; ALTER TABLE t ALTER COLUMN email SET MASK mask_email
- Row filters: CREATE FUNCTION region_filter(region STRING) RETURNS BOOLEAN RETURN (is_member('global_access') OR region = current_user_attribute('region'))
- Information schema: query system.information_schema.columns, tables, table_privileges to audit metadata programmatically
- Secrets management: dbutils.secrets.createScope(), dbutils.secrets.put(), dbutils.secrets.get() — secrets are redacted when printed
- Dynamic views: CREATE VIEW secure_view AS SELECT CASE WHEN is_member('admin') THEN ssn ELSE 'XXX-XX-XXXX' END AS ssn FROM table
- Data lineage in Unity Catalog: automatically tracked at column level — visible in Catalog Explorer, queryable via REST API
- GDPR deletion: use DELETE FROM + VACUUM to permanently remove personal data — verify with time travel that data is inaccessible after VACUUM
Common Exam Traps
Monitoring, Logging, and Optimization
Covers ~16% of the exam. Tests your ability to monitor production pipelines, interpret Spark UI metrics, diagnose performance issues (skew, spill, shuffle), configure alerts, optimize queries with AQE, analyze query plans, and manage Delta table maintenance (OPTIMIZE, VACUUM, file sizing).
Key Topics
Must-Know Concepts
- Spark UI stages view: identify data skew (one task much slower than others), shuffle spill to disk (memory insufficient), and straggler tasks
- Adaptive Query Execution (AQE): auto-coalesces shuffle partitions, converts sort-merge joins to broadcast joins at runtime, handles skewed joins automatically
- EXPLAIN command: analyze query plans to identify full table scans, missing predicates, and suboptimal join strategies
- Auto Compaction file sizing: adapts target file size based on write patterns — streaming MERGE jobs may produce 32–48 MB files instead of the default 1 GB
- Databricks SQL alerts: schedule queries and trigger notifications when results exceed thresholds — used for data quality monitoring and SLA tracking
- OPTIMIZE compacts small files for better read performance. VACUUM removes files older than the retention period. Both are essential for table maintenance
- Job monitoring: track run duration trends, failure rates, cluster utilization, and cost per job run. Use Databricks SQL dashboards for fleet-wide monitoring
Common Exam Traps
Testing and Deployment
Covers ~20% of the exam. Tests production deployment practices: unit testing PySpark code, integration testing with temporary tables, CI/CD with Databricks Asset Bundles, multi-environment promotion (dev/staging/prod), the Databricks REST API, notebook parameterization, job cluster configuration, and repair runs for partial failures.
Key Topics
Must-Know Concepts
- Unit testing PySpark: use pytest with a local SparkSession for testing transformation functions. Isolate logic into testable functions, not monolithic notebooks
- Integration testing: use temporary tables or dedicated test catalogs in Unity Catalog to validate end-to-end pipeline behavior without affecting production data
- Shallow clones for testing: CREATE TABLE test_table SHALLOW CLONE prod_table — provides a zero-copy snapshot for testing without duplicating production data
- Databricks Asset Bundles: define resources in databricks.yml with dev/staging/prod targets. Deploy with 'databricks bundle deploy -t production'
- Notebook parameterization: use dbutils.widgets.get() to retrieve parameters passed via the Jobs API or widget configuration
- REST API /jobs/create: creates a new job definition each call (not idempotent). Use /jobs/reset to update an existing job. /jobs/run-now triggers immediate execution
- Job repair runs: re-execute only failed and downstream tasks without re-running successful tasks — use the Repair Run feature from the Jobs UI or API
- Multi-environment promotion: use separate Unity Catalog catalogs (dev_catalog, staging_catalog, prod_catalog) and Asset Bundle targets for environment isolation
Common Exam Traps
Key Professional-Level Concepts Compared
These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.
Top Mistakes to Avoid
Exam-Ready Checklist
Recommended Resources
Free & Official Resources
Paid Courses & Practice Exams
These are recommended if you prefer a structured learning path. They can save time but are not required to pass.