General Exam Tips
- 1.Read ALL answer options before selecting — Databricks SQL questions often have two plausible answers where one is subtly more correct
- 2.The exam tests reasoning, not syntax memorization — 'when and why' matters more than 'exact command'
- 3.Flag difficult questions and return to them — with 45 questions in 90 minutes you have 2 minutes per question
- 4.Eliminate obviously wrong answers first — questions often have one clearly wrong option that simplifies the choice
- 5.Watch for qualifiers: 'most efficiently', 'best approach', 'minimal configuration' — these steer toward specific platform-native answers
- 6.When two answers seem correct, prefer the Databricks-native feature over a workaround or external tool
- 7.The exam is entirely conceptual — no access to a live environment. Questions test decision-making in described scenarios
- 8.SQL Programming is 29% of the exam — prioritize this domain above all others if time is short
- 9.Never guess early — read the full scenario before deciding. Context changes which answer is correct
Quick Navigation
SQL Programming
Must-Know Facts
- SQL logical execution order determines what is legal in each clause — FROM runs first (tables resolved), then WHERE (row filtering), then GROUP BY, HAVING, SELECT (aliases created here), ORDER BY, LIMIT. Exam questions exploit this: aliases defined in SELECT are not yet visible to WHERE or HAVING; use the full expression or a CTE to work around this
- Window functions require OVER clause and cannot appear directly in WHERE — use QUALIFY, a subquery, or a CTE instead
- QUALIFY is a Databricks-specific clause that filters on window function results without wrapping in a subquery: QUALIFY ROW_NUMBER() OVER (...) = 1
- RANK() skips numbers after ties (1,2,2,4). DENSE_RANK() does not skip (1,2,2,3). ROW_NUMBER() always assigns a unique number
- SUM() OVER (PARTITION BY x ORDER BY y) = running total. SUM() OVER (PARTITION BY x) without ORDER BY = partition-wide total
- GROUP BY collapses rows to one per group. Window functions (OVER) preserve every original row with additional computed columns
- INSERT INTO appends rows. INSERT OVERWRITE replaces all existing data — they are not interchangeable
- ROLLUP generates hierarchical subtotals following column order. CUBE generates ALL possible subtotal combinations. ROLLUP returns fewer rows than CUBE
- LEFT ANTI JOIN returns rows from the left table that have NO match on the right — the opposite of a SEMI JOIN
- EXPLODE flattens an array column into individual rows. ANY() or ARRAY_CONTAINS() tests membership without flattening
- PIVOT rotates rows to columns. UNPIVOT rotates columns to rows. Both are testable syntax in Databricks SQL
- CASE WHEN is evaluated top-to-bottom — the first matching condition wins. Subsequent conditions are not evaluated
Common Traps
Confusing Pairs
Scenario Tips
Question asks for 'exactly one row per group' with an aggregated metric
Use GROUP BY with an aggregate function (SUM, COUNT, AVG). This is the correct choice when you want one output row per distinct group value
Window function with OVER (PARTITION BY) — this keeps all original rows, not one per group
Question asks to keep all rows but add a ranking, running total, or comparison to previous row
Use a window function (ROW_NUMBER, RANK, LAG, SUM OVER). Window functions are designed to add computed columns without collapsing the row count
GROUP BY — this would remove the individual rows that the question specifically asks to preserve
Question asks to deduplicate records keeping the most recent row per ID
Use ROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) with QUALIFY rn = 1, or wrap in a subquery and filter WHERE rn = 1
DISTINCT — DISTINCT removes duplicate rows only when all columns are identical, not 'most recent per group'
Question asks to generate sales subtotals at each level of a hierarchy (year, then year-quarter, then grand total)
Use GROUP BY year, quarter WITH ROLLUP. ROLLUP respects column order and generates the hierarchical subtotals
CUBE — CUBE generates all possible combinations including quarter-only subtotals that don't make sense in a strict hierarchy
Last-Minute Facts
Databricks SQL
Must-Know Facts
- Three SQL warehouse types: Classic (manual config, startup delay), Pro (adds governance features and system tables), Serverless (instant startup, auto-scale, no cluster management)
- Serverless SQL warehouses start in seconds — if a question mentions startup latency, intermittent workloads, or cost optimization, Serverless is almost always the answer
- Data Explorer: graphical metadata browser for tables, schemas, column info, permission management — no SQL required. Transfer ownership by clicking the Owner field
- Query History: read-only log of past queries with execution metrics, timing, and query plans — it cannot schedule queries or create workflows
- Query Profile: visual operator tree for a completed query — use it to find which step used the most memory or took longest (not just total runtime)
- DESCRIBE shows table schema. DESCRIBE EXTENDED reveals the Location field, which tells you if a table is managed or external
- Photon engine: vectorized execution using SIMD CPU operations — speeds up SQL aggregations and scans without code changes, but does not replace proper query optimization
- Delta Lake transaction log records every write operation — it is what enables ACID guarantees, time travel, and schema enforcement on the Lakehouse
Common Traps
Confusing Pairs
Scenario Tips
Question asks about reducing startup latency for a scheduled query that runs twice a day
Migrate to a Serverless SQL warehouse — it eliminates startup delays and auto-scales
Increasing cluster size on a Classic warehouse increases cost but does not reduce startup time
Question asks how to identify whether a table is managed or external without prior knowledge
Run DESCRIBE EXTENDED table_name and check the Location field. Managed tables show a Unity Catalog-managed path; external tables show your cloud storage path
DESCRIBE alone does not reveal the table type
Question asks a data analyst to view column types and who has access to a table, without writing SQL
Use Data Explorer — it shows metadata, data preview, and permission grants in a graphical interface
Query Editor requires writing SHOW GRANTS or DESCRIBE queries
Last-Minute Facts
Data Management
Must-Know Facts
- Managed table DROP: deletes both metadata AND data files permanently. External table DROP: removes only the metadata — data files remain in cloud storage
- Unity Catalog three-level namespace: catalog.schema.table. All SQL must use fully qualified names across catalog and schema boundaries
- GRANT SELECT alone is insufficient — users also need GRANT USE CATALOG on the catalog and GRANT USE SCHEMA on the schema before table access works
- Medallion Architecture: Bronze = raw data as-is (append-only, may contain duplicates), Silver = cleaned and deduplicated, Gold = business-ready aggregates. Analysts query Gold
- Delta Lake time travel: SELECT * FROM table VERSION AS OF 3 (by version number) or TIMESTAMP AS OF '2026-01-15' (by date). View history with DESCRIBE HISTORY
- MERGE INTO is the upsert command — update matching rows, insert non-matching rows in one atomic operation
- system.Certified tag: ALTER TABLE t SET TAGS ('system.Certified' = 'true') marks an asset as production-ready in Unity Catalog
- PII compliance requires three considerations together: jurisdiction where data was collected, local regulations of that jurisdiction, AND the organization's internal governance standards
Common Traps
Confusing Pairs
Scenario Tips
Question asks what happens when DROP TABLE is run on a table, and you must decide whether data is lost
Check if the table is managed or external. Managed = data deleted. External = data preserved. If the question does not specify, look for hints: tables created with LOCATION clause are external; tables created without it are managed
IF EXISTS clause prevents errors but does not change deletion behavior
Question asks why a user cannot query a table despite having SELECT granted on it
They are missing USE CATALOG on the catalog and/or USE SCHEMA on the schema. Unity Catalog requires all three levels of permission. Grant USE CATALOG, USE SCHEMA, then SELECT
Regranting SELECT on the table will not fix the issue — the hierarchy must be satisfied
Question asks which Medallion layer an analyst should use for daily KPI reporting
Gold layer — it contains pre-aggregated, business-ready data. It is optimized for read performance and analytical queries
Silver is tempting because it is 'cleaner than Bronze,' but Silver is still an engineering layer for data validation, not analyst-facing aggregations
Last-Minute Facts
Data Visualization and Dashboarding
Must-Know Facts
- Dashboard components: query-based visualizations, markdown text boxes, filter widgets, and parameterized queries — know all four
- Markdown text boxes are the correct way to add section headers and labels to dashboards — not SQL comments, not query result columns
- Parameterized queries use {{ parameter_name }} syntax to accept dynamic user input from dashboard filter widgets
- Dashboard refresh frequency is bounded by the data pipeline interval — a pipeline that batches every 5 minutes cannot produce dashboard updates faster than 5 minutes, regardless of the dashboard's refresh setting
- Dashboard sharing does NOT grant table access — viewers see data based on their own Unity Catalog permissions. Sharing a dashboard to someone without SELECT on the underlying tables will show errors
- Scheduled refresh requires an active SQL warehouse — with Classic warehouses, the startup delay can affect refresh timing. Serverless avoids this problem
- Visualization selection rules: trends over time = line chart; category comparison = bar chart; part-of-whole with fewer than 5 categories = pie chart; two-dimensional intensity = heatmap; geographic distribution = choropleth map; single KPI = counter
Common Traps
Confusing Pairs
Scenario Tips
Question asks which visualization is best for showing monthly revenue trend over 24 months
Line chart — it is purpose-built for trends over time and handles many data points cleanly
Bar chart — workable but less ideal for continuous time series with many points; often presented as a distractor
Question asks how to add a readable section header to a dashboard that business stakeholders can see
Add a markdown text box and use ## for headings or bold text
SQL comments (--) or query result columns are not visible to dashboard viewers as labels
Question asks why dashboard viewers see access errors even though the dashboard was shared with them
Sharing a dashboard does not grant Unity Catalog table permissions. The viewers need SELECT and USE grants on the underlying tables/schemas separately
Thinking the dashboard owner needs to re-share or that the dashboard permissions are broken
Question asks which chart type best shows the contribution of each sales region to total company revenue
Pie chart — this is a classic part-to-whole question. As long as there are fewer than 5 regions, a pie chart is appropriate. If there are 8+ regions, a bar chart is better because pie charts become unreadable with many slices
Bar chart — works technically but is not the best answer for part-to-whole when region count is small; the exam tests that you know pie chart's specific use case
Last-Minute Facts
Analytics Applications
Must-Know Facts
- Databricks SQL is complementary to BI tools, not a replacement — the exam tests this positioning explicitly. When a scenario asks 'should the team replace Tableau with Databricks SQL?' the answer is always no. Databricks SQL is the right answer for ad-hoc exploration, prototyping, and SQL-native dashboards; Tableau/Power BI win for enterprise distribution and pixel-perfect formatting
- Partner Connect is the exam's correct answer whenever a question asks about connecting a BI tool or ingestion tool with 'minimal configuration' or 'least effort.' It provisions the warehouse and connector in one workflow. The wrong answer is always the manual path: create warehouse → export credentials → configure connector in the third-party tool. If Partner Connect is listed as an option, pick it
- Genie Spaces respects Unity Catalog permissions — the exam will test this with a scenario like 'a non-technical user uses Genie Spaces but still cannot see certain data.' The answer is always that Genie Spaces runs queries under the user's own identity and permissions; it does not grant additional access. It is for business users who lack SQL skills, not a way to bypass governance
- Databricks Marketplace delivers data via Delta Sharing — the critical exam judgment is that this means NO data replication and NO ETL pipeline. Exam distractors will offer 'set up an ingestion pipeline to import the Marketplace dataset into your catalog,' which is wrong. The data is queried in-place, always current, and governed by the provider. You cannot write back to it
- SQL alert statuses: OK (threshold not exceeded), TRIGGERED (threshold condition was met), UNKNOWN (not yet evaluated or query execution failed)
- SQL alerts only work on queries that return a single numeric value — queries returning multiple rows or non-numeric types cannot be used for alerts
- Date-type parameter widgets are incompatible with SQL alerts — only dropdown-type parameters work with alerts
- Databricks SQL is not appropriate for ML model training, complex Python ETL pipelines, or deep Spark transformations — those belong in notebooks or Lakeflow Jobs
Common Traps
Confusing Pairs
Scenario Tips
Question asks how to connect Tableau to Databricks with minimal manual configuration
Use Partner Connect — it provisions the SQL warehouse and configures the connection automatically
Manually creating a SQL warehouse and configuring JDBC/ODBC settings — more steps and not the recommended approach
Question says an alert has UNKNOWN status — what does this mean?
The alert has not been evaluated yet (it was just created or the schedule hasn't run), OR the underlying query failed to execute. UNKNOWN is not an error with the alert configuration — it means no evaluation result exists yet
TRIGGERED (which means the condition was met) or that the alert is disabled
Question asks which tool a non-technical business analyst should use to explore data by asking questions in plain English
Genie Spaces — it translates natural language to SQL and returns results without requiring the analyst to write code
Data Explorer (used for metadata browsing) or Databricks Notebooks (requires Python/SQL coding)
Question asks which project is most appropriate for Databricks SQL
Ad-hoc analysis, dashboards, scheduled reports, threshold alerts on business metrics — anything that is SQL-driven and exploratory
ML model training, Python-based ETL pipelines, or Spark streaming — these belong outside Databricks SQL