CertPrepNow
DatabricksData Analyst Associate5 domains

Data Analyst Associate Exam Notes

Last-minute traps, must-know facts, and scenario tips for the Databricks Certified Data Analyst Associate exam.

General Exam Tips

  • 1.Read ALL answer options before selecting — Databricks SQL questions often have two plausible answers where one is subtly more correct
  • 2.The exam tests reasoning, not syntax memorization — 'when and why' matters more than 'exact command'
  • 3.Flag difficult questions and return to them — with 45 questions in 90 minutes you have 2 minutes per question
  • 4.Eliminate obviously wrong answers first — questions often have one clearly wrong option that simplifies the choice
  • 5.Watch for qualifiers: 'most efficiently', 'best approach', 'minimal configuration' — these steer toward specific platform-native answers
  • 6.When two answers seem correct, prefer the Databricks-native feature over a workaround or external tool
  • 7.The exam is entirely conceptual — no access to a live environment. Questions test decision-making in described scenarios
  • 8.SQL Programming is 29% of the exam — prioritize this domain above all others if time is short
  • 9.Never guess early — read the full scenario before deciding. Context changes which answer is correct
Domain 129% of exam

SQL Programming

Must-Know Facts

  • SQL logical execution order determines what is legal in each clause — FROM runs first (tables resolved), then WHERE (row filtering), then GROUP BY, HAVING, SELECT (aliases created here), ORDER BY, LIMIT. Exam questions exploit this: aliases defined in SELECT are not yet visible to WHERE or HAVING; use the full expression or a CTE to work around this
  • Window functions require OVER clause and cannot appear directly in WHERE — use QUALIFY, a subquery, or a CTE instead
  • QUALIFY is a Databricks-specific clause that filters on window function results without wrapping in a subquery: QUALIFY ROW_NUMBER() OVER (...) = 1
  • RANK() skips numbers after ties (1,2,2,4). DENSE_RANK() does not skip (1,2,2,3). ROW_NUMBER() always assigns a unique number
  • SUM() OVER (PARTITION BY x ORDER BY y) = running total. SUM() OVER (PARTITION BY x) without ORDER BY = partition-wide total
  • GROUP BY collapses rows to one per group. Window functions (OVER) preserve every original row with additional computed columns
  • INSERT INTO appends rows. INSERT OVERWRITE replaces all existing data — they are not interchangeable
  • ROLLUP generates hierarchical subtotals following column order. CUBE generates ALL possible subtotal combinations. ROLLUP returns fewer rows than CUBE
  • LEFT ANTI JOIN returns rows from the left table that have NO match on the right — the opposite of a SEMI JOIN
  • EXPLODE flattens an array column into individual rows. ANY() or ARRAY_CONTAINS() tests membership without flattening
  • PIVOT rotates rows to columns. UNPIVOT rotates columns to rows. Both are testable syntax in Databricks SQL
  • CASE WHEN is evaluated top-to-bottom — the first matching condition wins. Subsequent conditions are not evaluated

Common Traps

TrapUsing a window function in WHERE to filter rows
RealityWindow functions cannot appear in WHERE clauses. Either wrap the query in a subquery/CTE and filter the outer query, or use the Databricks-specific QUALIFY clause to filter window function results inline
TrapExpecting SUM() OVER (PARTITION BY x ORDER BY y) to return a per-group total
RealityAdding ORDER BY to a window aggregation makes it a running total, not a group total. To get a static per-group total, use SUM() OVER (PARTITION BY x) without ORDER BY — or just use GROUP BY
TrapConfusing RANK and DENSE_RANK when ties are present
RealityRANK jumps numbers after ties: if two rows tie at position 2, the next row is position 4. DENSE_RANK does not skip: the next row after two tied 2s is position 3. The exam will present output and ask you to identify the function
TrapThinking INSERT INTO replaces existing data like a truncate-and-load
RealityINSERT INTO always appends rows — it never deletes existing data. INSERT OVERWRITE is the destructive version that replaces all existing data. Mixing these up causes incorrect answers on data manipulation questions
TrapAssuming column aliases defined in SELECT can be used in WHERE
RealityWHERE is evaluated before SELECT in logical order. A column alias defined in SELECT does not exist yet when WHERE runs. You must use the full expression or a subquery/CTE to reference aliases in filters
TrapTreating ROLLUP and CUBE as interchangeable for subtotal generation
RealityROLLUP produces hierarchical subtotals following the column order you specify — useful for hierarchical data like year > quarter > month. CUBE generates every possible combination of grouping — more rows, more comprehensive. The exam tests which to use for a given reporting requirement
TrapUsing ARRAY_CONTAINS syntax for the ANY() test
RealityIn Databricks SQL, the correct syntax for checking array membership is `'value' = ANY(array_column)` or `ARRAY_CONTAINS(array_column, 'value')`. Using `array_column IN ('value')` does not work for array columns

Confusing Pairs

GROUP BY + aggregateWindow function (OVER)

GROUP BY collapses multiple rows into one per group — use it when you need summary totals (one row per region, one row per category). Window functions preserve every original row and add a calculated column alongside — use them when you need each row's value plus group context (e.g., each sale plus its region's total)

RANK()DENSE_RANK()

RANK() skips position numbers after ties — two rows tied at rank 2 means the next row is rank 4. DENSE_RANK() never skips — the next row after two tied rank-2s is rank 3. Choose based on whether gaps in the sequence matter for the business requirement

ROLLUPCUBE

ROLLUP = hierarchical subtotals following left-to-right column order (ideal for time hierarchies: year > quarter > month). CUBE = all possible group combinations regardless of order (ideal for cross-dimensional analysis). ROLLUP produces fewer rows; CUBE is more exhaustive

LEFT SEMI JOINLEFT ANTI JOIN

SEMI JOIN returns left rows that DO have a match on the right (like an existence filter). ANTI JOIN returns left rows that DO NOT have a match on the right (useful for finding unmatched records, e.g., orders without invoices)

QUALIFY clauseSubquery with WHERE

Both filter on window function results but QUALIFY is a Databricks-specific shortcut that filters inline without nesting. A subquery wraps the window function query and filters in the outer WHERE. QUALIFY is more concise; subqueries are more portable across SQL dialects

Scenario Tips

If the question asks about:

Question asks for 'exactly one row per group' with an aggregated metric

Answer:

Use GROUP BY with an aggregate function (SUM, COUNT, AVG). This is the correct choice when you want one output row per distinct group value

Distractor to avoid:

Window function with OVER (PARTITION BY) — this keeps all original rows, not one per group

If the question asks about:

Question asks to keep all rows but add a ranking, running total, or comparison to previous row

Answer:

Use a window function (ROW_NUMBER, RANK, LAG, SUM OVER). Window functions are designed to add computed columns without collapsing the row count

Distractor to avoid:

GROUP BY — this would remove the individual rows that the question specifically asks to preserve

If the question asks about:

Question asks to deduplicate records keeping the most recent row per ID

Answer:

Use ROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) with QUALIFY rn = 1, or wrap in a subquery and filter WHERE rn = 1

Distractor to avoid:

DISTINCT — DISTINCT removes duplicate rows only when all columns are identical, not 'most recent per group'

If the question asks about:

Question asks to generate sales subtotals at each level of a hierarchy (year, then year-quarter, then grand total)

Answer:

Use GROUP BY year, quarter WITH ROLLUP. ROLLUP respects column order and generates the hierarchical subtotals

Distractor to avoid:

CUBE — CUBE generates all possible combinations including quarter-only subtotals that don't make sense in a strict hierarchy

Last-Minute Facts

1SQL execution order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT
2QUALIFY is Databricks-specific — not standard SQL
3RANK skips numbers after ties; DENSE_RANK does not
4INSERT INTO = append; INSERT OVERWRITE = replace all
5ROLLUP = hierarchical subtotals; CUBE = all combinations
6LEFT ANTI JOIN = rows without a match (opposite of SEMI JOIN)
Domain 222% of exam

Databricks SQL

Must-Know Facts

  • Three SQL warehouse types: Classic (manual config, startup delay), Pro (adds governance features and system tables), Serverless (instant startup, auto-scale, no cluster management)
  • Serverless SQL warehouses start in seconds — if a question mentions startup latency, intermittent workloads, or cost optimization, Serverless is almost always the answer
  • Data Explorer: graphical metadata browser for tables, schemas, column info, permission management — no SQL required. Transfer ownership by clicking the Owner field
  • Query History: read-only log of past queries with execution metrics, timing, and query plans — it cannot schedule queries or create workflows
  • Query Profile: visual operator tree for a completed query — use it to find which step used the most memory or took longest (not just total runtime)
  • DESCRIBE shows table schema. DESCRIBE EXTENDED reveals the Location field, which tells you if a table is managed or external
  • Photon engine: vectorized execution using SIMD CPU operations — speeds up SQL aggregations and scans without code changes, but does not replace proper query optimization
  • Delta Lake transaction log records every write operation — it is what enables ACID guarantees, time travel, and schema enforcement on the Lakehouse

Common Traps

TrapThinking all three SQL warehouse types perform equally for intermittent workloads
RealityClassic warehouses have startup delays of several minutes — they are inefficient for workloads that run occasionally. Pro is Classic plus governance features. Serverless eliminates startup delays entirely. Questions about cost efficiency for bursty workloads point to Serverless
TrapUsing Query History to schedule or automate queries
RealityQuery History is purely diagnostic and read-only. It shows what ran, when, and how long it took — it cannot trigger, schedule, or automate anything. Scheduling requires dashboard refresh, SQL alerts, or Lakeflow Jobs
TrapAssuming Data Explorer is just for viewing data
RealityData Explorer can also manage permissions, preview data, and transfer table ownership — all without writing SQL. It is the preferred UI-based governance tool for operations that don't require the Query Editor
TrapThinking Photon eliminates the need for query optimization
RealityPhoton accelerates vectorized operations but does not fix poor query design, missing partitioning, or bad join order. The exam will present Photon as a performance enhancement, not a magic fix

Confusing Pairs

Query HistoryQuery Profile

Query History is the list of all previously executed queries — use it to find a past query, review its status, or see aggregate execution metrics. Query Profile is the visual execution plan for a single query — use it to drill into operator-level performance and identify which step is the bottleneck

Classic SQL WarehouseServerless SQL Warehouse

Classic = manual configuration, full control, startup delays of minutes, suitable for steady continuous workloads. Serverless = instant startup, auto-scales without configuration, pay-per-query, best for intermittent or unpredictable workloads. The exam uses startup latency and cost optimization as the key differentiators

DESCRIBE tableDESCRIBE EXTENDED table

DESCRIBE shows column names, data types, and comments. DESCRIBE EXTENDED adds Location, Owner, Created time, table type, and other metadata. Use DESCRIBE EXTENDED to determine if a table is managed or external by checking its Location field

Scenario Tips

If the question asks about:

Question asks about reducing startup latency for a scheduled query that runs twice a day

Answer:

Migrate to a Serverless SQL warehouse — it eliminates startup delays and auto-scales

Distractor to avoid:

Increasing cluster size on a Classic warehouse increases cost but does not reduce startup time

If the question asks about:

Question asks how to identify whether a table is managed or external without prior knowledge

Answer:

Run DESCRIBE EXTENDED table_name and check the Location field. Managed tables show a Unity Catalog-managed path; external tables show your cloud storage path

Distractor to avoid:

DESCRIBE alone does not reveal the table type

If the question asks about:

Question asks a data analyst to view column types and who has access to a table, without writing SQL

Answer:

Use Data Explorer — it shows metadata, data preview, and permission grants in a graphical interface

Distractor to avoid:

Query Editor requires writing SHOW GRANTS or DESCRIBE queries

Last-Minute Facts

1Serverless warehouse startup time: near-instant (seconds)
2Classic warehouse startup time: several minutes
3Query Profile = single query operator tree; Query History = list of all past queries
4Data Explorer can transfer ownership — click the Owner field
5DESCRIBE EXTENDED reveals the Location property to identify managed vs external
6Photon = vectorized execution engine, enabled at warehouse level, no code changes needed
Domain 320% of exam

Data Management

Must-Know Facts

  • Managed table DROP: deletes both metadata AND data files permanently. External table DROP: removes only the metadata — data files remain in cloud storage
  • Unity Catalog three-level namespace: catalog.schema.table. All SQL must use fully qualified names across catalog and schema boundaries
  • GRANT SELECT alone is insufficient — users also need GRANT USE CATALOG on the catalog and GRANT USE SCHEMA on the schema before table access works
  • Medallion Architecture: Bronze = raw data as-is (append-only, may contain duplicates), Silver = cleaned and deduplicated, Gold = business-ready aggregates. Analysts query Gold
  • Delta Lake time travel: SELECT * FROM table VERSION AS OF 3 (by version number) or TIMESTAMP AS OF '2026-01-15' (by date). View history with DESCRIBE HISTORY
  • MERGE INTO is the upsert command — update matching rows, insert non-matching rows in one atomic operation
  • system.Certified tag: ALTER TABLE t SET TAGS ('system.Certified' = 'true') marks an asset as production-ready in Unity Catalog
  • PII compliance requires three considerations together: jurisdiction where data was collected, local regulations of that jurisdiction, AND the organization's internal governance standards

Common Traps

TrapDropping a managed table thinking the data files are safe
RealityManaged table DROP is irreversible — Databricks deletes both the catalog entry AND the underlying data files. External table DROP only removes the catalog entry; your cloud storage files survive. This is the single most-tested distinction in the data management domain
TrapGranting SELECT and expecting the user to immediately query the table
RealityUnity Catalog permissions are cumulative and hierarchical. A user needs USE CATALOG on the catalog, USE SCHEMA on the schema, AND SELECT on the table. Missing any one level blocks access. The exam frequently tests this three-level requirement
TrapThinking Bronze layer data is clean and ready for some analysis
RealityBronze is raw ingestion — it preserves data exactly as received, including duplicates, nulls, malformed records, and schema inconsistencies. It is an engineering layer, not an analyst layer. Analysts work with Gold; Silver is for data engineers cleaning the raw data
TrapConfusing data enrichment with data warehousing
RealityData enrichment (augmentation) means adding new attributes or joining external data to an existing dataset to make it more analytically valuable. It is not about architectural patterns or storage optimization — it is about increasing the informational content of a dataset
TrapAssuming PII handling only requires checking your organization's policy
RealityThe exam tests that PII governance must account for the geographic jurisdiction where data was collected (which law applies at source), the local regulations of the analysis region, AND internal standards. No single factor alone is sufficient

Confusing Pairs

Managed TableExternal Table

Managed = Databricks owns both the metadata and data files; DROP removes everything permanently. External = Databricks owns only the metadata pointer; DROP removes the catalog entry and leaves your cloud storage untouched. Choose external when you need data files to survive table lifecycle changes

ViewTemporary View

A regular view persists in the Unity Catalog metastore indefinitely and is accessible to other users (with permissions). A temporary view exists only for the current session — it disappears on logout or disconnect. Use temp views for intermediate steps in a long analysis session; use regular views for shared reusable queries

MERGE INTOINSERT OVERWRITE

MERGE INTO is a conditional upsert — it updates rows that match a condition and inserts rows that do not, in one atomic statement. INSERT OVERWRITE replaces ALL data in the target table unconditionally. Use MERGE for incremental updates; use INSERT OVERWRITE for full refreshes

Scenario Tips

If the question asks about:

Question asks what happens when DROP TABLE is run on a table, and you must decide whether data is lost

Answer:

Check if the table is managed or external. Managed = data deleted. External = data preserved. If the question does not specify, look for hints: tables created with LOCATION clause are external; tables created without it are managed

Distractor to avoid:

IF EXISTS clause prevents errors but does not change deletion behavior

If the question asks about:

Question asks why a user cannot query a table despite having SELECT granted on it

Answer:

They are missing USE CATALOG on the catalog and/or USE SCHEMA on the schema. Unity Catalog requires all three levels of permission. Grant USE CATALOG, USE SCHEMA, then SELECT

Distractor to avoid:

Regranting SELECT on the table will not fix the issue — the hierarchy must be satisfied

If the question asks about:

Question asks which Medallion layer an analyst should use for daily KPI reporting

Answer:

Gold layer — it contains pre-aggregated, business-ready data. It is optimized for read performance and analytical queries

Distractor to avoid:

Silver is tempting because it is 'cleaner than Bronze,' but Silver is still an engineering layer for data validation, not analyst-facing aggregations

Last-Minute Facts

1Managed table DROP = data deleted. External table DROP = metadata only, files safe
2Unity Catalog permissions: need USE CATALOG + USE SCHEMA + SELECT — all three
3Three-level namespace: catalog.schema.table
4DESCRIBE HISTORY shows version numbers and timestamps for time travel
5Bronze = raw, Silver = cleaned, Gold = aggregated (analysts use Gold)
6MERGE INTO = conditional upsert; INSERT OVERWRITE = full replacement
Domain 414% of exam

Data Visualization and Dashboarding

Must-Know Facts

  • Dashboard components: query-based visualizations, markdown text boxes, filter widgets, and parameterized queries — know all four
  • Markdown text boxes are the correct way to add section headers and labels to dashboards — not SQL comments, not query result columns
  • Parameterized queries use {{ parameter_name }} syntax to accept dynamic user input from dashboard filter widgets
  • Dashboard refresh frequency is bounded by the data pipeline interval — a pipeline that batches every 5 minutes cannot produce dashboard updates faster than 5 minutes, regardless of the dashboard's refresh setting
  • Dashboard sharing does NOT grant table access — viewers see data based on their own Unity Catalog permissions. Sharing a dashboard to someone without SELECT on the underlying tables will show errors
  • Scheduled refresh requires an active SQL warehouse — with Classic warehouses, the startup delay can affect refresh timing. Serverless avoids this problem
  • Visualization selection rules: trends over time = line chart; category comparison = bar chart; part-of-whole with fewer than 5 categories = pie chart; two-dimensional intensity = heatmap; geographic distribution = choropleth map; single KPI = counter

Common Traps

TrapThinking you can refresh a dashboard faster than the data pipeline produces new data
RealityThe dashboard's maximum effective refresh rate is constrained by the upstream data pipeline. If the ETL loads data every 10 minutes, a 1-minute dashboard refresh just re-queries the same stale data. The bottleneck is always the pipeline, not the dashboard setting
TrapAssuming sharing a dashboard grants access to the underlying data
RealityDashboard sharing controls who can see the dashboard interface — not who can access the underlying tables. Each viewer's Unity Catalog permissions determine what data they see. A viewer without SELECT on the underlying table will receive permission errors when the dashboard loads
TrapUsing SQL comments as dashboard labels for non-technical viewers
RealitySQL comments (-- or /* */) are invisible to dashboard viewers — they exist only in the query editor. Markdown text boxes are the correct way to add visible section headers, descriptions, and labels that stakeholders see when viewing the dashboard
TrapThinking pie charts are always appropriate for showing proportions
RealityPie charts work only for part-to-whole relationships with a small number of categories (under 5). With more categories, bar charts are more readable. The exam tests chart selection appropriateness, not just mechanics

Confusing Pairs

Filter widgetParameterized query

A filter widget is a dashboard UI element (dropdown, date picker) that lets viewers filter displayed data. A parameterized query uses {{ param }} syntax so the SQL itself accepts dynamic values. These work together: the filter widget passes its value into the parameterized query. You need both for fully interactive dashboards

Dashboard refresh scheduleAlert schedule

Dashboard refresh reruns all queries on the dashboard and updates all visualizations — it is about keeping displayed data current. Alert schedule evaluates a single query against a threshold condition — it is about triggering notifications when a metric crosses a boundary. Both use scheduled execution but serve different purposes

Bar chartLine chart

Bar chart = compare discrete categories side-by-side (region A vs region B vs region C). Line chart = show continuous change over time (revenue from Jan to Dec). The exam will present a scenario — 'monthly revenue trend over 24 months' means line chart; 'revenue by product category for last quarter' means bar chart. Both can use time on the x-axis, but line implies continuity and trend; bar implies distinct comparison

Scenario Tips

If the question asks about:

Question asks which visualization is best for showing monthly revenue trend over 24 months

Answer:

Line chart — it is purpose-built for trends over time and handles many data points cleanly

Distractor to avoid:

Bar chart — workable but less ideal for continuous time series with many points; often presented as a distractor

If the question asks about:

Question asks how to add a readable section header to a dashboard that business stakeholders can see

Answer:

Add a markdown text box and use ## for headings or bold text

Distractor to avoid:

SQL comments (--) or query result columns are not visible to dashboard viewers as labels

If the question asks about:

Question asks why dashboard viewers see access errors even though the dashboard was shared with them

Answer:

Sharing a dashboard does not grant Unity Catalog table permissions. The viewers need SELECT and USE grants on the underlying tables/schemas separately

Distractor to avoid:

Thinking the dashboard owner needs to re-share or that the dashboard permissions are broken

If the question asks about:

Question asks which chart type best shows the contribution of each sales region to total company revenue

Answer:

Pie chart — this is a classic part-to-whole question. As long as there are fewer than 5 regions, a pie chart is appropriate. If there are 8+ regions, a bar chart is better because pie charts become unreadable with many slices

Distractor to avoid:

Bar chart — works technically but is not the best answer for part-to-whole when region count is small; the exam tests that you know pie chart's specific use case

Last-Minute Facts

1Parameterized queries use {{ param_name }} syntax
2Markdown text boxes = only way to add visible section labels/headers to dashboards. SQL comments (-- text) are stripped from output and invisible to viewers; query result columns show data, not labels
3Dashboard refresh is bounded by the upstream data pipeline interval
4Sharing a dashboard does not grant underlying table access
5Pie chart limit: fewer than 5 categories for part-to-whole
6Heatmap = two categorical dimensions with intensity values
Domain 515% of exam

Analytics Applications

Must-Know Facts

  • Databricks SQL is complementary to BI tools, not a replacement — the exam tests this positioning explicitly. When a scenario asks 'should the team replace Tableau with Databricks SQL?' the answer is always no. Databricks SQL is the right answer for ad-hoc exploration, prototyping, and SQL-native dashboards; Tableau/Power BI win for enterprise distribution and pixel-perfect formatting
  • Partner Connect is the exam's correct answer whenever a question asks about connecting a BI tool or ingestion tool with 'minimal configuration' or 'least effort.' It provisions the warehouse and connector in one workflow. The wrong answer is always the manual path: create warehouse → export credentials → configure connector in the third-party tool. If Partner Connect is listed as an option, pick it
  • Genie Spaces respects Unity Catalog permissions — the exam will test this with a scenario like 'a non-technical user uses Genie Spaces but still cannot see certain data.' The answer is always that Genie Spaces runs queries under the user's own identity and permissions; it does not grant additional access. It is for business users who lack SQL skills, not a way to bypass governance
  • Databricks Marketplace delivers data via Delta Sharing — the critical exam judgment is that this means NO data replication and NO ETL pipeline. Exam distractors will offer 'set up an ingestion pipeline to import the Marketplace dataset into your catalog,' which is wrong. The data is queried in-place, always current, and governed by the provider. You cannot write back to it
  • SQL alert statuses: OK (threshold not exceeded), TRIGGERED (threshold condition was met), UNKNOWN (not yet evaluated or query execution failed)
  • SQL alerts only work on queries that return a single numeric value — queries returning multiple rows or non-numeric types cannot be used for alerts
  • Date-type parameter widgets are incompatible with SQL alerts — only dropdown-type parameters work with alerts
  • Databricks SQL is not appropriate for ML model training, complex Python ETL pipelines, or deep Spark transformations — those belong in notebooks or Lakeflow Jobs

Common Traps

TrapThinking TRIGGERED means the alert is currently active or enabled
RealityTRIGGERED specifically means the alert's threshold condition was met during evaluation — the metric crossed the defined boundary. UNKNOWN means the alert was never evaluated or the query failed. OK means the query ran successfully and the threshold was not exceeded. The exam uses precise terminology
TrapThinking Genie Spaces bypasses Unity Catalog permissions for non-technical users
RealityGenie Spaces translates plain English to SQL and executes it — but it runs under the querying user's identity and permissions. A user without SELECT on a table cannot access that data through Genie Spaces any more than they could through the Query Editor
TrapManually configuring a SQL warehouse and connector when connecting a BI tool
RealityPartner Connect automates the entire process — it provisions a dedicated SQL warehouse and configures the connector in one workflow. Manual configuration is always the less preferred answer when Partner Connect is available
TrapExpecting Databricks Marketplace to replicate data into your workspace
RealityMarketplace data is accessed via Delta Sharing as a live, read-only link — the data stays in the provider's storage and is not copied into your Unity Catalog. No ETL pipeline is needed, and the data is always current
TrapAssuming any SQL query can power a SQL alert
RealitySQL alerts require queries that return exactly one numeric value. Queries returning multiple rows, string values, or complex result sets cannot be used for threshold alerting. Alerts also do not support date-type parameter widgets — only dropdowns

Confusing Pairs

Genie SpacesDatabricks Assistant

Genie Spaces is a curated, governed environment where business users ask natural language questions about specific datasets — it is configured by data analysts and used by end consumers. Databricks Assistant is the AI coding helper in notebooks and the SQL editor that helps analysts write and debug SQL/code. Different audiences, different purposes

Partner ConnectManual BI tool connection

Partner Connect = single workflow that automatically provisions a SQL warehouse and creates the connection — minimal configuration, preferred exam answer. Manual connection = create warehouse, export credentials, configure connector in third-party tool — more steps, more error-prone. Always choose Partner Connect when it is listed as an option

Databricks SQLDatabricks Notebooks

Databricks SQL = SQL-only analytics environment for analysts (dashboards, alerts, ad-hoc queries). Notebooks = multi-language environment (Python, SQL, Scala, R) for data engineers and data scientists. The exam tests that analysts use Databricks SQL for their work and escalate to notebooks only for complex transformations or ML

Scenario Tips

If the question asks about:

Question asks how to connect Tableau to Databricks with minimal manual configuration

Answer:

Use Partner Connect — it provisions the SQL warehouse and configures the connection automatically

Distractor to avoid:

Manually creating a SQL warehouse and configuring JDBC/ODBC settings — more steps and not the recommended approach

If the question asks about:

Question says an alert has UNKNOWN status — what does this mean?

Answer:

The alert has not been evaluated yet (it was just created or the schedule hasn't run), OR the underlying query failed to execute. UNKNOWN is not an error with the alert configuration — it means no evaluation result exists yet

Distractor to avoid:

TRIGGERED (which means the condition was met) or that the alert is disabled

If the question asks about:

Question asks which tool a non-technical business analyst should use to explore data by asking questions in plain English

Answer:

Genie Spaces — it translates natural language to SQL and returns results without requiring the analyst to write code

Distractor to avoid:

Data Explorer (used for metadata browsing) or Databricks Notebooks (requires Python/SQL coding)

If the question asks about:

Question asks which project is most appropriate for Databricks SQL

Answer:

Ad-hoc analysis, dashboards, scheduled reports, threshold alerts on business metrics — anything that is SQL-driven and exploratory

Distractor to avoid:

ML model training, Python-based ETL pipelines, or Spark streaming — these belong outside Databricks SQL

Last-Minute Facts

1Three and only three alert statuses: OK (threshold not exceeded), TRIGGERED (threshold condition was met — not 'alert is enabled'), UNKNOWN (never evaluated yet OR query execution failed). Exam will try to get you to say TRIGGERED means activated
2SQL alerts require a query returning a single numeric value
3Date-type parameter widgets do not work with SQL alerts — use dropdown only
4Partner Connect = auto-provision warehouse + connector in one step
5Genie Spaces respects Unity Catalog permissions — no access bypass
6Databricks Marketplace = Delta Sharing (live, read-only, no ETL) — if exam asks 'how to get Marketplace data into your workspace,' the answer is NOT 'build an ingestion pipeline'; it is already accessible via the Delta Sharing protocol

Feeling confident?

Put your knowledge to the test with a timed Data Analyst Associate mock exam.