CertPrepNow
MicrosoftDP-60080 concepts

DP-600 Cheat Sheet

Quick reference for the Microsoft Fabric Analytics Engineer Associate exam.

Microsoft Fabric Architecture

OneLake
The single unified data lake for all Microsoft Fabric workloads — stores data in Delta Parquet format and is shared by Lakehouse, Warehouse, Eventhouse, and semantic models.
Lakehouse vs. Warehouse
Lakehouse: supports structured, semi-structured, and unstructured data with Spark; SQL analytics endpoint is READ-ONLY. Warehouse: structured data only with full T-SQL read-write (INSERT, UPDATE, DELETE, stored procedures, views).
Eventhouse
Fabric item for ingesting and querying large volumes of real-time and historical data using KQL; integrates with OneLake and is the data store for Real-Time Intelligence.
Dataflows Gen2 vs. Spark Notebooks
Dataflows Gen2: low-code Power Query interface with 100+ connectors, best for simple/moderate transforms. Notebooks: code-first PySpark/Spark SQL for complex ETL, large datasets, and cost-efficient production pipelines.
Data Pipelines
Orchestration engine using activities (Copy, Dataflow, Notebook, stored procedure) to schedule and chain data workflows; the orchestration layer above individual transformation tools.
Query Language by Data Store
Warehouse/Lakehouse SQL endpoint: T-SQL. Eventhouse: KQL. Semantic models: DAX. Never mix — KQL does not work on Warehouse and T-SQL does not work on Eventhouse.

Data Ingestion and Connections

Copy Activity (Data Pipeline)
Moves data between sources and Fabric destinations without transformation; fastest ingestion tool for bulk loads but does not apply business logic.
Dataflows Gen2 output destinations
Dataflows Gen2 can write to Fabric Lakehouse, Warehouse, Azure SQL, or other destinations — they are not self-contained; always require an output destination to be useful.
OneLake Catalog
Data discovery tool for browsing and finding data assets (lakehouses, warehouses, semantic models) across the Fabric tenant.
Real-Time Hub
Centralized catalog for discovering and subscribing to real-time data streams from internal and external sources within the Fabric tenant.
When to use Notebooks over Dataflows Gen2
Choose Notebooks for complex JSON schema inference, large datasets (500 GB+), incremental loading logic, or when compute cost is a concern — Dataflows Gen2 cost more CUs per operation.
Lakehouse Files vs. Lakehouse Tables
Files section: raw unstructured or semi-structured data not yet in Delta format. Tables section: Delta tables queryable via SQL analytics endpoint and Spark; only Tables are accessible via the SQL analytics endpoint.
Eventstream
Fabric item for capturing, transforming, and routing real-time events from sources (Event Hubs, IoT Hub, custom) into destinations such as Eventhouse, Lakehouse, or another Eventstream — the primary real-time ingestion path in Fabric.

T-SQL for Fabric Warehouse

CREATE VIEW v_SalesSummary AS SELECT ...
Creates a reusable view in the Warehouse — the Lakehouse SQL analytics endpoint is entirely READ-ONLY and does not support any DDL (CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION).
CREATE PROCEDURE usp_MonthlySales AS BEGIN SELECT ... END
Creates a stored procedure in the Warehouse — stored procedures cannot be created in the Lakehouse SQL analytics endpoint.
SELECT c.CustomerName, SUM(f.SalesAmount) AS Total FROM FactSales f JOIN DimCustomer c ON f.CustomerKey = c.CustomerKey GROUP BY c.CustomerName
Standard star schema query joining fact and dimension tables in the Warehouse using GROUP BY aggregation.
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY OrderDate DESC)
Window function for ranking rows within partitions — use for deduplication, SCD Type 2 row identification, and top-N per group patterns.
WITH CTE AS (SELECT ...) SELECT * FROM CTE WHERE ...
Common Table Expression (CTE) for building readable multi-step queries and recursive patterns in the Warehouse.
MERGE target USING source ON condition WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ...
MERGE statement for implementing SCD Type 1 upserts — updates existing rows and inserts new ones in a single atomic T-SQL statement.

KQL for Real-Time Intelligence (Eventhouse)

TableName | where Timestamp > ago(1h)
Filters rows from the last 1 hour — KQL uses ago() for relative time filtering instead of SQL's DATEADD or GETDATE().
TableName | project ColumnA, ColumnB
Selects specific columns (equivalent to SQL SELECT col1, col2) — project is the column selection operator in KQL.
TableName | summarize count() by Category
Groups and aggregates rows (equivalent to SQL GROUP BY) — summarize is KQL's aggregation operator.
TableName | where ResponseTime > 500 | summarize avg(ResponseTime) by ServiceName
Chains operators with the pipe (|) — KQL uses pipes to compose query steps, unlike SQL's nested subqueries.
TableA | join kind=leftouter (TableB) on $left.Id == $right.Id
Left outer join in KQL using join operator — specify join kind (inner, leftouter, rightouter, fullouter) explicitly.
TableName | top 100 by Timestamp desc
Returns the top N rows ordered by a column — equivalent to SQL SELECT TOP 100 ... ORDER BY ... DESC.
TableName | extend DurationSec = ResponseTime / 1000.0 | project Timestamp, ServiceName, DurationSec
extend adds computed columns to the result set; chain with project to select only needed columns — equivalent to SQL SELECT with computed expressions.

DAX for Semantic Models

VAR PriorYear = CALCULATE([Sales], SAMEPERIODLASTYEAR(DimDate[Date])) RETURN DIVIDE([Sales] - PriorYear, PriorYear)
Uses VAR/RETURN to store intermediate calculations — improves readability and performance by evaluating the sub-expression only once.
CALCULATE([Sales], DimRegion[Region] = "West")
CALCULATE modifies the filter context of a measure — the most important DAX function; always understand that it changes which rows are evaluated.
SUMX(FactSales, FactSales[Quantity] * FactSales[Price])
Iterator function that evaluates an expression row-by-row over a table — use SUMX, AVERAGEX, MAXX, MINX when the calculation must happen per row before aggregating.
CALCULATE([Sales], ALL(DimDate))
ALL() removes all filters from the specified table/column inside CALCULATE — used to compute totals ignoring current filter context (e.g., % of total measures).
CALCULATE([Sales], OFFSET(-1, ALLSELECTED('DimDate'), ORDERBY('DimDate'[Date])))
DAX windowing OFFSET(-1) references the previous row's context — used for period-over-period comparisons; must be wrapped in CALCULATE and requires a relation and ORDERBY clause.
ISBLANK([Sales])
Information function that returns TRUE if the measure or column value is blank — use with IF() for conditional logic to handle missing data gracefully.
EVALUATE SUMMARIZE('FactSales', 'DimCustomer'[CustomerName], "Total", [Sales])
DAX query using EVALUATE and SUMMARIZE to query a semantic model directly — used in DAX query view, XMLA tools, and automated testing.

Star Schema and Data Modeling

Fact Table vs. Dimension Table
Fact table: contains measurable events (sales amounts, quantities) with foreign keys to dimensions. Dimension table: contains descriptive attributes (customer name, product category) with a primary key.
SCD Type 1 — Overwrite
Overwrites the old attribute value with the new one; no history is preserved — use when historical accuracy is not required (e.g., correcting a typo in a name).
SCD Type 2 — Add Row with Effective Dates
Preserves history by adding a new row with StartDate/EndDate and an IsCurrent flag; old record is closed with an EndDate — use when tracking historical changes (e.g., customer address changes).
Bridge Table (for many-to-many relationships)
Resolves many-to-many relationships by introducing an intermediate table with foreign keys to both sides — the correct pattern for many-to-many in star schemas instead of a direct relationship.
Denormalization
Flattening normalized tables into wider dimension tables for query performance — preferred over normalized third-normal-form schemas in analytics because joins are expensive at query time.
Star Schema in Lakehouse vs. Semantic Model
Best practice: implement the physical star schema as Delta tables in the Lakehouse/Warehouse first, then reflect that structure in the semantic model through Direct Lake relationships.

Semantic Model Storage Modes

Import Mode
Copies data into the Power BI VertiPaq engine at scheduled refresh time — best for small-to-medium datasets; requires refresh to get new data; does not need Fabric capacity license.
DirectQuery Mode
Translates DAX queries to source SQL at report interaction time — always live data but slower due to query translation and network round trips; no data is stored in the model.
Direct Lake on OneLake
Reads Delta tables from OneLake directly into VertiPaq memory; can reference tables from multiple Fabric items (e.g., Lakehouse + Warehouse); supports composite models with Import tables; does NOT fall back to DirectQuery — guardrail violations cause refresh failure.
Direct Lake on SQL Endpoints
Reads Delta tables via the SQL analytics endpoint of a single Fabric data source; falls back to DirectQuery when data cannot be framed directly (SQL views, SQL-based RLS, guardrail breaches); does NOT support composite models with DirectQuery or Dual tables.
Direct Lake Framing
The refresh operation for Direct Lake — copies only Delta table metadata (file references), not data itself; completes in seconds compared to minutes/hours for Import refresh.
Incremental Refresh (Import Mode Only)
Refreshes only new or changed date-range partitions instead of the full dataset; requires RangeStart and RangeEnd DateTime parameters in Power Query and a DateTime column in the table — not supported for Direct Lake-only models.
Large Semantic Model Storage Format
Configuration that removes the default 10 GB per-dataset size limit, allowing enterprise-scale Import models — enable this in workspace settings before the model exceeds 10 GB.

Security and Access Controls

Workspace Roles: Admin > Member > Contributor > Viewer
Admin: full control. Member: publish, manage permissions. Contributor: create and edit content. Viewer: read-only access to content. Viewer cannot build reports on shared semantic models — Build permission required.
Row-Level Security (RLS)
Filters data rows based on DAX expressions assigned to roles — users see the same table and columns but only their permitted rows; configured in Power BI Desktop, managed in the service.
USERPRINCIPALNAME() in RLS
DAX function returning the current user's email — used in RLS DAX filter expressions to dynamically filter rows to the logged-in user's data (e.g., [SalesRep] = USERPRINCIPALNAME()).
Object-Level Security (OLS)
Hides entire tables or columns from the semantic model field list — objects are completely invisible to unauthorized users; requires Tabular Editor or another XMLA tool (cannot be configured in Power BI Desktop).
Column-Level Security (CLS)
Restricts access to specific columns at the data layer (Warehouse or Lakehouse SQL endpoint level) — different from OLS which operates at the semantic model layer.
Sensitivity Labels (Microsoft Purview)
Labels applied to Fabric items (Confidential, Internal, Public) that propagate downstream — if a Lakehouse is labeled Confidential, reports derived from it inherit the Confidential label.
Endorsement: Promoted vs. Certified
Promoted: any workspace member can endorse. Certified: requires specific permissions granted by a tenant admin — Certified signals organizational trust and quality standards.
Item-Level Permissions (override workspace roles)
Granular permissions set directly on a specific Lakehouse, Warehouse, or semantic model — override workspace role defaults; use to grant a user read access to one item without elevating them across the whole workspace.

Lifecycle Management and Governance

Git Integration (version control)
Connects a Fabric workspace to Azure DevOps or GitHub for commits, branches, and pull request workflows — enables team collaboration and source control for Fabric items.
Deployment Pipelines (environment promotion)
Promotes Fabric content through Development, Test, and Production stages with comparison and selective deployment — complementary to Git integration, not a replacement.
Git integration vs. Deployment Pipelines
Git integration = version control (what changed, by whom, and when). Deployment Pipelines = environment promotion (Dev to Test to Prod). Use both together: Git for source control, pipelines for staging.
PBIX vs. PBIP (Power BI Project)
PBIX: binary single-file format, not Git-friendly. PBIP: folder-based format storing components as human-readable JSON files — designed for Git version control and team collaboration.
PBIT (Power BI Template)
Contains report layout and data model structure but no data — used as a reusable starting point; PBIT differs from PBIX (has data) and PBIP (source control format).
XMLA Endpoint
Allows external tools (Tabular Editor, SSMS, ALM Toolkit) to connect and manage Fabric semantic models programmatically — required for OLS configuration, scripting model changes, and advanced ALM operations.
Impact Analysis
Tool for understanding downstream dependencies before making changes to a Lakehouse, Warehouse, Dataflow, or semantic model — shows which reports and models will be affected.

Advanced Semantic Model Features

Calculation Groups
Reusable DAX calculation patterns (YTD, QTD, prior year) applied across multiple measures via calculation items — affect ALL measures by default; explicitly exclude measures that should not be modified.
Dynamic Format Strings
DAX expressions that dynamically change how a measure is formatted based on filter context — allows a single measure to display as currency, percentage, or number depending on context.
Field Parameters
Allow report users to dynamically switch which fields or measures appear in visuals — enable flexible report interactivity (dynamic axis and measure switching) without creating multiple reports.
Composite Models (Direct Lake on OneLake)
A single semantic model mixing Direct Lake on OneLake tables with Import tables from any data source — the Import portion requires scheduled refresh while Direct Lake tables auto-sync via framing.
Direct Lake Automatic Updates
When enabled, the semantic model automatically reframes (updates metadata) when the underlying Delta tables change — near-real-time data availability without manual refresh scheduling.
Fixed Identity (Direct Lake + RLS)
A fixed identity cloud connection is strongly recommended when using semantic model RLS with Direct Lake — ensures consistent data access control regardless of the end user's identity.

Performance Optimization

V-Order optimization for Direct Lake
Write-time optimization applied to Delta Parquet files that improves Direct Lake columnar read speed — enabled by default for Fabric Lakehouse writes; can also be applied retroactively via OPTIMIZE ... VORDER on existing tables.
OPTIMIZE <lakehouse_table_name> VORDER
Spark SQL command that compacts small Parquet files and applies V-Order write optimization — run periodically on Lakehouse Delta tables to prevent Direct Lake guardrail violations and improve column read speed.
DAX variable best practice
Always use VAR to cache repeated sub-expressions; a VAR is evaluated once in the filter context where it is defined, preventing redundant recalculations inside iterators.
Avoid CALCULATE inside iterators
Nesting CALCULATE within SUMX or other iterators causes context transition for every row — refactor to calculate the value outside the iterator using a variable when possible.
Pre-aggregation tables
Pre-computing summary tables (e.g., daily totals from transaction-level data) in the Lakehouse or Warehouse and referencing them in the semantic model reduces DAX query time on large datasets.
Direct Lake guardrails (SKU-dependent)
F64+ allows 5,000 Parquet files per table and 1.5B rows per table; smaller SKUs (F2-F32) are limited to 1,000 files and 300M rows — exceed these and Direct Lake on OneLake refresh fails.

Key Exam Distinctions

Lakehouse SQL endpoint is read-only
The SQL analytics endpoint of a Lakehouse supports only SELECT — any question requiring DML (INSERT, UPDATE, DELETE) or DDL (CREATE PROCEDURE, CREATE VIEW) must use the Warehouse.
OLS requires external tools, not Power BI Desktop
Object-Level Security can only be configured via Tabular Editor (or another XMLA-compatible tool) — you can configure RLS natively in Power BI Desktop, but not OLS.
Direct Lake on OneLake: no fallback — Direct Lake on SQL: falls back
Direct Lake on OneLake does NOT fall back to DirectQuery when guardrails are exceeded — refresh fails and queries fail. Direct Lake on SQL endpoints DOES fall back to DirectQuery for SQL views, RLS enforcement, or guardrail breaches.
Deployment Pipelines vs. Git Integration
Deployment Pipelines handle environment promotion (Dev to Prod). Git Integration handles version control (branches, pull requests). They are complementary — not alternatives.
SCD Type 1 loses history; SCD Type 2 preserves it
SCD Type 1 overwrites the old value (no history). SCD Type 2 adds a new row with effective dates and preserves the old record — if the question mentions 'preserve history', the answer is Type 2.
Incremental refresh requires RangeStart and RangeEnd
These two DateTime Power Query parameters are reserved prerequisites for incremental refresh on Import models — without them, incremental refresh cannot be configured; not applicable to Direct Lake-only models.
Calculation groups affect ALL measures by default
When a calculation group is created, its items modify every measure in the model — explicitly exclude measures (using ISSELECTEDMEASURE or similar) that should not be affected.

Ready to test yourself?

Start a timed DP-600 mock exam or review practice questions by domain.