Quick Navigation
Microsoft Fabric ArchitectureData Ingestion and ConnectionsT-SQL for Fabric WarehouseKQL for Real-Time Intelligence (Eventhouse)DAX for Semantic ModelsStar Schema and Data ModelingSemantic Model Storage ModesSecurity and Access ControlsLifecycle Management and GovernanceAdvanced Semantic Model FeaturesPerformance OptimizationKey Exam Distinctions
Microsoft Fabric Architecture
- OneLake
- The single unified data lake for all Microsoft Fabric workloads — stores data in Delta Parquet format and is shared by Lakehouse, Warehouse, Eventhouse, and semantic models.
- Lakehouse vs. Warehouse
- Lakehouse: supports structured, semi-structured, and unstructured data with Spark; SQL analytics endpoint is READ-ONLY. Warehouse: structured data only with full T-SQL read-write (INSERT, UPDATE, DELETE, stored procedures, views).
- Eventhouse
- Fabric item for ingesting and querying large volumes of real-time and historical data using KQL; integrates with OneLake and is the data store for Real-Time Intelligence.
- Dataflows Gen2 vs. Spark Notebooks
- Dataflows Gen2: low-code Power Query interface with 100+ connectors, best for simple/moderate transforms. Notebooks: code-first PySpark/Spark SQL for complex ETL, large datasets, and cost-efficient production pipelines.
- Data Pipelines
- Orchestration engine using activities (Copy, Dataflow, Notebook, stored procedure) to schedule and chain data workflows; the orchestration layer above individual transformation tools.
- Query Language by Data Store
- Warehouse/Lakehouse SQL endpoint: T-SQL. Eventhouse: KQL. Semantic models: DAX. Never mix — KQL does not work on Warehouse and T-SQL does not work on Eventhouse.
Data Ingestion and Connections
- Copy Activity (Data Pipeline)
- Moves data between sources and Fabric destinations without transformation; fastest ingestion tool for bulk loads but does not apply business logic.
- Dataflows Gen2 output destinations
- Dataflows Gen2 can write to Fabric Lakehouse, Warehouse, Azure SQL, or other destinations — they are not self-contained; always require an output destination to be useful.
- OneLake Catalog
- Data discovery tool for browsing and finding data assets (lakehouses, warehouses, semantic models) across the Fabric tenant.
- Real-Time Hub
- Centralized catalog for discovering and subscribing to real-time data streams from internal and external sources within the Fabric tenant.
- When to use Notebooks over Dataflows Gen2
- Choose Notebooks for complex JSON schema inference, large datasets (500 GB+), incremental loading logic, or when compute cost is a concern — Dataflows Gen2 cost more CUs per operation.
- Lakehouse Files vs. Lakehouse Tables
- Files section: raw unstructured or semi-structured data not yet in Delta format. Tables section: Delta tables queryable via SQL analytics endpoint and Spark; only Tables are accessible via the SQL analytics endpoint.
- Eventstream
- Fabric item for capturing, transforming, and routing real-time events from sources (Event Hubs, IoT Hub, custom) into destinations such as Eventhouse, Lakehouse, or another Eventstream — the primary real-time ingestion path in Fabric.
T-SQL for Fabric Warehouse
- CREATE VIEW v_SalesSummary AS SELECT ...
- Creates a reusable view in the Warehouse — the Lakehouse SQL analytics endpoint is entirely READ-ONLY and does not support any DDL (CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION).
- CREATE PROCEDURE usp_MonthlySales AS BEGIN SELECT ... END
- Creates a stored procedure in the Warehouse — stored procedures cannot be created in the Lakehouse SQL analytics endpoint.
- SELECT c.CustomerName, SUM(f.SalesAmount) AS Total FROM FactSales f JOIN DimCustomer c ON f.CustomerKey = c.CustomerKey GROUP BY c.CustomerName
- Standard star schema query joining fact and dimension tables in the Warehouse using GROUP BY aggregation.
- ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY OrderDate DESC)
- Window function for ranking rows within partitions — use for deduplication, SCD Type 2 row identification, and top-N per group patterns.
- WITH CTE AS (SELECT ...) SELECT * FROM CTE WHERE ...
- Common Table Expression (CTE) for building readable multi-step queries and recursive patterns in the Warehouse.
- MERGE target USING source ON condition WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ...
- MERGE statement for implementing SCD Type 1 upserts — updates existing rows and inserts new ones in a single atomic T-SQL statement.
KQL for Real-Time Intelligence (Eventhouse)
- TableName | where Timestamp > ago(1h)
- Filters rows from the last 1 hour — KQL uses ago() for relative time filtering instead of SQL's DATEADD or GETDATE().
- TableName | project ColumnA, ColumnB
- Selects specific columns (equivalent to SQL SELECT col1, col2) — project is the column selection operator in KQL.
- TableName | summarize count() by Category
- Groups and aggregates rows (equivalent to SQL GROUP BY) — summarize is KQL's aggregation operator.
- TableName | where ResponseTime > 500 | summarize avg(ResponseTime) by ServiceName
- Chains operators with the pipe (|) — KQL uses pipes to compose query steps, unlike SQL's nested subqueries.
- TableA | join kind=leftouter (TableB) on $left.Id == $right.Id
- Left outer join in KQL using join operator — specify join kind (inner, leftouter, rightouter, fullouter) explicitly.
- TableName | top 100 by Timestamp desc
- Returns the top N rows ordered by a column — equivalent to SQL SELECT TOP 100 ... ORDER BY ... DESC.
- TableName | extend DurationSec = ResponseTime / 1000.0 | project Timestamp, ServiceName, DurationSec
- extend adds computed columns to the result set; chain with project to select only needed columns — equivalent to SQL SELECT with computed expressions.
DAX for Semantic Models
- VAR PriorYear = CALCULATE([Sales], SAMEPERIODLASTYEAR(DimDate[Date])) RETURN DIVIDE([Sales] - PriorYear, PriorYear)
- Uses VAR/RETURN to store intermediate calculations — improves readability and performance by evaluating the sub-expression only once.
- CALCULATE([Sales], DimRegion[Region] = "West")
- CALCULATE modifies the filter context of a measure — the most important DAX function; always understand that it changes which rows are evaluated.
- SUMX(FactSales, FactSales[Quantity] * FactSales[Price])
- Iterator function that evaluates an expression row-by-row over a table — use SUMX, AVERAGEX, MAXX, MINX when the calculation must happen per row before aggregating.
- CALCULATE([Sales], ALL(DimDate))
- ALL() removes all filters from the specified table/column inside CALCULATE — used to compute totals ignoring current filter context (e.g., % of total measures).
- CALCULATE([Sales], OFFSET(-1, ALLSELECTED('DimDate'), ORDERBY('DimDate'[Date])))
- DAX windowing OFFSET(-1) references the previous row's context — used for period-over-period comparisons; must be wrapped in CALCULATE and requires a relation and ORDERBY clause.
- ISBLANK([Sales])
- Information function that returns TRUE if the measure or column value is blank — use with IF() for conditional logic to handle missing data gracefully.
- EVALUATE SUMMARIZE('FactSales', 'DimCustomer'[CustomerName], "Total", [Sales])
- DAX query using EVALUATE and SUMMARIZE to query a semantic model directly — used in DAX query view, XMLA tools, and automated testing.
Star Schema and Data Modeling
- Fact Table vs. Dimension Table
- Fact table: contains measurable events (sales amounts, quantities) with foreign keys to dimensions. Dimension table: contains descriptive attributes (customer name, product category) with a primary key.
- SCD Type 1 — Overwrite
- Overwrites the old attribute value with the new one; no history is preserved — use when historical accuracy is not required (e.g., correcting a typo in a name).
- SCD Type 2 — Add Row with Effective Dates
- Preserves history by adding a new row with StartDate/EndDate and an IsCurrent flag; old record is closed with an EndDate — use when tracking historical changes (e.g., customer address changes).
- Bridge Table (for many-to-many relationships)
- Resolves many-to-many relationships by introducing an intermediate table with foreign keys to both sides — the correct pattern for many-to-many in star schemas instead of a direct relationship.
- Denormalization
- Flattening normalized tables into wider dimension tables for query performance — preferred over normalized third-normal-form schemas in analytics because joins are expensive at query time.
- Star Schema in Lakehouse vs. Semantic Model
- Best practice: implement the physical star schema as Delta tables in the Lakehouse/Warehouse first, then reflect that structure in the semantic model through Direct Lake relationships.
Semantic Model Storage Modes
- Import Mode
- Copies data into the Power BI VertiPaq engine at scheduled refresh time — best for small-to-medium datasets; requires refresh to get new data; does not need Fabric capacity license.
- DirectQuery Mode
- Translates DAX queries to source SQL at report interaction time — always live data but slower due to query translation and network round trips; no data is stored in the model.
- Direct Lake on OneLake
- Reads Delta tables from OneLake directly into VertiPaq memory; can reference tables from multiple Fabric items (e.g., Lakehouse + Warehouse); supports composite models with Import tables; does NOT fall back to DirectQuery — guardrail violations cause refresh failure.
- Direct Lake on SQL Endpoints
- Reads Delta tables via the SQL analytics endpoint of a single Fabric data source; falls back to DirectQuery when data cannot be framed directly (SQL views, SQL-based RLS, guardrail breaches); does NOT support composite models with DirectQuery or Dual tables.
- Direct Lake Framing
- The refresh operation for Direct Lake — copies only Delta table metadata (file references), not data itself; completes in seconds compared to minutes/hours for Import refresh.
- Incremental Refresh (Import Mode Only)
- Refreshes only new or changed date-range partitions instead of the full dataset; requires RangeStart and RangeEnd DateTime parameters in Power Query and a DateTime column in the table — not supported for Direct Lake-only models.
- Large Semantic Model Storage Format
- Configuration that removes the default 10 GB per-dataset size limit, allowing enterprise-scale Import models — enable this in workspace settings before the model exceeds 10 GB.
Security and Access Controls
- Workspace Roles: Admin > Member > Contributor > Viewer
- Admin: full control. Member: publish, manage permissions. Contributor: create and edit content. Viewer: read-only access to content. Viewer cannot build reports on shared semantic models — Build permission required.
- Row-Level Security (RLS)
- Filters data rows based on DAX expressions assigned to roles — users see the same table and columns but only their permitted rows; configured in Power BI Desktop, managed in the service.
- USERPRINCIPALNAME() in RLS
- DAX function returning the current user's email — used in RLS DAX filter expressions to dynamically filter rows to the logged-in user's data (e.g., [SalesRep] = USERPRINCIPALNAME()).
- Object-Level Security (OLS)
- Hides entire tables or columns from the semantic model field list — objects are completely invisible to unauthorized users; requires Tabular Editor or another XMLA tool (cannot be configured in Power BI Desktop).
- Column-Level Security (CLS)
- Restricts access to specific columns at the data layer (Warehouse or Lakehouse SQL endpoint level) — different from OLS which operates at the semantic model layer.
- Sensitivity Labels (Microsoft Purview)
- Labels applied to Fabric items (Confidential, Internal, Public) that propagate downstream — if a Lakehouse is labeled Confidential, reports derived from it inherit the Confidential label.
- Endorsement: Promoted vs. Certified
- Promoted: any workspace member can endorse. Certified: requires specific permissions granted by a tenant admin — Certified signals organizational trust and quality standards.
- Item-Level Permissions (override workspace roles)
- Granular permissions set directly on a specific Lakehouse, Warehouse, or semantic model — override workspace role defaults; use to grant a user read access to one item without elevating them across the whole workspace.
Lifecycle Management and Governance
- Git Integration (version control)
- Connects a Fabric workspace to Azure DevOps or GitHub for commits, branches, and pull request workflows — enables team collaboration and source control for Fabric items.
- Deployment Pipelines (environment promotion)
- Promotes Fabric content through Development, Test, and Production stages with comparison and selective deployment — complementary to Git integration, not a replacement.
- Git integration vs. Deployment Pipelines
- Git integration = version control (what changed, by whom, and when). Deployment Pipelines = environment promotion (Dev to Test to Prod). Use both together: Git for source control, pipelines for staging.
- PBIX vs. PBIP (Power BI Project)
- PBIX: binary single-file format, not Git-friendly. PBIP: folder-based format storing components as human-readable JSON files — designed for Git version control and team collaboration.
- PBIT (Power BI Template)
- Contains report layout and data model structure but no data — used as a reusable starting point; PBIT differs from PBIX (has data) and PBIP (source control format).
- XMLA Endpoint
- Allows external tools (Tabular Editor, SSMS, ALM Toolkit) to connect and manage Fabric semantic models programmatically — required for OLS configuration, scripting model changes, and advanced ALM operations.
- Impact Analysis
- Tool for understanding downstream dependencies before making changes to a Lakehouse, Warehouse, Dataflow, or semantic model — shows which reports and models will be affected.
Advanced Semantic Model Features
- Calculation Groups
- Reusable DAX calculation patterns (YTD, QTD, prior year) applied across multiple measures via calculation items — affect ALL measures by default; explicitly exclude measures that should not be modified.
- Dynamic Format Strings
- DAX expressions that dynamically change how a measure is formatted based on filter context — allows a single measure to display as currency, percentage, or number depending on context.
- Field Parameters
- Allow report users to dynamically switch which fields or measures appear in visuals — enable flexible report interactivity (dynamic axis and measure switching) without creating multiple reports.
- Composite Models (Direct Lake on OneLake)
- A single semantic model mixing Direct Lake on OneLake tables with Import tables from any data source — the Import portion requires scheduled refresh while Direct Lake tables auto-sync via framing.
- Direct Lake Automatic Updates
- When enabled, the semantic model automatically reframes (updates metadata) when the underlying Delta tables change — near-real-time data availability without manual refresh scheduling.
- Fixed Identity (Direct Lake + RLS)
- A fixed identity cloud connection is strongly recommended when using semantic model RLS with Direct Lake — ensures consistent data access control regardless of the end user's identity.
Performance Optimization
- V-Order optimization for Direct Lake
- Write-time optimization applied to Delta Parquet files that improves Direct Lake columnar read speed — enabled by default for Fabric Lakehouse writes; can also be applied retroactively via OPTIMIZE ... VORDER on existing tables.
- OPTIMIZE <lakehouse_table_name> VORDER
- Spark SQL command that compacts small Parquet files and applies V-Order write optimization — run periodically on Lakehouse Delta tables to prevent Direct Lake guardrail violations and improve column read speed.
- DAX variable best practice
- Always use VAR to cache repeated sub-expressions; a VAR is evaluated once in the filter context where it is defined, preventing redundant recalculations inside iterators.
- Avoid CALCULATE inside iterators
- Nesting CALCULATE within SUMX or other iterators causes context transition for every row — refactor to calculate the value outside the iterator using a variable when possible.
- Pre-aggregation tables
- Pre-computing summary tables (e.g., daily totals from transaction-level data) in the Lakehouse or Warehouse and referencing them in the semantic model reduces DAX query time on large datasets.
- Direct Lake guardrails (SKU-dependent)
- F64+ allows 5,000 Parquet files per table and 1.5B rows per table; smaller SKUs (F2-F32) are limited to 1,000 files and 300M rows — exceed these and Direct Lake on OneLake refresh fails.
Key Exam Distinctions
- Lakehouse SQL endpoint is read-only
- The SQL analytics endpoint of a Lakehouse supports only SELECT — any question requiring DML (INSERT, UPDATE, DELETE) or DDL (CREATE PROCEDURE, CREATE VIEW) must use the Warehouse.
- OLS requires external tools, not Power BI Desktop
- Object-Level Security can only be configured via Tabular Editor (or another XMLA-compatible tool) — you can configure RLS natively in Power BI Desktop, but not OLS.
- Direct Lake on OneLake: no fallback — Direct Lake on SQL: falls back
- Direct Lake on OneLake does NOT fall back to DirectQuery when guardrails are exceeded — refresh fails and queries fail. Direct Lake on SQL endpoints DOES fall back to DirectQuery for SQL views, RLS enforcement, or guardrail breaches.
- Deployment Pipelines vs. Git Integration
- Deployment Pipelines handle environment promotion (Dev to Prod). Git Integration handles version control (branches, pull requests). They are complementary — not alternatives.
- SCD Type 1 loses history; SCD Type 2 preserves it
- SCD Type 1 overwrites the old value (no history). SCD Type 2 adds a new row with effective dates and preserves the old record — if the question mentions 'preserve history', the answer is Type 2.
- Incremental refresh requires RangeStart and RangeEnd
- These two DateTime Power Query parameters are reserved prerequisites for incremental refresh on Import models — without them, incremental refresh cannot be configured; not applicable to Direct Lake-only models.
- Calculation groups affect ALL measures by default
- When a calculation group is created, its items modify every measure in the model — explicitly exclude measures (using ISSELECTEDMEASURE or similar) that should not be affected.