How long should I study for the DP-800 exam?

It depends heavily on your SQL experience. Senior SQL developers who are comfortable with T-SQL, database design, and Azure SQL can prepare in 4-6 weeks by focusing on the newer AI capabilities (vector search, embeddings, RAG) and modern CI/CD tooling (SQL Database Projects, Data API builder). Developers with moderate SQL experience should plan 8 weeks. If you are new to SQL development, budget 10-12 weeks as you need to build foundational T-SQL skills before tackling the AI integration topics.

What prerequisites do I need for the DP-800 exam?

There are no mandatory prerequisites, but Microsoft strongly recommends hands-on experience writing T-SQL and developing databases on SQL Server, Azure SQL, or SQL databases in Microsoft Fabric. You should also be familiar with CI/CD practices in GitHub, AI-assisted development tools like GitHub Copilot, and AI concepts such as embeddings, vectors, and models. In practice, most successful candidates have at least 1-2 years of SQL development experience.

How does DP-800 differ from DP-300 (Azure Database Administrator)?

DP-300 focuses on administering and managing Azure SQL databases — backup, restore, high availability, disaster recovery, and monitoring from a DBA perspective. DP-800 focuses on developing AI-enabled database solutions — writing advanced T-SQL, integrating vector search and RAG, implementing CI/CD with SQL Database Projects, and exposing data through APIs with Data API builder. DP-300 is for DBAs who keep databases running; DP-800 is for developers who build the database layer for AI-powered applications.

How does DP-800 differ from DP-600 (Fabric Analytics Engineer)?

DP-600 focuses on analytics within Microsoft Fabric — semantic models, data transformation, Power BI integration, and analytics governance. DP-800 covers AI-enabled database development across SQL Server, Azure SQL, AND Fabric. DP-800 emphasizes T-SQL development, vector search, embeddings, RAG, CI/CD, and API generation. The overlap is minimal: DP-600 is analytics-focused, DP-800 is developer-focused with AI integration.

Is the DP-800 exam difficult?

DP-800 is considered one of the harder associate-level Microsoft exams because it tests both deep T-SQL expertise and cutting-edge AI integration concepts. The exam includes case study questions that take 4-6 minutes each and may include interactive simulation items. The AI capabilities domain (26%) covers very new features like vector search, DiskANN, and RAG in SQL that many candidates have not used in production. Candidates with strong SQL backgrounds but no AI experience should dedicate extra time to Domain 3.

Can I pass the DP-800 exam without Azure experience?

It would be very difficult. While the exam covers SQL Server and Fabric in addition to Azure SQL, many topics require Azure knowledge: Azure Monitor and Application Insights, Managed Identity for securing endpoints, Azure Functions with SQL trigger bindings, Data API builder deployment, and sp_invoke_external_rest_endpoint for calling Azure OpenAI. You need at least a working understanding of Azure SQL, Azure AD authentication, and Azure OpenAI integration.

What score do I need to pass the DP-800 exam?

You need a scaled score of 700 out of 1000 to pass. The scaling means the exact number of correct answers varies, but aim for approximately 75% or higher to be safe. There is no penalty for wrong answers, so always answer every question even if you are guessing. Case study questions may be weighted differently from standard multiple-choice questions.

Do I need to know SQL Server 2025 features specifically?

Yes. Several key exam topics are new to SQL Server 2025: the native VECTOR data type, VECTOR_DISTANCE and VECTOR_SEARCH functions, DiskANN vector indexes, regular expression functions (REGEXP_LIKE, etc.), fuzzy string matching functions (EDIT_DISTANCE, JARO_WINKLER_DISTANCE), and enhanced JSON functions. These are not available in earlier SQL Server versions and represent a significant portion of the exam content.

Is hands-on practice required, or can I pass with theory alone?

Hands-on practice is strongly recommended. The exam includes scenario-based questions that require you to analyze T-SQL code, query execution plans, and configuration files. Candidates who have actually written vector search queries, configured Data API builder, and built SQL Database Projects will find these questions much more approachable than those who only studied documentation. At minimum, set up a SQL Server 2025 instance and practice the vector functions and RAG workflow.

What types of questions are on the DP-800 exam?

The exam includes multiple-choice (single answer), multiple-select (choose all that apply), case studies with multiple related questions based on a scenario, drag-and-drop ordering questions, and potentially interactive simulation items where you work in a real database environment. Case studies provide a detailed scenario and ask 4-6 questions about it. Budget about 2.5 minutes per standard question but more for case studies.

Is the DP-800 certification worth it for my career?

DP-800 positions you at the intersection of SQL development and AI — a rapidly growing field with relatively few certified professionals as of mid-2026. If you work with SQL Server, Azure SQL, or Fabric and want to integrate AI capabilities like semantic search, embeddings, and RAG into database solutions, this certification validates a highly marketable skill set. The supply of certified SQL AI Developers is still very low, which creates a salary premium for credential holders.

How often do I need to renew the DP-800 certification?

Microsoft associate-level certifications expire annually. You can renew for free by passing an online renewal assessment on Microsoft Learn before the expiration date. The renewal assessment covers updated exam content and can be taken without scheduling a proctored exam. Microsoft sends email reminders as your renewal date approaches.

Microsoft Certified: SQL AI Developer Associate (DP-800) Free Study Guide 2026

You Can Pass This Exam For Free

The DP-800 exam is passable with free resources alone if you study consistently for 6-10 weeks:

Microsoft Learn DP-800 official study guide and learning paths (free)
SQL Server 2025 documentation on vector search, embeddings, and AI functions (free)
Data API builder (DAB) open-source documentation and quickstarts (free)
SQL Database Projects documentation and CI/CD tutorials (free)
Azure SQL and Microsoft Fabric documentation (free)
500+ free practice questions on this site

DP-800 is a brand-new certification (March 2026) focused on emerging AI-in-SQL features. Microsoft Learn learning paths cover most exam objectives for free. Hands-on practice with SQL Server 2025 vector functions and Data API builder is essential since the exam tests practical T-SQL skills, not just theory.

Choose Your Study Path

Limited SQL or database development experience. You need to build foundational T-SQL skills and database design knowledge before tackling AI integration topics.

Week 1-2Learn T-SQL fundamentals: SELECT, JOIN, subqueries, CTEs, window functions. Practice writing queries against sample databases like AdventureWorks or WideWorldImporters

Week 3Study database design: tables, data types, constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, DEFAULT), indexes (clustered, nonclustered, columnstore), and partitioning

Week 4Learn programmability objects: views, scalar functions, table-valued functions, stored procedures, and triggers. Write working examples of each

Week 5Study advanced T-SQL: JSON functions (JSON_OBJECT, JSON_ARRAY, OPENJSON, JSON_VALUE), regular expressions (REGEXP_LIKE, REGEXP_REPLACE), fuzzy matching (EDIT_DISTANCE), and graph queries with MATCH

Week 6Cover security and compliance: Always Encrypted, column-level encryption, Dynamic Data Masking, Row-Level Security, object-level permissions, passwordless access, and auditing

Week 7Study performance optimization: execution plans, DMVs, Query Store, Query Performance Insight, transaction isolation levels, blocking, and deadlocks

Week 8Learn CI/CD with SQL Database Projects: SDK-style projects, source control, branching, schema drift detection, deployment pipelines, secrets management, and testing strategies

Week 9Study AI capabilities: external models, embeddings, vector data type, VECTOR_DISTANCE, VECTOR_SEARCH, semantic search vs full-text search vs hybrid search, and RAG patterns with sp_invoke_external_rest_endpoint

Week 10Learn Data API builder (DAB): configuration files, REST and GraphQL endpoints, entity configuration, caching, pagination. Also cover change event streaming, CDC, and Azure Functions with SQL trigger binding

Week 11Practice questions across all domains, review explanations carefully. Focus on Domains 1 and 2 which together are 70-80% of the exam

Week 12Take full mock exams under timed conditions (120 minutes). Review weak areas, re-study any domains where you score below 70%

Exam Overview

Format

Approximately 50 questions, 120 minutes. Multiple choice, multiple select, case studies, and interactive lab-style questions.

Scoring

Scaled score 100-1000. Passing: 700. No penalty for wrong answers — always answer every question.

Domains & Weights

Design and Develop Database Solutions37%
Secure, Optimize, and Deploy Database Solutions37%
Implement AI Capabilities in Database Solutions26%

Registration

$165 USD. Available at Pearson VUE testing centers or online proctored from home. Exam fee is $165 USD.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these technologies deeply, know their syntax and configuration, and be able to apply them in scenario-based questions. These appear across multiple exam questions.

Tier 2: Should KnowUnderstand what these technologies are and their key characteristics. May appear in 2-5 questions each.

Tier 3: Recognize OnlyKnow what these are at a high level. Rarely more than 1-2 questions each.

Domain 137% of exam

Design and Develop Database Solutions

This domain covers the full breadth of database design and T-SQL development. You need to design tables with appropriate data types, indexes, and constraints, implement programmability objects like views and stored procedures, write advanced T-SQL including JSON functions, regular expressions, and graph queries, and use AI-assisted development tools like GitHub Copilot and MCP server connections.

Key Topics

Tables and IndexesStored ProceduresViews and FunctionsJSON FunctionsRegular ExpressionsGraph QueriesCTEs and Window FunctionsGitHub CopilotMCP Server

Must-Know Concepts

Table design: choosing appropriate data types, sizes, columns, clustered and nonclustered indexes, and columnstore indexes for the workload type
Specialized table types: in-memory (MEMORY_OPTIMIZED), temporal (system-versioned with history table), external (referencing remote data), ledger (tamper-evident with cryptographic hashing), and graph (node and edge tables)
JSON column design: creating JSON columns, JSON indexes, and using JSON_OBJECT, JSON_ARRAY, JSON_ARRAYAGG, JSON_CONTAINS, OPENJSON, and JSON_VALUE for querying semi-structured data
Constraints: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT — know when to use each and how they enforce data integrity
SEQUENCES: how to create and use sequences for generating ordered numeric values independently of tables
Table and index partitioning: partition functions, partition schemes, and how partitioning improves manageability and query performance for large tables
Programmability objects: creating views, scalar functions, table-valued functions, stored procedures, and triggers — know the use cases and limitations of each
CTEs (Common Table Expressions): syntax, recursive CTEs for hierarchical data, and when CTEs are better than subqueries or temp tables
Window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, SUM/AVG/COUNT with OVER and PARTITION BY
JSON functions: constructing JSON (JSON_OBJECT, JSON_ARRAY), parsing JSON (OPENJSON, JSON_VALUE), aggregating JSON (JSON_ARRAYAGG), and testing containment (JSON_CONTAINS)
Regular expressions: REGEXP_LIKE (matching), REGEXP_REPLACE (replacement), REGEXP_SUBSTR (extraction), REGEXP_INSTR (position), REGEXP_COUNT (counting), REGEXP_MATCHES, and REGEXP_SPLIT_TO_TABLE
Fuzzy string matching: EDIT_DISTANCE (Levenshtein distance), EDIT_DISTANCE_SIMILARITY (percentage match), and JARO_WINKLER_DISTANCE (positional similarity) for approximate matching
Graph queries: creating node tables and edge tables, writing MATCH operator queries to traverse graph relationships
Error handling: TRY/CATCH blocks, THROW, RAISERROR, @@ERROR, and XACT_ABORT for transactional error management
AI-assisted tools: enabling GitHub Copilot, configuring instruction files, connecting to MCP server endpoints for SQL Server and Fabric lakehouse, and understanding the security implications of AI-assisted code generation

Common Exam Traps

Temporal tables require TWO tables: the current table and a history table. The history table is system-managed — you cannot directly modify it

Ledger tables come in TWO types: updatable (tracks changes) and append-only (no updates or deletes allowed). Know when to use each

JSON_ARRAYAGG aggregates multiple rows into a JSON array, while JSON_ARRAY constructs an array from individual values. They serve different purposes

Graph MATCH queries use arrow syntax (->) to traverse edges. You must understand the direction matters in graph traversal

GitHub Copilot instruction files (.github/copilot-instructions.md) customize Copilot behavior for your project. They do NOT replace code review — AI-generated code still needs security review

Quick Check: Design and Develop Database Solutions

Question 1 of 3

A company needs to store audit-proof financial transaction records where any tampering with historical data must be cryptographically detectable. Which table type should they use?

Domain 237% of exam

Secure, Optimize, and Deploy Database Solutions

This domain covers data security features, performance optimization, CI/CD pipeline implementation with SQL Database Projects, and integration with Azure services including Data API builder. Together with Domain 1, these two domains represent about 74% of the exam. You need both conceptual knowledge and practical ability to implement these features.

Key Topics

Always EncryptedDynamic Data MaskingRow-Level SecurityQuery StoreExecution PlansSQL Database ProjectsData API BuilderAzure MonitorCDC

Must-Know Concepts

Always Encrypted: deterministic vs randomized encryption, enclave-enabled Always Encrypted, column master keys vs column encryption keys, and that the database engine NEVER sees plaintext
Column-level encryption: encrypting specific columns using symmetric keys and certificates, different from Always Encrypted which is client-side
Dynamic Data Masking: default, email, random, and custom string masking functions. Users with UNMASK permission bypass masking
Row-Level Security: filter predicates (control which rows are returned by SELECT), block predicates (control which rows can be modified by INSERT/UPDATE/DELETE), and security policy objects
Object-level permissions: GRANT, DENY, REVOKE on tables, views, stored procedures, schemas. Understand permission inheritance and the principle of least privilege
Passwordless database access: Azure AD authentication, Managed Identity, eliminating connection strings with passwords
Auditing: server-level and database-level auditing, audit action groups, writing to Azure Blob Storage or Log Analytics
Securing endpoints: Managed Identity for model endpoints, authentication/authorization for GraphQL, REST, and MCP endpoints
Query execution plans: reading actual vs estimated plans, identifying expensive operators (Table Scan, Key Lookup, Sort), and using plans to diagnose performance issues
Dynamic Management Views (DMVs): sys.dm_exec_query_stats, sys.dm_exec_requests, sys.dm_os_wait_stats for monitoring query and server performance
Query Store: enabling, configuring retention, identifying regressed queries, forcing plans, and comparing performance across time periods
Query Performance Insight: Azure SQL dashboard for identifying top resource-consuming queries
Blocking and deadlocks: identifying blocking chains, using sys.dm_exec_requests and sys.dm_tran_locks, implementing deadlock retry logic, and designing to minimize lock contention
Transaction isolation levels: READ UNCOMMITTED (dirty reads), READ COMMITTED (default), REPEATABLE READ, SERIALIZABLE, and SNAPSHOT — know which phenomena each prevents
SQL Database Projects: SDK-style .sqlproj files, build with dotnet build, schema comparison, deployment with SqlPackage or dacpac, and schema drift detection
CI/CD pipeline design: source control integration, branching policies, pull request workflows, deployment triggers, approval gates, code owners, secrets management, and unit/integration testing strategies
Data API builder (DAB): JSON configuration files, entity definitions for tables/views/stored procedures, REST and GraphQL endpoint configuration, pagination, filtering, searching, caching, and GraphQL relationships
Azure Monitor: Application Insights for application-level telemetry, Log Analytics for centralized log queries, and diagnostic settings
Change patterns: Change Data Capture (CDC), Change Tracking, Change Event Streaming (CES), Azure Functions with SQL trigger binding, and Azure Logic Apps for event-driven architectures

Common Exam Traps

Always Encrypted deterministic encryption allows equality comparisons but NOT range queries. Randomized encryption does not allow any comparisons at all

Dynamic Data Masking is NOT a security boundary — it only hides data in query results. A determined user might infer masked values through crafted queries. Use it for convenience, not for true data protection

Query Store captures plans and statistics automatically but you must ENABLE it first. It is not enabled by default on all platforms

Schema drift means the live database has diverged from the source-controlled project. SQL Database Projects can DETECT drift but you must decide how to resolve it — the tool does not automatically fix it

Data API builder generates REST AND GraphQL endpoints simultaneously from the same config. Do not assume you must choose one or the other

CDC captures full row data (before and after images). Change Tracking only captures primary keys of changed rows. The exam tests this distinction in the context of embedding maintenance

Quick Check: Secure, Optimize, and Deploy Database Solutions

Question 1 of 3

A database stores customer credit card numbers that must be encrypted so that even database administrators cannot view the plaintext values. Which feature should be implemented?

Domain 326% of exam

Implement AI Capabilities in Database Solutions

This domain covers integrating AI directly into SQL databases: evaluating and managing external AI models, generating and maintaining embeddings, implementing intelligent search (full-text, vector, and hybrid), and building RAG workflows using sp_invoke_external_rest_endpoint. While the smallest domain by weight, this is the most novel material and a key differentiator for the certification.

Key Topics

External ModelsEmbeddingsAI_GENERATE_EMBEDDINGSVector SearchHybrid SearchRAGVECTOR_DISTANCEVECTOR_NORMALIZEVECTORPROPERTYVECTOR_SEARCHDiskANN

Must-Know Concepts

Evaluating external models: multimodal capabilities, multilanguage support, model sizes, structured output support, and cost considerations when choosing an AI model for your database solution
Creating and managing external models: registering models with CREATE EXTERNAL MODEL, configuring credentials, and managing model lifecycle
Embedding maintenance methods: table triggers, Change Tracking, Azure Functions with SQL trigger binding, Azure Logic Apps, CDC, Change Event Streaming (CES), and Microsoft Foundry — know which to use for different freshness requirements
Choosing which columns to include in embeddings: identifying the most semantically meaningful columns, combining multiple columns for richer embeddings, and excluding irrelevant data
Designing and implementing chunks for embeddings: splitting large text into appropriately sized chunks with overlap to maintain context across chunk boundaries
Generating embeddings: using AI_GENERATE_EMBEDDINGS (the native T-SQL function that calls a registered external model) or sp_invoke_external_rest_endpoint (direct REST call to an AI API). AI_GENERATE_EMBEDDINGS requires a CREATE EXTERNAL MODEL registration and is the more concise, exam-preferred method
Full-text search: creating full-text indexes, full-text catalogs, and using CONTAINS, FREETEXT, CONTAINSTABLE, and FREETEXTTABLE for keyword search
Vector data type: declaring vector columns with dimensions, understanding the vector data type limitations, and storing embeddings
VECTOR_DISTANCE: syntax with three distance metrics (cosine, Euclidean, dot product), when to use each metric, and the performance characteristics of exact search
VECTOR_NORMALIZE: normalizes a vector to unit length using a specified norm type (norm1, norm2, norminf). Use before storing or comparing embeddings when magnitude should not affect similarity scores
VECTORPROPERTY: returns metadata about a vector column — Dimensions (the count of dimensions as an integer) or BaseType (the underlying data type name). Used for inspecting vector schema programmatically
VECTOR_SEARCH: using DiskANN indexes for approximate nearest neighbor search, understanding recall vs speed tradeoffs, and configuring vector index parameters
ANN vs ENN: Approximate Nearest Neighbor (VECTOR_SEARCH with indexes, fast but approximate) vs Exact Nearest Neighbor (VECTOR_DISTANCE scanning all vectors, slow but precise)
Hybrid search: combining full-text and vector search results for improved relevance using Reciprocal Rank Fusion (RRF) to merge ranked lists
Evaluating search performance: measuring recall, precision, latency, and relevance of vector and hybrid search implementations
RAG workflow: (1) receive user query, (2) generate query embedding, (3) search for relevant context using vector/hybrid search, (4) construct prompt with retrieved context, (5) send to language model via sp_invoke_external_rest_endpoint, (6) extract and return the response
Converting structured data to JSON for language model processing: using FOR JSON, JSON_OBJECT, and JSON_ARRAY to format query results as model input

Common Exam Traps

VECTOR_DISTANCE does NOT use indexes — it scans all vectors for exact results. VECTOR_SEARCH uses DiskANN indexes for approximate results. Mixing these up is a common exam mistake

Cosine distance is the standard metric for text embeddings and NLP. Do not default to Euclidean unless the scenario specifically involves numeric/spatial data

Embedding maintenance is ongoing, not one-time. When source data changes, embeddings must be regenerated. The exam tests which maintenance method (triggers, CDC, CES, etc.) fits the update frequency and latency requirements

RAG does NOT retrain or fine-tune the model. It augments the prompt with retrieved context at query time. The model weights are unchanged

Chunking strategy matters for embedding quality. Chunks that are too small lose context; chunks that are too large dilute relevance. Overlap between chunks preserves continuity at boundaries

sp_invoke_external_rest_endpoint is a SYSTEM stored procedure. You cannot create your own version of it. Know its exact syntax for calling REST APIs from T-SQL

AI_GENERATE_EMBEDDINGS requires a CREATE EXTERNAL MODEL registration first. It is NOT the same as calling sp_invoke_external_rest_endpoint manually — AI_GENERATE_EMBEDDINGS is the higher-level, single-function approach that the exam emphasizes for embedding generation

VECTOR_NORMALIZE does not change what is stored in the database — it returns a new normalized vector as output. You must explicitly store the result if you want the normalized form. Also, dot product distance assumes pre-normalized vectors; using unnormalized vectors with dot product gives inaccurate similarity scores

Quick Check: Implement AI Capabilities in Database Solutions

Question 1 of 3

A retail database has 10 million product descriptions and needs to support semantic search for finding products similar in meaning to a user's natural language query. The search must return results in under 100 milliseconds. Which approach should be used?

Technologies and Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

VECTOR_DISTANCE (ENN) vs VECTOR_SEARCH (ANN)

Use VECTOR_DISTANCE (ENN) when…

Performs exact nearest neighbor search by computing distance between every vector pair. Returns perfectly accurate results but slower on large datasets.

Use VECTOR_SEARCH (ANN) when…

Uses DiskANN vector indexes for approximate nearest neighbor search. Much faster on large datasets but returns approximate results that may miss some true nearest neighbors.

Exam trap

VECTOR_DISTANCE gives exact results but scans all vectors. VECTOR_SEARCH uses indexes for speed but sacrifices some accuracy. Choose based on dataset size and accuracy requirements. Small datasets can use ENN; production-scale embeddings need ANN.

Full-Text Search vs Semantic Vector Search

Use Full-Text Search when…

Keyword-based search that matches exact words and linguistic variations (stemming, inflections). Uses inverted indexes and ranking algorithms like BM25.

Use Semantic Vector Search when…

Meaning-based search that finds semantically similar content using vector embeddings. Can match conceptually related results even when no keywords overlap.

Exam trap

Full-text search matches words. Vector search matches meaning. Hybrid search combines both with Reciprocal Rank Fusion (RRF). The exam tests when each is appropriate: exact keyword lookups vs conceptual similarity.

Always Encrypted vs Dynamic Data Masking

Use Always Encrypted when…

Encrypts data at rest and in transit with keys held only by the client. The database engine never sees plaintext. Protects against database admin and server compromise.

Use Dynamic Data Masking when…

Masks data at query time for non-privileged users but stores the actual data unencrypted. Users with UNMASK permission see real values. Does not protect stored data.

Exam trap

Always Encrypted protects data from everyone including DBAs — the engine never decrypts. Dynamic Data Masking only hides data in query results and is reversible with the right permission. They solve fundamentally different threat models.

Row-Level Security (RLS) vs Dynamic Data Masking

Use Row-Level Security (RLS) when…

Controls which ROWS a user can see or modify by applying filter predicates and block predicates to table access. Non-visible rows are completely hidden from queries.

Use Dynamic Data Masking when…

Controls which COLUMN VALUES are visible by replacing sensitive data with masked values in query results. All rows are visible, but sensitive column data is obscured.

Exam trap

RLS hides entire ROWS. Dynamic Data Masking hides COLUMN VALUES. RLS means the row does not exist from the user's perspective. Masking means the row exists but sensitive values are replaced.

CDC (Change Data Capture) vs Change Tracking

Use CDC (Change Data Capture) when…

Captures full before-and-after images of changed rows in dedicated change tables. Records the actual data values that changed, enabling full audit trails.

Use Change Tracking when…

Tracks only that a row changed and the primary key of the changed row. Does not record what the old or new values were. Lighter weight than CDC.

Exam trap

CDC captures WHAT changed (full row data). Change Tracking only captures WHICH rows changed (primary keys only). Use CDC when you need the actual data values. Use Change Tracking for sync scenarios where you just need to know what to re-process.

SQL Database Projects vs Direct Database Deployment

Use SQL Database Projects when…

Schema-as-code approach where the database model lives in source control. Changes go through CI/CD pipelines with build validation, schema drift detection, and approval gates.

Use Direct Database Deployment when…

Applying DDL changes directly to a production database without source control or pipeline validation. Fast but risky — no version history, no rollback, no drift detection.

Exam trap

The exam strongly favors SQL Database Projects with CI/CD. Know the full pipeline: source control, branching policies, build validation, schema drift detection, secrets management, and deployment with approval triggers.

REST Endpoints (DAB) vs GraphQL Endpoints (DAB)

Use REST Endpoints (DAB) when…

Resource-based API style with predictable URL patterns. Each entity gets its own endpoint. Simple to consume but may require multiple requests for related data.

Use GraphQL Endpoints (DAB) when…

Query-based API style where clients specify exactly which fields and relationships to return in a single request. Reduces over-fetching and under-fetching.

Exam trap

Data API builder generates BOTH from the same configuration file. Know when GraphQL is preferred (complex relationships, reducing round trips) vs REST (simple CRUD, caching, broad client support). Also know that DAB supports MCP endpoints.

Cosine Distance vs Euclidean Distance

Use Cosine Distance when…

Measures the angle between two vectors, ignoring magnitude. Best for text embeddings, NLP, and RAG where direction (meaning) matters more than scale.

Use Euclidean Distance when…

Measures the straight-line distance between two points in vector space. Best for numeric/spatial data where magnitude differences are meaningful.

Exam trap

VECTOR_DISTANCE supports three metrics: cosine, Euclidean, and dot product. Cosine is the default for text/NLP workloads. Euclidean is for numeric data. Dot product is for pre-normalized vectors. The exam expects you to pick the right metric for the scenario.

Top Mistakes to Avoid

Confusing VECTOR_DISTANCE (exact nearest neighbor, no index, scans all vectors) with VECTOR_SEARCH (approximate nearest neighbor, uses DiskANN index, faster but approximate)

Using Euclidean distance for text embeddings when cosine distance is the standard metric for NLP and RAG workloads

Thinking Always Encrypted and Dynamic Data Masking provide the same level of protection — Always Encrypted prevents even DBAs from seeing data, while masking only hides data in query results

Confusing Row-Level Security (hides entire rows) with Dynamic Data Masking (hides column values) — they operate on different axes of access control

Mixing up CDC (captures full before/after row data) with Change Tracking (captures only which rows changed by primary key) when choosing an embedding maintenance method

Confusing AI_GENERATE_EMBEDDINGS (the native T-SQL function paired with CREATE EXTERNAL MODEL) with sp_invoke_external_rest_endpoint (a lower-level raw REST call) — the exam expects you to know both and when to use each

Assuming RAG fine-tunes or retrains the AI model — RAG only augments the prompt with retrieved context at query time without changing model weights

Confusing JSON_ARRAY (constructs an array from scalar values) with JSON_ARRAYAGG (aggregates multiple rows into a JSON array) — different purposes, different syntax

Not understanding that SQL Database Projects detect schema drift but do not automatically resolve it — the developer must decide how to handle the divergence

Forgetting that Data API builder generates both REST and GraphQL endpoints simultaneously from one configuration file — you do not have to choose one

Thinking temporal tables provide tamper detection — they track history but do not cryptographically verify integrity like ledger tables do

Exam-Ready Checklist

Can explain all 3 exam domains and their relative weights (37%, 37%, 26%)

Know how to design tables with appropriate data types, indexes (clustered, nonclustered, columnstore), constraints, and partitioning strategies

Can write advanced T-SQL including CTEs, window functions, JSON functions, regular expressions, fuzzy matching, and graph queries

Understand all specialized table types: temporal, ledger, in-memory, external, and graph — and when to use each

Can implement and differentiate Always Encrypted, Dynamic Data Masking, and Row-Level Security

Know how to read query execution plans, use DMVs, Query Store, and Query Performance Insight to diagnose performance issues

Understand CI/CD with SQL Database Projects: SDK-style projects, source control, branching, schema drift detection, and deployment pipelines

Can configure Data API builder for REST and GraphQL endpoints including entity exposure, pagination, caching, and GraphQL relationships

Know how to generate embeddings using sp_invoke_external_rest_endpoint and store them in vector columns

Can distinguish VECTOR_DISTANCE (ENN) from VECTOR_SEARCH (ANN) and choose the right distance metric (cosine, Euclidean, dot product)

Understand hybrid search implementation and Reciprocal Rank Fusion (RRF) for merging keyword and semantic results

Can implement a complete RAG workflow: query embedding, context retrieval, prompt construction, model invocation, and response extraction

Know embedding maintenance methods (triggers, CDC, CES, Change Tracking, Azure Functions, Logic Apps) and when to use each

Scored 70%+ on at least two full mock exams (700/1000 passing score)

Recommended Resources

Free & Official Resources

Microsoft DP-800 Official Study Guide

Official exam study guide with complete skills measured breakdown, exam objectives, and links to relevant Microsoft Learn modules.

Official

Microsoft Learn: DP-800 Learning Path

Official self-paced learning paths covering all three exam domains with hands-on exercises and knowledge checks.

Official

SQL Server 2025 Vector Search Documentation

Official documentation for vector data type, VECTOR_DISTANCE, VECTOR_SEARCH, and DiskANN indexes in SQL Server 2025.

Free

Data API Builder Documentation

Complete documentation for Data API builder including configuration, REST/GraphQL endpoints, and deployment guides.

Free

SQL Database Projects Documentation

Official tutorials for creating, building, and deploying SDK-style SQL Database Projects with CI/CD integration.

Free

Community Study Guide (GitHub)

Open-source community study guide with 60+ practice questions, mock exams, cheat sheets, hands-on labs, and spaced-repetition Anki deck.

Free

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Microsoft Course DP-800T00: Develop AI-Enabled Database Solutions

Official 3-day instructor-led training course covering all exam domains with hands-on labs across SQL Server, Azure SQL, and Fabric.

Paid

Udemy: DP-800 Microsoft Certified SQL AI Developer Associate

Third-party video course covering exam objectives with practice questions and hands-on demonstrations.

Paid

DP-800 Study Guide

You Can Pass This Exam For Free

Choose Your Study Path

Exam Overview

Topic Priority Table

Design and Develop Database Solutions

Key Topics

Must-Know Concepts

Common Exam Traps

Secure, Optimize, and Deploy Database Solutions

Key Topics

Must-Know Concepts

Common Exam Traps

Implement AI Capabilities in Database Solutions

Key Topics

Must-Know Concepts

Common Exam Traps

Technologies and Concepts You Must Not Confuse

Top Mistakes to Avoid

Exam-Ready Checklist

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

Frequently Asked Questions