MicrosoftAI-1035 domains

AI-103 Exam Notes

Last-minute traps, must-know facts, and scenario tips for the Microsoft Certified: Azure AI Apps and Agents Developer Associate exam.

General Exam Tips

1.Read every scenario as a requirements ticket: identify the desired outcome first, then identify the constraint (cost, latency, update frequency, security), then eliminate wrong-layer answers.
2.Domain 2 (Generative AI and Agentic Solutions, 33%) plus Domain 1 (Plan and Manage, 28%) make up over 60% of the exam. Weight your study time accordingly — if you are weak on agents or RAG, fix that before anything else.
3.When a question involves authentication, the correct answer is almost always managed identity with RBAC, not API keys. This rule applies across all scenario types.
4.Hands-on experience with Azure AI Foundry changes how you answer scenario questions. Candidates who only study theory get tricked by plausible-sounding wrong answers. Build at least one RAG pipeline and one agent before exam day.
5.The exam is 40-60 questions in 120 minutes — about 2 minutes per question. For case study sections, read the scenario description once carefully, then answer each sub-question referring back to specific details.
6.There is no penalty for wrong answers. If you are unsure, eliminate two clearly wrong options and guess between the remaining ones. Never leave a question blank.
7.Custom Vision is almost never the correct answer in 2026 — Azure AI Vision (Image Analysis 4.0) is the preferred modern service. When you see Custom Vision as an option, treat it as a distractor.
8.Watch for questions that ask about 'the least-privilege role' — the answer is almost always Cognitive Services OpenAI User, not Contributor. Contributor grants management-plane access and violates least-privilege.
9.When a question involves both Azure AI Search and Foundry IQ, they are NOT interchangeable — Foundry IQ is the higher-level abstraction that uses Azure AI Search underneath. Choosing one when the other is correct costs you the question.
10.For agent tool questions: file_search = searching uploaded documents (supports multiple vector store IDs); code_interpreter = executing Python code in a sandbox; azure_ai_search = querying an existing Azure AI Search index; function calling = invoking external APIs with structured arguments; bing_grounding = real-time public web retrieval with inline citations.
11.Do NOT study old AI-102 dumps. AI-103 tests Foundry, agentic workflows, RAG evaluation, Content Understanding, and modern safety controls — topics that are underweighted or absent in AI-102 material.
12.When a question asks about Content Understanding modes: single-task (standard) mode covers ALL content types (documents, images, audio, video). Pro mode is DOCUMENTS ONLY but adds multi-step reasoning and multi-input document support.
13.Questions describe symptoms, not service names. A scenario about 'picking up daily updates without retraining' is asking about RAG. A scenario about 'consistent product name translation at scale' is asking about Azure Translator with custom glossaries.

Quick Navigation

Plan and Manage an Azure AI Solution Implement Generative AI and Agentic Solutions Implement Computer Vision Solutions Implement Text Analysis Solutions Implement Information Extraction Solutions

Domain 128% of exam

Plan and Manage an Azure AI Solution

Must-Know Facts

Managed identity with RBAC is the production authentication pattern — keyless credentials are explicitly preferred over API keys on every security question.
The least-privilege inference role is Cognitive Services OpenAI User, not Contributor. Cognitive Services OpenAI User = inference API calls only (cannot create/edit deployments, cannot fine-tune, cannot create guardrails). Cognitive Services OpenAI Contributor = inference + fine-tuning + creating deployments. Neither can create new Azure OpenAI resources or access quota — those require Cognitive Services Contributor or Usages Reader.
Foundry projects are the organizing unit for AI-103. Resources (models, agents, search indexes) live inside a project. CI/CD pipelines connect to the project, not to individual service deployments.
Three deployment options for models in Foundry: serverless (pay-per-token, shared capacity, variable latency), managed compute (dedicated VMs), and provisioned throughput (reserved token capacity, predictable latency, billed hourly even when idle). Provisioned throughput is only cost-effective for high-volume, predictable production workloads.
Content filters in Azure OpenAI are independently configurable for BOTH the prompt input side AND the completion output side. Each of the four harm categories (hate, self-harm, sexual, violence) has its own severity threshold per side. Know the category names and that each is configured per side, per deployment.
Guardrails in Microsoft Foundry are the broader enforcement policy framework that covers FOUR intervention points: user input, tool call (Preview), tool response (Preview), and final output. Guardrails are assigned at the model deployment or agent level. A critical exam rule: an agent's guardrail FULLY OVERRIDES the underlying model deployment's guardrail. If no guardrail is assigned to the agent, it inherits the model deployment's guardrail.
Prompt Shields protect against two distinct attack types: direct jailbreaks (user input attempts to bypass the system prompt) and indirect prompt injection (malicious instructions embedded in retrieved content, web pages, or image text).
Monitoring in Foundry covers five distinct signals: tracing (step-by-step execution logs), token analytics (usage and cost per deployment), safety signals (content filter trigger events), latency breakdowns (which processing step is slow), and grounding quality (relevance of retrieved context to the query).
Quota management: token quotas are set per model deployment in TPM (tokens per minute). Provisioned throughput reserves dedicated capacity measured in PTUs (provisioned throughput units) — billed hourly regardless of usage.
Agent governance oversight modes: autonomous (agent acts without human confirmation) vs. semi-autonomous (agent requests human approval before irreversible or high-risk actions). Risk level drives the oversight mode choice, not agent accuracy.
Foundry safety evaluators are MEASUREMENT tools used during testing, development, and red-teaming — they produce scores, not enforcement actions. Content filters are ENFORCEMENT mechanisms that block or flag content in real-time. These serve different purposes and are separate tools.

Common Traps

TrapChoosing Contributor role for an app that only needs to call OpenAI inference endpoints

RealityCognitive Services OpenAI User is the correct least-privilege role for inference. Contributor grants management-plane access and violates least-privilege. This is consistently the security anti-pattern wrong answer on the exam.

TrapConfiguring content moderation at the individual prompt level, or assuming one guardrail setting covers everything

RealityContent filters are configured at the Azure OpenAI resource level (deployment scope), not per-prompt in code. For Foundry guardrails: an agent's guardrail FULLY overrides the model deployment's guardrail — they do not merge. An agent without its own guardrail inherits the model's. An agent with a guardrail completely ignores the model's guardrail.

TrapTreating CI/CD integration as connecting to individual OpenAI model deployments

RealityCI/CD pipelines in Foundry connect at the project level, which manages model endpoints, agents, and evaluation runs as a coordinated unit. Connecting directly to individual deployments bypasses Foundry orchestration.

TrapThinking Foundry safety evaluators and content filters are the same tool

RealityFoundry safety evaluators score inputs/outputs after the fact — they are for testing and red-teaming. Content filters (Azure AI Content Safety) block content in real-time based on configured severity thresholds. Evaluators measure; filters enforce. They are invoked at different times and serve different purposes.

TrapMonitoring grounding quality and monitoring model performance as the same metric

RealityGrounding quality measures whether retrieved chunks are relevant to the query — a retrieval metric. Model performance measures whether the model generates accurate and coherent text given that context — a generation metric. They require different measurement approaches.

TrapChoosing provisioned throughput for dev/test workloads

RealityProvisioned throughput reserves dedicated capacity and is billed hourly even when idle. It is only cost-effective for high-volume, predictable production traffic. For dev/test, serverless pay-per-token is always the correct answer.

TrapAssuming content filters apply only to the input (user prompts)

RealityContent filters in Azure OpenAI are independently configurable for both prompt input AND completion output. Missing that output filtering is a separate, independently configured control is a common scoring mistake.

Confusing Pairs

Prompt Shields (Direct Injection)Prompt Shields (Indirect Injection)

Direct prompt injection = a user deliberately tries to override the system prompt via their own input (jailbreaks, 'ignore all previous instructions'). Indirect prompt injection = malicious instructions embedded in retrieved content (RAG documents, web search results, or image text) that the model processes as trusted data. Both are mitigated by Prompt Shields but detected and configured at different points in the pipeline.

Serverless DeploymentProvisioned Throughput

Serverless = pay-per-token, shared capacity, variable latency under load, zero upfront commitment — right for variable or low-volume workloads and dev/test. Provisioned Throughput = reserved token capacity (PTUs), predictable latency, billed hourly regardless of usage — right for high-volume production workloads with SLA requirements.

Guardrails (Foundry enforcement framework)Content Filters (Azure AI Content Safety)

Guardrails = the named enforcement policy framework in Microsoft Foundry that covers FOUR intervention points (user input, tool call, tool response, output) and is assigned at the model or agent level. Content Safety classification models are the underlying detection engine. Key override rule: agent guardrail fully overrides model deployment guardrail — they do not inherit or merge.

Foundry Safety EvaluatorsContent Filters

Safety evaluators = post-generation measurement tools used during development and testing, produce scores and reports for red-teaming. Content filters = real-time enforcement gates that block or allow requests and responses in production. Evaluators measure; filters enforce. Both are needed but used at different lifecycle stages.

Trace LoggingToken Analytics

Trace logging captures step-by-step execution of an agent or prompt flow — which tools were called, what their inputs/outputs were, and how long each step took. Token analytics tracks token consumption per model, per deployment, per time period for cost management and quota monitoring. Both are in Foundry observability but answer different operational questions.

Scenario Tips

If the question asks about:

When the question asks how to secure a Foundry application connecting to Azure OpenAI in production without storing credentials in code

Answer:

Use managed identity assigned to the compute resource (App Service, Container App, Function App) with Cognitive Services OpenAI User RBAC role. No API keys, no Key Vault lookup at runtime.

Distractor to avoid:

API keys stored in Key Vault is often presented as secure — but managed identity is the explicitly preferred production pattern. Key Vault + API keys is acceptable but managed identity is the better answer when both are options.

If the question asks about:

When the question asks what to do when an AI agent is being used for high-stakes actions like financial transactions or deleting records

Answer:

Implement semi-autonomous mode with approval workflows. Human-in-the-loop confirmation is required before irreversible high-risk actions.

Distractor to avoid:

Fully autonomous mode might seem correct if agent accuracy is high, but the exam expects risk level (not accuracy) to drive the oversight mode decision.

If the question asks about:

When the question asks how to configure the agent to use stricter content filtering than the underlying model deployment

Answer:

Assign a separate guardrail with stricter settings directly to the agent. The agent's guardrail fully overrides the model deployment's guardrail — the agent applies its own policy, not the model's.

Distractor to avoid:

Modifying the model deployment's guardrail — this changes the policy for all consumers of that deployment, not just the specific agent, and may not be the right scope.

If the question asks about:

When the question asks about blocking a specific category of harmful content from both user prompts and model responses

Answer:

Configure the specific harm category (hate, self-harm, sexual, or violence) with separate severity thresholds for both input (prompt) and output (completion) sides in the Azure OpenAI content filtering settings.

Distractor to avoid:

Enabling a generic 'harmful content' filter — content filters require specific category and severity threshold configuration per side, not a single binary switch.

Last-Minute Facts

1Passing score: 700 out of 1000 (scaled). No penalty for wrong answers — always answer every question.

2Cognitive Services OpenAI User = inference only. Cognitive Services OpenAI Contributor = inference + fine-tuning + managing deployments. Neither creates new OpenAI resources or manages quota.

3Guardrail override rule: agent guardrail FULLY overrides model deployment guardrail. No inheritance or merging. An agent without its own guardrail inherits the model's.

4Model deployment options: serverless (pay-per-token, dev/test), managed compute (dedicated VMs), provisioned throughput (reserved PTUs, hourly billing, for high-volume production).

5Prompt Shields: direct injection (user jailbreaks) vs. indirect injection (malicious instructions in retrieved content or image text).

6Content filter harm categories: hate, self-harm, sexual, violence. Each is independently configurable per side (input vs. output) per deployment.

7Protected material detection covers both text (copyrighted content) and code (licensed open-source code).

8Content filters are configured at RESOURCE level (deployment scope), not per-prompt. CI/CD connects at PROJECT level, not deployment level.

Domain 233% of exam

Implement Generative AI and Agentic Solutions

Must-Know Facts

RAG architecture: document ingestion → chunking → embedding generation → indexing in Azure AI Search → hybrid retrieval (keyword BM25 + vector HNSW) → semantic ranker re-ranking → prompt augmentation → LLM response. Know each stage and what failure at each stage means for retrieval quality.
Chunking strategies: fixed-size chunking is simple but may split context mid-sentence. Semantic chunking preserves contextual boundaries but is more compute-intensive. The exam tests which to use based on document content type and query patterns.
Hybrid search in Azure AI Search combines keyword search (BM25) with vector search (HNSW or KNN). Hybrid retrieval results are merged using Reciprocal Rank Fusion (RRF). After hybrid retrieval, the semantic ranker re-ranks results using a transformer model. Semantic ranking is a POST-retrieval re-ranking step, not a retrieval method itself.
Integrated vectorization: Azure AI Search can automatically generate embeddings during indexing using a configured embedding model, eliminating the need to manage external embedding pipelines.
Agent thread/run/message model: a Thread persists the full conversation history. Messages are individual user and agent turns within a thread. A Run is a single agent invocation on a thread that may call multiple tools before producing the final response.
Agent tools available in Foundry Agent Service: file_search (managed vector store, supports multiple vector_store_ids), code_interpreter (sandboxed Python execution), azure_ai_search (query an existing Azure AI Search index), function calling (invoke external APIs with structured JSON schemas), bing_grounding (real-time web retrieval with citations). Max 128 tools per agent.
Azure AI Search tool limitation: one index endpoint per tool configuration. To query multiple independent indexes, use connected agents (A2A) where each sub-agent owns its own index. File Search is different — it supports multiple vector_store_ids on a single tool instance.
Multi-agent orchestration: a primary orchestrator agent routes tasks to specialized sub-agents. Each sub-agent has its own tools, instructions, memory, and thread. A coordination layer passes shared context between agents — each agent's memory is not automatically visible to other agents.
Prompt Flow vs. Agents: Prompt Flow = deterministic, versioned, evaluated input-output pipeline where developer specifies every step. Foundry Agent Service = autonomous system where the model decides which tools to call based on reasoning. Use Prompt Flow when you need reproducible pipelines with formal evaluation artifacts. Use Agents when tasks require open-ended reasoning and dynamic tool selection.
Fabrication detection (groundedness checking) is a post-generation evaluation step using Foundry evaluators. It measures whether the model's response is supported by retrieved context. It is NOT a real-time guardrail — it does not prevent hallucinations during production inference.
Model selection: GPT-4o-mini for cost-sensitive high-volume tasks; GPT-4o for capability-first tasks; o1/o3-mini for complex multi-step reasoning that benefits from extended chain-of-thought; Phi models (SLMs) for edge deployments, cost constraints, or latency-sensitive scenarios.

Common Traps

TrapChoosing fine-tuning when the knowledge base is updated frequently or contains private external data

RealityFine-tuning modifies model weights and cannot incorporate real-time data updates. RAG retrieves data at query time and naturally handles daily updates. For dynamic knowledge bases, frequent updates, or private enterprise data, RAG is almost always the correct answer. Fine-tuning is for changing model behavior, style, or format — not for adding external facts.

TrapTreating function calling and prompt injection as related or equivalent risks

RealityFunction calling is a structured, authorized mechanism where the developer defines tool schemas the model can invoke. Prompt injection is a security attack where malicious users or retrieved content attempt to hijack model behavior. One is an intended feature; the other is a threat vector.

TrapAssuming vector search alone is sufficient for a production RAG system

RealityVector search finds conceptually similar content but can miss exact term matches. Hybrid search (keyword BM25 + vector HNSW) followed by semantic ranker re-ranking is the recommended production RAG retrieval pattern. Questions about best retrieval strategy for enterprise RAG expect hybrid search plus semantic ranker.

TrapBelieving fabrication detection prevents hallucinations in real-time production

RealityFoundry groundedness evaluators are evaluation tools used during testing and batch assessment. They produce scores and reports; they do not block hallucinations at runtime. Real-time grounding prevention comes from proper RAG design — ensuring the model has accurate, relevant retrieved context to work from.

TrapThinking each agent can query multiple independent Azure AI Search indexes through a single tool configuration

RealityThe Azure AI Search agent tool connects to one index endpoint per tool. To fan out across multiple independent indexes, use connected agents (A2A): sub-agents each configured with their own index, orchestrated by a primary agent. File Search supports multiple vector_store_ids on one instance — this difference is testable.

TrapConfusing Prompt Flow with agents for autonomous task execution

RealityPrompt Flow defines a fixed, versioned sequence of steps. It does not make autonomous decisions about which tools to use. Questions about 'dynamically selecting tools based on user intent at runtime' require agents, not Prompt Flow.

TrapTreating semantic ranker as a search method equivalent to vector search or keyword search

RealitySemantic ranker is a re-ranking step applied AFTER hybrid retrieval produces a candidate set. It does NOT retrieve documents itself. Removing semantic ranker from a hybrid pipeline removes only the re-ranking — keyword and vector search still run. This ordering is frequently tested.

Confusing Pairs

File Search (agent tool)Azure AI Search (agent tool)

File Search = searches files you upload to the agent's Foundry-managed vector store. Supports multiple vector_store_ids on one tool instance. Best for ad-hoc document Q&A where you control the documents. Azure AI Search = queries an existing Azure AI Search index you manage separately. One index per tool configuration. Best for enterprise data maintained outside Foundry.

Code Interpreter (agent tool)Function Calling (agent tool)

Code Interpreter = agent writes and executes Python code in a sandboxed environment to compute results, analyze uploaded data, or generate files. Self-contained computation. Function Calling = agent invokes external APIs and functions you have defined using structured JSON arguments. Reaches external systems and services.

RAG (Retrieval-Augmented Generation)Fine-Tuning

RAG = augments prompts with retrieved context at query time, no model weight changes, works with dynamic frequently updated data, cheaper and faster to implement. Fine-tuning = modifies model weights through additional training, bakes in knowledge permanently, expensive, requires retraining when knowledge changes. Use RAG for dynamic/external data. Use fine-tuning for style, tone, format, or behavior changes that do not depend on external facts.

Semantic RankerVector Search

Vector search = retrieves documents by cosine similarity between query embedding and document embeddings. It IS a retrieval step. Semantic ranker = a transformer model that re-ranks the top-k results from hybrid or keyword search using deeper linguistic understanding. It is a POST-retrieval re-ranking step, NOT a retrieval method. Both serve different roles in the same pipeline.

Groundedness (Evaluator)Relevance (Evaluator)

Groundedness = measures whether the model's response is SUPPORTED by the retrieved context (hallucination check — is the answer actually in the source documents?). Relevance = measures whether the retrieved context is PERTINENT to the query (retrieval quality check — did we retrieve the right documents?). Groundedness is a generation quality metric; relevance is a retrieval quality metric. A full RAG evaluation requires both.

Prompt FlowFoundry Agent Service

Prompt Flow = structured, versioned, evaluated pipeline with developer-defined inputs and outputs. Every step is specified. Best for reproducible, formally evaluated workflows. Foundry Agent Service = autonomous system where the model decides which tools to invoke. Best for open-ended, multi-turn tasks requiring adaptive reasoning. Agents are adaptive; Prompt Flow is deterministic.

Scenario Tips

If the question asks about:

When the question presents a knowledge base that is updated daily and asks how to keep AI responses current without retraining

Answer:

Implement RAG with Azure AI Search. Re-index updated documents when the knowledge base changes. The model retrieves fresh content at query time without any model retraining.

Distractor to avoid:

Fine-tuning the model on a schedule — this is expensive, slow, and cannot keep pace with daily updates. Fine-tuning is the consistently wrong answer for dynamic knowledge base scenarios.

If the question asks about:

When the question asks which agent tool to use when the agent needs to run data analysis on a spreadsheet uploaded by a user

Answer:

Code Interpreter — it allows the agent to write and execute Python code in a sandboxed environment to analyze uploaded data and return computed results.

Distractor to avoid:

Function calling — this invokes external APIs, not self-contained computation on uploaded files.

If the question asks about:

When the question describes needing a pipeline with formal versioning, evaluation runs, and reproducible execution for a specific workflow

Answer:

Use Prompt Flow — it provides versioned, evaluated, deployable pipelines with formal evaluation artifacts and deterministic step execution.

Distractor to avoid:

Foundry Agent Service — agents are autonomous and not suitable when you need deterministic, versioned execution with formal evaluation artifacts.

If the question asks about:

When the question asks which retrieval approach to use for a production RAG system where both keyword precision and semantic understanding matter

Answer:

Hybrid search (keyword BM25 + vector HNSW) combined with semantic ranker re-ranking. This is the recommended production RAG retrieval pattern in Microsoft Foundry.

Distractor to avoid:

Vector search alone — it misses exact keyword matches and is not the recommended production pattern when both precision and recall matter.

If the question asks about:

When the question asks about a multi-agent system and which pattern to use when one agent needs information from another agent's conversation context

Answer:

Implement a coordination/orchestration layer that explicitly passes shared context between agents. Each agent's memory is isolated — a sub-agent cannot access another sub-agent's thread without an explicit handoff.

Distractor to avoid:

Agents automatically share memory — this is incorrect. Agent memory isolation is a core design principle. Shared context requires explicit design.

If the question asks about:

When the question asks which model to choose for an edge deployment with strict cost and latency constraints

Answer:

Phi models (SLMs) — Microsoft's small language models designed for efficiency, low cost, and edge deployment while maintaining strong task performance.

Distractor to avoid:

GPT-4o — the most capable model but far too large and expensive for edge deployment scenarios.

Last-Minute Facts

1Agent Thread = persists conversation history. Run = single agent invocation on a thread (may call multiple tools). Message = one turn in the conversation.

2Max 128 tools per agent. Azure AI Search tool = one index per connection; multiple independent indexes = use A2A connected agents. File Search supports multiple vector_store_ids on one instance.

3Hybrid retrieval: keyword BM25 + vector HNSW merged via RRF. Semantic ranker then re-ranks the merged candidate set. Semantic ranker is POST-retrieval re-ranking, not retrieval itself.

4text-embedding-3-large = high-accuracy embedding model for Azure AI Search integrated vectorization. text-embedding-3-small = cost/speed tradeoff.

5Groundedness evaluator = checks if response is supported by retrieved context. NOT a real-time guardrail — it does not block hallucinations during production inference.

6Semantic chunking preserves context boundaries. Fixed-size chunking is simpler but may split mid-sentence.

7GPT-4o-mini = cost; GPT-4o = capability; o1/o3-mini = complex multi-step reasoning; Phi = SLM for edge and cost-constrained scenarios.

8ReAct loop: Think (select tool) → Act (call tool) → Observe (process result) → Repeat. Agents iterate this loop until goal achieved or max iterations reached.

Domain 313% of exam

Implement Computer Vision Solutions

Must-Know Facts

DALL-E operations: text-to-image (generate from prompt), inpainting (fill in a masked/damaged region of an existing image), variation (generate variations of an existing image), and mask-based editing (modify a precisely specified region while preserving everything outside the mask).
Content Understanding has two modes: single-task (standard) mode and pro mode. Single-task mode supports ALL content types — documents, images, audio, and video — with lower cost and latency. Pro mode supports DOCUMENTS ONLY but adds multi-step reasoning, multi-input document support, and reference data integration. This distinction is a recurring exam trap.
GPT-4o with Vision and GPT-4 Vision perform visual question answering — they can answer questions grounded in the content of submitted images. This is a multimodal capability distinct from traditional image classification.
Accessibility-compliant alt-text must describe the function and context of the image, not just visual appearance. Accessibility alt-text is more than a generic caption — it must convey meaning suitable for screen readers and align with WCAG guidelines.
Content Understanding pipelines for video analysis can extract: transcriptions, scene segmentation, object detection in frames, and topic identification. This is a different use case than Azure Video Indexer.
Indirect prompt injection via embedded image text: malicious instructions hidden as text within uploaded images can be processed by multimodal models as if they were trusted instructions. Prompt Shields must scan image text to detect this attack vector.
Image generation parameters: resolution (e.g., 1024x1024), quality (standard vs. HD), style (natural vs. vivid), and number of images generated per request.
Custom Vision is the legacy image classification service. Azure AI Vision (Image Analysis 4.0) is the current preferred service for classification, object detection, captioning, and OCR on images.

Common Traps

TrapSelecting Custom Vision for image classification or object detection tasks in 2026

RealityAzure AI Vision (Image Analysis 4.0) is the current preferred service. It has built-in pre-trained models for classification, object detection, and captioning. Custom Vision is legacy — when both appear as options, Image Analysis 4.0 is almost always correct.

TrapTreating inpainting and mask-based editing as the same operation

RealityInpainting fills in missing, corrupted, or removed regions of an image using AI generation. Mask-based editing uses a precisely defined mask to specify exactly which region to modify while preserving everything outside the mask. Inpainting is about restoration; mask-based editing is about targeted modification.

TrapThinking Content Understanding pro mode supports video and audio while single-task mode does not

RealityThis is exactly backwards. Single-task (standard) mode supports ALL content types including video, audio, images, and documents. Pro mode is DOCUMENTS ONLY. Use pro mode only when you need multi-step reasoning or multi-input document support with documents specifically.

TrapAssuming a generic image caption satisfies accessibility requirements

RealityAccessibility-compliant alt-text must describe the purpose and context of the image in a way that conveys meaning to screen reader users — not just describe visual appearance. Accessibility-focused questions in healthcare and public-sector scenarios specifically test this distinction.

TrapOverlooking embedded text in user-uploaded images as a security risk

RealityWhen users can upload images to a multimodal AI app, attackers can embed malicious instructions in text that appears in the image. The model may process this embedded text as trusted instructions. Prompt Shields must inspect image text content to detect indirect prompt injection.

TrapChoosing Azure Video Indexer for a Foundry-native agentic video processing pipeline

RealityAzure Video Indexer is a standalone service with its own portal and API — it is not the Foundry-native choice. For agent pipelines and RAG systems that process video content, use Content Understanding single-task mode pipelines.

Confusing Pairs

Azure AI Vision (Image Analysis 4.0)Custom Vision

Image Analysis 4.0 = current, pre-trained, multimodal-capable service for image classification, object detection, captioning, OCR, and visual question answering. No training required. Custom Vision = legacy service requiring custom training for classification and detection tasks. In 2026, always prefer Image Analysis 4.0 unless the scenario specifically requires domain-specific training that built-in models cannot address.

Content Understanding Single-Task ModeContent Understanding Pro Mode

Single-task (standard) mode = supports ALL content types (documents, images, audio, VIDEO) with lower cost and latency. Best for broad multimodal processing. Pro mode = supports DOCUMENTS ONLY but adds multi-step reasoning, multi-input document support, and reference data integration. More expensive and slower. Key trap: pro mode is NOT the video-capable mode — single-task mode is.

Content Understanding (Video Pipeline)Azure Video Indexer

Content Understanding video pipelines = Foundry-native Tool for extracting structured insights from video as part of a larger AI pipeline (RAG, agents). The expected answer for agent and Foundry-integrated video processing. Azure Video Indexer = standalone service with pre-built video insight extraction (faces, topics, keyframes). The expected answer for standalone video analytics without agent integration.

InpaintingMask-Based Editing

Inpainting = fill in damaged, missing, or removed regions of an image using AI generation — about restoration and completion. Mask-based editing = use a precise mask to specify which region to modify while keeping everything outside the mask unchanged — about targeted modification of a specific area.

Scenario Tips

If the question asks about:

When the question asks about a marketing team that needs to replace a specific product in an existing AI-generated image while keeping the background and other elements intact

Answer:

Use mask-based editing — define a mask over the product region, then use DALL-E to generate new content only in that masked area while preserving everything outside.

Distractor to avoid:

Generating a completely new image with a modified prompt — this does not preserve the original composition and produces a different image.

If the question asks about:

When the question asks about detecting malicious instructions in user-uploaded images that contain visible text

Answer:

Enable Prompt Shields with indirect prompt injection detection. The shield scans image text content as part of the content safety pipeline to catch embedded malicious instructions.

Distractor to avoid:

File size limits, rate limiting, or image resolution restrictions — these do not address the content of embedded text.

If the question asks about:

When the question asks about a healthcare application generating image descriptions for visually impaired users

Answer:

Configure extended image descriptions aligned to accessibility guidelines (WCAG). These provide contextual, function-describing alt-text suitable for screen readers — not just visual appearance descriptions.

Distractor to avoid:

Standard captions — too brief and focused on appearance rather than meaning and context. Wrong for accessibility use cases.

If the question asks about:

When the question asks which Content Understanding mode to use for a pipeline that extracts information from video recordings

Answer:

Single-task (standard) mode — it supports video content. Pro mode does NOT support video.

Distractor to avoid:

Pro mode — a very common trap because 'pro' sounds more capable, but pro mode is documents-only.

Last-Minute Facts

1DALL-E operations: text-to-image, inpainting (fill missing/damaged area), variations, mask-based editing (preserve outside mask, modify inside mask).

2Image generation parameters: resolution (1024x1024 etc.), quality (standard / hd), style (vivid / natural).

3Content Understanding SINGLE-TASK mode = all content types (docs, images, audio, VIDEO). PRO mode = documents ONLY + multi-step reasoning + multi-input.

4Custom Vision is legacy. Azure AI Vision Image Analysis 4.0 is the 2026 preferred service for classification, detection, and captioning.

5Indirect prompt injection via image text = attack vector for multimodal models. Prompt Shields detect it by scanning image text content.

6Accessibility alt-text must describe function and context, not just visual appearance. WCAG-compliant, not just a generic caption.

Domain 413% of exam

Implement Text Analysis Solutions

Must-Know Facts

Azure AI Language provides standardized NLP capabilities: named entity recognition (NER), sentiment analysis, key phrase extraction, language detection, personally identifiable information (PII) detection, and abstractive summarization.
LLM-based text analysis (prompting GPT-4o for NLP tasks) is more flexible and handles nuanced complex tasks, but is significantly more expensive than Azure AI Language for high-volume, standardized extraction at scale.
Structured JSON output from LLMs requires either JSON mode (response_format parameter set to json_object) or an explicit output schema defined in the system prompt with examples. LLMs do not produce structured output automatically.
Azure Translator supports 100+ languages with deterministic, consistent translation and custom glossaries for domain-specific product names and technical terms. It is the correct choice when terminology consistency across large volumes of requests is required.
LLM-powered translation (prompting GPT-4o to translate) is more flexible for rare language pairs or nuanced localization, but does not guarantee consistent terminology translation across requests.
Speech-to-text (STT) and text-to-speech (TTS) are both required for a complete voice-enabled agent interaction. STT converts spoken input to text for the agent to process. TTS converts the agent's response back to audio for the user. Without TTS, the agent can receive voice input but cannot produce spoken responses.
Custom speech models require actual domain-specific training data: audio recordings paired with accurate transcriptions. They are not a configuration toggle — training data and a training job are required.
Speech translation = convert spoken audio in one language to text or speech in another language, integrating STT and translation in one pipeline. This is distinct from text-only Azure Translator.

Common Traps

TrapUsing GPT-4o for every text extraction task regardless of volume and standardization

RealityFor high-volume, standardized NLP tasks (entity extraction, sentiment classification, PII detection, key phrase extraction), Azure AI Language is far more cost-effective and produces consistent output. GPT-4o is appropriate when flexibility, nuance, or non-standard output formats are required.

TrapAssuming LLM translation guarantees consistent handling of custom product terminology

RealityLLMs translate freely and cannot guarantee that a product name like 'Foundry' is translated the same way across thousands of requests. Azure Translator with custom glossaries enforces consistent terminology mapping across all translation requests.

TrapThinking STT alone creates a fully voice-capable agent

RealitySpeech-to-text converts spoken user input to text. Without text-to-speech, the agent can receive voice input but its responses remain text-only. Both STT and TTS are required for a complete bidirectional voice interaction.

TrapTreating custom speech models as a simple configuration option that can be toggled

RealityCustom speech models require real domain-specific audio recordings paired with accurate transcriptions as training data. A training job must be run. They are built, not configured.

TrapExpecting structured JSON output from an LLM without specifying the format

RealityLLMs produce unstructured text by default. To get reliable JSON, you must either enable JSON mode (response_format: json_object) or explicitly define the output schema in the system prompt. The format must be specified or the model will not produce it reliably.

Confusing Pairs

Azure AI LanguageAzure OpenAI (Prompting for NLP Tasks)

Azure AI Language = purpose-built NLP service with prebuilt models for NER, sentiment, PII detection, language detection, summarization. Fixed API, consistent deterministic output, cheaper per call. Azure OpenAI prompting = flexible, handles arbitrary NLP tasks including non-standard ones, but more expensive and less consistent at scale. Use Language for standardized high-volume tasks; use OpenAI prompting for nuanced or non-standard requirements.

Azure TranslatorLLM Translation (GPT-4o)

Azure Translator = deterministic, supports 100+ languages, custom glossaries for terminology consistency, optimized for high-volume translation throughput. The correct choice when terminology consistency at scale matters. LLM translation = flexible, context-aware, handles nuanced localization, but inconsistent with terminology and more expensive per request. The correct choice for rare language pairs or highly nuanced localization.

Speech TranslationAzure Translator

Speech Translation (Azure AI Speech) = converts spoken audio in one language directly to text or speech in another language — combines STT and translation in a single pipeline. If the input is spoken audio, use Speech Translation. Azure Translator = text-to-text translation only. If the input is already text, Azure Translator is the correct service.

Scenario Tips

If the question asks about:

When the question presents a company processing thousands of customer reviews daily for sentiment and entity extraction at minimal cost

Answer:

Use Azure AI Language (sentiment analysis + named entity recognition). Purpose-built service, cheaper per call than GPT-4o, consistent output format — the right tool for high-volume standardized NLP.

Distractor to avoid:

GPT-4o for analysis — capable but significantly more expensive per call at scale and unnecessary when standardized NLP tasks suffice.

If the question asks about:

When the question asks about enabling a customer service agent to conduct full bidirectional voice conversations

Answer:

Integrate Azure AI Speech for both STT (convert spoken customer input to text for the agent) and TTS (convert agent text responses to audio for the customer). Both directions are required.

Distractor to avoid:

Azure AI Language — handles text understanding and NLP, not audio input/output conversion.

If the question asks about:

When the question asks about translating customer support chats across 50+ languages while ensuring product names are always translated consistently

Answer:

Azure Translator with a custom glossary — glossaries enforce consistent translation of specific terms across all requests. Deterministic translation at scale.

Distractor to avoid:

GPT-4o translation — creative and flexible, but will not guarantee that product names are translated the same way across thousands of requests.

If the question asks about:

When the question asks how to get an LLM to return a structured response with specific fields for downstream processing

Answer:

Enable JSON mode (response_format: json_object) or define the explicit JSON schema in the system prompt with examples. The model requires explicit schema guidance to produce structured output reliably.

Distractor to avoid:

Just asking the model nicely to return JSON in the user message — without explicit format specification, output format is not guaranteed and will vary.

Last-Minute Facts

1JSON mode for structured LLM output: response_format parameter set to json_object. Without explicit format specification, output format is not guaranteed.

2Azure AI Language tasks: NER, sentiment analysis, key phrase extraction, PII detection, language detection, abstractive summarization.

3Azure Translator: 100+ languages, custom glossaries for domain terminology, text-to-text only. Not for audio input.

4Speech Translation (Azure AI Speech): spoken audio in → spoken or text output in another language. Combines STT and translation in one pipeline.

5Voice-enabled agent requires BOTH STT (spoken input to text) and TTS (text response to audio). STT alone creates a voice-in, text-out system — incomplete.

6Custom speech models require: actual domain-specific audio recordings + transcription pairs. A training job is required. Not a configuration toggle.

Domain 513% of exam

Implement Information Extraction Solutions

Must-Know Facts

RAG ingestion pipeline order: document ingestion → OCR (for scanned content) → layout analysis → text extraction → chunking → embedding generation → index population. Every step must succeed for retrieval to work. Scanned PDFs have no native text layer — OCR is not optional.
Vector search requires pre-computed embeddings. Raw text cannot be vector-searched directly. Embeddings must be generated during ingestion and stored in the index alongside source content. A RAG pipeline that skips embedding generation cannot support vector search.
Enrichment skills in Azure AI Search run during indexing (via skillsets attached to indexers), NOT at query time. Adding a new skill to an existing skillset does NOT retroactively process already-indexed documents — a full reindex or indexer reset is required to apply the new skill to all documents.
Integrated vectorization in Azure AI Search: configure an embedding model to automatically generate vectors during indexing. Eliminates external embedding pipeline management.
Incremental indexing = re-processes only changed, added, or deleted documents since the last run. Full reindexing = rebuilds the entire index from scratch. Use incremental for routine updates; use full reindexing when the index schema changes, enrichment skills change, or the embedding model changes.
Document Intelligence prebuilt models cover common document types: invoice, receipt, ID document, business card, W-2, 1098, pay stub, health insurance card. Use prebuilt when the document type matches — no training required.
Document Intelligence custom models require labeling examples of your proprietary document format and a training job. They are necessary only for non-standard document types that prebuilt models do not cover.
Content Understanding generates markdown output by default (preserving document structure). If downstream agents or pipelines need structured typed fields, the analyzer output format must be explicitly configured for JSON.
Foundry IQ is a knowledge layer built on top of Azure AI Search — it adds agentic retrieval, permission-aware multi-source knowledge bases, and automated chunking and embedding for agents. Foundry IQ is NOT the same service as Azure AI Search.
Document Intelligence applies to documents only (PDFs, images of forms, office documents). For video or audio content extraction, use Content Understanding (single-task mode supports all content types) or Azure AI Speech.

Common Traps

TrapUsing Document Intelligence for video or audio content extraction

RealityDocument Intelligence processes documents — PDFs, images of forms, office documents. For video or audio content, use Content Understanding (single-task mode supports video and audio) or Azure AI Speech. Choosing Document Intelligence for a video analysis scenario is a common wrong answer.

TrapAssuming vector search works on raw text without pre-generating embeddings

RealityVector search compares floating-point embedding vectors — not raw text. Embeddings must be generated during ingestion and stored in the search index. A pipeline without embedding generation cannot support vector search queries.

TrapThinking enrichment skills process documents at query time and apply automatically to existing content

RealityEnrichment skills execute during the indexing job. They do not run when a user submits a search query. Adding a new skill does not retroactively process already-indexed documents — a full reindex or reset is required.

TrapBuilding a custom Document Intelligence model when a prebuilt model already covers the document type

RealityPrebuilt invoice, receipt, and ID models handle varying layouts within those document categories without any training. Custom models require labeling, training, and validation — they are only necessary for proprietary or non-standard document types. Using custom when prebuilt exists is wasteful and the wrong answer.

TrapTreating Foundry IQ and Azure AI Search as interchangeable in agent scenarios

RealityAzure AI Search is the underlying retrieval infrastructure — an indexing and search service. Foundry IQ is a higher-level abstraction built on top of Azure AI Search that adds agentic retrieval, permission-aware multi-source knowledge bases, and automated embedding. In agent scenarios requiring permission-aware access across multiple knowledge sources, Foundry IQ is the correct abstraction.

TrapExpecting Content Understanding to produce JSON output by default without configuring the analyzer

RealityContent Understanding generates markdown output by default. Downstream agents or pipelines that need structured JSON fields must have the analyzer output format explicitly configured. The default is markdown, not JSON.

Confusing Pairs

Document IntelligenceContent Understanding

Document Intelligence = specialized, high-precision extraction from documents using prebuilt or custom trained models. Best for structured forms (invoices, receipts, contracts) where deterministic field extraction matters. Documents only. Content Understanding = broader multimodal extraction from documents, images, audio, and video using AI reasoning. Best when content spans multiple modalities or has variable structure. Broader but less precise for well-defined document fields.

Incremental IndexingFull Reindexing

Incremental indexing = re-processes only changed or new documents since the last run. Efficient for routine knowledge base updates where the schema has not changed. Full reindexing = rebuilds the entire index from scratch. Required when the index schema changes, enrichment skills are added or modified, or the embedding model changes and all documents need re-vectorization.

Foundry IQAzure AI Search

Azure AI Search = the managed search service providing indexing, vector search, hybrid search, semantic ranking, and skillset processing. Foundry IQ = a knowledge layer built on top of Azure AI Search that adds agentic retrieval, permission-aware multi-source knowledge bases, and automated chunking and embedding for agents. Foundry IQ uses Azure AI Search as its retrieval infrastructure but is a distinct, higher-level service abstraction.

Document Intelligence Prebuilt ModelsDocument Intelligence Custom Models

Prebuilt = Microsoft-trained models for common document types (invoice, receipt, ID, W-2, health insurance card). Zero training required — deploy and call immediately. Handles varying layouts within the covered document category. Custom = train your own model using labeled examples of your proprietary format. Requires a training job and labeled data. Only needed when prebuilt models do not cover the document type.

Scenario Tips

If the question asks about:

When the question describes indexing thousands of scanned PDFs with handwritten notes and tables for a RAG pipeline

Answer:

Configure an Azure AI Search indexer with an OCR skill (for scanned text), layout analysis skill (for tables and structure), text split skill (for chunking), and an embedding skill (for vector generation). All four are required for complete RAG ingestion of scanned PDFs.

Distractor to avoid:

Text-only extraction — scanned PDFs have no native text layer. Without OCR, the indexer extracts nothing from the pages.

If the question asks about:

When the question asks about extracting vendor name, invoice number, line items, and total from invoices with widely varying formats

Answer:

Use the Document Intelligence prebuilt invoice model — it is specifically designed to handle varying invoice layouts and reliably extract standard invoice fields without any custom training.

Distractor to avoid:

A custom Document Intelligence model — requires labeling and training time, and is unnecessary since the prebuilt invoice model already covers standard invoice extraction.

If the question asks about:

When the question asks how to connect an agent to a product manual knowledge base so it can answer customer support queries

Answer:

Index the manuals in Azure AI Search, then connect the index as an Azure AI Search agent tool (or via Foundry IQ knowledge base). The agent retrieves relevant sections at query time dynamically.

Distractor to avoid:

Embedding all manual content in the agent's system prompt — system prompts have context window limits and cannot accommodate large document collections. This approach fails at any non-trivial scale.

If the question asks about:

When the question asks what happens when you add a new enrichment skill to an existing Azure AI Search skillset that already has indexed documents

Answer:

The new skill applies only to documents indexed after the change. To apply the skill to all existing documents, trigger a full reindex or reset the indexer. Enrichment skills do not retroactively process already-indexed content.

Distractor to avoid:

The skill automatically applies to all existing documents — enrichment skills are applied during the indexing job, not retroactively.

If the question asks about:

When the question asks about extracting structured data from audio recordings for a downstream agent pipeline

Answer:

Use Content Understanding single-task mode — it supports audio as a content type and can produce structured output from audio recordings. Document Intelligence does not process audio.

Distractor to avoid:

Document Intelligence — it does not process audio content. This is a trap question specifically testing knowledge of which service handles which content type.

Last-Minute Facts

1Enrichment skills run at INDEX TIME only. Adding a new skill does NOT retroactively re-process already-indexed documents — you must trigger a full reindex or reset the indexer.

2Vector search requires pre-computed embeddings in the index. Raw text cannot be vector-searched.

3Integrated vectorization: Azure AI Search generates embeddings automatically during indexing using a configured model.

4Incremental indexing = changed documents only. Full reindex = all documents. Full reindex required after schema changes, skill changes, or embedding model changes.

5Content Understanding output format: markdown by default. Explicitly configure the analyzer for JSON output if downstream systems need typed fields.

6Document Intelligence prebuilt models: invoice, receipt, ID document, business card, W-2, health insurance card. Custom = non-standard proprietary document types only.

7Foundry IQ is NOT Azure AI Search. Foundry IQ is a higher-level knowledge layer that uses Azure AI Search as its underlying retrieval infrastructure.

8Document Intelligence = documents (PDFs, forms, office docs) ONLY. Content Understanding = documents + images + audio + video. Do not use Document Intelligence for audio or video.

Feeling confident?

Put your knowledge to the test with a timed AI-103 mock exam.