General Exam Tips
- 1.Read every scenario as a requirements ticket: identify the desired outcome first, then identify the constraint (cost, latency, update frequency, security), then eliminate wrong-layer answers.
- 2.Domain 2 (Generative AI and Agentic Solutions, 33%) plus Domain 1 (Plan and Manage, 28%) make up over 60% of the exam. Weight your study time accordingly — if you are weak on agents or RAG, fix that before anything else.
- 3.When a question involves authentication, the correct answer is almost always managed identity with RBAC, not API keys. This rule applies across all scenario types.
- 4.Hands-on experience with Azure AI Foundry changes how you answer scenario questions. Candidates who only study theory get tricked by plausible-sounding wrong answers. Build at least one RAG pipeline and one agent before exam day.
- 5.The exam is 40-60 questions in 120 minutes — about 2 minutes per question. For case study sections, read the scenario description once carefully, then answer each sub-question referring back to specific details.
- 6.There is no penalty for wrong answers. If you are unsure, eliminate two clearly wrong options and guess between the remaining ones. Never leave a question blank.
- 7.Custom Vision is almost never the correct answer in 2026 — Azure AI Vision (Image Analysis 4.0) is the preferred modern service. When you see Custom Vision as an option, treat it as a distractor.
- 8.Watch for questions that ask about 'the least-privilege role' — the answer is almost always Cognitive Services OpenAI User, not Contributor. Contributor grants management-plane access and violates least-privilege.
- 9.When a question involves both Azure AI Search and Foundry IQ, they are NOT interchangeable — Foundry IQ is the higher-level abstraction that uses Azure AI Search underneath. Choosing one when the other is correct costs you the question.
- 10.For agent tool questions: file_search = searching uploaded documents (supports multiple vector store IDs); code_interpreter = executing Python code in a sandbox; azure_ai_search = querying an existing Azure AI Search index; function calling = invoking external APIs with structured arguments; bing_grounding = real-time public web retrieval with inline citations.
- 11.Do NOT study old AI-102 dumps. AI-103 tests Foundry, agentic workflows, RAG evaluation, Content Understanding, and modern safety controls — topics that are underweighted or absent in AI-102 material.
- 12.When a question asks about Content Understanding modes: single-task (standard) mode covers ALL content types (documents, images, audio, video). Pro mode is DOCUMENTS ONLY but adds multi-step reasoning and multi-input document support.
- 13.Questions describe symptoms, not service names. A scenario about 'picking up daily updates without retraining' is asking about RAG. A scenario about 'consistent product name translation at scale' is asking about Azure Translator with custom glossaries.
Quick Navigation
Plan and Manage an Azure AI Solution
Must-Know Facts
- Managed identity with RBAC is the production authentication pattern — keyless credentials are explicitly preferred over API keys on every security question.
- The least-privilege inference role is Cognitive Services OpenAI User, not Contributor. Cognitive Services OpenAI User = inference API calls only (cannot create/edit deployments, cannot fine-tune, cannot create guardrails). Cognitive Services OpenAI Contributor = inference + fine-tuning + creating deployments. Neither can create new Azure OpenAI resources or access quota — those require Cognitive Services Contributor or Usages Reader.
- Foundry projects are the organizing unit for AI-103. Resources (models, agents, search indexes) live inside a project. CI/CD pipelines connect to the project, not to individual service deployments.
- Three deployment options for models in Foundry: serverless (pay-per-token, shared capacity, variable latency), managed compute (dedicated VMs), and provisioned throughput (reserved token capacity, predictable latency, billed hourly even when idle). Provisioned throughput is only cost-effective for high-volume, predictable production workloads.
- Content filters in Azure OpenAI are independently configurable for BOTH the prompt input side AND the completion output side. Each of the four harm categories (hate, self-harm, sexual, violence) has its own severity threshold per side. Know the category names and that each is configured per side, per deployment.
- Guardrails in Microsoft Foundry are the broader enforcement policy framework that covers FOUR intervention points: user input, tool call (Preview), tool response (Preview), and final output. Guardrails are assigned at the model deployment or agent level. A critical exam rule: an agent's guardrail FULLY OVERRIDES the underlying model deployment's guardrail. If no guardrail is assigned to the agent, it inherits the model deployment's guardrail.
- Prompt Shields protect against two distinct attack types: direct jailbreaks (user input attempts to bypass the system prompt) and indirect prompt injection (malicious instructions embedded in retrieved content, web pages, or image text).
- Monitoring in Foundry covers five distinct signals: tracing (step-by-step execution logs), token analytics (usage and cost per deployment), safety signals (content filter trigger events), latency breakdowns (which processing step is slow), and grounding quality (relevance of retrieved context to the query).
- Quota management: token quotas are set per model deployment in TPM (tokens per minute). Provisioned throughput reserves dedicated capacity measured in PTUs (provisioned throughput units) — billed hourly regardless of usage.
- Agent governance oversight modes: autonomous (agent acts without human confirmation) vs. semi-autonomous (agent requests human approval before irreversible or high-risk actions). Risk level drives the oversight mode choice, not agent accuracy.
- Foundry safety evaluators are MEASUREMENT tools used during testing, development, and red-teaming — they produce scores, not enforcement actions. Content filters are ENFORCEMENT mechanisms that block or flag content in real-time. These serve different purposes and are separate tools.
Common Traps
Confusing Pairs
Scenario Tips
When the question asks how to secure a Foundry application connecting to Azure OpenAI in production without storing credentials in code
Use managed identity assigned to the compute resource (App Service, Container App, Function App) with Cognitive Services OpenAI User RBAC role. No API keys, no Key Vault lookup at runtime.
API keys stored in Key Vault is often presented as secure — but managed identity is the explicitly preferred production pattern. Key Vault + API keys is acceptable but managed identity is the better answer when both are options.
When the question asks what to do when an AI agent is being used for high-stakes actions like financial transactions or deleting records
Implement semi-autonomous mode with approval workflows. Human-in-the-loop confirmation is required before irreversible high-risk actions.
Fully autonomous mode might seem correct if agent accuracy is high, but the exam expects risk level (not accuracy) to drive the oversight mode decision.
When the question asks how to configure the agent to use stricter content filtering than the underlying model deployment
Assign a separate guardrail with stricter settings directly to the agent. The agent's guardrail fully overrides the model deployment's guardrail — the agent applies its own policy, not the model's.
Modifying the model deployment's guardrail — this changes the policy for all consumers of that deployment, not just the specific agent, and may not be the right scope.
When the question asks about blocking a specific category of harmful content from both user prompts and model responses
Configure the specific harm category (hate, self-harm, sexual, or violence) with separate severity thresholds for both input (prompt) and output (completion) sides in the Azure OpenAI content filtering settings.
Enabling a generic 'harmful content' filter — content filters require specific category and severity threshold configuration per side, not a single binary switch.
Last-Minute Facts
Implement Generative AI and Agentic Solutions
Must-Know Facts
- RAG architecture: document ingestion → chunking → embedding generation → indexing in Azure AI Search → hybrid retrieval (keyword BM25 + vector HNSW) → semantic ranker re-ranking → prompt augmentation → LLM response. Know each stage and what failure at each stage means for retrieval quality.
- Chunking strategies: fixed-size chunking is simple but may split context mid-sentence. Semantic chunking preserves contextual boundaries but is more compute-intensive. The exam tests which to use based on document content type and query patterns.
- Hybrid search in Azure AI Search combines keyword search (BM25) with vector search (HNSW or KNN). Hybrid retrieval results are merged using Reciprocal Rank Fusion (RRF). After hybrid retrieval, the semantic ranker re-ranks results using a transformer model. Semantic ranking is a POST-retrieval re-ranking step, not a retrieval method itself.
- Integrated vectorization: Azure AI Search can automatically generate embeddings during indexing using a configured embedding model, eliminating the need to manage external embedding pipelines.
- Agent thread/run/message model: a Thread persists the full conversation history. Messages are individual user and agent turns within a thread. A Run is a single agent invocation on a thread that may call multiple tools before producing the final response.
- Agent tools available in Foundry Agent Service: file_search (managed vector store, supports multiple vector_store_ids), code_interpreter (sandboxed Python execution), azure_ai_search (query an existing Azure AI Search index), function calling (invoke external APIs with structured JSON schemas), bing_grounding (real-time web retrieval with citations). Max 128 tools per agent.
- Azure AI Search tool limitation: one index endpoint per tool configuration. To query multiple independent indexes, use connected agents (A2A) where each sub-agent owns its own index. File Search is different — it supports multiple vector_store_ids on a single tool instance.
- Multi-agent orchestration: a primary orchestrator agent routes tasks to specialized sub-agents. Each sub-agent has its own tools, instructions, memory, and thread. A coordination layer passes shared context between agents — each agent's memory is not automatically visible to other agents.
- Prompt Flow vs. Agents: Prompt Flow = deterministic, versioned, evaluated input-output pipeline where developer specifies every step. Foundry Agent Service = autonomous system where the model decides which tools to call based on reasoning. Use Prompt Flow when you need reproducible pipelines with formal evaluation artifacts. Use Agents when tasks require open-ended reasoning and dynamic tool selection.
- Fabrication detection (groundedness checking) is a post-generation evaluation step using Foundry evaluators. It measures whether the model's response is supported by retrieved context. It is NOT a real-time guardrail — it does not prevent hallucinations during production inference.
- Model selection: GPT-4o-mini for cost-sensitive high-volume tasks; GPT-4o for capability-first tasks; o1/o3-mini for complex multi-step reasoning that benefits from extended chain-of-thought; Phi models (SLMs) for edge deployments, cost constraints, or latency-sensitive scenarios.
Common Traps
Confusing Pairs
Scenario Tips
When the question presents a knowledge base that is updated daily and asks how to keep AI responses current without retraining
Implement RAG with Azure AI Search. Re-index updated documents when the knowledge base changes. The model retrieves fresh content at query time without any model retraining.
Fine-tuning the model on a schedule — this is expensive, slow, and cannot keep pace with daily updates. Fine-tuning is the consistently wrong answer for dynamic knowledge base scenarios.
When the question asks which agent tool to use when the agent needs to run data analysis on a spreadsheet uploaded by a user
Code Interpreter — it allows the agent to write and execute Python code in a sandboxed environment to analyze uploaded data and return computed results.
Function calling — this invokes external APIs, not self-contained computation on uploaded files.
When the question describes needing a pipeline with formal versioning, evaluation runs, and reproducible execution for a specific workflow
Use Prompt Flow — it provides versioned, evaluated, deployable pipelines with formal evaluation artifacts and deterministic step execution.
Foundry Agent Service — agents are autonomous and not suitable when you need deterministic, versioned execution with formal evaluation artifacts.
When the question asks which retrieval approach to use for a production RAG system where both keyword precision and semantic understanding matter
Hybrid search (keyword BM25 + vector HNSW) combined with semantic ranker re-ranking. This is the recommended production RAG retrieval pattern in Microsoft Foundry.
Vector search alone — it misses exact keyword matches and is not the recommended production pattern when both precision and recall matter.
When the question asks about a multi-agent system and which pattern to use when one agent needs information from another agent's conversation context
Implement a coordination/orchestration layer that explicitly passes shared context between agents. Each agent's memory is isolated — a sub-agent cannot access another sub-agent's thread without an explicit handoff.
Agents automatically share memory — this is incorrect. Agent memory isolation is a core design principle. Shared context requires explicit design.
When the question asks which model to choose for an edge deployment with strict cost and latency constraints
Phi models (SLMs) — Microsoft's small language models designed for efficiency, low cost, and edge deployment while maintaining strong task performance.
GPT-4o — the most capable model but far too large and expensive for edge deployment scenarios.
Last-Minute Facts
Implement Computer Vision Solutions
Must-Know Facts
- DALL-E operations: text-to-image (generate from prompt), inpainting (fill in a masked/damaged region of an existing image), variation (generate variations of an existing image), and mask-based editing (modify a precisely specified region while preserving everything outside the mask).
- Content Understanding has two modes: single-task (standard) mode and pro mode. Single-task mode supports ALL content types — documents, images, audio, and video — with lower cost and latency. Pro mode supports DOCUMENTS ONLY but adds multi-step reasoning, multi-input document support, and reference data integration. This distinction is a recurring exam trap.
- GPT-4o with Vision and GPT-4 Vision perform visual question answering — they can answer questions grounded in the content of submitted images. This is a multimodal capability distinct from traditional image classification.
- Accessibility-compliant alt-text must describe the function and context of the image, not just visual appearance. Accessibility alt-text is more than a generic caption — it must convey meaning suitable for screen readers and align with WCAG guidelines.
- Content Understanding pipelines for video analysis can extract: transcriptions, scene segmentation, object detection in frames, and topic identification. This is a different use case than Azure Video Indexer.
- Indirect prompt injection via embedded image text: malicious instructions hidden as text within uploaded images can be processed by multimodal models as if they were trusted instructions. Prompt Shields must scan image text to detect this attack vector.
- Image generation parameters: resolution (e.g., 1024x1024), quality (standard vs. HD), style (natural vs. vivid), and number of images generated per request.
- Custom Vision is the legacy image classification service. Azure AI Vision (Image Analysis 4.0) is the current preferred service for classification, object detection, captioning, and OCR on images.
Common Traps
Confusing Pairs
Scenario Tips
When the question asks about a marketing team that needs to replace a specific product in an existing AI-generated image while keeping the background and other elements intact
Use mask-based editing — define a mask over the product region, then use DALL-E to generate new content only in that masked area while preserving everything outside.
Generating a completely new image with a modified prompt — this does not preserve the original composition and produces a different image.
When the question asks about detecting malicious instructions in user-uploaded images that contain visible text
Enable Prompt Shields with indirect prompt injection detection. The shield scans image text content as part of the content safety pipeline to catch embedded malicious instructions.
File size limits, rate limiting, or image resolution restrictions — these do not address the content of embedded text.
When the question asks about a healthcare application generating image descriptions for visually impaired users
Configure extended image descriptions aligned to accessibility guidelines (WCAG). These provide contextual, function-describing alt-text suitable for screen readers — not just visual appearance descriptions.
Standard captions — too brief and focused on appearance rather than meaning and context. Wrong for accessibility use cases.
When the question asks which Content Understanding mode to use for a pipeline that extracts information from video recordings
Single-task (standard) mode — it supports video content. Pro mode does NOT support video.
Pro mode — a very common trap because 'pro' sounds more capable, but pro mode is documents-only.
Last-Minute Facts
Implement Text Analysis Solutions
Must-Know Facts
- Azure AI Language provides standardized NLP capabilities: named entity recognition (NER), sentiment analysis, key phrase extraction, language detection, personally identifiable information (PII) detection, and abstractive summarization.
- LLM-based text analysis (prompting GPT-4o for NLP tasks) is more flexible and handles nuanced complex tasks, but is significantly more expensive than Azure AI Language for high-volume, standardized extraction at scale.
- Structured JSON output from LLMs requires either JSON mode (response_format parameter set to json_object) or an explicit output schema defined in the system prompt with examples. LLMs do not produce structured output automatically.
- Azure Translator supports 100+ languages with deterministic, consistent translation and custom glossaries for domain-specific product names and technical terms. It is the correct choice when terminology consistency across large volumes of requests is required.
- LLM-powered translation (prompting GPT-4o to translate) is more flexible for rare language pairs or nuanced localization, but does not guarantee consistent terminology translation across requests.
- Speech-to-text (STT) and text-to-speech (TTS) are both required for a complete voice-enabled agent interaction. STT converts spoken input to text for the agent to process. TTS converts the agent's response back to audio for the user. Without TTS, the agent can receive voice input but cannot produce spoken responses.
- Custom speech models require actual domain-specific training data: audio recordings paired with accurate transcriptions. They are not a configuration toggle — training data and a training job are required.
- Speech translation = convert spoken audio in one language to text or speech in another language, integrating STT and translation in one pipeline. This is distinct from text-only Azure Translator.
Common Traps
Confusing Pairs
Scenario Tips
When the question presents a company processing thousands of customer reviews daily for sentiment and entity extraction at minimal cost
Use Azure AI Language (sentiment analysis + named entity recognition). Purpose-built service, cheaper per call than GPT-4o, consistent output format — the right tool for high-volume standardized NLP.
GPT-4o for analysis — capable but significantly more expensive per call at scale and unnecessary when standardized NLP tasks suffice.
When the question asks about enabling a customer service agent to conduct full bidirectional voice conversations
Integrate Azure AI Speech for both STT (convert spoken customer input to text for the agent) and TTS (convert agent text responses to audio for the customer). Both directions are required.
Azure AI Language — handles text understanding and NLP, not audio input/output conversion.
When the question asks about translating customer support chats across 50+ languages while ensuring product names are always translated consistently
Azure Translator with a custom glossary — glossaries enforce consistent translation of specific terms across all requests. Deterministic translation at scale.
GPT-4o translation — creative and flexible, but will not guarantee that product names are translated the same way across thousands of requests.
When the question asks how to get an LLM to return a structured response with specific fields for downstream processing
Enable JSON mode (response_format: json_object) or define the explicit JSON schema in the system prompt with examples. The model requires explicit schema guidance to produce structured output reliably.
Just asking the model nicely to return JSON in the user message — without explicit format specification, output format is not guaranteed and will vary.
Last-Minute Facts
Implement Information Extraction Solutions
Must-Know Facts
- RAG ingestion pipeline order: document ingestion → OCR (for scanned content) → layout analysis → text extraction → chunking → embedding generation → index population. Every step must succeed for retrieval to work. Scanned PDFs have no native text layer — OCR is not optional.
- Vector search requires pre-computed embeddings. Raw text cannot be vector-searched directly. Embeddings must be generated during ingestion and stored in the index alongside source content. A RAG pipeline that skips embedding generation cannot support vector search.
- Enrichment skills in Azure AI Search run during indexing (via skillsets attached to indexers), NOT at query time. Adding a new skill to an existing skillset does NOT retroactively process already-indexed documents — a full reindex or indexer reset is required to apply the new skill to all documents.
- Integrated vectorization in Azure AI Search: configure an embedding model to automatically generate vectors during indexing. Eliminates external embedding pipeline management.
- Incremental indexing = re-processes only changed, added, or deleted documents since the last run. Full reindexing = rebuilds the entire index from scratch. Use incremental for routine updates; use full reindexing when the index schema changes, enrichment skills change, or the embedding model changes.
- Document Intelligence prebuilt models cover common document types: invoice, receipt, ID document, business card, W-2, 1098, pay stub, health insurance card. Use prebuilt when the document type matches — no training required.
- Document Intelligence custom models require labeling examples of your proprietary document format and a training job. They are necessary only for non-standard document types that prebuilt models do not cover.
- Content Understanding generates markdown output by default (preserving document structure). If downstream agents or pipelines need structured typed fields, the analyzer output format must be explicitly configured for JSON.
- Foundry IQ is a knowledge layer built on top of Azure AI Search — it adds agentic retrieval, permission-aware multi-source knowledge bases, and automated chunking and embedding for agents. Foundry IQ is NOT the same service as Azure AI Search.
- Document Intelligence applies to documents only (PDFs, images of forms, office documents). For video or audio content extraction, use Content Understanding (single-task mode supports all content types) or Azure AI Speech.
Common Traps
Confusing Pairs
Scenario Tips
When the question describes indexing thousands of scanned PDFs with handwritten notes and tables for a RAG pipeline
Configure an Azure AI Search indexer with an OCR skill (for scanned text), layout analysis skill (for tables and structure), text split skill (for chunking), and an embedding skill (for vector generation). All four are required for complete RAG ingestion of scanned PDFs.
Text-only extraction — scanned PDFs have no native text layer. Without OCR, the indexer extracts nothing from the pages.
When the question asks about extracting vendor name, invoice number, line items, and total from invoices with widely varying formats
Use the Document Intelligence prebuilt invoice model — it is specifically designed to handle varying invoice layouts and reliably extract standard invoice fields without any custom training.
A custom Document Intelligence model — requires labeling and training time, and is unnecessary since the prebuilt invoice model already covers standard invoice extraction.
When the question asks how to connect an agent to a product manual knowledge base so it can answer customer support queries
Index the manuals in Azure AI Search, then connect the index as an Azure AI Search agent tool (or via Foundry IQ knowledge base). The agent retrieves relevant sections at query time dynamically.
Embedding all manual content in the agent's system prompt — system prompts have context window limits and cannot accommodate large document collections. This approach fails at any non-trivial scale.
When the question asks what happens when you add a new enrichment skill to an existing Azure AI Search skillset that already has indexed documents
The new skill applies only to documents indexed after the change. To apply the skill to all existing documents, trigger a full reindex or reset the indexer. Enrichment skills do not retroactively process already-indexed content.
The skill automatically applies to all existing documents — enrichment skills are applied during the indexing job, not retroactively.
When the question asks about extracting structured data from audio recordings for a downstream agent pipeline
Use Content Understanding single-task mode — it supports audio as a content type and can produce structured output from audio recordings. Document Intelligence does not process audio.
Document Intelligence — it does not process audio content. This is a trap question specifically testing knowledge of which service handles which content type.