CertPrepNow
MicrosoftAI-10371 concepts

AI-103 Cheat Sheet

Quick reference for the Microsoft Certified: Azure AI Apps and Agents Developer Associate exam.

Foundry SDK — Client Setup and Authentication

pip install azure-ai-projects azure-identity
Install the Microsoft Foundry SDK (v2.2.0+) and Azure identity library required for keyless Entra ID authentication.
from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential with ( DefaultAzureCredential() as credential, AIProjectClient( endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], credential=credential ) as project_client, ):
Initialize AIProjectClient using managed identity (DefaultAzureCredential) — the recommended keyless pattern for production deployments.
with project_client.get_openai_client() as openai_client: response = openai_client.responses.create( model=os.environ["FOUNDRY_MODEL_NAME"], input="Your prompt here", ) print(response.output_text)
Get an authenticated OpenAI client from the project client to run Responses, Conversations, Evaluations, and Fine-Tuning operations.
FOUNDRY_PROJECT_ENDPOINT=https://<ai-services-name>.services.ai.azure.com/api/projects/<project-name>
Environment variable format for the Foundry project endpoint — find it on the Microsoft Foundry Project home page.
Managed Identity vs. API Keys
Managed identity authenticates automatically without storing secrets in code; use it for all production deployments. API keys are acceptable only for local development.
DefaultAzureCredential() resolution order
Tries: environment variables → workload identity → managed identity → Azure CLI → Visual Studio Code — ensuring seamless auth in both local dev and deployed environments.

Azure OpenAI Service — Model Deployment and Prompting

Deployment types: Standard, Provisioned-Managed, Global-Standard
Standard uses shared capacity (pay-per-token); Provisioned-Managed reserves dedicated throughput (PTUs); Global-Standard routes globally for highest availability.
temperature (0.0–1.0)
Controls response randomness — use 0.0–0.3 for factual/RAG tasks, 0.7–1.0 for creative generation.
top_p, frequency_penalty, presence_penalty, max_tokens
top_p limits token sampling pool; frequency_penalty reduces repetition of frequent tokens; presence_penalty encourages new topics; max_tokens caps response length.
Zero-shot / Few-shot / Chain-of-Thought
Zero-shot: no examples; few-shot: include input-output examples in the prompt; chain-of-thought: instruct the model to reason step-by-step before answering.
System prompt
Instruction block passed before user input that defines the model's role, constraints, format, and safety boundaries — processed with higher priority than the user turn.
LLM vs. SLM selection
Use LLMs (GPT-4o, o1) for broad reasoning and multimodal tasks; use SLMs (Phi family) for cost-efficient, latency-sensitive, or edge deployment scenarios.
Content filtering (Azure OpenAI)
Configured at the Azure OpenAI resource level — not per-prompt — and applies hate, violence, sexual, and self-harm category filters with configurable severity thresholds.

RAG Architecture — Ingestion and Retrieval Pipeline

RAG flow: documents → chunking → embedding → index → retrieval → prompt augmentation → LLM → response
Each stage is independently configurable: chunking strategy affects retrieval precision, embedding model determines semantic accuracy, retrieval config controls grounding quality.
Chunking strategies: fixed-size, sentence/paragraph, recursive
Fixed-size is simple and predictable; sentence/paragraph preserves semantic units; recursive splits by hierarchy (paragraph → sentence → word) for complex documents.
RAG vs. Fine-tuning
RAG retrieves data at inference time without modifying model weights — use it for dynamic, frequently updated knowledge; fine-tuning modifies model weights and requires retraining.
Hybrid search (Azure AI Search)
Runs keyword (BM25) and vector search in parallel; results are merged using Reciprocal Rank Fusion (RRF) — typically outperforms either method alone for RAG retrieval.
from azure.search.documents import SearchClient from azure.search.documents.models import VectorizedQuery vector_query = VectorizedQuery( vector=query_vector, k_nearest_neighbors=10, fields="DescriptionVector" ) results = client.search( search_text="your keyword query", vector_queries=[vector_query], top=10 )
Python SDK pattern for hybrid search combining keyword and vector queries against Azure AI Search.
Semantic ranking (queryType: semantic)
AI-powered reranking step applied AFTER hybrid retrieval that rescores results by meaning — set k=50 when combining with semantic ranking to provide sufficient input documents.
Vector search vs. semantic search
Vector search finds conceptually similar content using embedding similarity; semantic search reranks keyword results using AI understanding — hybrid search combines both approaches.
Embedding model selection for RAG
Use text-embedding-3-large for highest retrieval accuracy; text-embedding-3-small for cost/speed tradeoffs — the embedding model must be consistent across ingestion and query time.

Foundry Agent Service — Agents and Tool Calling

Agent = role + goals + memory + tools + constraints
An agent is defined by its assigned role, the goals it pursues, its conversation memory, the tool schemas it can call, and the behavioral constraints/approval workflows applied to it.
from azure.ai.projects.models import FunctionTool def get_order_status(order_id: str) -> str: return f"Order {order_id}: Shipped" tool = FunctionTool(functions={get_order_status})
Define a function tool using FunctionTool from azure.ai.projects.models — the agent calls this function when it determines the tool is needed to satisfy user intent.
Function calling vs. prompt injection
Function calling is an authorized, structured mechanism for agents to invoke external APIs; prompt injection is a malicious attack where user input overrides agent instructions — they are not the same.
Agent tools: Azure AI Search, Code Interpreter, File Search, OpenAPI, Bing Grounding, Azure Functions, MCP
Foundry Agent Service supports a broad tool catalog — agents autonomously select which tool to invoke based on the task, unlike Prompt Flow where the developer defines the sequence.
Foundry Agent Service vs. Prompt Flow
Agents are autonomous — they decide which tools to use and when; Prompt Flow pipelines are deterministic with developer-defined sequences — use agents for open-ended tasks, Prompt Flow for repeatable workflows.
Conversation memory
Tracks dialogue history across turns so the agent maintains context — memory is per-agent and must be explicitly shared via a coordination layer in multi-agent orchestration.
Autonomous vs. semi-autonomous (approval workflows)
Fully autonomous agents act without human approval — only appropriate when risk is low and safeguards are in place; semi-autonomous agents pause for human-in-the-loop approval on high-risk actions.

Multi-Agent Orchestration and Foundry IQ

Multi-agent orchestration pattern
A routing/supervisor agent directs user requests to specialized sub-agents (e.g., sales agent, support agent, billing agent) and coordinates shared context between them.
Agent-to-Agent (A2A) protocol
Preview capability in Foundry Agent Service enabling agents to invoke other agents as tools — provides structured inter-agent communication for complex orchestration workflows.
Foundry IQ (knowledge layer)
A managed knowledge layer BUILT ON TOP OF Azure AI Search that adds agentic retrieval, permission-aware multi-source knowledge bases, and automated chunking and embedding for agents — not the same as Azure AI Search itself.
Foundry IQ vs. Azure AI Search
Azure AI Search is the underlying retrieval infrastructure; Foundry IQ is the higher-level abstraction that wraps it with agentic retrieval, multi-source knowledge bases, and permission-aware responses for agents.
ReAct loop: Think → Act → Observe → Repeat
Standard agent reasoning pattern: the LLM decides which tool to use (Think), executes it (Act), processes the result (Observe), and repeats until the goal is achieved or max iterations are reached.
Shared context in multi-agent systems
Each agent has its own memory; a coordination layer is required to pass information between agents — without it, sub-agents cannot access each other's conversation history.

Computer Vision — Image Generation and Multimodal Understanding

DALL-E: text-to-image, inpainting, mask-based editing
DALL-E generates images from text prompts; inpainting fills in missing/selected areas; mask-based editing uses an explicit mask to target specific regions — inpainting and mask-based editing are distinct, not interchangeable.
Video generation from text prompts and reference media
Generate video clips from text descriptions or reference images using Azure AI video generation models — distinct from video analysis, which processes existing video content.
Caption generation: concise vs. detailed captions
Azure AI Vision supports concise captions (one sentence) and detailed captions (dense description) for single or multiple images — use detailed captions for accessibility-focused or RAG grounding scenarios.
Content Understanding: single-task (standard) mode vs. pro mode
Single-task mode supports ALL content types (documents, images, audio, video) with lower cost and latency; pro mode is documents-ONLY and adds multi-step reasoning, multi-input document support, and cross-file analysis.
Alt-text generation (accessibility)
Extended image descriptions for accessibility must follow WCAG guidelines — not just describe the image but convey meaning and context appropriate for screen readers.
Indirect prompt injection via image text
Malicious instructions can be embedded as text inside user-uploaded images — scan image text content for injected instructions before passing it to the model.
GPT-4 Vision / multimodal models
Multimodal models accept image and text inputs simultaneously — use for visual question answering, image captioning, and analyzing visual content to ground AI responses.
Video analysis: Content Understanding pipeline vs. Azure Video Indexer
Use Content Understanding pipelines for agentic video processing (transcription, segment extraction, structured output); use Azure Video Indexer for pre-built video insight extraction (faces, topics, keyframes) without custom pipeline configuration.

Text Analysis — Language, Speech, and Translation

Azure AI Language (Foundry Tool): entity extraction, sentiment, key phrases, language detection
Use Foundry Tools for high-volume, standardized text analysis tasks — more cost-effective than LLM-based analysis at scale for predictable extraction workloads.
LLM-based text analysis vs. Foundry Tools
LLM-based analysis is more flexible and handles complex nuanced tasks but is significantly more expensive; use Foundry Tools for high-volume standardized extraction at scale.
Structured JSON output from LLMs
Requires explicit schema definition in the prompt or API call — models do not automatically produce structured output without guidance specifying the expected JSON format.
Azure AI Speech: STT (speech-to-text) + TTS (text-to-speech)
A voice-enabled agent requires BOTH speech-to-text for input AND text-to-speech for output — STT alone does not create a complete voice interaction.
Custom speech models
Require training with domain-specific audio and text pair data — they are not simple configuration changes and are used for specialized vocabulary or accent handling.
Azure Translator vs. LLM-powered translation
Azure Translator provides deterministic, high-quality translation with custom terminology support across 100+ languages; LLM translation is more flexible but less consistent for standardized terminology.
Speech translation (Azure AI Speech)
Converts spoken audio directly into translated text or speech in another language — combines speech-to-text and translation in a single pipeline, distinct from text-only Azure Translator workflows.

Information Extraction — Document Intelligence and Indexing

Document Intelligence prebuilt models: invoice, receipt, ID, business card, W-2, health insurance card
Prebuilt models handle standard document types (invoices, receipts) without training; custom models are needed for proprietary document formats with unique layouts.
Document Intelligence: prebuilt vs. custom vs. composed models
Prebuilt: common document types out of the box; Custom: trained on your specific layouts; Composed: chains multiple custom models to handle varied document types in a single API call.
RAG ingestion for scanned PDFs: OCR → layout analysis → table extraction → embedding → index
Scanned PDFs require OCR to extract text — without it, the indexer cannot read image-based content; layout analysis preserves structure for tables and multi-column documents.
Content Understanding output: structured JSON vs. markdown
Configure the analyzer schema to produce structured JSON for typed field extraction or markdown output for downstream LLM reasoning — the output format depends on analyzer configuration.
Vector search requires pre-computed embeddings
You cannot perform vector search on raw text — documents must be converted to embedding vectors during ingestion before they can be queried by semantic similarity.
Connect Azure AI Search index as agent tool
Register the search index as an agent tool so the agent can dynamically retrieve relevant information during conversations — do not embed all content in the system prompt (exceeds token limits).
Enrichment skills: run at indexing time, not query time
Enrichment skills (OCR, language detection, entity extraction) execute during the indexing pipeline — for real-time processing of new content, a separate streaming pipeline is required.

Responsible AI — Safety, Guardrails, and Evaluation

Safety filters (input side) vs. guardrails (output side)
Safety filters inspect and block harmful prompts BEFORE they reach the model; guardrails constrain and validate model outputs AFTER generation — they operate on opposite sides of the model.
Content moderation configuration scope
Safety filters are configured at the Azure OpenAI resource level, not at the individual prompt level — a single resource can have multiple deployments each with different content filtering policies.
Foundry evaluators: fabrication, relevance, quality, safety
Run evaluators on RAG outputs to measure hallucination rate (fabrication), whether the response addresses the query (relevance), overall quality, and safety compliance.
Fabrication detection vs. guardrails
Fabrication detection (hallucination checking) is an evaluation step that measures quality after generation; guardrails are constraints that actively filter or modify outputs — they serve different purposes.
Trace logging and provenance metadata
Capture full execution traces (inputs, outputs, tool calls, latencies) and provenance metadata (which documents grounded each response) for auditability and debugging.
Agent governance: oversight modes and tool-access controls
Configure agent oversight mode (autonomous vs. semi-autonomous), restrict which tools agents can access, and define behavioral constraints to limit the scope of autonomous actions.
project_client.beta.red_teams.create(...)
Run automated adversarial (red team) scans against your generative AI application to identify safety risks and policy violations before production deployment.

Plan and Manage — Security, Monitoring, and CI/CD

RBAC roles for Foundry: Azure AI Developer, Cognitive Services User, Search Index Data Reader, Search Index Data Contributor
Assign the minimum required RBAC role — never use owner/contributor for application identities; use Azure AI Developer for Foundry project access with managed identity.
Private networking: private endpoints + VNet integration
Isolate Azure OpenAI, AI Search, and Foundry resources behind private endpoints to prevent public internet access — required for enterprise security and compliance deployments.
Foundry observability: tracing + token analytics + safety signals + latency breakdowns
Configure all four observability dimensions in Foundry for complete visibility — monitoring only Azure OpenAI metrics misses agent behavior, search quality, and safety signals.
Grounding quality monitoring vs. model performance monitoring
Grounding quality measures whether retrieved documents are relevant to the query; model performance measures generation accuracy — these are distinct metrics requiring separate monitoring.
Quota management: token quotas, rate limits, PTU scaling
Manage TPM (tokens per minute) and RPM (requests per minute) quotas per deployment; use provisioned throughput (PTUs) for predictable workloads requiring guaranteed capacity.
CI/CD integration with Foundry projects
CI/CD pipelines must connect at the Foundry project level — not just the individual service — to orchestrate model version promotion, prompt updates, and agent deployment across environments.
Model deployment options: serverless, managed compute, provisioned throughput
Serverless: pay-per-token with shared capacity; Managed compute: dedicated container instances; Provisioned throughput: reserved PTU capacity for predictable high-volume workloads.
Azure Key Vault for secret storage
Store API keys and connection strings in Key Vault rather than in code or environment files — reference them via Key Vault references in app configuration, not by reading the secret value at deploy time.

Ready to test yourself?

Start a timed AI-103 mock exam or review practice questions by domain.