CertPrepNow
MicrosoftAI-300Updated 2026-06-13

AI-300 Study Guide

Everything you need to pass the Microsoft Certified: Machine Learning Operations Engineer Associate exam. Structured study plans, key services, common traps, and practice questions.

You Can Pass This Exam For Free

The AI-300 exam is passable with free resources if you study consistently for 6-10 weeks with hands-on Azure practice:

  • Microsoft Learn official study guide and learning paths for AI-300 (free)
  • Azure Machine Learning documentation and tutorials (free)
  • Microsoft Foundry (Azure AI Foundry) documentation (free)
  • MLflow official documentation and Azure ML integration guides (free)
  • Azure free account with $200 credit for hands-on labs
  • GitHub Actions documentation for CI/CD pipeline automation (free)
  • 500+ free practice questions on this site

This exam is heavily hands-on and scenario-based. While documentation covers the theory, you should use the Azure free account to build actual ML pipelines, deploy models to endpoints, and configure monitoring. Practical experience with Azure ML workspaces, MLflow tracking, and Microsoft Foundry is essential.

Choose Your Study Path

You have general programming or data science experience but limited hands-on Azure or MLOps knowledge. You need to build foundational cloud ML skills before tackling operations.

Week 1-2Learn Azure Machine Learning fundamentals: create a workspace, understand compute targets (compute instances, compute clusters, serverless compute), datastores, and data assets. Practice through the Azure portal and CLI
Week 3Study MLflow integration with Azure ML: experiment tracking, model logging, model registry, and how Azure ML workspaces act as MLflow tracking servers. Run a simple training experiment with MLflow tracking
Week 4Learn model training orchestration: training pipelines, components, environments, automated ML (AutoML), hyperparameter tuning, and distributed training concepts
Week 5Study model deployment: managed online endpoints vs batch endpoints, progressive rollout strategies, safe rollback, and endpoint testing and troubleshooting
Week 6Learn production monitoring: data drift detection, prediction drift, model performance metrics, alert triggers, and retraining workflows using Azure Event Hubs or Logic Apps
Week 7Deep dive into GenAIOps: Microsoft Foundry project setup, foundation model deployment (serverless API vs managed compute), provisioned throughput units, prompt versioning with Git
Week 8Study GenAI quality assurance: evaluation metrics (groundedness, relevance, coherence, fluency), risk and safety evaluations, test datasets, and custom evaluation workflows
Week 9Learn RAG optimization: chunk sizes, overlap strategies, hybrid search (semantic + keyword), embedding model selection, and fine-tuning methods including synthetic data generation
Week 10Infrastructure as Code: learn Bicep templates and Azure CLI for deploying ML workspaces, GitHub Actions workflows for automation, and network security configurations. Take full practice exams
Week 11-12Take multiple full mock exams, review all incorrect answers. Focus on weak areas, especially Domain 2 (28% weight) and Domain 3 (24% weight). Schedule your real exam when scoring 75%+

Exam Overview

Format

40-60 questions, 120 minutes. Multiple choice, multiple select, drag-and-drop, and case study scenarios. May include interactive lab components.

Scoring

Scaled score 100-1000. Passing: 700. No penalty for wrong answers — always answer every question even if unsure.

Domains & Weights

  • Design and Implement an MLOps Infrastructure18%
  • Implement Machine Learning Model Lifecycle and Operations28%
  • Design and Implement a GenAIOps Infrastructure24%
  • Implement Generative AI Quality Assurance and Observability15%
  • Optimize Generative AI Systems and Model Performance15%

Registration

$165 USD. Available at Pearson VUE testing centers or online proctored from home. Exam fee is $165 USD. AI-300 replaces the retiring DP-100 exam.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these services and concepts deeply, know their configurations, and be able to apply them in scenario-based questions. These appear across multiple domains.
Tier 2: Should KnowUnderstand these services and their key configurations. May appear in 2-5 questions each.
Tier 3: Recognize OnlyKnow what these are at a high level. Rarely more than 1-2 questions each.
Domain 118% of exam

Design and Implement an MLOps Infrastructure

This domain covers setting up the foundational Azure ML infrastructure for MLOps. You must know how to create and manage workspaces, compute targets, datastores, data assets, environments, and components. Also covers Infrastructure as Code using Bicep and Azure CLI, GitHub integration, and network security for ML workspaces.

Key Topics

Azure ML WorkspaceCompute TargetsDatastoresData AssetsEnvironmentsComponentsRegistriesBicepAzure CLIGitHub Actions

Must-Know Concepts

  • How to create and manage Azure ML workspaces including identity and access management (RBAC, managed identities)
  • Compute target types: compute instances (development), compute clusters (training), serverless compute, and inference compute — know when to use each
  • Datastores connect to Azure storage services (Blob, ADLS, SQL) without exposing credentials. Data assets are versioned references to data in datastores
  • Environments encapsulate Python packages and Docker images for reproducible training and deployment. Can be curated or custom, and are versioned
  • Components are reusable pipeline building blocks with defined inputs, outputs, code, and environment. Shared across workspaces via registries
  • Azure ML Registries enable cross-workspace sharing of models, environments, components, and data assets across an organization
  • Deploying workspaces and resources using Bicep templates and Azure CLI commands
  • GitHub integration for source control and GitHub Actions workflows for automating resource provisioning
  • Network security: restricting workspace access with private endpoints, VNets, and managed network isolation

Common Exam Traps

Datastores are NOT the same as data assets. Datastores define the CONNECTION to storage. Data assets define the specific DATA within a datastore
Compute instances are for interactive development (notebooks). Compute clusters are for training jobs. Do not confuse when each is appropriate
Registries share assets ACROSS workspaces. They are not the same as the workspace-level model registry which is local to a single workspace
Bicep templates are declarative — they define desired state. You do not write imperative provisioning scripts in Bicep. Azure CLI is the imperative alternative
GitHub Actions workflows for ML require secure authentication to Azure. The exam tests GitHub integration configuration, not just workflow YAML
Quick Check: Design and Implement an MLOps Infrastructure

Question 1 of 3

A team has ML models, environments, and components in Workspace A that need to be shared with Workspace B in a different Azure region. What should they use?

Domain 228% of exam

Implement Machine Learning Model Lifecycle and Operations

The heaviest domain at 28%. Covers the full ML model lifecycle from training through deployment to production monitoring. You must know MLflow tracking, AutoML, hyperparameter tuning, distributed training, model registration, endpoint deployment (online and batch), progressive rollout, data drift detection, and retraining triggers.

Key Topics

MLflowAutoMLTraining PipelinesModel RegistryManaged Online EndpointsBatch EndpointsData Drift MonitoringResponsible AI

Must-Know Concepts

  • MLflow experiment tracking: configure tracking URI, log metrics, log artifacts, compare runs across experiments, and use the MLflow UI in Azure ML Studio
  • AutoML for exploring optimal models: configure classification, regression, and time-series tasks with automated feature engineering and algorithm selection
  • Hyperparameter tuning: configure sweep jobs with search spaces, sampling methods (random, grid, Bayesian), early termination policies, and primary metrics
  • Training pipelines: compose multi-step pipelines using components, configure data flow between steps, and schedule pipeline runs
  • Distributed training approaches for large models: data parallelism (split data across GPUs) and model parallelism (split model across GPUs)
  • MLflow model registration: register models with versioning, package feature retrieval specifications, manage model lifecycle stages (staging, production, archived)
  • Responsible AI evaluation: use the Responsible AI Dashboard for fairness, interpretability, error analysis, and causal inference assessment
  • Managed online endpoint deployment: deploy models as REST APIs, configure traffic splitting between deployments, implement blue-green and progressive rollout
  • Batch endpoint deployment: configure parallel inference on large datasets, manage compute allocation, and troubleshoot batch jobs
  • Data drift monitoring: configure monitoring signals (data drift, prediction drift, data quality, feature attribution drift), set thresholds, and configure alert triggers
  • Retraining triggers: connect monitoring alerts to Event Hubs, Azure Functions, or Logic Apps to trigger automated retraining pipelines

Common Exam Traps

MLflow models registered in Azure ML can be deployed DIRECTLY to managed endpoints. You do not need to convert them to a different format first
Progressive rollout with managed online endpoints uses traffic SPLITTING — you route a percentage of traffic to the new deployment, not deploy to a subset of instances
Data collection for monitoring is AUTOMATIC with online endpoints but requires MANUAL setup for batch endpoints
AutoML does NOT replace the need for data preparation. It automates model selection and hyperparameter tuning, but data quality still matters
Archiving a model in the registry does NOT delete it. It marks it as no longer recommended for use while preserving the artifact for compliance
Quick Check: Implement Machine Learning Model Lifecycle and Operations

Question 1 of 4

An ML engineer deploys a new model version to a managed online endpoint and wants to gradually increase traffic from 10% to 100% while monitoring for errors. Which deployment strategy should they use?

Domain 324% of exam

Design and Implement a GenAIOps Infrastructure

This domain covers setting up and managing the infrastructure for generative AI operations using Microsoft Foundry. You must know how to create Foundry environments, configure security (RBAC, managed identities, networking), deploy foundation models (serverless vs managed compute), configure provisioned throughput, and implement prompt versioning with source control.

Key Topics

Microsoft FoundryServerless API EndpointsManaged ComputeProvisioned Throughput UnitsBicep TemplatesRBACManaged IdentitiesPrompt Versioning

Must-Know Concepts

  • Creating and configuring Microsoft Foundry resources and project environments, including hub-and-project architecture
  • Identity and access management: managed identities for secure authentication, RBAC for granular permissions on Foundry resources
  • Network security: private endpoints, private networking configurations, and VNet integration for Foundry environments
  • Infrastructure as Code: deploying Foundry resources using Bicep templates and Azure CLI
  • Foundation model deployment options: serverless API endpoints (pay-as-you-go, no GPU management) vs managed compute (dedicated resources, more control)
  • Selecting appropriate models for specific use cases from the Foundry model catalog
  • Model versioning and production deployment strategies for foundation models
  • Provisioned throughput units (PTUs) for guaranteeing consistent performance on high-volume workloads
  • Prompt design and development: creating prompts, building prompt variants, and comparing performance across variants
  • Prompt version control using Git repositories for tracking changes and enabling collaboration

Common Exam Traps

Serverless API endpoints support global, data zone, and regional deployment scopes. Global Standard routes across worldwide infrastructure; Data Zone restricts to a geographic boundary; Regional pins to a specific region for compliance
PTUs guarantee throughput but require upfront commitment. Pay-as-you-go serverless is more flexible but does not guarantee capacity during high demand
Microsoft Foundry is the current platform name — it replaced Azure AI Studio and Azure AI Foundry (classic). The exam references Microsoft Foundry; documentation may still use the older name in some places
Prompt versioning with Git is about tracking PROMPT changes, not model changes. Model versioning and prompt versioning are separate concerns
Managed compute deployments integrate with the full MLOps lifecycle. Serverless deployments are simpler but offer less pipeline integration
Quick Check: Design and Implement a GenAIOps Infrastructure

Question 1 of 3

A company needs to deploy a foundation model for a customer-facing chatbot that handles 50,000 requests per hour with guaranteed response times. Which deployment configuration should they use?

Domain 415% of exam

Implement Generative AI Quality Assurance and Observability

This domain covers evaluating and monitoring generative AI applications in production. You must know how to create test datasets, implement AI quality metrics (groundedness, relevance, coherence, fluency), configure risk and safety evaluations, set up automated evaluation workflows, and implement comprehensive observability including performance, cost, and debugging capabilities.

Key Topics

Evaluation MetricsTest DatasetsRisk and Safety EvaluationsObservabilityTracingToken MonitoringLatency Tracking

Must-Know Concepts

  • Creating test datasets and data mapping for comprehensive evaluation of GenAI applications and agents
  • AI quality metrics: groundedness (factual accuracy based on source data), relevance (response addresses the query), coherence (logical consistency and flow), fluency (natural language quality)
  • Risk and safety evaluations: detecting harmful content, bias, and policy violations in model outputs
  • Automated evaluation workflows using built-in metrics and custom evaluation metrics
  • Continuous monitoring in Microsoft Foundry: setting up dashboards, configuring alerts, and tracking trends
  • Performance metrics: latency (time to first token, total response time), throughput (requests per second), and response time distribution
  • Cost metrics: token consumption (input tokens, output tokens), resource usage, and cost allocation across projects
  • Logging, tracing, and debugging: capturing detailed request/response logs, distributed tracing for multi-step GenAI applications, and debugging production issues

Common Exam Traps

Groundedness measures whether the response is factually supported by the provided SOURCE DATA, not general knowledge. A fluent and coherent response can still be ungrounded
Evaluation metrics can be computed on BOTH test datasets and production traffic. The exam tests whether you know when to use each approach
Token consumption tracks BOTH input tokens (prompt) and output tokens (response). Cost optimization may require reducing either or both
Tracing is especially important for multi-step GenAI applications (agents, RAG pipelines) where you need to identify which step caused an issue
Safety evaluations are separate from quality evaluations. A high-quality response can still be unsafe, and vice versa
Quick Check: Implement Generative AI Quality Assurance and Observability

Question 1 of 3

A GenAI application produces responses that are grammatically correct, well-structured, and directly address the user's question, but contain facts not supported by the underlying knowledge base. Which evaluation metric would detect this issue?

Domain 515% of exam

Optimize Generative AI Systems and Model Performance

This domain covers optimizing both RAG systems and foundation models for production performance. You must know how to tune retrieval parameters (chunk sizes, similarity thresholds, search strategies), select and fine-tune embedding models, implement hybrid search, evaluate RAG quality, and implement advanced fine-tuning with synthetic data generation.

Key Topics

RAG OptimizationChunk Size TuningHybrid SearchEmbedding ModelsFine-TuningSynthetic DataA/B Testing

Must-Know Concepts

  • RAG retrieval optimization: tuning similarity thresholds to balance precision and recall, adjusting chunk sizes and overlap for optimal context windows
  • Chunk size strategies: smaller chunks (e.g., 512 tokens) provide more precise retrieval, larger chunks provide more context but dilute relevance. Overlap prevents information loss at boundaries
  • Embedding model selection: choosing models optimized for your domain, fine-tuning embeddings for domain-specific vocabulary and concepts
  • Hybrid search: combining semantic search (vector-based) with keyword search (BM25) to capture both conceptual similarity and exact matches
  • Evaluating RAG performance: using relevance metrics, A/B testing frameworks, and systematic parameter sweeps to find optimal configurations
  • Fine-tuning methods: supervised fine-tuning on labeled data, parameter-efficient fine-tuning (LoRA, QLoRA), and instruction tuning
  • Synthetic data generation: creating training data for fine-tuning when real labeled data is scarce, using LLMs to generate diverse examples
  • Monitoring fine-tuned model performance: comparing against base models, tracking quality degradation, and managing fine-tuned models through the development-to-production lifecycle

Common Exam Traps

Larger chunk sizes do NOT always improve RAG quality. They increase context but dilute retrieval precision. The optimal size depends on query patterns and data characteristics
Hybrid search is almost always better than pure vector or pure keyword search alone. The exam expects you to know this
Fine-tuning changes model BEHAVIOR (style, format, task specialization). RAG adds KNOWLEDGE. If you need to add new facts, use RAG. If you need to change how the model responds, use fine-tuning
Synthetic data for fine-tuning must be diverse and representative. Low-quality synthetic data can degrade model performance rather than improve it
A/B testing for RAG configurations requires holding the LLM constant while varying retrieval parameters to isolate the impact of retrieval changes
Quick Check: Optimize Generative AI Systems and Model Performance

Question 1 of 3

A RAG application retrieves relevant documents but the LLM often generates responses that include information NOT present in the retrieved documents. What should the team optimize first?

Services and Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

Managed Online Endpoints vs Batch Endpoints

Use Managed Online Endpoints when…

Deploy models for real-time, low-latency inference via REST APIs. Always-on with auto-scaling, load balancing, and blue-green deployment support. Best for interactive applications.

Use Batch Endpoints when…

Run inference on large datasets in parallel batches. No always-on infrastructure required. Best for bulk scoring, scheduled processing, and offline predictions.

Exam trap

Online endpoints support progressive rollout with traffic splitting. Batch endpoints do not. Data collection for monitoring is automatic with online endpoints but requires manual configuration for batch endpoints.

Serverless API Endpoints (MaaS) vs Managed Compute Deployments

Use Serverless API Endpoints (MaaS) when…

Deploy foundation models from the Foundry catalog without managing GPU infrastructure. Pay-as-you-go billing based on token usage. Supports global, data zone, and regional deployment scopes. Quick to set up.

Use Managed Compute Deployments when…

Deploy models with dedicated compute resources you manage. More control over infrastructure, supports custom containers, and integrates with full MLOps lifecycle. Higher upfront commitment.

Exam trap

Serverless is Models-as-a-Service with no GPU management. Managed compute gives more control but requires infrastructure planning. Use provisioned throughput units (PTUs) on serverless for guaranteed high-volume performance.

MLflow Model Registry vs Azure ML Model Registry

Use MLflow Model Registry when…

Open-source model registry integrated into Azure ML workspaces. Register, version, and manage models using MLflow APIs. Portable across MLflow-compatible platforms.

Use Azure ML Model Registry when…

Azure-native model registry with additional features like responsible AI evaluation, deployment to managed endpoints, and cross-workspace sharing via registries.

Exam trap

In Azure ML, both are interconnected — registering an MLflow model also registers it in the Azure ML model registry. The exam tests whether you know the MLflow API for registration vs the Azure ML SDK approach.

Data Drift vs Prediction Drift

Use Data Drift when…

Detects when the statistical distribution of production INPUT data changes compared to training data. Indicates the real-world data no longer matches what the model was trained on.

Use Prediction Drift when…

Detects when the distribution of model PREDICTIONS changes over time, even if input data appears stable. May indicate concept drift or model degradation.

Exam trap

Data drift monitors INPUT changes. Prediction drift monitors OUTPUT changes. Both are monitoring signals in Azure ML, but they detect different problems. A model can have prediction drift without data drift if the underlying relationships change.

Azure ML Environments vs Azure ML Components

Use Azure ML Environments when…

Define the software dependencies (Python packages, Docker base images, conda specs) needed to run training scripts or deploy models. Versioned and reproducible.

Use Azure ML Components when…

Reusable, versioned pipeline steps that combine code, environment, and interface definitions (inputs/outputs). Building blocks for composing ML pipelines.

Exam trap

Environments define WHAT software is installed. Components define WHAT code runs and HOW it connects in a pipeline. A component references an environment but they are separate assets.

RAG (Retrieval-Augmented Generation) vs Fine-Tuning

Use RAG (Retrieval-Augmented Generation) when…

Augments LLM prompts with external data retrieved at query time from vector stores or search indexes. Does NOT modify model weights. Best for frequently changing or private data.

Use Fine-Tuning when…

Modifies model weights by training on domain-specific data. Changes model behavior, style, or task performance permanently. Best for specialized tasks or improving output format.

Exam trap

RAG adds KNOWLEDGE without changing the model. Fine-tuning changes the MODEL without necessarily adding knowledge. The exam tests when each is appropriate and how to optimize each approach.

Bicep Templates vs GitHub Actions Workflows

Use Bicep Templates when…

Declarative Infrastructure as Code for DEFINING Azure resources. Describes the desired state of ML workspaces, compute, networking, and Foundry resources.

Use GitHub Actions Workflows when…

CI/CD AUTOMATION that EXECUTES deployments, training jobs, and model lifecycle operations. Orchestrates when and how Bicep templates and ML operations run.

Exam trap

Bicep defines WHAT to deploy. GitHub Actions defines WHEN and HOW to deploy it. They work together: GitHub Actions workflows typically execute Bicep deployments as pipeline steps.

Top Mistakes to Avoid

Confusing datastores (connection to storage) with data assets (versioned references to specific data) — they are separate Azure ML concepts
Mixing up managed online endpoints (real-time REST APIs) with batch endpoints (offline bulk inference) — they have different deployment patterns and monitoring capabilities
Thinking serverless API endpoints and managed compute deployments are interchangeable — serverless is pay-as-you-go with no GPU management, managed compute gives more control but requires infrastructure planning
Confusing data drift (input distribution changes) with prediction drift (output distribution changes) — both are monitoring signals but detect different problems
Assuming RAG and fine-tuning serve the same purpose — RAG adds knowledge at query time without changing the model, fine-tuning modifies the model's behavior permanently
Not understanding the difference between Azure ML Environments (software dependencies) and Components (reusable pipeline steps) — environments define WHAT is installed, components define WHAT code runs
Thinking Bicep templates and GitHub Actions serve the same role — Bicep defines infrastructure declaratively, GitHub Actions automates WHEN and HOW to deploy it
Confusing groundedness (supported by source data) with relevance (addresses the query) in GenAI evaluation — a response can be relevant but ungrounded, or grounded but irrelevant
Forgetting that provisioned throughput units (PTUs) are needed for guaranteed performance on serverless endpoints — pay-as-you-go does not guarantee capacity
Assuming AutoML eliminates the need for data preparation and feature engineering — AutoML automates model selection and tuning, not data quality

Exam-Ready Checklist

Can explain all 5 exam domains and their relative weights (18%, 28%, 24%, 15%, 15%)
Know how to create and configure Azure ML workspaces including compute targets, datastores, data assets, environments, and components
Can deploy ML infrastructure using Bicep templates and automate with GitHub Actions workflows
Understand MLflow integration with Azure ML: experiment tracking, model logging, model registration, and deployment
Can configure AutoML, hyperparameter tuning, and training pipelines for model development
Know the difference between managed online endpoints and batch endpoints, including progressive rollout strategies
Can configure data drift monitoring, set alert thresholds, and connect to retraining triggers via Event Hubs or Logic Apps
Understand Microsoft Foundry architecture: project setup, model deployment options (serverless vs managed compute), and PTUs
Can implement prompt versioning with Git and compare prompt variant performance
Know all four GenAI quality metrics (groundedness, relevance, coherence, fluency) and when each applies
Can configure observability including latency tracking, token consumption monitoring, distributed tracing, and cost metrics
Understand RAG optimization: chunk sizes, overlap strategies, similarity thresholds, hybrid search, and embedding model selection
Can explain when to use fine-tuning vs RAG and how to create synthetic data for fine-tuning
Scored 70%+ on at least two full mock exams (700/1000 passing score)

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Frequently Asked Questions