CertPrepNow
MicrosoftAI-30075 concepts

AI-300 Cheat Sheet

Quick reference for the Microsoft Certified: Machine Learning Operations Engineer Associate exam.

Azure ML Workspace — Setup and CLI

az ml workspace create -n <workspace-name> -g <resource-group>
Create an Azure ML workspace using CLI v2 — the workspace is the top-level resource for all ML assets (compute, data, environments, models, endpoints).
az extension add -n ml az configure --defaults group=<rg> workspace=<ws> location=<loc>
Install the Azure ML CLI v2 extension and configure defaults to avoid repeating --workspace and --resource-group flags on every command.
az ml workspace show -n <workspace-name> -g <resource-group>
Display workspace details including MLflow tracking URI, associated storage, key vault, and container registry.
Workspace-level model registry vs. Azure ML Registry
Workspace model registry stores models scoped to ONE workspace; Azure ML Registry shares models, environments, and components ACROSS multiple workspaces organization-wide.
Compute types: Compute Instance / Compute Cluster / Serverless Compute / Inference Compute
Compute Instance is for interactive dev (notebooks); Compute Cluster is for scalable training jobs; Serverless Compute auto-provisions for jobs; Inference Compute backs managed endpoints.
Datastore vs. Data Asset
A Datastore defines the CONNECTION to Azure storage (Blob, ADLS, SQL) without exposing credentials; a Data Asset is a versioned REFERENCE to specific data within that datastore.
az ml data create --name mydata --version 1 --type uri_folder --path azureml://datastores/<ds>/paths/<folder>
Register a versioned data asset pointing to a folder in a registered datastore — data assets are the recommended way to reference training and evaluation data.

Infrastructure as Code — Bicep and GitHub Actions

resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = { name: workspaceName location: location identity: { type: 'SystemAssigned' } properties: { storageAccount: storageAccount.id keyVault: keyVault.id applicationInsights: appInsights.id containerRegistry: containerRegistry.id } }
Bicep resource definition for an Azure ML workspace with system-assigned managed identity — declarative IaC for reproducible workspace deployments.
az deployment group create --resource-group <rg> --template-file main.bicep --parameters @params.json
Deploy a Bicep template to create Azure ML infrastructure — use parameter files to manage environment-specific (dev/staging/prod) configurations.
Bicep vs. GitHub Actions role distinction
Bicep defines WHAT Azure resources to deploy (declarative desired state); GitHub Actions defines WHEN and HOW to execute deployments (CI/CD orchestration) — they work together, not as alternatives.
# .github/workflows/deploy-workspace.yml jobs: deploy: runs-on: ubuntu-latest steps: - uses: azure/login@v2 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - run: az deployment group create --template-file infra/main.bicep
GitHub Actions workflow snippet authenticating to Azure with OIDC (no stored secrets) before deploying Bicep templates for ML infrastructure.
GitHub Actions OIDC vs. service principal secret auth
OIDC federated credentials are preferred for GitHub Actions — they eliminate stored secrets and use short-lived tokens, unlike service principal client secrets which must be rotated manually.
Private endpoint + VNet isolation for workspace
Restrict workspace access by deploying private endpoints that route traffic through Azure VNet — public internet access to the workspace is disabled when private endpoints are enabled.

MLflow — Experiment Tracking and Model Registry

import mlflow mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()) mlflow.set_experiment("my-experiment") with mlflow.start_run(): mlflow.log_param("learning_rate", 0.01) mlflow.log_metric("accuracy", 0.95) mlflow.sklearn.log_model(model, artifact_path="model")
Configure MLflow tracking URI from the Azure ML workspace and log parameters, metrics, and a scikit-learn model artifact — all within a single tracked run.
mlflow.register_model( model_uri=f"runs:/{run.info.run_id}/model", name="my-registered-model" )
Register a trained model from a run into the MLflow model registry — in Azure ML this simultaneously registers the model in the Azure ML model registry.
Model lifecycle stages: None → Staging → Production → Archived
MLflow model registry tracks lifecycle stages — Archived marks the model as deprecated but does NOT delete it; the artifact is retained for compliance and rollback.
mlflow.autolog()
Enable automatic logging of parameters, metrics, and artifacts for supported frameworks (scikit-learn, TensorFlow, PyTorch) — reduces boilerplate MLflow instrumentation code.
MLflow tracking URI format: azureml:///<workspace-info>
Every Azure ML workspace exposes a unique MLflow tracking URI — point mlflow.set_tracking_uri() to this URI to store experiments directly in the workspace.
mlflow.evaluate(model_uri, data=test_df, targets="label", model_type="classifier")
Run MLflow model evaluation to compute classification metrics (accuracy, F1, ROC-AUC) against a test dataset — results are logged as run metrics for comparison.

Model Training — AutoML, Pipelines, and Hyperparameter Tuning

az ml job create --file sweep-job.yml # sweep-job.yml defines: search_space, sampling_algorithm, limits, objective
Submit a hyperparameter sweep job using CLI v2 — the YAML file defines the parameter search space, sampling method (random/grid/Bayesian), and early termination policy.
Sweep job sampling methods: random / grid / Bayesian
Random sampling is fastest for broad exploration; Grid exhaustively tests all combinations; Bayesian uses prior results to guide the search — Bayesian is most efficient when evaluation is expensive.
Early termination policies: Bandit / Median Stopping / Truncation Selection
Bandit terminates runs performing below a slack factor of the best run; Median Stopping cancels runs below the median primary metric; Truncation Selection cancels the lowest-performing X% each interval.
AutoML vs. manual hyperparameter sweep
AutoML explores algorithms AND hyperparameters automatically for classification/regression/time-series; a sweep job tunes hyperparameters of ONE fixed algorithm — AutoML does not replace data preparation.
Distributed training: data parallelism vs. model parallelism
Data parallelism splits the DATASET across GPUs and synchronizes gradients — for large datasets with models that fit on one GPU; model parallelism splits the MODEL across GPUs — for models too large for one GPU.
az ml component create --file train-component.yml az ml pipeline create --file pipeline.yml
Register a reusable component and compose it into a pipeline via YAML — components define inputs, outputs, code, and environment; pipelines chain components with defined data flow.
Environment types: curated (Microsoft-maintained) vs. custom
Curated environments come pre-built with common ML frameworks (sklearn, PyTorch, TensorFlow); custom environments let you specify Docker base images and conda dependencies — both are versioned.

Model Deployment — Online and Batch Endpoints

az ml online-endpoint create --name my-endpoint -g <rg> -w <ws> az ml online-deployment create --name blue --endpoint my-endpoint --file deployment.yml --all-traffic
Create a managed online endpoint and deploy a model to it with 100% traffic using the --all-traffic flag — the endpoint hosts the REST API URL.
az ml online-endpoint update --name my-endpoint \ --traffic "blue=90 green=10"
Split traffic between two deployments on the same endpoint for progressive rollout — gradually shift traffic from old (blue) to new (green) deployment while monitoring performance.
az ml online-endpoint update --name my-endpoint --traffic "blue=100"
Perform a safe rollback by routing 100% of traffic back to the stable deployment — the new deployment remains in place but receives no traffic until issues are resolved.
Managed online endpoints vs. Batch endpoints
Online endpoints are always-on REST APIs for low-latency real-time inference with auto-scaling and blue-green deployment; batch endpoints run parallel inference on large datasets with no always-on compute cost.
Data collection for monitoring: automatic (online) vs. manual (batch)
Online endpoints automatically collect input/output data for monitoring when data collection is enabled; batch endpoints require manual configuration to capture prediction data.
az ml batch-endpoint invoke --name my-batch-endpoint \ --input azureml:my-data-asset:1
Trigger a batch inference job by invoking the batch endpoint with an input data asset reference — the job runs across compute cluster nodes in parallel.

Production Monitoring — Drift Detection and Retraining Triggers

Data drift vs. Prediction drift
Data drift detects changes in the statistical distribution of INPUT features vs. training data; prediction drift detects changes in the OUTPUT distribution — a model can have prediction drift without data drift if the data-label relationship changes.
Monitoring signals: data drift / prediction drift / data quality / feature attribution drift
Configure all four monitoring signals for comprehensive production visibility — data quality checks for nulls and schema violations; feature attribution drift detects changes in which features drive predictions.
Retraining trigger pipeline: Model Monitor alert → Event Hubs / Logic Apps / Azure Functions → training pipeline
When a monitoring signal exceeds its configured threshold, the alert can trigger Azure Event Hubs, Logic Apps, or Azure Functions to launch an automated retraining pipeline.
Responsible AI Dashboard components: fairness / interpretability / error analysis / causal inference
The Responsible AI Dashboard in Azure ML Studio provides a unified view of model fairness, feature explanations, error distributions, and causal impact — use before production deployment.
Feature retrieval specification
A specification packaged with the model artifact that describes how to retrieve features from feature stores at inference time — enables consistent feature engineering between training and serving.
Reference dataset for drift monitoring
Set the training dataset as the reference dataset in model monitoring — all drift calculations compare production data distribution against this reference, not against previous production windows.

Microsoft Foundry — GenAIOps Infrastructure

Microsoft Foundry hub-and-project architecture
A Foundry hub is the top-level governance resource (shared compute, networking, security); projects are isolated workspaces under the hub for individual teams or applications to deploy models and build GenAI apps.
Serverless API (MaaS) vs. Managed Compute deployment
Serverless API is pay-as-you-go with no GPU management and regional deployment scope; managed compute provides dedicated GPU infrastructure with more control and full MLOps lifecycle integration.
Serverless deployment scopes: Global Standard / Data Zone / Regional
Global Standard routes requests across worldwide Microsoft infrastructure for highest availability; Data Zone restricts to a geographic boundary for data residency; Regional pins to a specific Azure region for compliance.
Provisioned Throughput Units (PTUs)
PTUs reserve a fixed amount of model processing capacity upfront — choose PTUs over pay-as-you-go serverless when you need guaranteed throughput and consistent latency for high-volume production workloads.
az cognitiveservices account deployment create \ --name <foundry-resource> \ --resource-group <rg> \ --deployment-name my-gpt4o \ --model-name gpt-4o \ --model-version 2024-08-06 \ --model-format OpenAI \ --sku-capacity 10 \ --sku-name GlobalStandard
Deploy a foundation model to Microsoft Foundry using Azure CLI — sku-name specifies the deployment scope (GlobalStandard, DataZoneStandard, or Standard for regional).
Managed identity + RBAC for Foundry resources
Use system-assigned or user-assigned managed identities for credential-free authentication to Foundry resources — assign granular RBAC roles (e.g., Azure AI Developer) rather than owner/contributor.
Prompt versioning with Git repositories
Store prompts in Git repositories within Microsoft Foundry to track prompt changes, create and compare variants, and enable team collaboration — prompt versioning and model versioning are separate concerns.

GenAI Quality Metrics and Evaluation Workflows

Groundedness: response factually supported by the source data
Groundedness measures whether each claim in the response is backed by the provided context documents — a fluent and relevant response can still be ungrounded if it introduces facts not in the source.
Relevance: response directly addresses the user's query
Relevance measures whether the generated response answers what the user asked — a grounded and coherent response can still be irrelevant if it discusses the wrong topic.
Coherence: logical flow and consistency across the response
Coherence measures whether the response is logically consistent and well-structured from sentence to sentence — distinct from fluency which measures grammar and naturalness.
Fluency: grammatically correct and natural-sounding language
Fluency measures the linguistic quality of the response — a fluent response can still be incoherent, irrelevant, or ungrounded; fluency alone does not indicate quality.
Risk and safety evaluations vs. quality evaluations
Safety evaluations detect harmful content, bias, and policy violations in model outputs — they are separate from quality metrics (groundedness, relevance) since a high-quality response can still be unsafe.
Automated evaluation workflow: test dataset → run metrics → compare → gate deployment
Configure automated evaluation in Foundry to run built-in and custom metrics on a test dataset on every deployment — use metric thresholds as quality gates before promoting to production.
Evaluation on test dataset vs. production traffic
Test dataset evaluation runs before deployment and catches regressions; production traffic evaluation monitors drift in quality metrics over time — both are necessary for complete quality assurance.

GenAI Observability — Latency, Cost, and Tracing

Distributed tracing for multi-step GenAI applications
Distributed tracing captures timing and execution details at each pipeline step (embedding, retrieval, LLM inference) — use it to identify which step is the latency bottleneck in RAG or agentic pipelines.
Token consumption: input tokens + output tokens
Monitor both input token count (prompt length) and output token count (response length) separately — cost optimization may target either side, and context window limits apply to the combined total.
Performance metrics: TTFT (time to first token) vs. total response time
Time to first token measures perceived latency in streaming responses; total response time measures complete generation — for streaming UIs, TTFT is the primary user-perceived latency metric.
Throughput: requests per second (RPS) / tokens per minute (TPM)
Monitor both RPS and TPM to understand system capacity — PTU limits are defined in TPM, not RPS, so high-context requests consume PTU capacity faster than short requests at the same RPS.
Foundry observability dashboard: latency / throughput / token usage / quality signals / safety signals
Configure all five observability dimensions in the Foundry monitoring dashboard — monitoring only Azure resource metrics misses GenAI-specific quality and safety signals.
Logging and tracing for debugging: full request/response capture
Enable detailed logging to capture the full prompt, retrieved documents, and generated response for each request — essential for debugging quality issues and auditing production behavior.

RAG Optimization — Chunking, Search, and Retrieval Tuning

Chunk size strategies: smaller (e.g., 512 tokens) = precision; larger = context
Smaller chunks improve retrieval precision by reducing noise per chunk but may miss cross-chunk context; larger chunks provide more context per result but dilute relevance scores — optimal size depends on query patterns.
Chunk overlap
Overlapping adjacent chunks by 10–20% prevents information at chunk boundaries from being lost — without overlap, sentences that span a boundary are split and may not be retrieved.
Hybrid search = semantic (vector) + keyword (BM25) via Reciprocal Rank Fusion
Hybrid search with RRF merging almost always outperforms pure vector or pure keyword search alone — keyword search captures exact terminology that embeddings may not preserve.
Similarity threshold tuning: precision vs. recall tradeoff
A high similarity threshold filters out loosely related chunks (high precision, lower recall); a low threshold returns more chunks including marginally relevant ones (high recall, lower precision) — tune based on hallucination vs. missed-answer tradeoff.
Embedding model selection for RAG
Choose embedding models optimized for your domain and language — a general-purpose embedding model may not capture domain-specific vocabulary; fine-tuning embeddings on domain data improves retrieval quality.
RAG vs. Fine-tuning decision
RAG adds KNOWLEDGE at inference time without changing model weights — use for dynamic, private, or frequently updated data; fine-tuning changes model BEHAVIOR permanently — use for specialized response style or task format.
A/B testing for RAG parameter optimization
Hold the LLM constant and vary ONE retrieval parameter at a time (chunk size, top-k, threshold) to isolate the impact of each change on end-to-end response quality metrics.

Fine-Tuning Foundation Models

Fine-tuning methods: supervised / parameter-efficient (LoRA, QLoRA) / instruction tuning
Supervised fine-tuning trains on labeled input-output pairs; LoRA and QLoRA are parameter-efficient methods that train a small set of adapter weights rather than all model parameters; instruction tuning aligns models to follow natural language instructions.
Synthetic data generation for fine-tuning
When real labeled examples are scarce, use an LLM to generate diverse synthetic training examples based on a small seed set — synthetic data must be diverse and representative or it degrades model performance.
Fine-tuning deployment: serverless or managed compute in Microsoft Foundry
Fine-tuned models can be deployed to either serverless API endpoints (pay-as-you-go, less control) or managed compute deployments (dedicated GPU, full MLOps integration) within Microsoft Foundry.
Monitoring fine-tuned vs. base model performance
After deployment, compare fine-tuned model quality metrics against the base model on the same test dataset — fine-tuning can improve task performance but may degrade general capability (catastrophic forgetting).
Fine-tuning does NOT add knowledge — use RAG for that
Fine-tuning changes HOW the model responds (style, format, task specialization) but does not update the model's knowledge; if the goal is grounding responses in new facts, use RAG instead.

Security and Identity — RBAC, Managed Identities, and Networking

az role assignment create \ --role "Azure ML Data Scientist" \ --assignee <principal-id> \ --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.MachineLearningServices/workspaces/<ws>
Assign an RBAC role scoped to a specific Azure ML workspace — use built-in ML roles (AzureML Data Scientist, AzureML Compute Operator) rather than Owner/Contributor for least-privilege access.
System-assigned managed identity vs. user-assigned managed identity
System-assigned identity is tied to the resource lifecycle (deleted with the resource); user-assigned identity is standalone and can be shared across multiple resources — use user-assigned for shared access scenarios.
Private endpoint for ML workspace: disables public access, routes through VNet
Deploying a private endpoint to an Azure ML workspace disables public internet access to the workspace and Studio — all access goes through the private IP within the configured VNet.
Managed network isolation modes: Disabled / Allow Internet Outbound / Allow Only Approved Outbound
Workspace managed network isolation controls outbound traffic: Disabled allows all outbound; Allow Internet Outbound allows all outbound plus private endpoints; Allow Only Approved Outbound restricts outbound to configured rules only.
Azure ML Registries: cross-workspace asset sharing with governance
Registries share models, environments, components, and data assets ACROSS workspaces — assets promoted to a registry can be consumed by any workspace in the organization, enabling centralized governance.

Ready to test yourself?

Start a timed AI-300 mock exam or review practice questions by domain.