How long should I study for the Databricks ML Professional exam?

Most people need 4-8 weeks of focused study depending on experience. If you passed the ML Associate and work with Databricks ML daily, 2-3 weeks of targeted review may be enough. If you are coming from the Associate with limited production experience, plan for 6-8 weeks with heavy hands-on practice.

How difficult is the ML Professional compared to the ML Associate?

Significantly harder. The Associate tests syntax and concepts (e.g., 'what does VectorAssembler do?'). The Professional tests production judgment in complex scenarios (e.g., 'your model precision dropped — is it data drift or concept drift, and what should you do?'). Most questions present 3-5 sentence scenarios where multiple answers seem reasonable but one is optimal.

Do I need to pass the ML Associate before taking the ML Professional?

No formal prerequisite exists. However, Databricks strongly recommends passing the Associate first and having 1+ years of production ML experience on Databricks. The Professional assumes you already know SparkML, MLflow, and Feature Store fundamentals covered by the Associate exam.

What topics are new on the Professional exam that are not on the Associate?

Key additions include: distributed hyperparameter tuning with Optuna and Ray, Lakehouse Monitoring (drift detection, inference tables, statistical tests), Databricks Asset Bundles for CI/CD, ML pipeline testing strategies (unit/integration/end-to-end), deployment strategies (blue-green, canary), PyFunc custom models, online tables and on-demand features, and automated retraining with champion-challenger patterns.

How heavily is Lakehouse Monitoring tested?

Very heavily — it has 10 objectives in the MLOps domain alone, making it the single most densely tested sub-section. Expect questions on statistical tests (KS, Chi-squared, Jensen-Shannon), monitoring table types (snapshot, time series, inference), custom metrics, feature slicing, alert configuration, and endpoint health tracking. Do not underestimate this topic.

Should I focus on the September 2025 exam version or an older version?

Focus on the September 2025 version, which consolidated from 4 sections to 3 and introduced new topics: Optuna/Ray for distributed tuning, DABs, ML pipeline testing, and blue-green/canary deployments. The exam now uses Unity Catalog model aliases instead of legacy Model Registry Webhooks.

How much does the exam cost and how long is the certification valid?

The exam costs $200 USD per attempt. If you fail, you can retake after a waiting period. The certification is valid for 2 years from the date you pass. To renew, you must retake the current version of the exam.

Is hands-on experience required to pass?

Strongly recommended. The Professional exam tests production judgment, not just knowledge. Questions assume you have built and deployed ML models, configured monitoring, and handled model lifecycle management in practice. Use Databricks Community Edition or a trial workspace for hands-on practice if you do not have production access.

Databricks Certified Machine Learning Professional (ML Professional) Free Study Guide 2026

You Can Pass This Exam For Free

The Databricks Certified Machine Learning Professional exam is passable with free resources, but requires significant hands-on production ML experience (6-12 months minimum):

Databricks Academy free learning paths (Machine Learning Professional track)
Official Databricks documentation — MLflow, Feature Store, Lakehouse Monitoring, Model Serving
Databricks Community Edition for practicing SparkML pipelines, MLflow experiment tracking, and distributed tuning
500+ free practice questions on this site covering all 3 professional-level domains
The Big Book of MLOps (free eBook from Databricks) — covers end-to-end MLOps architecture
databricks/mlops-stacks GitHub repository for production ML pipeline templates

This is a professional-level exam. Unlike the Associate, most questions present complex production ML scenarios — you need real-world experience with distributed training, drift monitoring, Feature Store pipelines, and model deployment strategies. Book knowledge alone is insufficient.

Choose Your Study Path

You passed the ML Associate exam and have 6-12 months of Databricks ML experience. You need to level up on advanced topics like distributed tuning, Lakehouse Monitoring, Feature Store pipelines, deployment strategies, and MLOps testing patterns.

Week 1Review the official Professional exam guide. Compare domains to the Associate exam — note what's new: distributed training with Ray/Optuna, Lakehouse Monitoring, Databricks Asset Bundles, blue-green/canary deployments, and ML pipeline testing

Week 2SparkML deep dive: Pipeline construction with stages/estimators/transformers, distributed feature engineering, CrossValidator with ParamGrid, choosing distributed SparkML vs single-node models, batch and streaming inference patterns

Week 3Distributed training and tuning: Optuna-MLflow integration, Ray framework for distributed tuning, pandas Function APIs for group-specific model training, vertical vs horizontal scaling, data parallelism vs model parallelism

Week 4Advanced MLflow and Feature Store: nested runs, PyFunc custom models, point-in-time correctness, online tables for low-latency serving, streaming feature ingestion, on-demand features for training-serving consistency

Week 5MLOps: Unity Catalog model management, model lifecycle (dev to staging to prod), CI/CD testing (unit tests with pytest, integration tests, end-to-end pipeline tests), Databricks Asset Bundles for multi-environment deployment

Week 6Lakehouse Monitoring deep dive: drift detection (KS, Chi-squared, Jensen-Shannon), inference tables, data table types (snapshot/time series/inference), custom metrics, alert configuration, baseline vs time-window comparison

Week 7Model Deployment: blue-green vs canary deployment strategies, Model Serving endpoints, PyFunc model registration in Unity Catalog, REST API querying, MLflow Deployments SDK, traffic splitting for rollout

Week 8Full practice exams across all 3 domains. Review explanations carefully. Target 80%+ before scheduling the real exam. Focus on scenario-based questions that test judgment, not syntax recall

Exam Overview

Format

59 questions, 120 minutes. Multiple choice (single select and multiple select). Scenario-heavy — most questions present production ML scenarios requiring you to choose the best approach. Covers advanced topics not on the Associate exam including distributed hyperparameter tuning, Lakehouse Monitoring, Databricks Asset Bundles, and deployment strategies.

Scoring

Pass/fail based on percentage score. Passing: 70%. No penalty for wrong answers — always guess if unsure. Questions are weighted equally across all domains. May include unscored items for statistical analysis — these are not identified and do not impact your score.

Domains & Weights

Model Development44%
MLOps44%
Model Deployment12%

Registration

$200 USD. Available through Kryterion testing centers or online proctored via WebAssessor. Schedule at databricks.com/certification. Costs $200 USD per attempt. No formal prerequisites, but Databricks recommends the ML Associate certification and 1+ years of hands-on production ML experience on Databricks. Credential is valid for 2 years.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowDeep understanding required — these appear across multiple domains and form the foundation of professional-level questions. Know internals, edge cases, and production patterns.

Tier 2: Should KnowUnderstand use cases, configuration, and key behaviors. May appear in 3-8 questions each.

Tier 3: Recognize OnlyKnow at a high level — what it does and when to use it. Rarely more than 1-2 questions each.

Domain 144% of exam

Model Development

The largest domain at 44% (~26 questions). Tests advanced ML pipeline construction with SparkML, distributed training and hyperparameter tuning (Optuna, Ray, pandas Function APIs), advanced MLflow usage (nested runs, PyFunc custom models), and Feature Store concepts (point-in-time correctness, online tables, streaming features, on-demand features). This domain has 22 objectives across 4 sub-sections: SparkML (7), Scaling and Tuning (7), Advanced MLflow (3), and Advanced Feature Store (5).

Key Topics

SparkML PipelinesOptunaRayMLflow TrackingMLflow PyFuncFeature StoreOnline TablesPandas Function APIsCrossValidatorVectorAssembler

Must-Know Concepts

SparkML Pipeline construction: stages chain estimators and transformers. Pipeline.fit() trains all stages sequentially, PipelineModel.transform() applies them in order
Feature transformer selection: StringIndexer (strings to indices), OneHotEncoder (indices to binary vectors), VectorAssembler (columns to single Vector), StandardScaler (normalization)
CrossValidator with ParamGrid for hyperparameter tuning: CrossValidator trains k x n models (folds x combinations). Use with RegressionEvaluator or classification evaluators
When to use SparkML vs single-node: SparkML for distributed data (millions+ rows). For single-node models at scale, distribute tuning with SparkTrials or Ray, not SparkML
Optuna-MLflow integration: Optuna's define-by-run API with MLflow callback for logging. Supports pruning (early stopping bad trials) and multi-objective optimization
Ray for distributed tuning: distributes independent Python functions across a cluster. Better than Spark for embarrassingly parallel compute-bound workloads
Pandas Function APIs: applyInPandas() for group-specific model training (e.g., one model per store), mapInPandas() for partition-level distributed inference
Vertical vs horizontal scaling: vertical = bigger machines (more RAM/CPU per node), horizontal = more machines. Vertical for memory-bound, horizontal for compute-bound workloads
Data parallelism vs model parallelism: data parallelism splits data across workers (each has full model), model parallelism splits the model across workers (each has part of model)
Nested MLflow runs: use mlflow.start_run(nested=True) to create parent-child hierarchies for organizing hyperparameter search results under a single parent experiment
PyFunc custom models: wrap custom inference logic (pre/post-processing, feature engineering) so it runs at prediction time. Ensures training-serving consistency for complex pipelines
Custom metric/parameter/artifact logging: log_metric() for numeric values, log_param() for configuration, log_artifact() for files (plots, data samples, configs)
Point-in-time correctness: Feature Store retrieves features as they existed at the prediction timestamp, preventing future data leakage during historical training
Online tables: low-latency feature serving synced from offline Feature Store tables. Required for real-time serving endpoints that need feature lookups during inference
On-demand features: computed at request time for features that depend on the prediction request itself (e.g., time since last login). Ensures training-serving consistency

Common Exam Traps

SparkML requires all features in a single Vector column (VectorAssembler). Forgetting this step is the most common pipeline error — models will fail without it

OneHotEncoder requires numeric indices, not strings — apply StringIndexer FIRST. Direct string input to OneHotEncoder raises an error

Hyperopt and Optuna both MINIMIZE by default. Return -accuracy (negative) if maximizing a metric, or the optimizer finds the worst model

Too much parallelism degrades Bayesian optimization — TPE/Optuna need completed trials to guide proposals. With max parallelism, it degenerates to random search

applyInPandas() is for grouped operations (after groupBy), mapInPandas() is for partition-level processing. Confusing the two yields incorrect results for group-specific training

PyFunc models run custom code at prediction time — ensure all dependencies are logged with the model via conda_env or pip_requirements, or serving will fail with import errors

Point-in-time lookups only work if the Feature Store table has a timestamp key. Without it, the latest feature value is always returned regardless of prediction time

Online tables have a sync delay from offline tables — real-time predictions may use slightly stale features. Design tolerance for this latency in serving architecture

Quick Check: Model Development

Question 1 of 3

A data scientist needs to train a separate demand forecasting model for each of 500 retail stores using the same scikit-learn algorithm. The training data is stored in a single Spark DataFrame with a store_id column. What is the most efficient approach?

Domain 244% of exam

ML Operations (MLOps)

Tied for the largest domain at 44% (~26 questions). Tests model lifecycle management (dev to staging to prod), validation testing strategies (unit, integration, end-to-end), environment architectures with Databricks Asset Bundles, automated retraining workflows, and — most heavily — Lakehouse Monitoring for drift detection. This domain has 20 objectives across 5 sub-sections: Model Lifecycle (2), Validation Testing (4), Environment Architectures (2), Automated Retraining (2), and Drift Detection/Lakehouse Monitoring (10).

Key Topics

Lakehouse MonitoringUnity Catalog ModelsDatabricks Asset BundlespytestInference TablesDatabricks WorkflowsChampion-Challenger Pattern

Must-Know Concepts

Model lifecycle architecture: deploy CODE (not models) across environments. Train in dev, validate in staging, serve in prod. The same pipeline code runs in each environment with different configurations
Unity Catalog model aliases replace legacy stage transitions. Assign aliases like 'champion' and 'challenger' to model versions for lifecycle management
Unit testing ML code: test individual transformation and feature engineering functions in isolation using pytest. Test data quality assertions, schema validation, and edge cases
Integration testing: test component interactions across environments — verify feature pipelines produce expected output types, model training completes, and predictions are within expected ranges
End-to-end pipeline testing: validate the full pipeline from feature computation through training, evaluation, and deployment. Use test datasets and temporary catalogs
Test organization: separate unit tests (fast, isolated) from integration tests (slower, require infrastructure). Run unit tests on every commit, integration tests on merge to main
Databricks Asset Bundles (DABs): define ML resources (jobs, pipelines, serving endpoints) as YAML. Deploy to dev/staging/prod with environment-specific overrides using targets
Infrastructure-as-code with DABs: version control ML pipeline configurations alongside code. Enables reproducible deployments and rollbacks across environments
Automated retraining triggers: monitor for data drift, prediction drift, or performance degradation. When thresholds are breached, trigger retraining workflows automatically
Champion-challenger pattern: train a new model (challenger), compare it against the current model (champion) on held-out data or A/B test in production, promote only if the challenger wins
Lakehouse Monitoring statistical tests: Kolmogorov-Smirnov (KS) for numerical drift, Chi-squared for categorical drift, Jensen-Shannon divergence for distribution comparison
Three monitoring table types: snapshot (point-in-time data quality), time series (temporal trends), inference (model inputs/outputs/performance)
Monitor creation and configuration: create monitors on Delta tables in Unity Catalog, configure refresh schedules, set baseline tables for comparison
Custom metrics in Lakehouse Monitoring: define business-specific metrics beyond built-in statistical tests. Use SQL expressions for custom metric computation
Feature slicing: analyze drift and performance for specific data segments (e.g., by region, customer type). Identifies localized issues that aggregate metrics miss

Common Exam Traps

Deploy CODE, not models — the production environment runs the same training pipeline as dev, but with production data and configuration. Exporting a trained model file from dev to prod is an anti-pattern

Unity Catalog aliases are flexible — you can create any alias name (not limited to 'champion'/'challenger'). But the exam typically uses champion/challenger as the standard pattern

Unit tests should NOT test model accuracy — they test deterministic functions (data transformations, feature logic). Model accuracy varies with data and is validated in integration/E2E tests

Lakehouse Monitoring drift metrics are computed on REFRESH, not continuously. If your refresh interval is daily, drift that occurs and resolves within a day may be missed

KS test is for numerical features ONLY. Chi-squared is for categorical features ONLY. Using the wrong test produces meaningless results — the exam tests which test to apply for each feature type

Inference tables log raw request/response data — they are NOT drift metrics themselves. Lakehouse Monitoring analyzes inference tables to compute drift metrics and performance trends

Champion-challenger comparison must use the SAME evaluation dataset and metrics. Comparing models on different data subsets or with different metrics invalidates the comparison

DABs targets inherit from the default configuration — never duplicate the full config per target. Only specify environment-specific overrides (cluster sizes, catalog names, permissions)

Quick Check: ML Operations (MLOps)

Question 1 of 3

An ML team notices their fraud detection model's precision has dropped 15% over the past month. Before retraining, they want to identify whether the drop is caused by data drift or concept drift. Which approach is most appropriate?

Domain 312% of exam

Model Deployment

The smallest domain at 12% (~7 questions). Tests deployment strategies (blue-green, canary), custom model serving with PyFunc, REST API integration, and model rollout management. This domain has 5 objectives across 2 sub-sections: Deployment Strategies (2) and Custom Model Serving (3). Despite its low weight, these questions are often the most scenario-heavy and nuanced.

Key Topics

Model Serving EndpointsPyFuncREST APIMLflow Deployments SDKTraffic SplittingBlue-Green DeploymentCanary Deployment

Must-Know Concepts

Blue-green deployment: two identical environments. Traffic switches entirely from old (blue) to new (green) version. Instant rollback by switching back. Higher cost but zero-downtime
Canary deployment: gradual traffic routing to new version (e.g., 5% to 25% to 50% to 100%). Monitor metrics at each step. Lower risk but slower full rollout
Evaluate deployment strategy suitability: blue-green for high-traffic applications needing instant rollback, canary for gradual validation with real production traffic
PyFunc model registration in Unity Catalog: log custom models with mlflow.pyfunc.log_model(), register in Unity Catalog for governance and lineage tracking
REST API querying: send prediction requests to model serving endpoints via HTTP POST with JSON payloads. Handle authentication with Databricks personal access tokens
MLflow Deployments SDK: programmatic interface for creating, updating, and querying model serving endpoints. Alternative to REST API for Python-based workflows
Custom artifact management: log additional files (preprocessing pipelines, lookup tables, configuration) with the model so they are available at serving time
Model deployment methods: UI (click-based), REST API (programmatic), MLflow Deployments SDK (Python). Know when to use each based on automation needs
Traffic splitting for gradual rollout: configure percentage-based traffic routing between model versions on the same serving endpoint for A/B testing or canary deployment
Endpoint scaling and latency: configure auto-scaling, warm-up strategies, and appropriate instance types to meet latency SLAs for real-time serving

Common Exam Traps

Blue-green deployment requires running TWO full environments simultaneously — double the infrastructure cost during deployment. If cost is a primary concern, canary uses fewer additional resources

Canary deployment does NOT provide instant rollback — rolling back requires gradually shifting traffic back, which takes time. If instant rollback is required, choose blue-green

PyFunc models must include ALL dependencies (conda_env or pip_requirements) when logged. Missing dependencies cause serving endpoint startup failures that are difficult to debug in production

REST API authentication requires a Databricks personal access token or service principal token. Anonymous access to serving endpoints is not supported by default

Traffic splitting percentages must sum to 100% across all model versions on an endpoint. The exam may present configurations that do not sum correctly as distractor answers

Quick Check: Model Deployment

Question 1 of 3

A financial services company is deploying a new credit scoring model to production. The model serves 50,000 requests per hour and any incorrect predictions could result in regulatory fines. They need the ability to instantly roll back if the new model underperforms. Which deployment strategy is most appropriate?

Key ML Professional Concepts Compared

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

Optuna vs Hyperopt

Use Optuna when…

Modern hyperparameter optimization with native MLflow integration, define-by-run API, pruning support for early stopping unpromising trials, and multi-objective optimization. Preferred for new Databricks workloads.

Use Hyperopt when…

Legacy Bayesian hyperparameter optimization using TPE (Tree of Parzen Estimators) with SparkTrials for Spark-native distributed tuning. Still widely used but being superseded by Optuna on Databricks.

Exam trap

Optuna is the recommended approach on the current exam (September 2025 version). Hyperopt is still tested but Optuna-MLflow integration is the focus for distributed tuning questions. Both minimize by default — return negative values for metrics you want to maximize.

Ray vs Spark (SparkTrials / pandas Function APIs)

Use Ray when…

Distributed Python-native computing for embarrassingly parallel workloads: training many independent models, running compute-heavy tuning trials. Works independently of Spark DataFrames. Best for model parallelism.

Use Spark (SparkTrials / pandas Function APIs) when…

Data-parallel distributed computing. SparkTrials distributes Hyperopt trials across Spark workers. Pandas Function APIs (applyInPandas/mapInPandas) distribute pandas-based operations across Spark partitions. Best for data parallelism.

Exam trap

Ray and Spark serve different parallelism needs. Ray is for distributing independent Python functions (model parallelism). Spark is for distributing data processing (data parallelism). The exam tests when to choose each based on whether the bottleneck is compute or data volume.

Blue-Green Deployment vs Canary Deployment

Use Blue-Green Deployment when…

Two full environments run simultaneously. Traffic switches entirely from old (blue) to new (green). Instant rollback by switching back. Higher infrastructure cost but zero-downtime, all-or-nothing deployment.

Use Canary Deployment when…

Gradual traffic routing — start with 5-10% traffic to the new model, increase as confidence grows. Lower risk per exposure but slower full rollout. Better for validating with real production traffic before committing.

Exam trap

Blue-green is best when you need instant, full rollback capability (e.g., critical financial models). Canary is best when you want to validate with real traffic incrementally. The exam tests which strategy fits specific production risk profiles.

Unity Catalog Model Aliases vs Legacy Model Registry Stages

Use Unity Catalog Model Aliases when…

Current approach — assign named aliases (e.g., 'champion', 'challenger') to model versions in Unity Catalog. Flexible, supports custom alias names, and integrates with Unity Catalog governance.

Use Legacy Model Registry Stages when…

Legacy approach — model versions transition through fixed stages (None, Staging, Production, Archived). Being deprecated in favor of Unity Catalog aliases.

Exam trap

The September 2025 exam tests Unity Catalog aliases, NOT legacy stage transitions. If a question mentions 'champion/challenger' patterns, think aliases. If it mentions Staging/Production stages, it is referencing the legacy approach — the correct modern answer uses aliases.

Point-in-Time Feature Lookup vs Standard Feature Lookup

Use Point-in-Time Feature Lookup when…

Retrieves feature values as they existed at a specific timestamp, preventing future data from leaking into historical training examples. Essential for time-sensitive ML tasks (fraud detection, demand forecasting).

Use Standard Feature Lookup when…

Retrieves the latest feature values without timestamp awareness. Simpler but risks data leakage — features computed after the prediction timestamp may contaminate training data.

Exam trap

Point-in-time correctness is the #1 Feature Store concept on the exam. Without it, a fraud model trained on historical transactions might use account features computed AFTER the fraud occurred — inflating accuracy during training but failing in production.

Snapshot Monitoring vs Time Series / Inference Monitoring

Use Snapshot Monitoring when…

Monitors point-in-time snapshots of a Delta table. Compares current data distribution against a baseline. Best for detecting data quality issues in static tables.

Use Time Series / Inference Monitoring when…

Monitors data over time windows. Time series monitoring tracks feature distributions across windows. Inference monitoring tracks model inputs, outputs, and performance metrics. Best for detecting drift in production models.

Exam trap

Lakehouse Monitoring supports three table types and each produces different metrics. Inference tables are specifically for model monitoring — they track prediction distributions, input drift, and performance trends. The exam tests which table type and monitoring approach to use for different scenarios.

Data Parallelism vs Model Parallelism

Use Data Parallelism when…

Distribute training data across workers, each training a copy of the full model on a data subset. Gradients are aggregated across workers. Best for large datasets with models that fit in a single worker's memory.

Use Model Parallelism when…

Distribute model layers or components across workers. Each worker holds part of the model. Best for models too large to fit in a single worker's memory (e.g., large deep learning models).

Exam trap

Most Databricks ML workloads use data parallelism (SparkML, SparkTrials). Model parallelism is needed only when the model itself exceeds single-node memory — rare for classical ML but common for large deep learning models. The exam tests when each is appropriate.

Concept Drift vs Data Drift

Use Concept Drift when…

The relationship between input features and the target variable has changed (P(Y|X) shifts). Model performance degrades even though input distributions may look stable. Requires retraining on recent labeled data.

Use Data Drift when…

The distribution of input features has changed (P(X) shifts). Detected by comparing feature distributions against a baseline using KS test (numerical) or Chi-squared (categorical). May or may not affect model performance.

Exam trap

Concept drift can occur even when feature distributions are stable — always monitor prediction quality alongside feature distributions. Data drift may not cause performance issues if the model generalizes well to the shifted distribution.

Top Mistakes to Avoid

Confusing data drift (input feature distributions change) with concept drift (relationship between features and target changes) — each requires different detection and remediation strategies

Using KS test for categorical features or Chi-squared for numerical features — KS is for numerical distributions only, Chi-squared is for categorical distributions only

Deploying trained models instead of training code across environments — the professional pattern is to deploy the same pipeline code to each environment and train with environment-specific data

Forgetting point-in-time correctness in Feature Store lookups — without it, future data leaks into historical training examples, inflating offline metrics that fail to translate to production performance

Over-parallelizing Bayesian hyperparameter optimization — TPE/Optuna need completed trial results to guide proposals. Excessive parallelism degenerates to random search with no intelligent guidance

Confusing Unity Catalog model aliases with legacy Model Registry stages — the current exam tests the alias-based approach (champion/challenger), not the legacy Staging/Production/Archived stages

Testing model accuracy in unit tests — unit tests should validate deterministic functions (data transformations, feature logic). Model accuracy belongs in integration or end-to-end tests with real data

Using model serving endpoints for batch inference — serving endpoints are designed for real-time, low-latency predictions. Batch scoring should use spark_udf or dedicated batch inference jobs

Not logging dependencies with PyFunc models — missing conda_env or pip_requirements causes serving endpoint startup failures that are difficult to diagnose

Assuming Lakehouse Monitoring runs continuously — drift metrics are computed on scheduled REFRESH, not in real time. Configure refresh frequency based on your monitoring SLA requirements

Exam-Ready Checklist

Can explain all 3 exam domains, their weights (44/44/12), and the sub-sections within each domain

Know SparkML pipeline construction: stages, estimators, transformers, VectorAssembler, StringIndexer, OneHotEncoder, CrossValidator

Understand distributed tuning: Optuna-MLflow integration, Ray vs Spark trade-offs, pandas Function APIs (applyInPandas vs mapInPandas)

Can explain vertical vs horizontal scaling and data parallelism vs model parallelism — when to use each

Know advanced MLflow: nested runs, PyFunc custom models with pre/post-processing, custom metric/parameter/artifact logging

Understand Feature Store deeply: point-in-time correctness, online tables, streaming features, on-demand features, training-serving consistency

Know Lakehouse Monitoring: statistical tests (KS, Chi-squared, Jensen-Shannon), table types (snapshot, time series, inference), custom metrics, feature slicing, alert configuration

Can distinguish data drift vs concept drift vs prediction drift and know which monitoring approach detects each

Understand model lifecycle: deploy code (not models), Unity Catalog aliases (champion/challenger), environment promotion patterns

Know ML testing strategies: unit testing (pytest, isolated functions), integration testing (component interactions), end-to-end testing (full pipeline)

Can configure Databricks Asset Bundles: YAML configuration, targets with environment-specific overrides, infrastructure-as-code for ML resources

Understand automated retraining: drift-triggered retraining, champion-challenger comparison, model selection strategies

Know deployment strategies: blue-green (instant rollback, higher cost) vs canary (gradual rollout, lower risk) — when to use each

Can deploy custom PyFunc models: registration in Unity Catalog, REST API querying, MLflow Deployments SDK, traffic splitting

Scored 80%+ on at least two full practice exams covering all 3 domains

Reviewed all incorrect answers and understand why the right answer is right

Recommended Resources

Free & Official Resources

Databricks Academy — Machine Learning Professional Learning Path

Free official learning path covering all exam domains including distributed training, MLOps, and model deployment.

Official

The Big Book of MLOps (Databricks eBook)

Free comprehensive guide to MLOps architecture on Databricks — covers model lifecycle, monitoring, Feature Store, and deployment patterns.

Official

Databricks Documentation — Lakehouse Monitoring

Official documentation for Lakehouse Monitoring: drift detection, inference tables, custom metrics, and alert configuration.

Official

Databricks Documentation — MLflow on Databricks

Complete MLflow documentation covering tracking, PyFunc models, Unity Catalog model registry, and model serving.

Official

Databricks Documentation — Feature Store

Feature Store documentation covering point-in-time correctness, online tables, streaming features, and on-demand features.

Official

databricks/mlops-stacks (GitHub)

Production ML pipeline templates using Databricks Asset Bundles — reference implementation for ML CI/CD patterns tested on the exam.

Free

Databricks Certified ML Professional Exam Guide

Official exam guide with domain breakdown, objectives, and registration information.

Official

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Udemy — Databricks Machine Learning Professional Practice Exams

Practice exam course with detailed explanations matching real exam difficulty and scenario-based questions.

Paid

Databricks Academy — Machine Learning at Scale (Instructor-Led)

Official instructor-led course covering distributed training, MLOps, and model serving with hands-on labs.

Paid

Databricks Academy — Advanced Machine Learning Operations

Advanced MLOps course covering Lakehouse Monitoring, deployment strategies, CI/CD, and production monitoring.

Paid

ML Professional Study Guide

You Can Pass This Exam For Free

Choose Your Study Path

Exam Overview

Topic Priority Table

Model Development

Key Topics

Must-Know Concepts

Common Exam Traps

ML Operations (MLOps)

Key Topics

Must-Know Concepts

Common Exam Traps

Model Deployment

Key Topics

Must-Know Concepts

Common Exam Traps

Key ML Professional Concepts Compared

Top Mistakes to Avoid

Exam-Ready Checklist

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

Frequently Asked Questions