You Can Pass This Exam For Free
Choose Your Study Path
You have general ML/data science experience but limited Snowflake-specific knowledge. You need to learn Snowflake's ML platform from the ground up.
Exam Overview
Format
70 questions, 115 minutes. Multiple choice and multiple select. English only.
Scoring
Scaled score 0-1000. Passing: 750. No penalty for wrong answers — always answer every question.
Domains & Weights
- Operationalize Data Preparation and Feature Engineering20%
- MLOps Infrastructure and Management24%
- Model Serving and Deployment Operations18%
- Pipeline Orchestration and Automation (CI/CD)22%
- Governance, Security and Monitoring16%
Registration
$375 USD. Delivered online via Snowflake's certification portal. Exam fee is $375 USD ($188 during beta period June 15 - July 13, 2026). Requires SnowPro Core certification as a prerequisite.
Topic Priority Table
Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.
Operationalize Data Preparation and Feature Engineering
This domain covers building production-grade data pipelines and feature engineering workflows within Snowflake. You need to understand how to transform raw data into ML-ready features using Snowflake Feature Store, Dynamic Tables, Streams and Tasks, and Snowpark ML preprocessing APIs, with emphasis on automation, incremental refresh, and point-in-time correctness.
Key Topics
Must-Know Concepts
- Feature Store architecture: feature entities (business objects), feature views (transformation logic), managed vs external feature tables, and automated incremental refresh
- Point-in-time feature lookups: how Feature Store ensures training data does not leak future information when generating historical feature sets
- Online vs offline feature serving: online for low-latency real-time inference, offline for training dataset generation and batch scoring
- Dynamic Tables for feature pipelines: declarative SQL transformations with automatic refresh, target lag configuration, and DAG chaining for multi-step feature computation
- Streams and Tasks for complex feature pipelines: when to use CDC-based streams with task scheduling instead of Dynamic Tables (MERGE operations, stored procedures, custom error handling)
- Snowpark ML preprocessing: distributed scalers (StandardScaler, MinMaxScaler), encoders (OneHotEncoder, OrdinalEncoder), and Pipeline objects for chaining transformations
- Data quality monitoring with Data Metric Functions: freshness, completeness, and accuracy checks on feature tables
- ML Lineage for feature traceability: tracking data flow from source tables through feature definitions to training datasets and models
- Feature engineering patterns: group-by aggregations, window functions, pivot operations, and join strategies using Snowpark DataFrames
- Incremental feature computation: how Dynamic Tables and Streams enable efficient processing of only new or changed data
Common Exam Traps
MLOps Infrastructure and Management
The heaviest domain at 24%, covering the infrastructure and management layer for ML operations in Snowflake. This includes Snowpark Container Services compute pools, Model Registry, Experiment Tracking, Snowpark ML model training, hyperparameter tuning, and the Snowflake ML platform architecture. Master the relationship between warehouses, SPCS, and Container Runtime.
Key Topics
Must-Know Concepts
- SPCS compute pool architecture: instance families (CPU_X64_S/M/L for CPU, GPU_NV_S/M for A10G workloads, GPU_NV_L for A100 workloads), min/max node autoscaling, and workload-to-pool matching
- GPU support in SPCS: A10G and A100 NVIDIA GPUs for model training, fine-tuning, and inference. Know when GPU compute is necessary vs CPU-sufficient
- Model Registry as first-class schema objects: model versioning, default version designation, metadata storage, and deployment to warehouse or SPCS
- Experiment Tracking: logging hyperparameters, metrics, and artifacts during training runs for comparison and model selection
- Snowpark ML model training: built-in wrappers for scikit-learn, XGBoost, and LightGBM that run natively in Snowflake without UDF creation
- Distributed hyperparameter optimization: GridSearchCV and RandomSearchCV execution across multiple warehouse or SPCS nodes using UDTFs
- Container Runtime for Notebooks: preconfigured ML software stacks (PyTorch, TensorFlow, scikit-learn) on CPU or GPU compute pools, extensible with additional packages
- ML Jobs for production training: scheduling model retraining, integrating with external IDEs, and running on Container Runtime
- Warehouse compute vs SPCS: warehouse for SQL inference and supported framework training; SPCS for custom containers, arbitrary packages, GPU workloads, and HTTP serving endpoints
- Model artifact management: storing, versioning, and comparing trained models and their associated metadata in the Model Registry
Common Exam Traps
Model Serving and Deployment Operations
This domain covers deploying trained models to production for both batch and real-time inference. Key topics include Model Serving on SPCS, deployment automation, inference endpoint management, autoscaling configuration, A/B testing strategies, and the operational aspects of running models in production including rollback and blue-green deployment patterns.
Key Topics
Must-Know Concepts
- Model Serving architecture: Model Registry model deployed to SPCS as a managed HTTP endpoint. Snowflake automates container image building, deployment, and endpoint setup
- Batch vs real-time inference: batch runs on warehouse or SPCS jobs for bulk processing; real-time deploys as HTTP endpoints on SPCS for individual predictions with autoscaling
- Deployment process: register model in Model Registry, configure compute pool, deploy to SPCS, verify endpoint health, route traffic
- Autoscaling for Model Serving: configure min/max nodes in SPCS compute pools. Snowflake scales based on request load automatically
- GPU selection for inference: A10G GPUs for standard inference workloads, A100 GPUs for large models requiring more memory and compute
- Model version management: default version designation in Model Registry, deploying specific versions, and rolling back to previous versions
- Inference on warehouse vs SPCS: warehouse for SQL-integrated batch predictions with supported models; SPCS for custom containers, arbitrary packages, and HTTP endpoints
- Multi-modal batch inference: SPCS job-based batch inference supporting GPU acceleration across multimodal datasets
- Deployment patterns: blue-green deployments using multiple Model Serving endpoints, canary releases by splitting traffic between model versions
- Endpoint observability: monitoring inference latency, throughput, error rates, and resource utilization for deployed models
Common Exam Traps
Pipeline Orchestration and Automation (CI/CD)
The second-heaviest domain at 22%, covering end-to-end ML pipeline orchestration and CI/CD automation. Topics include ML Jobs, Task Graphs, scheduled notebooks, Git integration, Snowflake CLI GitHub Actions, external orchestrators (Airflow, Prefect, Dagster), version control for ML artifacts, and automated deployment workflows.
Key Topics
Must-Know Concepts
- ML Jobs for pipeline orchestration: scheduling retraining pipelines, connecting multiple steps, and running on Container Runtime with support for external IDE development
- Task Graphs (DAGs): chaining Snowflake Tasks using AFTER clauses to create multi-step workflows with CRON scheduling and dependency management
- CI/CD with Git integration: connecting Snowflake Notebooks and code to Git repositories (GitHub, GitLab) for version control and collaborative development
- Snowflake CLI GitHub Actions: automating deployment pipelines triggered by Git events (merge to main, release tags) to deploy updated DAGs and ML artifacts
- External orchestrator integration: using Airflow, Prefect, or Dagster with ML Jobs when organizations have existing orchestration infrastructure
- Pipeline stages: data ingestion, feature engineering, model training, model validation, model registration, model deployment, and monitoring — each as an orchestrated step
- Version control for ML artifacts: tracking notebook code, model configurations, feature definitions, and pipeline definitions in Git
- Automated model retraining: scheduling periodic retraining pipelines triggered by data drift detection or calendar schedules
- Deployment automation: from code merge to production deployment using CI/CD pipelines that validate, test, and deploy ML models
- Error handling in pipelines: task retry policies, failure notifications, and graceful degradation patterns in production ML pipelines
Common Exam Traps
Governance, Security and Monitoring
This domain covers securing, governing, and monitoring ML systems in production. Topics include RBAC for ML artifacts, dynamic data masking, Snowflake Horizon governance suite, ML Observability for drift detection and performance monitoring, ML Lineage for compliance, cost management for ML workloads, and audit logging for regulatory requirements.
Key Topics
Must-Know Concepts
- RBAC for ML artifacts: securing models, feature tables, compute pools, and inference endpoints using Snowflake's role hierarchy (access roles + functional roles)
- Dynamic data masking for ML: applying column-level masking policies to ensure sensitive training data is not exposed to unauthorized roles
- Snowflake Horizon governance suite: universal data discovery, Trust Center for security posture, data classification for sensitive data identification
- ML Observability: configuring model monitors for regression and binary classification models, tracking performance metrics, detecting data drift using Difference of Means, and setting alerts
- Drift detection: understanding data drift (input distribution changes) vs concept drift (relationship between inputs and outputs changes), and how Snowflake detects each
- ML Lineage for compliance: tracing data from source through features, datasets, and models for audit trails, reproducibility, and regulatory requirements
- ML Explainability with Shapley values: computing feature importance scores for model interpretability and transparency requirements
- Cost management for ML workloads: monitoring compute costs across warehouses, SPCS compute pools, and serverless functions. Understanding credit consumption patterns
- Audit logging: tracking who accessed which models, when inference was run, and what data was used for training — critical for regulated industries
- Data Clean Rooms: privacy-preserving collaboration environments for secure cross-organizational ML without exposing raw data
Common Exam Traps
Snowflake ML Concepts You Must Not Confuse
These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.
Top Mistakes to Avoid
Exam-Ready Checklist
Recommended Resources
Free & Official Resources
Paid Courses & Practice Exams
These are recommended if you prefer a structured learning path. They can save time but are not required to pass.