General Exam Tips
- 1.Every question is scenario-based — do not look for a single keyword match. Read the full scenario to identify the CONSTRAINT that eliminates wrong answers.
- 2.Flag and skip questions where you are genuinely stuck. Come back after answering questions you are confident about — seeing more questions often jogs your memory.
- 3.No penalty for wrong answers. Answer every single question even if you have to guess.
- 4.Passing score is 750 out of 1000. You need roughly 75-80% correct — you can miss 15-18 questions and still pass.
- 5.Domains 2 (24%) and 4 (22%) together make up 46% of the exam. Getting those right is the single biggest lever on your score.
- 6.Multiple-select questions often have 2 or 3 correct answers. Do not select more or fewer than indicated — partial credit is not given.
- 7.When two answers both sound correct, look for a CONSTRAINT in the scenario (requires GPU, needs HTTP endpoint, needs MERGE, requires sub-second latency) that eliminates one.
Quick Navigation
Operationalize Data Preparation and Feature Engineering
Must-Know Facts
- Feature Store uses ENTITIES (business object identifiers, e.g., user_id) and FEATURE VIEWS (transformation logic as SQL or DataFrames) as its two core building blocks.
- Managed feature views are refreshed automatically by Snowflake on a schedule. External feature views are maintained by outside tools like dbt — Snowflake does NOT refresh them.
- Point-in-time lookups use ASOF JOIN under the hood to retrieve feature values that were available at each training example's timestamp, preventing data leakage. This is only relevant when you provide a spine_timestamp_col.
- Online feature serving is NOT enabled by default. It requires explicit setup and creates a separate low-latency serving infrastructure. Not all feature views automatically get an online store.
- Dynamic Tables use TARGET_LAG to define the acceptable staleness. Snowflake handles scheduling — you never write a Task to refresh a Dynamic Table.
- Streams + Tasks are required whenever the pipeline needs: MERGE statements, stored procedures, external function calls, custom retry logic, or explicit CRON scheduling with procedural branching.
- Snowpark ML preprocessing (StandardScaler, OneHotEncoder, Pipeline) runs DISTRIBUTED across warehouse nodes — it does NOT require SPCS.
- Data Metric Functions (DMFs) measure data quality: freshness, completeness, accuracy. They run on schedules and produce quality scores for feature tables, not model performance scores.
- ML Lineage automatically traces: source data -> feature views -> training datasets -> registered models. This traceability is what enables compliance audits and reproducibility.
Common Traps
Confusing Pairs
Scenario Tips
When the question describes training data that accidentally includes future information (labels or features from after the training timestamp)...
The fix is point-in-time feature lookups via Feature Store with spine_timestamp_col. This uses ASOF JOIN to ensure only historically available feature values are included.
Dynamic Table target lag might seem relevant because it controls freshness, but it does not control WHICH point-in-time values are returned during training dataset generation.
When the question asks about a feature pipeline that must update a staging table using MERGE and then trigger downstream processing only when specific conditions are met...
Streams + Tasks with a stored procedure. Streams detect new/changed rows; Tasks execute the stored procedure that performs MERGE and conditional logic.
Dynamic Tables cannot perform MERGE. Even though they 'automatically refresh,' they only support SELECT transformations.
When the question asks how to monitor whether input data to a production model has shifted distribution compared to training...
ML Observability drift detection. It compares inference-time input distributions against baseline (training) distributions using Difference of Means.
DMFs check data quality but cannot compare distributions between training baseline and production inference data.
Last-Minute Facts
MLOps Infrastructure and Management
Must-Know Facts
- Model Registry models are SCHEMA-LEVEL OBJECTS. They are not stored in stages, not in external locations — they live in a Snowflake schema and have full RBAC applied.
- A model can have up to 1,000 versions. Each version is unique by name. System aliases DEFAULT, FIRST, and LAST are always available and cannot be overridden.
- USAGE privilege on a model = warehouse-only inference + no access to internal details or artifacts. READ privilege = SPCS inference + metadata visibility (comments, tags, metrics, artifacts).
- Model artifacts reside on internal stages accessible via snow:// URLs, not standard stage paths. Only model OWNERS can access artifacts.
- SPCS compute pool instance families: CPU_X64_S/M/L for CPU workloads. GPU_NV_S = 1x A10G, GPU_NV_M = 4x A10G (standard GPU). GPU_NV_L = 8x A100 (large model GPU). Know which GPU count and type for which workload size.
- Container Runtime for Notebooks is a curated ML environment running on SPCS. It is not the same as creating your own SPCS service. It runs notebooks; it does not run long-lived services.
- Warehouse compute supports Snowpark ML training for scikit-learn, XGBoost, and LightGBM. You do NOT need SPCS just because you are training an ML model.
- Distributed hyperparameter tuning (GridSearchCV, RandomSearchCV) uses UDTFs to parallelize each hyperparameter combination across multiple warehouse nodes.
- Experiment Tracking logs hyperparameters, metrics, and artifacts during training runs. It feeds into Model Registry for model selection — they work together.
- The target_platforms argument in log_model() determines WHERE a model can be deployed (WAREHOUSE vs SNOWPARK_CONTAINER_SERVICES). Setting the wrong target platform causes log_model() to fail if dependencies conflict.
Common Traps
Confusing Pairs
Scenario Tips
When the question asks about training a PyTorch model requiring multiple A100 GPUs and packages not available in Snowflake's Anaconda channel...
SPCS compute pool with GPU_NV_L instance family (A100). Custom packages require a custom OCI image deployed as an SPCS service or job.
Container Runtime for Notebooks is tempting but is a curated environment. It does not support fully custom OCI images or multi-node distributed GPU training at production scale.
When a question asks which privilege allows a data scientist to view model metrics and use the model for SPCS inference without revealing model weights...
READ privilege. It grants SPCS inference + metadata visibility (metrics, tags, comments). OWNERSHIP is not needed and would be overly permissive.
USAGE grants warehouse inference only. OWNERSHIP grants full control. READ is the specific privilege designed for this access pattern.
When a scenario describes registering a scikit-learn model and running batch predictions on a 50-million-row table...
Virtual warehouse inference (batch). Scikit-learn models registered in Model Registry can run inference on warehouses directly — no SPCS needed for batch scoring with supported frameworks.
Model Serving on SPCS adds unnecessary cost and complexity for batch workloads with standard packages.
Last-Minute Facts
Model Serving and Deployment Operations
Must-Know Facts
- Model Serving creates managed HTTP endpoints on SPCS. It ONLY runs on SPCS — there is no warehouse-based HTTP endpoint for real-time inference.
- Snowflake automates container image building and deployment when you deploy from Model Registry to Model Serving. You do NOT build Docker images manually.
- Autoscaling is configured at the COMPUTE POOL level (min_nodes, max_nodes), not at the model or endpoint level.
- SPCS SERVICES are long-running (for real-time Model Serving endpoints). SPCS JOBS are finite-duration runs (for batch inference). Know which to use for each pattern.
- Batch inference for large datasets uses virtual warehouses or SPCS jobs. Real-time inference uses Model Serving (SPCS service with HTTP endpoint).
- Model version rollback: set the previous version as default in Model Registry and redeploy. The Model Registry maintains multiple versions exactly for this purpose.
- Blue-green deployment: run two Model Serving endpoints simultaneously (one per version), then redirect traffic. Canary: split traffic between versions during gradual rollout.
- A10G GPUs: GPU_NV_S (1x A10G) for single-GPU inference, GPU_NV_M (4x A10G) for heavier concurrent workloads. A100 GPUs: GPU_NV_L (8x A100) for very large models requiring maximum memory. A10G is sufficient for most standard inference workloads.
- Cold-start latency exists when a Model Serving endpoint scales from zero. Plan for warm-up time or configure minimum node count to keep at least one node hot.
Common Traps
Confusing Pairs
Scenario Tips
When a question asks about deploying a model for sub-second predictions with variable traffic that can spike 10x during business hours...
Model Serving on SPCS with autoscaling compute pool (configure max_nodes to handle peak load). The HTTP endpoint handles variable traffic and autoscaling manages node count.
Virtual warehouse batch inference cannot provide sub-second latency for individual prediction requests. Scheduled tasks cannot respond to real-time traffic spikes.
When a production model shows accuracy degradation after a new version was deployed and the team needs the fastest possible recovery...
Set the previous model version as default in Model Registry and redeploy. No retraining needed — the old version is already registered.
Retraining takes hours or days. Deleting the compute pool causes downtime. Increasing node count addresses capacity, not accuracy.
When the question asks about running GPU-accelerated inference on a large custom PyTorch model serving 100 concurrent real-time requests...
Model Serving on SPCS with GPU_NV_S or GPU_NV_M compute pool (A10G GPUs). Real-time concurrent requests require the HTTP endpoint provided by Model Serving, not batch jobs.
SPCS Jobs handle batch inference but terminate after completion — they cannot serve concurrent real-time requests.
Last-Minute Facts
Pipeline Orchestration and Automation (CI/CD)
Must-Know Facts
- ML Jobs are designed for PRODUCTION ML pipeline orchestration on Container Runtime. They support external IDE integration, run on SPCS Container Runtime, and integrate with Task Graphs.
- Task Graphs (DAGs) chain Snowflake Tasks using AFTER clauses. Each task runs only when its predecessor succeeds. Root tasks have the CRON schedule; child tasks inherit execution.
- Snowflake CLI GitHub Actions run in GITHUB, not in Snowflake. They are triggered by Git events (push, merge, tag) and execute Snowflake CLI commands to deploy artifacts.
- Git integration for Notebooks enables version control and collaboration, but does NOT automatically deploy changes. CI/CD pipelines with GitHub Actions are required for automated deployment.
- External orchestrators (Airflow, Prefect, Dagster) connect TO Snowflake from OUTSIDE. They trigger ML Jobs or Tasks via Snowflake connectors but do not replace Snowflake's internal scheduling.
- Task failure handling: tasks can be configured to either halt the graph (default) or continue downstream tasks despite upstream failures. Know how SUSPEND_TASK_AFTER_NUM_FAILURES works.
- Pipeline stages in a proper ML CI/CD: data ingestion → feature engineering → training → validation → registration → deployment → monitoring. Each step is an orchestrated task or job.
- ML Jobs vs Scheduled Notebooks: ML Jobs support multi-step production pipelines with Container Runtime and external IDEs. Scheduled Notebooks are simpler, single-notebook scheduling for lighter workloads.
Common Traps
Confusing Pairs
Scenario Tips
When the question describes a scenario where code changes to a notebook should automatically deploy to production when merged to the main branch...
Git integration (to connect notebooks to GitHub) PLUS Snowflake CLI GitHub Actions (to trigger deployment on merge). Both are needed.
Git integration alone only provides version control. It does not trigger deployment. A scheduled Task cannot watch for Git events.
When the question describes a 5-step ML pipeline (ingest, feature compute, train, validate, deploy) where each step must only run after the previous succeeds...
Task Graph with AFTER clauses chaining 5 tasks. The root task has the CRON schedule. Each subsequent task runs only when its predecessor succeeds.
Dynamic Tables handle data transformation, not arbitrary pipeline steps. Scheduled Notebooks cannot chain multi-step dependencies with failure gates.
When a team uses Airflow for all their orchestration but wants training to run on Snowflake's GPU Container Runtime...
Keep Airflow as the orchestrator. Use Airflow's Snowflake provider to trigger ML Jobs that execute on Container Runtime. No migration of Airflow DAGs is needed.
Converting all Airflow DAGs to Snowflake Task Graphs is unnecessary and disruptive. The exam favors integration over replacement.
Last-Minute Facts
Governance, Security and Monitoring
Must-Know Facts
- ML Observability ONLY supports regression and binary classification models. Multi-class classification, clustering, and other model types are NOT supported for automated monitoring.
- Data drift = INPUT feature distribution changed from training baseline. Concept drift = the RELATIONSHIP between inputs and outputs changed. Both degrade accuracy but need different responses.
- ML Observability uses Difference of Means as its drift detection statistical method for comparing inference-time distributions against baseline.
- Model monitors have hard limits: 250 monitors per account maximum, 500 features monitored per model maximum, minimum 1-day aggregation window.
- Timestamp columns for model monitors MUST be TIMESTAMP_NTZ type. Prediction and actual columns MUST be NUMBER type. Wrong types cause monitor failure.
- RBAC for ML: access roles define object-level permissions (SELECT, USAGE, READ on models and feature tables). Functional roles map to job functions (data_scientist, ml_engineer). Best practice: custom functional roles grant to SYSADMIN, not ACCOUNTADMIN.
- Dynamic Data Masking applies to COLUMNS in feature tables and training data. It does NOT mask model weights or prediction outputs — those are governed via RBAC on the model object.
- ML Lineage traces from source tables through feature views through training datasets through registered models. If a model is trained EXTERNALLY and imported, lineage within Snowflake is partial.
- Snowflake Horizon is the governance suite umbrella: data discovery (catalog), Trust Center (security posture monitoring), compliance center, and data clean rooms.
- ML Explainability computes Shapley values (SHAP) to explain which features drive individual predictions. Used for governance and regulatory transparency requirements.
Common Traps
Confusing Pairs
Scenario Tips
When a question asks about setting up automated monitoring for a multi-class classification model to detect when predictions degrade...
ML Observability does NOT support multi-class classification. The team would need custom monitoring using Snowpark or SQL-based alerting on prediction distributions.
The exam expects you to know the limitation. Do not select 'configure an ML Observability model monitor' for multi-class classification.
When a question asks how to give an auditor read-only access to model metadata (tags, metrics, version history) and the ability to run SPCS inference without exposing model weights...
Grant READ privilege on the model to the auditor's role. READ grants SPCS inference + metadata visibility without revealing internal model weights or artifacts.
OWNERSHIP is too permissive. USAGE grants warehouse inference only and no metadata. READ is the exact privilege for this pattern.
When a scenario describes a production model that shows identical input feature distributions to training time, but prediction accuracy has dropped significantly after a major market event...
This is CONCEPT DRIFT — the relationship between inputs and outputs changed (the market event changed what the correct prediction should be), not the input distribution. The response is retraining on post-event data, not fixing data pipelines.
Data drift refers specifically to input distribution changes. If inputs look the same but outputs are wrong, that's concept drift.