You Can Pass This Exam For Free
Choose Your Study Path
You have SnowPro Core and 1-2 years of Snowflake experience but limited depth in Snowpark, Snowpipe Streaming, Dynamic Tables, or Cortex LLM functions. You need to systematically build advanced data engineering skills before tackling the DEA-C02.
Exam Overview
Format
65 questions in 115 minutes. Question types include multiple-choice, multi-select, and interactive formats (drag-and-drop, matching). Delivered via Pearson VUE, available as online proctored or at a Pearson VUE testing center. Closed-book with no external resources allowed.
Scoring
Scaled scoring from 0-1000. Passing score: 750/1000. No penalty for wrong answers — always answer every question. Score report provided immediately after exam. Prerequisite: active SnowPro Core certification required at time of exam.
Domains & Weights
- Data Movement26%
- Data Transformation25%
- Performance Optimization21%
- Storage and Data Protection14%
- Data Governance14%
Registration
$375 USD. Register at home.pearsonvue.com/snowflake. Exam fee is $375 USD. Certification is valid for 2 years. Retake policy: 7-day waiting period between attempts, maximum 4 retakes in any 12-month period at $375 each. Recertification options: DEA-R02 recertification exam ($188), equivalent higher-level certification, or Snowflake instructor-led training. A Pearson VUE account is required.
Topic Priority Table
Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.
Data Movement
The largest domain at 26%. Covers all aspects of getting data into and out of Snowflake: staging (internal and external), bulk loading with COPY INTO, continuous ingestion with Snowpipe (file-based and Snowpipe Streaming), Tasks and Streams for pipeline orchestration, Dynamic Tables for declarative transformation pipelines, connector-based ingestion (Kafka, Spark), secure data sharing, external and Iceberg tables, and data unloading.
Key Topics
Must-Know Concepts
- Internal stages: user stages (@~), table stages (@%table_name), named stages (@stage_name). Named stages are shareable and can have file format, encryption, and compression settings.
- External stages reference cloud storage (S3, Azure Blob, GCS) with storage integration objects for credential management. File format objects define parsing rules for CSV, JSON, Parquet, Avro, ORC.
- COPY INTO options: ON_ERROR (CONTINUE, SKIP_FILE, ABORT_STATEMENT), PURGE to remove files after load, FORCE to reload previously loaded files, FILE_FORMAT to specify parsing.
- Snowpipe: SQS/SNS auto-ingest for S3, Storage Queue for Azure, Pub/Sub for GCS. Uses COPY INTO statement internally. Serverless compute. REST API for manual triggering.
- Snowpipe Streaming: SDK-based (Java, Python, Go, and REST API) for row-level inserts. Uses channels for ordered, offset-tracked row delivery with exactly-once semantics. Sub-second latency. No staging files required.
- Stream types: standard (all DML changes: INSERT, UPDATE, DELETE), append-only (INSERT only), insert-only (for external tables). SYSTEM$STREAM_HAS_DATA() checks if stream has unconsumed changes.
- Task prerequisites: a task can have one or more predecessor tasks using AFTER clause. Root tasks are triggered by a schedule (CRON or minutes interval) or manually. Tasks require a running warehouse or serverless compute.
- Dynamic Table lag target: DOWNSTREAM (determined by consuming dynamic tables), explicit duration (e.g., LAG = '1 minute'). Refresh mode: AUTO (Snowflake decides), FULL, or INCREMENTAL.
- Data unloading: COPY INTO @stage SELECT ... with format options. Results can be split into multiple files with MAX_FILE_SIZE. Parallel unloading uses SINGLE=FALSE.
- Data sharing: CREATE SHARE > GRANT privilege on objects > ADD ACCOUNTS. Consumer creates a database FROM SHARE. No data movement — consumer queries provider's storage.
Common Exam Traps
Data Transformation
The second-largest domain at 25%. Covers all ways to transform data within Snowflake: SQL UDFs (scalar and tabular), JavaScript UDFs, Python UDFs, external functions, stored procedures (JavaScript, Python, Scala, Java), semi-structured and unstructured data processing, Snowflake Cortex LLM functions, Git repository integration, and Snowpark DataFrames for Python/Java/Scala.
Key Topics
Must-Know Concepts
- UDF types: SQL UDF (portable, no external dependencies), JavaScript UDF (built-in to Snowflake runtime, no imports), Python UDF (supports packages from Snowflake Anaconda channel), Java UDF (compiled JAR).
- Vectorized (batch) UDFs: process rows as a pandas DataFrame batch rather than row-by-row. Significantly faster for Python UDFs. Defined with the vectorized decorator.
- UDTFs (Table Functions): return multiple rows per input row. Used to explode or expand data. SQL: RETURN TABLE clause. Python: process() method returns iterator of rows.
- External functions: HTTPS endpoint integration via API integration object. Called from SQL like UDFs. Good for ML model endpoints, third-party lookups. Latency is significant vs native UDFs.
- Stored procedures: support JavaScript, Python, Scala, Java, Snowflake Scripting (SQL). Can execute DML and DDL, return values, use exception handling. Execute with EXECUTE AS CALLER or EXECUTE AS OWNER privilege.
- Semi-structured data: VARIANT column stores JSON, Avro, Parquet. Access fields with dot notation (v:field) or bracket notation (v['field']). PARSE_JSON() converts strings to VARIANT. FLATTEN() explodes arrays into rows with LATERAL.
- Unstructured data: STAGE files (PDFs, images) accessed via directory tables. BUILD_SCOPED_FILE_URL() and GET_PRESIGNED_URL() for accessing files from UDFs or Cortex.
- Snowflake Cortex: COMPLETE(model, prompt), SUMMARIZE(text), SENTIMENT(text), TRANSLATE(text, source_lang, target_lang), EXTRACT_ANSWER(question, context), EMBED_TEXT(model, text), CLASSIFY_TEXT(text, categories).
- Snowpark lazy evaluation: transformations build a logical plan, execution occurs at collect()/show()/write(). Pushdown optimization executes DataFrame operations as Snowflake SQL inside the engine.
- Git integration: CREATE OR REPLACE GIT REPOSITORY, fetch with SYSTEM$GIT_REPOSITORY_FETCH(), execute scripts with EXECUTE IMMEDIATE FROM @repo/path/file.sql.
Common Exam Traps
Performance Optimization
The third-largest domain at 21%. Covers query troubleshooting with the query profile, virtual warehouse sizing and multi-cluster configuration, micro-partition pruning and explicit clustering keys, materialized views, search optimization service, query acceleration service, the three-tier cache hierarchy, and credit cost management strategies.
Key Topics
Must-Know Concepts
- Query profile nodes: TableScan (partition pruning %), Aggregate, Join, Sort, Filter, ProjectionFilter. High 'partitions scanned vs total' indicates poor pruning.
- Spill to disk: occurs when a query's intermediate data exceeds the warehouse's memory. Shows as 'Bytes spilled to local/remote storage'. Fix by scaling up the warehouse.
- Warehouse sizing: X-Small (1 node) → Small (2) → Medium (4) → Large (8) → XL (16) → 2XL (32) → 3XL (64) → 4XL (128) → 5XL (256) → 6XL (512 nodes). Each size doubles credits per hour.
- Multi-cluster warehouse modes: Auto-scale (add clusters when queries queue, remove when idle) vs Maximized (all clusters always running). Auto-scale for variable concurrency, Maximized for consistent high concurrency.
- Minimum billed time: all warehouse sizes bill a minimum of 60 seconds each time the warehouse starts or resumes. After the first 60 seconds, billing is per-second while the warehouse remains running.
- Result cache: stores query results for 24 hours per unique SQL query hash + parameter values (each reuse resets the 24-hour window, up to a maximum of 31 days). Free — no compute credits consumed. Invalidated if the underlying table data changes (DML). Warehouse resizing does NOT invalidate the result cache — it only clears the local disk cache.
- Local disk cache (data cache): SSD cache on warehouse nodes that stores recently read micro-partitions. Improves performance for repeated scans. Cleared when the warehouse suspends.
- Remote disk cache: Snowflake's internal Foundation Data Cache layer in cloud storage. Faster than reading from original S3/Blob/GCS but slower than local disk cache.
- Clustering information: SYSTEM$CLUSTERING_INFORMATION(table, columns) returns clustering_ratio, clustering_depth. A clustering_ratio near 1.0 means excellent clustering; depth near 1 is ideal.
- Search optimization service: ENABLE SEARCH_OPTIMIZATION on table. Best for equality, IN predicates, ARRAY_CONTAINS, and substring/LIKE searches (when configured with SUBSTRING search method). Adds cost: additional storage + background maintenance compute.
Common Exam Traps
Storage and Data Protection
Covers 14% of the exam. Topics include Time Travel syntax and retention periods, Fail-safe behavior, cross-region and cross-cloud database replication, business continuity groups and failover, zero-copy cloning, and validating data changes with change tracking for rollback scenarios.
Key Topics
Must-Know Concepts
- Time Travel AT syntax: SELECT ... FROM table AT(TIMESTAMP => '2026-01-01 12:00:00'); AT(OFFSET => -3600); AT(STATEMENT => '<query_id>'); BEFORE(STATEMENT => '<query_id>').
- UNDROP TABLE/SCHEMA/DATABASE: recovers objects dropped within the Time Travel retention period. Must be run in the same session context (database/schema) as the dropped object.
- Time Travel retention defaults: Standard edition: 1 day (max 1 day). Enterprise edition: 1 day default, configurable up to 90 days per object. Set with DATA_RETENTION_TIME_IN_DAYS.
- Fail-safe: always 7 days for permanent tables. Begins immediately after Time Travel expires. Not accessible via SQL. Contact Snowflake Support only for catastrophic failure recovery.
- Transient and temporary tables: NO Fail-safe, Time Travel max 1 day. Lower storage cost. Use for staging or ephemeral data that does not need recovery guarantees.
- Zero-copy clone: CREATE TABLE clone_name CLONE source_table [AT(...)]. Clone shares physical micro-partitions with source. Storage cost is near-zero initially. Clone has its own Time Travel retention starting from clone creation time.
- Cloning includes: all micro-partitions, metadata, and object definition. Does NOT include: virtual warehouses, resource monitors, users, roles, shares. Clones are independent objects — grants on source do NOT transfer.
- Replication: ENABLE REPLICATION OF DATABASE to target accounts in other regions/clouds. ALTER DATABASE ... REFRESH to pull latest changes to secondary. Secondary is read-only.
- Failover groups (business continuity): a named group of Snowflake objects (databases, shares, tasks, resource monitors) that are replicated atomically and can be promoted to primary.
- Connection objects: provide a stable endpoint URL for failover. Redirected to the new primary account without requiring client connection string changes.
Common Exam Traps
Data Governance
Covers 14% of the exam. Topics include object tagging and automated data classification, data lineage, column-level security with dynamic data masking policies, row-level security with row access policies, tag-based masking, and Snowflake Data Clean Rooms for privacy-preserving data collaboration.
Key Topics
Must-Know Concepts
- Object tags: key-value pairs attached to databases, schemas, tables, views, columns. CREATE TAG pii_sensitivity; ALTER TABLE t ALTER COLUMN email SET TAG pii_sensitivity = 'high'.
- Automated data classification: SYSTEM$CLASSIFY() function or Snowsight UI auto-scans columns for PII patterns (email, phone, SSN, etc.) and suggests privacy category tags. Results must be reviewed and applied by a data steward.
- Masking policies: CREATE MASKING POLICY mask_email AS (val STRING) RETURNS STRING -> CASE WHEN IS_ROLE_IN_SESSION('ANALYST') THEN '***@***.com' ELSE val END. Applied with ALTER TABLE t MODIFY COLUMN c SET MASKING POLICY mask_email.
- Tag-based masking: assign a masking policy to a tag so that any column tagged with that tag automatically gets the masking policy applied — no per-column policy assignment needed.
- Row access policies: CREATE ROW ACCESS POLICY region_filter AS (region_val STRING) RETURNS BOOLEAN -> region_val = CURRENT_USER() OR IS_ROLE_IN_SESSION('ADMIN'). Applied with ALTER TABLE t ADD ROW ACCESS POLICY region_filter ON (region).
- Policy hierarchies: a table can have at most one row access policy. A column can have at most one masking policy. Tag-based policies can conflict with directly applied policies — direct assignment takes precedence.
- Data lineage: Snowsight's Data Lineage UI shows upstream and downstream object dependencies derived from query history. ACCESS_HISTORY view in ACCOUNT_USAGE shows per-query, per-column access with a lag of 2-3 hours.
- Data Clean Rooms: provider creates an enclave using Snowflake Native App Framework. Consumer queries are restricted to approved analyses within the enclave. Neither party sees the other's raw data — only aggregate results.
- Governance roles: USERADMIN (manage users), SYSADMIN (manage objects), SECURITYADMIN (manage grants), ACCOUNTADMIN (all). Use custom roles following least privilege. Avoid using ACCOUNTADMIN for daily operations.
- Column-level security vs row-level security: masking policies hide column VALUES but all rows are visible. Row access policies filter ROW VISIBILITY. Both can be applied simultaneously to the same table.
Common Exam Traps
Concepts You Must Not Confuse
These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.
Top Mistakes to Avoid
Exam-Ready Checklist
Recommended Resources
Free & Official Resources
Paid Courses & Practice Exams
These are recommended if you prefer a structured learning path. They can save time but are not required to pass.