CertPrepNow
SnowflakeDEA-C02Updated 2026-06-09

DEA-C02 Study Guide

Everything you need to pass the SnowPro Advanced: Data Engineer exam. Structured study plans, key services, common traps, and practice questions.

You Can Pass This Exam For Free

The SnowPro Advanced: Data Engineer exam is passable with free resources if you have 2+ years of hands-on Snowflake production experience and study consistently for 8-10 weeks:

  • Snowflake official documentation at docs.snowflake.com (free, comprehensive coverage of all exam topics)
  • SnowPro Advanced: Data Engineer exam guide on Snowflake's certification page (free, lists all domains and objectives)
  • Snowflake University free learning paths: Data Engineering with Snowflake, Snowpark for Python/Java (free with Snowflake account)
  • Snowflake quickstart tutorials on quickstarts.snowflake.com — hands-on labs for Snowpipe, Dynamic Tables, Snowpark, Cortex (free)
  • Snowflake Developer Guide and SQL Command Reference (free, essential for syntax and feature details)
  • Snowflake blog and release notes for Snowpark, Dynamic Tables, Cortex LLM functions, and Iceberg table updates
  • Snowflake community forums and Snowflake Summit session recordings (free on YouTube)
  • Free trial Snowflake account for hands-on practice with stages, Snowpipe, tasks, streams, and Snowpark
  • Free practice questions on this site

The DEA-C02 is a scenario-based multiple-choice exam requiring deep practical knowledge. An active SnowPro Core certification is a prerequisite. Free official Snowflake documentation and hands-on trial experience cover all exam domains, but budget extra time for Snowpark, Dynamic Tables, and Cortex LLM functions which are newer features heavily represented on the exam.

Choose Your Study Path

You have SnowPro Core and 1-2 years of Snowflake experience but limited depth in Snowpark, Snowpipe Streaming, Dynamic Tables, or Cortex LLM functions. You need to systematically build advanced data engineering skills before tackling the DEA-C02.

Week 1Review the DEA-C02 exam guide and self-assess against all five domains. Refresh SnowPro Core fundamentals: virtual warehouses, micro-partitions, clustering, Time Travel, and the Snowflake architecture. Set up a free Snowflake trial account for hands-on labs throughout your study period.
Week 2Deep dive into Domain 1: Data Movement. Practice bulk loading with COPY INTO using CSV, JSON, Parquet formats. Create internal and external stages. Configure Snowpipe with auto-ingest and SQS notifications. Understand the difference between Snowpipe and COPY INTO for batch vs streaming scenarios.
Week 3Continue Domain 1: set up Tasks with complex DAGs (predecessor tasks, task graphs), Streams for change data capture on tables, and Dynamic Tables. Understand Stream offset management, SYSTEM$STREAM_HAS_DATA(), and when to choose Dynamic Tables versus Streams+Tasks. Practice Kafka and Spark connector configurations.
Week 4Study Domain 2: Data Transformation. Write SQL UDFs (scalar and tabular), JavaScript UDFs, and Python UDFs. Understand external functions with API Gateway. Build stored procedures in JavaScript and Python. Learn how to process semi-structured data (VARIANT, OBJECT, ARRAY) using FLATTEN, LATERAL JOIN, and dot/bracket notation.
Week 5Continue Domain 2: learn Snowpark in depth. Understand the Snowpark DataFrame API for Python and Java. Write Snowpark Python UDFs and UDTFs. Explore Snowflake Cortex LLM functions (COMPLETE, SUMMARIZE, SENTIMENT, TRANSLATE, EXTRACT_ANSWER, EMBED_TEXT). Set up Git integration for deploying code via Snowflake Git repositories.
Week 6Study Domain 3: Performance Optimization. Learn to read query profiles and identify performance bottlenecks (full table scans, spills to disk, cartesian joins). Configure virtual warehouse sizes and multi-cluster warehouses with auto-suspend and auto-resume. Understand when to scale up vs scale out.
Week 7Continue Domain 3: study micro-partition pruning and manual vs automatic clustering keys. Configure materialized views and understand their refresh behavior. Learn search optimization service for point lookups. Understand query acceleration service for ad-hoc large scans. Learn the result cache, local disk cache, and remote disk cache and when each applies.
Week 8Study Domain 4: Storage and Data Protection. Practice Time Travel queries (AT/BEFORE clause) with timestamp and query ID offsets. Understand Fail-safe (7-day immutable, Snowflake-managed). Configure database/schema/table replication for cross-region and cross-cloud. Practice zero-copy cloning and understand clone metadata behavior.
Week 9Study Domain 5: Data Governance. Configure object tagging and automated data classification. Understand data lineage in Snowflake. Set up column-level security with masking policies and row-level security with row access policies. Learn Data Clean Rooms and the difference between dynamic data masking and tokenization. Practice granting and revoking object-level and column-level privileges.
Week 10Take full mock exams targeting 750/1000+. Review all incorrect answers. Focus extra time on Snowpark (heavily tested), Dynamic Tables, and Cortex LLM functions. The exam uses scenario-based questions — practice explaining WHY a solution is correct, not just WHAT it is.

Exam Overview

Format

65 questions in 115 minutes. Question types include multiple-choice, multi-select, and interactive formats (drag-and-drop, matching). Delivered via Pearson VUE, available as online proctored or at a Pearson VUE testing center. Closed-book with no external resources allowed.

Scoring

Scaled scoring from 0-1000. Passing score: 750/1000. No penalty for wrong answers — always answer every question. Score report provided immediately after exam. Prerequisite: active SnowPro Core certification required at time of exam.

Domains & Weights

  • Data Movement26%
  • Data Transformation25%
  • Performance Optimization21%
  • Storage and Data Protection14%
  • Data Governance14%

Registration

$375 USD. Register at home.pearsonvue.com/snowflake. Exam fee is $375 USD. Certification is valid for 2 years. Retake policy: 7-day waiting period between attempts, maximum 4 retakes in any 12-month period at $375 each. Recertification options: DEA-R02 recertification exam ($188), equivalent higher-level certification, or Snowflake instructor-led training. A Pearson VUE account is required.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these features deeply, know their syntax and behavior, and apply them in complex scenario-based questions. These appear across multiple questions and multiple domains.
Tier 2: Should KnowUnderstand what these features are, their key characteristics, and how they integrate with the Must Know features. May appear in 2-5 questions each.
Tier 3: Recognize OnlyKnow what these are at a high level and their role in Snowflake's data engineering ecosystem. Rarely more than 1-2 questions each.
Domain 126% of exam

Data Movement

The largest domain at 26%. Covers all aspects of getting data into and out of Snowflake: staging (internal and external), bulk loading with COPY INTO, continuous ingestion with Snowpipe (file-based and Snowpipe Streaming), Tasks and Streams for pipeline orchestration, Dynamic Tables for declarative transformation pipelines, connector-based ingestion (Kafka, Spark), secure data sharing, external and Iceberg tables, and data unloading.

Key Topics

COPY INTOSnowpipeSnowpipe StreamingInternal StagesExternal StagesTasksStreamsDynamic TablesKafka ConnectorSpark ConnectorData SharingExternal TablesIceberg TablesGET/PUT Commands

Must-Know Concepts

  • Internal stages: user stages (@~), table stages (@%table_name), named stages (@stage_name). Named stages are shareable and can have file format, encryption, and compression settings.
  • External stages reference cloud storage (S3, Azure Blob, GCS) with storage integration objects for credential management. File format objects define parsing rules for CSV, JSON, Parquet, Avro, ORC.
  • COPY INTO options: ON_ERROR (CONTINUE, SKIP_FILE, ABORT_STATEMENT), PURGE to remove files after load, FORCE to reload previously loaded files, FILE_FORMAT to specify parsing.
  • Snowpipe: SQS/SNS auto-ingest for S3, Storage Queue for Azure, Pub/Sub for GCS. Uses COPY INTO statement internally. Serverless compute. REST API for manual triggering.
  • Snowpipe Streaming: SDK-based (Java, Python, Go, and REST API) for row-level inserts. Uses channels for ordered, offset-tracked row delivery with exactly-once semantics. Sub-second latency. No staging files required.
  • Stream types: standard (all DML changes: INSERT, UPDATE, DELETE), append-only (INSERT only), insert-only (for external tables). SYSTEM$STREAM_HAS_DATA() checks if stream has unconsumed changes.
  • Task prerequisites: a task can have one or more predecessor tasks using AFTER clause. Root tasks are triggered by a schedule (CRON or minutes interval) or manually. Tasks require a running warehouse or serverless compute.
  • Dynamic Table lag target: DOWNSTREAM (determined by consuming dynamic tables), explicit duration (e.g., LAG = '1 minute'). Refresh mode: AUTO (Snowflake decides), FULL, or INCREMENTAL.
  • Data unloading: COPY INTO @stage SELECT ... with format options. Results can be split into multiple files with MAX_FILE_SIZE. Parallel unloading uses SINGLE=FALSE.
  • Data sharing: CREATE SHARE > GRANT privilege on objects > ADD ACCOUNTS. Consumer creates a database FROM SHARE. No data movement — consumer queries provider's storage.

Common Exam Traps

Snowpipe DOES NOT use virtual warehouse credits — it uses Snowflake-managed serverless compute. COPY INTO DOES use virtual warehouse credits.
A stream's offset advances ONLY when the consuming transaction commits. If a task consuming a stream fails mid-execution, the offset stays at the previous position.
Dynamic Tables with DOWNSTREAM lag propagate target staleness through the pipeline — each downstream table waits for its upstream tables to refresh first.
External tables do NOT cache data — every query reads directly from cloud storage. Performance is slower than internal tables due to lack of micro-partition metadata and caching.
Snowpipe Streaming channels are per-table and per-client. Each channel has an independent offset token for exactly-once delivery. Multiple clients writing to the same table use separate channels. On recovery, call getLatestCommittedOffsetToken to resume without duplicates.
Quick Check: Data Movement

Question 1 of 3

A data engineering team needs to ingest log files from an S3 bucket into Snowflake within 2 minutes of file creation. Files arrive continuously throughout the day. Which approach provides the lowest operational overhead while meeting the latency requirement?

Domain 225% of exam

Data Transformation

The second-largest domain at 25%. Covers all ways to transform data within Snowflake: SQL UDFs (scalar and tabular), JavaScript UDFs, Python UDFs, external functions, stored procedures (JavaScript, Python, Scala, Java), semi-structured and unstructured data processing, Snowflake Cortex LLM functions, Git repository integration, and Snowpark DataFrames for Python/Java/Scala.

Key Topics

SQL UDFsJavaScript UDFsPython UDFsVectorized UDFsUDTFsExternal FunctionsStored ProceduresVARIANTFLATTENCortex FunctionsSnowpark DataFramesGit Repository Integration

Must-Know Concepts

  • UDF types: SQL UDF (portable, no external dependencies), JavaScript UDF (built-in to Snowflake runtime, no imports), Python UDF (supports packages from Snowflake Anaconda channel), Java UDF (compiled JAR).
  • Vectorized (batch) UDFs: process rows as a pandas DataFrame batch rather than row-by-row. Significantly faster for Python UDFs. Defined with the vectorized decorator.
  • UDTFs (Table Functions): return multiple rows per input row. Used to explode or expand data. SQL: RETURN TABLE clause. Python: process() method returns iterator of rows.
  • External functions: HTTPS endpoint integration via API integration object. Called from SQL like UDFs. Good for ML model endpoints, third-party lookups. Latency is significant vs native UDFs.
  • Stored procedures: support JavaScript, Python, Scala, Java, Snowflake Scripting (SQL). Can execute DML and DDL, return values, use exception handling. Execute with EXECUTE AS CALLER or EXECUTE AS OWNER privilege.
  • Semi-structured data: VARIANT column stores JSON, Avro, Parquet. Access fields with dot notation (v:field) or bracket notation (v['field']). PARSE_JSON() converts strings to VARIANT. FLATTEN() explodes arrays into rows with LATERAL.
  • Unstructured data: STAGE files (PDFs, images) accessed via directory tables. BUILD_SCOPED_FILE_URL() and GET_PRESIGNED_URL() for accessing files from UDFs or Cortex.
  • Snowflake Cortex: COMPLETE(model, prompt), SUMMARIZE(text), SENTIMENT(text), TRANSLATE(text, source_lang, target_lang), EXTRACT_ANSWER(question, context), EMBED_TEXT(model, text), CLASSIFY_TEXT(text, categories).
  • Snowpark lazy evaluation: transformations build a logical plan, execution occurs at collect()/show()/write(). Pushdown optimization executes DataFrame operations as Snowflake SQL inside the engine.
  • Git integration: CREATE OR REPLACE GIT REPOSITORY, fetch with SYSTEM$GIT_REPOSITORY_FETCH(), execute scripts with EXECUTE IMMEDIATE FROM @repo/path/file.sql.

Common Exam Traps

Python UDFs using libraries not in the Snowflake Anaconda channel require uploading the package to a stage and referencing it with IMPORTS — they cannot import arbitrary internet packages directly.
External functions are row-based and have network latency for every batch of rows. For high-volume transformations, Snowpark Python UDFs are significantly faster.
EXECUTE AS CALLER stored procedures run with the CALLER's privileges (can access objects the caller has access to). EXECUTE AS OWNER runs with the PROCEDURE OWNER's privileges (used for privilege escalation patterns).
Snowpark pushdown only applies to Snowpark DataFrame API operations. If you call a custom Python function that is NOT part of the Snowpark API inside a DataFrame transformation, that code runs CLIENT-SIDE and data must be transferred.
VARIANT columns store a maximum of 16MB per value. For larger semi-structured documents, splitting or chunking is required before loading.
Quick Check: Data Transformation

Question 1 of 3

A data engineer needs to apply a Python-based natural language processing function to classify 100 million rows of customer comments stored in Snowflake. The function uses pandas for batch processing. Which approach maximizes performance?

Domain 321% of exam

Performance Optimization

The third-largest domain at 21%. Covers query troubleshooting with the query profile, virtual warehouse sizing and multi-cluster configuration, micro-partition pruning and explicit clustering keys, materialized views, search optimization service, query acceleration service, the three-tier cache hierarchy, and credit cost management strategies.

Key Topics

Query ProfileVirtual WarehousesMulti-cluster WarehousesMicro-partitionsClustering KeysAutomatic ClusteringMaterialized ViewsSearch OptimizationQuery AccelerationResult CacheLocal Disk CacheRemote Disk Cache

Must-Know Concepts

  • Query profile nodes: TableScan (partition pruning %), Aggregate, Join, Sort, Filter, ProjectionFilter. High 'partitions scanned vs total' indicates poor pruning.
  • Spill to disk: occurs when a query's intermediate data exceeds the warehouse's memory. Shows as 'Bytes spilled to local/remote storage'. Fix by scaling up the warehouse.
  • Warehouse sizing: X-Small (1 node) → Small (2) → Medium (4) → Large (8) → XL (16) → 2XL (32) → 3XL (64) → 4XL (128) → 5XL (256) → 6XL (512 nodes). Each size doubles credits per hour.
  • Multi-cluster warehouse modes: Auto-scale (add clusters when queries queue, remove when idle) vs Maximized (all clusters always running). Auto-scale for variable concurrency, Maximized for consistent high concurrency.
  • Minimum billed time: all warehouse sizes bill a minimum of 60 seconds each time the warehouse starts or resumes. After the first 60 seconds, billing is per-second while the warehouse remains running.
  • Result cache: stores query results for 24 hours per unique SQL query hash + parameter values (each reuse resets the 24-hour window, up to a maximum of 31 days). Free — no compute credits consumed. Invalidated if the underlying table data changes (DML). Warehouse resizing does NOT invalidate the result cache — it only clears the local disk cache.
  • Local disk cache (data cache): SSD cache on warehouse nodes that stores recently read micro-partitions. Improves performance for repeated scans. Cleared when the warehouse suspends.
  • Remote disk cache: Snowflake's internal Foundation Data Cache layer in cloud storage. Faster than reading from original S3/Blob/GCS but slower than local disk cache.
  • Clustering information: SYSTEM$CLUSTERING_INFORMATION(table, columns) returns clustering_ratio, clustering_depth. A clustering_ratio near 1.0 means excellent clustering; depth near 1 is ideal.
  • Search optimization service: ENABLE SEARCH_OPTIMIZATION on table. Best for equality, IN predicates, ARRAY_CONTAINS, and substring/LIKE searches (when configured with SUBSTRING search method). Adds cost: additional storage + background maintenance compute.

Common Exam Traps

Result cache is shared across users — if user A runs a query and user B runs the IDENTICAL query within 24 hours, B gets the cached result for free. The result cache is invalidated when underlying table data changes (DML), not when the warehouse is resized. Each reuse extends the 24-hour window up to a maximum of 31 days.
Local disk cache is WAREHOUSE-SPECIFIC — it is cleared when the warehouse suspends. If you suspend and resume a warehouse, the next queries may be slower until the cache warms up.
Query acceleration service activates automatically for eligible queries when enabled on the warehouse. It does NOT require per-query configuration. You set QUERY_ACCELERATION_MAX_SCALE_FACTOR to limit cost.
Automatic clustering has an ongoing credit cost — it runs a background service to continuously reorganize micro-partitions. Evaluate whether the query performance improvement justifies the clustering service cost.
Search optimization service and automatic clustering can coexist but serve different scenarios. Apply search optimization for point lookups; use clustering for range scans and common filter patterns on large tables.
Quick Check: Performance Optimization

Question 1 of 3

A data engineer's query on a 10TB table shows in the query profile that only 3% of partitions were pruned (97% were scanned). The WHERE clause filters on a TRANSACTION_DATE column. The table was loaded in random order. What is the most cost-effective fix?

Domain 414% of exam

Storage and Data Protection

Covers 14% of the exam. Topics include Time Travel syntax and retention periods, Fail-safe behavior, cross-region and cross-cloud database replication, business continuity groups and failover, zero-copy cloning, and validating data changes with change tracking for rollback scenarios.

Key Topics

Time TravelFail-safeUNDROPDatabase ReplicationReplication GroupsFailover GroupsConnection ObjectsZero-copy CloningAT / BEFORE Clauses

Must-Know Concepts

  • Time Travel AT syntax: SELECT ... FROM table AT(TIMESTAMP => '2026-01-01 12:00:00'); AT(OFFSET => -3600); AT(STATEMENT => '<query_id>'); BEFORE(STATEMENT => '<query_id>').
  • UNDROP TABLE/SCHEMA/DATABASE: recovers objects dropped within the Time Travel retention period. Must be run in the same session context (database/schema) as the dropped object.
  • Time Travel retention defaults: Standard edition: 1 day (max 1 day). Enterprise edition: 1 day default, configurable up to 90 days per object. Set with DATA_RETENTION_TIME_IN_DAYS.
  • Fail-safe: always 7 days for permanent tables. Begins immediately after Time Travel expires. Not accessible via SQL. Contact Snowflake Support only for catastrophic failure recovery.
  • Transient and temporary tables: NO Fail-safe, Time Travel max 1 day. Lower storage cost. Use for staging or ephemeral data that does not need recovery guarantees.
  • Zero-copy clone: CREATE TABLE clone_name CLONE source_table [AT(...)]. Clone shares physical micro-partitions with source. Storage cost is near-zero initially. Clone has its own Time Travel retention starting from clone creation time.
  • Cloning includes: all micro-partitions, metadata, and object definition. Does NOT include: virtual warehouses, resource monitors, users, roles, shares. Clones are independent objects — grants on source do NOT transfer.
  • Replication: ENABLE REPLICATION OF DATABASE to target accounts in other regions/clouds. ALTER DATABASE ... REFRESH to pull latest changes to secondary. Secondary is read-only.
  • Failover groups (business continuity): a named group of Snowflake objects (databases, shares, tasks, resource monitors) that are replicated atomically and can be promoted to primary.
  • Connection objects: provide a stable endpoint URL for failover. Redirected to the new primary account without requiring client connection string changes.

Common Exam Traps

Transient tables have NO Fail-safe and a maximum Time Travel of 1 day. Temporary tables also have no Fail-safe and exist only for the duration of the session.
A clone created AT a specific timestamp clones the data AS OF that timestamp but starts its OWN Time Travel retention from the time the clone was created, not from the source's history.
Data replication replicates DATA objects (databases, schemas, tables) but NOT compute objects (virtual warehouses, resource monitors). Warehouses must be recreated manually on the secondary.
Reducing DATA_RETENTION_TIME_IN_DAYS to 0 eliminates Time Travel but does NOT eliminate Fail-safe — Fail-safe always runs its 7 days for permanent tables.
Database replication is one-directional: primary to secondary. Only the secondary can be promoted to primary (failover). Multiple secondaries are possible for read scaling or regional distribution.
Quick Check: Storage and Data Protection

Question 1 of 3

A data engineer accidentally runs DELETE FROM ORDERS WHERE ORDER_DATE < '2026-01-01', removing 2 million historical records. The table has a 7-day Time Travel retention period. The query ID is available. How can the deleted rows be recovered?

Domain 514% of exam

Data Governance

Covers 14% of the exam. Topics include object tagging and automated data classification, data lineage, column-level security with dynamic data masking policies, row-level security with row access policies, tag-based masking, and Snowflake Data Clean Rooms for privacy-preserving data collaboration.

Key Topics

Object TagsData ClassificationMasking PoliciesRow Access PoliciesTag-based MaskingData LineageColumn-level SecurityRow-level SecurityData Clean RoomsAccess History

Must-Know Concepts

  • Object tags: key-value pairs attached to databases, schemas, tables, views, columns. CREATE TAG pii_sensitivity; ALTER TABLE t ALTER COLUMN email SET TAG pii_sensitivity = 'high'.
  • Automated data classification: SYSTEM$CLASSIFY() function or Snowsight UI auto-scans columns for PII patterns (email, phone, SSN, etc.) and suggests privacy category tags. Results must be reviewed and applied by a data steward.
  • Masking policies: CREATE MASKING POLICY mask_email AS (val STRING) RETURNS STRING -> CASE WHEN IS_ROLE_IN_SESSION('ANALYST') THEN '***@***.com' ELSE val END. Applied with ALTER TABLE t MODIFY COLUMN c SET MASKING POLICY mask_email.
  • Tag-based masking: assign a masking policy to a tag so that any column tagged with that tag automatically gets the masking policy applied — no per-column policy assignment needed.
  • Row access policies: CREATE ROW ACCESS POLICY region_filter AS (region_val STRING) RETURNS BOOLEAN -> region_val = CURRENT_USER() OR IS_ROLE_IN_SESSION('ADMIN'). Applied with ALTER TABLE t ADD ROW ACCESS POLICY region_filter ON (region).
  • Policy hierarchies: a table can have at most one row access policy. A column can have at most one masking policy. Tag-based policies can conflict with directly applied policies — direct assignment takes precedence.
  • Data lineage: Snowsight's Data Lineage UI shows upstream and downstream object dependencies derived from query history. ACCESS_HISTORY view in ACCOUNT_USAGE shows per-query, per-column access with a lag of 2-3 hours.
  • Data Clean Rooms: provider creates an enclave using Snowflake Native App Framework. Consumer queries are restricted to approved analyses within the enclave. Neither party sees the other's raw data — only aggregate results.
  • Governance roles: USERADMIN (manage users), SYSADMIN (manage objects), SECURITYADMIN (manage grants), ACCOUNTADMIN (all). Use custom roles following least privilege. Avoid using ACCOUNTADMIN for daily operations.
  • Column-level security vs row-level security: masking policies hide column VALUES but all rows are visible. Row access policies filter ROW VISIBILITY. Both can be applied simultaneously to the same table.

Common Exam Traps

Automated data classification SUGGESTS tags — it does not automatically apply masking policies. A governance workflow must review suggestions and apply both the tag AND any associated policies.
Masking policies are evaluated at query time, not storage time. The underlying data is stored unmasked. Result cache does NOT share results across roles that see different masked values — masking bypasses the result cache for users with different masking outcomes.
Row access policies evaluated to FALSE return NO rows — they do not throw an error. A misconfigured row access policy that always returns FALSE will silently return zero rows to all users, which is easy to miss.
Data Clean Rooms do NOT share raw data — only aggregate query results leave the enclave. The provider controls which analyses are allowed. This is distinct from Secure Data Sharing where consumers can see row-level data.
ACCESS_HISTORY records column-level access but has a latency of 2-3 hours. Queries do NOT show up in ACCESS_HISTORY immediately after execution.
Quick Check: Data Governance

Question 1 of 3

A healthcare company stores patient data in Snowflake. Business analysts should see patient names and emails as masked (first initial only, domain masked). Data stewards with the STEWARD role should see the full values. A tag-based governance approach is required so new PII columns are automatically governed. Which sequence of steps is correct?

Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

COPY INTO (Batch Loading) vs Snowpipe (Continuous Ingestion)

Use COPY INTO (Batch Loading) when…

Loads all staged files in a single batch operation using a virtual warehouse. Requires explicit execution (manual, scheduled task, or Airflow). Good for large periodic bulk loads where latency of minutes to hours is acceptable.

Use Snowpipe (Continuous Ingestion) when…

Continuously loads files as they arrive in a stage using event notifications. Serverless compute (no virtual warehouse required). Good for near-real-time data ingestion where data freshness within minutes matters.

Exam trap

COPY INTO uses virtual warehouse credits. Snowpipe uses serverless compute credits — it is NOT free. Classic file-based Snowpipe provides at-least-once delivery, not exactly-once. For sub-second latency row-level streaming, neither is appropriate — use Snowpipe Streaming (SDK-based, which provides exactly-once delivery via offset tokens) instead.

Snowpipe (File-based) vs Snowpipe Streaming (SDK-based)

Use Snowpipe (File-based) when…

Triggers on file arrival in a stage. Uses SQS/Event Grid/Pub-Sub notifications. Ingests files as atomic units. Latency: typically 1-5 minutes. Good for micro-batched file delivery from producers that write files.

Use Snowpipe Streaming (SDK-based) when…

SDK-based row-level streaming API. Inserts rows directly into Snowflake channels without staging files. Latency: sub-second to seconds. Exactly-once delivery via offset token tracking. Good for real-time CDC, IoT, and event streaming directly from applications.

Exam trap

Snowpipe Streaming uses CHANNELS, not stages. Each channel is an ordered stream of rows into a table. Channels provide offset token tracking enabling exactly-once delivery (unlike classic Snowpipe's at-least-once). Snowpipe Streaming does NOT use SQS notifications — it is SDK-driven from the application.

Streams + Tasks vs Dynamic Tables

Use Streams + Tasks when…

Streams capture CDC changes on source tables. Tasks execute SQL/stored procedures on a schedule to consume stream data. Flexible, supports complex multi-step logic, custom error handling, and non-SQL transformations via stored procedures.

Use Dynamic Tables when…

Declarative SQL-defined materialized tables with automatic incremental refresh. Simpler to define and maintain. Lag target controls maximum staleness. Snowflake manages refresh scheduling automatically without task scheduling logic.

Exam trap

Dynamic Tables are preferred when the transformation is purely SQL and you want declarative pipeline definition without scheduling complexity. Streams+Tasks are preferred when you need procedural logic, custom error handling, multi-target writes, or non-SQL transformations (e.g., calling external APIs, Snowpark code).

Scale Up (Larger Warehouse) vs Scale Out (Multi-cluster Warehouse)

Use Scale Up (Larger Warehouse) when…

Run a larger warehouse size (e.g., XL → 2XL). Increases compute per query — more memory, more parallelism within a single query. Helps with complex joins, large sorts, and queries that spill to disk.

Use Scale Out (Multi-cluster Warehouse) when…

Configure a multi-cluster warehouse with auto-scale. When queries queue because all clusters are busy, new clusters start automatically. Reduces query queuing under concurrent user load.

Exam trap

Scale up helps a SINGLE COMPLEX QUERY run faster. Scale out helps MANY CONCURRENT QUERIES avoid queuing. If users are waiting because a single query runs too slowly, scale up. If many users are waiting because all compute slots are occupied, scale out.

Search Optimization Service vs Query Acceleration Service

Use Search Optimization Service when…

Improves performance for highly selective point lookups: equality predicates (col = value), IN predicates, substring searches (LIKE with leading wildcard). Creates a persistent search access path maintained in the background.

Use Query Acceleration Service when…

Reduces wall-clock time for large ad-hoc analytical queries that scan many partitions. Offloads portions of eligible queries to additional serverless compute nodes. Best for unpredictable, large-scan analytics.

Exam trap

Search optimization and query acceleration target opposite query patterns. Search optimization: few matching rows (high selectivity). Query acceleration: many rows scanned (low selectivity, full or near-full table scans). Both have additional credit costs and must be enabled explicitly.

Time Travel vs Fail-safe

Use Time Travel when…

User-accessible historical data recovery. Query data as it existed at a past timestamp or before a specific DML statement. Restore dropped objects with UNDROP. Configurable retention: 0-1 day (Standard) or 0-90 days (Enterprise). Accessible via SQL (AT/BEFORE clauses).

Use Fail-safe when…

Snowflake-managed disaster recovery layer. 7 days immutable, always. Not accessible via SQL — only Snowflake Support can recover data. Activated only for catastrophic failures (Snowflake infrastructure issues). No user configuration.

Exam trap

Time Travel and Fail-safe are sequential, not overlapping. Data is in Time Travel for up to 90 days. After Time Travel expires, data enters Fail-safe for 7 days. After Fail-safe expires, data is permanently deleted. Reducing Time Travel retention to 0 skips Time Travel but Fail-safe always runs its 7 days.

Dynamic Data Masking vs Row Access Policies

Use Dynamic Data Masking when…

Applies to COLUMNS. Replaces the actual column value with a masked alternative (NULL, hash, partial mask, format-preserving mask) based on the querying role. All rows are visible but sensitive column values are hidden from unauthorized roles.

Use Row Access Policies when…

Applies to ROWS. Filters which rows are returned based on the querying role or user. A sales rep sees only their own region's data; a manager sees all regions. All columns are visible but row count may vary by role.

Exam trap

Masking policies and row access policies can be combined on the same table — you can hide sensitive column values AND restrict which rows are visible simultaneously. Both policies are applied at query time, not at storage time. Column-level masking does not prevent COUNT(*) from revealing how many rows exist.

Materialized Views vs Dynamic Tables

Use Materialized Views when…

Precomputed query result stored as a table. Automatically refreshed when source data changes using Snowflake-managed serverless compute. Long-standing feature, good stability. Restrictions: no UDFs, no non-deterministic functions, no CTEs, no LATERAL joins.

Use Dynamic Tables when…

Declarative incremental materialized table driven by a SQL query with a lag target. Newer feature with fewer restrictions, supports pipeline chaining, and incremental refresh for most SQL constructs. Preferred for new development.

Exam trap

Materialized views refresh automatically on source data change. Dynamic Tables refresh based on a lag target (how stale is acceptable). For new complex pipelines requiring UDFs or CTEs, Dynamic Tables are the correct choice. Materialized views are simpler but have significant SQL restrictions.

Top Mistakes to Avoid

Confusing COPY INTO (uses virtual warehouse credits, batch) with Snowpipe (serverless credits, continuous) — classic Snowpipe is not free and provides at-least-once delivery. Snowpipe Streaming provides exactly-once delivery via offset tokens — do not confuse the two
Mixing up scale up (larger warehouse = better single-query performance) with scale out (multi-cluster = better concurrency) — choose based on whether the problem is query complexity or user concurrency
Thinking Dynamic Tables use a fixed refresh schedule like Tasks — Dynamic Tables use a lag TARGET and Snowflake decides when to refresh to meet that staleness threshold
Believing stream offsets advance on task execution — stream offsets only advance when the consuming transaction COMMITS, so failed tasks leave the offset unchanged
Confusing search optimization (equality/IN point lookups) with query acceleration (large ad-hoc scans) — they target opposite selectivity patterns and can coexist on the same table
Assuming result cache is warehouse-specific — the result cache is ACCOUNT-level and shared across users, invalidated only by table DML (not by warehouse resize or suspend). Local disk cache is warehouse-specific and is cleared on warehouse suspend or resize
Forgetting that Time Travel and Fail-safe are sequential, not overlapping — Time Travel expires first (0-90 days configurable), then Fail-safe runs its fixed 7 days
Treating Snowpark as client-side Python that connects to Snowflake — Snowpark runs inside Snowflake via pushdown optimization. Only calling non-Snowpark Python functions breaks pushdown and runs client-side
Applying a masking policy and expecting the result cache to serve different masked values to different roles — masking bypasses result cache sharing across roles with different masking outcomes
Confusing automated data classification (suggests tags only) with automated governance (there is no automatic policy application — a data steward must apply the actual policies)
Cloning a database and assuming grants are inherited — zero-copy clones do NOT inherit privileges from the source object; grants must be re-applied on the clone
Misapplying row access policies that use CURRENT_USER() when CURRENT_ROLE() or a mapping table was intended, silently returning zero rows to all non-admin users

Exam-Ready Checklist

Can explain all 5 exam domains and their weights: 26%, 25%, 21%, 14%, 14%
Know when to use COPY INTO (batch, virtual warehouse) versus Snowpipe (continuous, serverless) versus Snowpipe Streaming (SDK, row-level, sub-second latency)
Understand stream types (standard, append-only, insert-only), stream offset behavior on COMMIT, and SYSTEM$STREAM_HAS_DATA() usage in task conditions
Can explain Dynamic Table lag targets, refresh modes (full vs incremental), and when Dynamic Tables are preferred over Streams+Tasks pipelines
Know all Snowflake Cortex LLM functions: COMPLETE, SUMMARIZE, SENTIMENT, TRANSLATE, EXTRACT_ANSWER, EMBED_TEXT, CLASSIFY_TEXT and which use case each addresses
Understand Snowpark lazy evaluation, pushdown optimization, when pushdown fails, vectorized UDFs, and the difference between EXECUTE AS OWNER and EXECUTE AS CALLER for stored procedures
Can explain scale up (larger warehouse, better single-query performance) vs scale out (multi-cluster, better concurrency) and when to apply each
Know all three cache tiers: result cache (24 hours, account-level, free, invalidated by DML not by warehouse changes, reusable up to 31 days), local disk cache (warehouse SSD, cleared on suspend/resize), remote disk cache (Snowflake Foundation layer)
Understand Time Travel AT/BEFORE syntax, retention period limits by edition, when to use UNDROP vs Time Travel queries, and the sequential Time Travel → Fail-safe recovery timeline
Know the difference between masking policies (hide column values, role-based) and row access policies (filter rows, role/user-based), and how tag-based masking enables scalable governance
Can explain zero-copy cloning storage model (shared micro-partitions, copy-on-write on modification) and that clones do NOT inherit source object grants
Understand Data Clean Rooms: privacy-preserving joint analysis where neither party sees the other's raw data — distinct from Secure Data Sharing
Know search optimization (equality/IN, point lookups) vs query acceleration (large scans, ad-hoc analytics) and their different use cases
Scored 750/1000+ on at least two full-length mock exams. Aim for 800+ in practice for a comfortable margin on exam day

Recommended Resources

Free & Official Resources

SnowPro Advanced: Data Engineer Exam Study Guide

Official Snowflake certification page with the exam guide, study materials, and candidate handbook for DEA-C02.

Official

Snowflake Documentation

Comprehensive reference for all Snowflake features. Essential reading for Snowpipe, Dynamic Tables, Snowpark, Cortex functions, masking policies, and replication.

Free

Snowflake Quickstart Tutorials

Free hands-on tutorials for Snowpipe, Snowpark Python, Dynamic Tables, Cortex LLM functions, data governance, and more. Strongly recommended for hands-on practice.

Free

Snowflake University — Data Engineering Learning Path

Free self-paced courses from Snowflake covering data loading, transformation, Snowpark, and advanced features. Requires a Snowflake account login.

Free

Snowflake Developer Guide

Deep technical documentation for Snowpark, UDFs, stored procedures, external functions, and Git repository integration.

Free

Dynamic Tables Documentation

Complete guide to Dynamic Tables including lag targets, refresh modes, pipeline design, and comparison with Streams+Tasks.

Free

Snowflake Cortex AI Documentation

Reference for all Cortex LLM functions with syntax, supported models, and usage examples.

Free

Snowflake Blog — Engineering Category

Technical blog posts on new features like Dynamic Tables, Snowpipe Streaming, Iceberg Tables, and Cortex. Good for staying current on exam-relevant features.

Free

Free DEA-C02 Practice Questions

Free practice questions on this site covering all DEA-C02 exam domains with detailed explanations.

Free

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Frequently Asked Questions