General Exam Tips
- 1.Read ALL answer options before selecting — many questions have a 'most correct' answer, not just a correct one. Snowflake scenarios often have two plausible options; the distinction is usually cost efficiency or operational overhead.
- 2.Time budget: 65 questions in 115 minutes = ~1.75 min/question. Flag complex scenario questions and return to them. Never spend more than 3 minutes on one question on the first pass.
- 3.No penalty for wrong answers — always answer every question, even guesses. Leave nothing blank.
- 4.Scenario questions describe a business constraint (latency, cost, simplicity). Map the constraint to the correct Snowflake feature before reading the options. This prevents the options from anchoring you to wrong choices.
- 5.When two answers both seem right, ask: which is lower operational overhead? Which is more cost-efficient? The exam prefers native Snowflake features over external solutions when Snowflake has a built-in equivalent.
- 6.Multi-select questions (select two or three) are harder — partial credit is not given. If unsure, use elimination to narrow to the most defensible combination.
- 7.DEA-C02 was updated in February 2025. If you studied for DEA-C01, note that Domain 4 is now 'Data Governance' (not 'Security') and Iceberg tables, unstructured data ingestion, and Cortex LLM functions received increased coverage.
- 8.About 30% of questions test SQL syntax directly. The rest test architectural judgment about which Snowflake pattern fits a scenario. Practice both; do not over-index on memorizing syntax alone.
- 9.When a question describes spill-to-disk in the query profile, the answer is always 'scale up the warehouse'. When a question describes query queuing, the answer is always 'multi-cluster warehouse (scale out)'. These patterns are reliable.
- 10.SnowPro Core is a prerequisite. If Core concepts feel shaky, review micro-partitions, warehouses, roles, and basic SQL before going deep on Advanced topics.
Quick Navigation
Data Movement
Must-Know Facts
- COPY INTO uses VIRTUAL WAREHOUSE credits (your compute). Snowpipe uses SERVERLESS compute credits (Snowflake-managed, separate billing). This cost distinction appears on the exam explicitly.
- Snowpipe delivery semantics: AT-LEAST-ONCE. Duplicate rows are possible if a file notification is replayed. Downstream deduplication is required for exactly-once guarantees.
- Snowpipe Streaming (SDK-based) delivers EXACTLY-ONCE via offset token tracking per channel. It requires NO staging files and achieves sub-second latency. Classic Snowpipe and Snowpipe Streaming have fundamentally different delivery contracts.
- Snowpipe Streaming uses CHANNELS (not stages, not SQS). Each channel is an ordered sequence of rows into a table. On recovery, call getLatestCommittedOffsetToken() to resume without duplicates.
- Stream offset advances ONLY on a successful COMMIT of the consuming transaction. A task that starts but fails (without committing) does NOT advance the stream offset. The unchanged offset means the next run will see the same data.
- SYSTEM$STREAM_HAS_DATA('stream_name') returns TRUE if there are unconsumed changes. Used in task WHEN clauses to avoid wasted compute running tasks against empty streams.
- Stream types: standard (INSERT + UPDATE + DELETE), append-only (INSERT only, more efficient for insert-only pipelines), insert-only (for external tables only — no DML tracking, just new files).
- Dynamic Table lag target is a MAXIMUM STALENESS threshold, not a fixed schedule. Snowflake chooses when to refresh. DOWNSTREAM lag propagates staleness from consuming dynamic tables up the pipeline.
- Dynamic Table refresh modes: INCREMENTAL (default for supported SQL patterns), FULL (forced when SQL cannot support incremental refresh, e.g., certain window functions or multiple GROUP BYs). AUTO lets Snowflake decide.
- External stages use STORAGE INTEGRATION objects for credential management — not hardcoded access keys. Storage integrations require configuration on both the Snowflake side and the cloud provider IAM side.
- Kafka Connector for Snowflake can use either classic Snowpipe (file-based, at-least-once) or Snowpipe Streaming mode (row-level, exactly-once). The streaming mode is preferred for lower latency.
- Data unloading COPY INTO exports to a stage with FILE_FORMAT, MAX_FILE_SIZE, HEADER, and SINGLE options. SINGLE=FALSE (default) writes multiple parallel files; SINGLE=TRUE writes one file.
- Snowflake Secure Data Sharing: no data is copied or moved. Consumers query the provider's actual storage with their own warehouse. Changes to the source are immediately visible to consumers.
Common Traps
Confusing Pairs
Scenario Tips
Need to ingest files from S3 within 5 minutes of arrival with lowest operational overhead
Snowpipe with AUTO_INGEST = TRUE and S3 event notifications to SQS. It triggers automatically on file arrival, uses serverless compute, and requires no scheduling code.
A scheduled Task running COPY INTO every 5 minutes — this requires a virtual warehouse to be running and may queue, missing the SLA during periods of warehouse contention.
Ingest CDC events from a Java application at sub-second latency with exactly-once delivery
Snowpipe Streaming SDK with channels and offset token tracking. It bypasses file staging and SQS entirely, achieving sub-second latency with exactly-once semantics.
Classic Snowpipe — wrong because classic Snowpipe requires staging files and provides only at-least-once delivery. The scenario specifies exactly-once, which only Snowpipe Streaming provides.
Build a daily aggregation pipeline that processes only new orders added since the last run
Append-only stream on the ORDERS table + a Task consuming the stream with SYSTEM$STREAM_HAS_DATA() in the WHEN clause. Append-only stream is more efficient than standard stream for insert-only tables.
Standard stream — works but is heavier overhead when only INSERTs need to be captured. If the question specifies insert-only pattern and asks for efficiency, append-only is preferred.
A company wants to share live product catalog data with 5 partners. Partners must see updates within minutes and cannot copy the data.
Snowflake Secure Data Sharing via CREATE SHARE. Consumer queries the provider's actual storage in real time. No data movement occurs. Share secure views (not base tables) to prevent CTAS copying.
Database replication — creates actual copies that partners could then read from (and potentially extract). Replication also lags by the refresh interval, not real-time.
Last-Minute Facts
Data Transformation
Must-Know Facts
- Snowpark uses LAZY EVALUATION — DataFrame transformations build a logical plan and execute nothing until an action is called (collect(), show(), count(), write()). This is the same model as Spark.
- Pushdown optimization: Snowpark DataFrame operations (filter, select, join, groupBy, agg) compile to Snowflake SQL and execute INSIDE Snowflake's engine. The compute stays in Snowflake, data does not move to the client.
- Pushdown BREAKS when you call a plain Python function (not part of the Snowpark API) directly inside a DataFrame transformation. That function runs client-side and Snowflake must transfer rows to the client.
- Vectorized (batch) Python UDFs: use a pandas Series as input/output. Process multiple rows per function call instead of row-by-row. Significantly faster for large-volume Python transformations. Defined with @vectorized decorator or 'compute' method in a class.
- EXECUTE AS OWNER stored procedures: run with the PROCEDURE OWNER's privileges, not the caller's. Enables controlled privilege escalation — a caller without direct table access can run a procedure that touches privileged tables. EXECUTE AS CALLER runs with the caller's own privileges (caller needs direct access).
- Python UDF packages: only packages available in the Snowflake Anaconda channel can be referenced via PACKAGES = (...). Arbitrary pip packages require uploading to a stage and referencing via IMPORTS.
- FLATTEN exam judgment: use FLATTEN (LATERAL) to explode ARRAY elements into separate rows. Use dot/bracket notation (v:field::TYPE) to read fields within an object without adding rows. A common trap is applying FLATTEN to a VARIANT object expecting field access — FLATTEN operates on arrays, not objects. If the question asks about reading nested object fields, dot notation is the answer; if it asks about one-row-per-element from an array, FLATTEN is the answer.
- VARIANT column maximum value size is 16 MB. Documents exceeding 16 MB cannot be stored in a single VARIANT cell and must be chunked or split before loading.
- External functions: HTTPS endpoint integration requiring an API integration object. Called row-by-row (Snowflake batches rows, but network round-trips add significant latency). Use for ML model endpoints or third-party lookups where data volume is moderate.
- Cortex SENTIMENT returns a FLOAT between -1.0 (most negative) and 1.0 (most positive) — not a string label like 'positive'. EMBED_TEXT returns a VECTOR type, not an ARRAY. These syntax details appear as distractors.
- Git repository integration: CREATE OR REPLACE GIT REPOSITORY ... ORIGIN = '<repo_url>'. Changes in the remote repo are NOT automatically pulled — a SYSTEM$GIT_REPOSITORY_FETCH() call or ALTER REPOSITORY FETCH is required.
Common Traps
Confusing Pairs
Scenario Tips
Apply a Python NLP classification function to 500 million rows using a pandas-compatible library
Vectorized Snowpark Python UDF. It processes rows as pandas Series batches inside Snowflake's engine — no data movement, no per-row function call overhead.
External function calling an AWS Lambda — adds network latency for every row batch and requires maintaining Lambda infrastructure outside Snowflake. Only appropriate if the logic cannot run inside Snowflake.
A developer without direct access to the PII_CUSTOMERS table needs to run a GDPR deletion procedure on it
Store procedure with EXECUTE AS OWNER where the owner has privilege on PII_CUSTOMERS. The developer calls the procedure; it runs with the owner's privileges. The developer never gets direct table access.
EXECUTE AS CALLER procedure — this would fail because the developer does not have direct access to PII_CUSTOMERS. CALLER context means the caller's privileges are used.
Summarize 10,000 customer support tickets and then classify their sentiment in SQL
SELECT SNOWFLAKE.CORTEX.SENTIMENT(SNOWFLAKE.CORTEX.SUMMARIZE(ticket_text)) — chain SUMMARIZE (condense long text) then SENTIMENT (return float score -1.0 to 1.0).
EMBED_TEXT then CLASSIFY_TEXT — EMBED_TEXT produces vectors for semantic similarity search, not for sentiment. CLASSIFY_TEXT categorizes into user-defined buckets, not sentiment floats.
A team wants to version-control Snowpark Python scripts in GitHub and deploy directly into Snowflake
Snowflake Git Repository integration: CREATE OR REPLACE GIT REPOSITORY linking the GitHub repo, then execute scripts with EXECUTE IMMEDIATE FROM @repo/path/script.sql. Requires a FETCH to pull latest changes.
Uploading scripts to an internal stage manually — works but has no automatic version-control linkage and requires manual uploads for every change.
Last-Minute Facts
Performance Optimization
Must-Know Facts
- Result cache is ACCOUNT-LEVEL (shared across all users), free (zero compute credits), and valid for 24 hours per unique query hash. Each reuse resets the 24-hour window, up to a maximum of 31 days total.
- Result cache is invalidated ONLY by DML on source tables (INSERT/UPDATE/DELETE/MERGE) or 24-hour expiry. Warehouse resize does NOT invalidate result cache. Warehouse suspend does NOT invalidate result cache.
- Local disk cache (data cache) is WAREHOUSE-SPECIFIC SSD cache for recently scanned micro-partitions. Cleared when the warehouse SUSPENDS or is RESIZED. Warming up after suspend/resize takes 1-2 queries.
- Cache exam judgment: the result cache survives warehouse suspend/resize; local disk cache does NOT (cleared on both). When a question asks 'what is cleared when the warehouse is suspended?' the answer is local disk cache, NOT result cache. The two user-visible cache types are result cache (account-level, Cloud Services layer) and local disk/data cache (warehouse SSD). A deeper FDN/remote data cache layer exists internally but is not directly testable.
- Scale UP (larger warehouse size) = more memory and CPU per node = faster complex queries, fixes spill to disk, helps large sorts and joins. Scale OUT (multi-cluster warehouse) = more concurrent clusters = fewer queuing users. These solve different problems.
- Spill to local storage: warehouse memory exceeded by intermediate data. Fix: scale up the warehouse. Spill to remote storage: local SSD also exhausted. Fix: scale up more, or optimize the query (avoid cartesian products).
- Query profile key metrics: Partitions Scanned vs Total Partitions (pruning ratio), Bytes Spilled to Local Storage, Bytes Spilled to Remote Storage, join type (nested loop = no join key), row estimate accuracy.
- average_depth near 1 = IDEAL (micro-partitions barely overlap). High average_depth = POOR clustering. average_overlaps: lower = better. The deprecated SYSTEM$CLUSTERING_RATIO returned 0–100 (higher = better); SYSTEM$CLUSTERING_INFORMATION uses average_depth and average_overlaps (both: lower = better). These directions are counterintuitive.
- Search optimization service: creates a persistent search access path for equality predicates (col = val), IN predicates, ARRAY_CONTAINS, and LIKE with leading wildcards (SUBSTRING method). Best for point lookups returning very few rows.
- Query acceleration service: offloads portions of LARGE SCAN queries to additional serverless compute. Best for ad-hoc analytics queries that scan many partitions. Controlled by QUERY_ACCELERATION_MAX_SCALE_FACTOR.
- Automatic clustering has an ongoing credit cost — the background clustering service continuously reorganizes micro-partitions. Must weigh query performance improvement against clustering maintenance credits.
Common Traps
Confusing Pairs
Scenario Tips
Query profile shows 97% of partitions scanned on a 10TB table with a WHERE filter on ORDER_DATE
Define a clustering key on ORDER_DATE. Poor pruning (97% scanned) for a date filter means micro-partitions are not organized by date. Clustering reorganizes data so date-range queries skip irrelevant partitions.
Enable search optimization — search optimization helps with equality predicates (ORDER_DATE = '2026-01-01'), NOT range scans (ORDER_DATE BETWEEN ...). Date range filters benefit from clustering, not search optimization.
200 concurrent analysts see 5+ minute queue times on an XL warehouse. Individual queries are fast when they eventually run.
Enable multi-cluster warehouse (scale out) with auto-scale. The bottleneck is concurrency (too many queries queuing), not single-query performance. Adding clusters reduces queue depth.
Scale up to 2XL — this makes each query slightly faster but does nothing to reduce the number of concurrent queries waiting in the queue.
The same complex aggregation query on SALES runs 50 times per day by different analysts. The underlying data changes only once per day at 2 AM.
Result cache serves repeated identical queries for free. Since the SALES table only changes at 2 AM, the result cache remains valid for the entire business day after the data load. Zero compute credits for the repeated executions.
Create a materialized view — also correct but adds storage and maintenance overhead. Result cache is free and requires no setup, making it the more cost-efficient answer when the exam asks about eliminating redundant compute cost.
A customer lookup query WHERE email = 'user@example.com' scans the entire 5TB CUSTOMERS table on every execution
Enable search optimization with EQUALITY(email). This creates a persistent access path for equality predicates on the email column, allowing the lookup to skip irrelevant partitions without reading the full table.
Define a clustering key on email — clustering on email (high cardinality) is a poor choice. Each customer has a unique email, so micro-partitions still have overlapping ranges after clustering. Search optimization handles high-cardinality equality lookups more efficiently.
Last-Minute Facts
Storage and Data Protection
Must-Know Facts
- Time Travel retention: Standard edition = 0 to 1 day max. Enterprise edition = 0 to 90 days, default is 1 day per object. Set with DATA_RETENTION_TIME_IN_DAYS.
- Fail-safe: ALWAYS 7 days for permanent tables. Fixed, non-configurable. Begins immediately after Time Travel expires. NOT accessible by users via SQL — only Snowflake Support can recover data from Fail-safe.
- Time Travel and Fail-safe are SEQUENTIAL, not overlapping. Data is in Time Travel (user-accessible) for up to 90 days, then Fail-safe (Snowflake-managed, not user-accessible) for 7 days, then permanently deleted.
- Reducing DATA_RETENTION_TIME_IN_DAYS to 0 disables Time Travel for an object. Fail-safe still runs its 7 days for permanent tables regardless.
- Transient tables: NO Fail-safe, maximum 1 day Time Travel. Temporary tables: NO Fail-safe, max 1 day Time Travel, exist only for the session. Use transient/temporary for staging data that does not need recovery.
- Time Travel AT/BEFORE syntax: AT(TIMESTAMP => ...), AT(OFFSET => -3600), AT(STATEMENT => '<qid>'), BEFORE(STATEMENT => '<qid>'). OFFSET is in SECONDS and must be negative.
- UNDROP restores DROPPED objects (tables, schemas, databases) within their Time Travel period. UNDROP does NOT work for DML changes (INSERT/UPDATE/DELETE) — use AT/BEFORE clauses and INSERT...SELECT for row recovery.
- Zero-copy clone: shares physical micro-partitions via metadata pointers. Near-zero initial storage cost. Storage increases only as clone and source diverge (copy-on-write on modification). Clone does NOT inherit grants from the source.
- Cloning a database does NOT include virtual warehouses, resource monitors, users, or roles — only data objects. Grants are NOT transferred to the clone.
- Database replication: secondary is READ-ONLY. Changes must be pulled manually with ALTER DATABASE ... REFRESH. Replication does NOT auto-sync continuously unless a replication group with scheduled refresh is configured.
- Failover groups (business continuity): atomically replicate multiple objects (databases, shares, tasks, resource monitors). Connection objects provide a stable endpoint URL that can be redirected during failover without client config changes.
Common Traps
Confusing Pairs
Scenario Tips
A developer accidentally ran DELETE FROM ORDERS WHERE ORDER_DATE < '2026-01-01'. The query ID is available. The table has 7-day retention.
INSERT INTO ORDERS SELECT * FROM ORDERS BEFORE(STATEMENT => '<delete_query_id>') EXCEPT SELECT * FROM ORDERS; — uses Time Travel BEFORE to view data before the delete, then inserts only the missing rows.
UNDROP TABLE ORDERS — wrong because ORDERS was not dropped, it just had DML executed against it. UNDROP only works for DDL drops.
Clone the PRODUCTION database for a QA environment. The QA team needs to run destructive tests without affecting production.
CREATE DATABASE qa_clone CLONE production; — instant, near-zero storage cost at creation. QA modifications write new micro-partitions (copy-on-write) without touching production. Remember: clone does NOT inherit production grants — re-apply privileges to QA team on the clone.
Replicate the database to the QA account — replication creates a read-only secondary that cannot be written to. QA needs a writable environment, so cloning is correct.
A company needs disaster recovery for their Snowflake account with sub-hour RPO and automatic client failover without reconfiguring connection strings
Configure a Failover Group (business continuity group) with scheduled refresh, and use a Connection Object as the client endpoint. On failover, the connection object is redirected to the new primary without requiring client changes.
Database replication alone — handles data replication but requires manual REFRESH, manual failover promotion, and client reconfiguration after failover. Failover Groups with connection objects are the complete DR solution.
Last-Minute Facts
Data Governance
Must-Know Facts
- Masking policies are evaluated at QUERY TIME, not storage time. Data is stored unmasked in Snowflake storage. The mask is applied when results are returned to the querying user based on their role.
- Tag-based masking: assign a masking policy to a TAG (not directly to each column). Any column tagged with that tag automatically receives the masking policy. New columns tagged in the future also auto-receive the policy without manual reassignment.
- Automated data classification (SYSTEM$CLASSIFY) SUGGESTS tags — it does NOT automatically apply masking policies. A data steward must review the classification results and explicitly apply both the tag AND the masking policy to the relevant columns.
- Row access policies return BOOLEAN. A row evaluating to FALSE is simply not returned — no error is thrown. A misconfigured policy that always returns FALSE silently returns zero rows, which can be very difficult to debug.
- A table can have at most ONE row access policy. A column can have at most ONE masking policy (either directly applied or via tag). If both a direct column masking policy and a tag-based policy exist, the direct assignment takes precedence.
- Masking policies bypass result cache sharing across roles with different masking outcomes. User A (full data) and User B (masked data) do not share cached results even for identical SQL queries.
- EXECUTE AS OWNER stored procedures enable controlled privilege escalation — common governance pattern for sensitive tables (PII, financial). The procedure owner holds the elevated privilege; the developer only has EXECUTE on the procedure.
- Data Clean Rooms use the Snowflake Native App Framework. Neither the provider nor consumer sees the other's raw data — only approved aggregate query results leave the clean room enclave. Fundamentally different from Secure Data Sharing (which exposes row-level data to the consumer).
- ACCOUNTADMIN should NOT be used for daily operations. Custom roles should inherit from SYSADMIN for object creation. SECURITYADMIN manages grants. USERADMIN manages users and roles. ACCOUNTADMIN is for account-level admin only.
- ACCESS_HISTORY view in ACCOUNT_USAGE has a 2-3 hour latency. Queries do not appear in ACCESS_HISTORY immediately. Plan audit reports accordingly.
- Managed access schemas: object owners (SYSADMIN) cannot grant privileges on objects in managed access schemas. Only the schema owner (or SECURITYADMIN) grants privileges. Enables centralized access control — prevents object owners from sharing access without security team oversight.
Common Traps
Confusing Pairs
Scenario Tips
A healthcare org needs PII columns auto-governed as new tables are added. The governance team cannot manually track every new column.
Tag-based masking: create a PII tag, associate the masking policy with the tag (ALTER TAG pii SET MASKING POLICY mask). Data engineers tag PII columns during table creation. The masking policy applies automatically without the governance team needing to assign it per column.
Run SYSTEM$CLASSIFY periodically and apply masking policies based on its output — this requires a human review step and creates lag between column creation and governance coverage. Tag-based masking provides immediate coverage on tagging.
An analyst queries a table with a row access policy and sees zero results. Data definitely exists in the table.
Row access policy is misconfigured — it evaluates to FALSE for this analyst's role/user. Row access policies silently return zero rows (no error). Debug by reviewing the policy logic: check IS_ROLE_IN_SESSION(), CURRENT_USER(), and the mapping table (if any) to find why the policy evaluates to FALSE for this analyst.
The table is empty or the analyst is querying the wrong table — plausible but less likely given the stated scenario. Always check row access policy logic first when zero rows return unexpectedly for a known non-empty table.
Two competitors want to compare customer overlap statistics without revealing their actual customer lists to each other
Data Clean Rooms built on the Snowflake Native App Framework. Both parties' data stays inside the clean room enclave. Only aggregate overlap statistics (not individual customer records) are returned to either party.
Secure Data Sharing — would expose raw customer records to the other party, violating the business requirement. Secure Data Sharing gives row-level access to the consumer.
A developer needs access to run a procedure that writes to a PII table, but the developer must NOT have direct SELECT/INSERT on the PII table
Create a stored procedure with EXECUTE AS OWNER where the owner has INSERT privilege on the PII table. Grant EXECUTE on the procedure to the developer role. The developer can call the procedure but cannot directly access the PII table — all access is mediated through the procedure's defined logic.
Grant SELECT and INSERT on the PII table to the developer role — this gives the developer unrestricted direct access, violating the requirement for mediated access only.