SnowflakeDEA-C02102 concepts

DEA-C02 Cheat Sheet

Quick reference for the SnowPro Advanced: Data Engineer exam.

Quick Navigation

Snowflake Architecture Fundamentals Data Loading and Stages Snowpipe and Streaming Ingestion Tasks, Streams, and Dynamic Tables Data Transformation and UDFs Snowpark DataFrames Snowflake Cortex LLM Functions Performance Optimization and Warehouses Micro-partitions, Clustering, and Search Optimization Time Travel and Data Protection Data Governance and Security

Snowflake Architecture Fundamentals

Three-Layer Architecture: Cloud Services layer (auth, metadata, optimizer), Virtual Warehouse layer (compute — MPP query execution), Storage layer (S3/Blob/GCS micro-partitions). Each layer scales independently.
Micro-partitions: Snowflake automatically divides all table data into 50-500MB compressed columnar micro-partitions. Each micro-partition stores min/max metadata per column enabling partition pruning without a user-defined index.
Virtual Warehouse Sizes: X-Small (1 node) → Small (2) → Medium (4) → Large (8) → XL (16) → 2XL (32) → 3XL (64) → 4XL (128) → 5XL (256) → 6XL (512). Credits double with each size.
Auto-suspend / Auto-resume: ALTER WAREHOUSE my_wh SET AUTO_SUSPEND = 60 AUTO_RESUME = TRUE; — suspends after 60 seconds of inactivity. Auto-resume starts the warehouse instantly on the next query without user action.
Multi-cluster Warehouse: ALTER WAREHOUSE my_wh SET MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 5 SCALING_POLICY = STANDARD; — auto-scales clusters for concurrency. Use for many concurrent users, not for single complex queries.
Warehouse Minimum Billing: ALL warehouse sizes (X-Small through 6XL) bill a minimum of 60 seconds per run. After 60 seconds, billing is per-second. Credits double with each size tier.
Snowflake Editions: Standard: 1-day Time Travel max. Enterprise: 90-day Time Travel, multi-cluster WH, materialized views, column-level security, row-level security. Business Critical: HIPAA, PCI, HITRUST compliance, PrivateLink.

Data Loading and Stages

User Stage: PUT file://local/file.csv @~/my_dir; — uploads to the current user's personal stage. Accessible only by the owning user. Path prefix: @~.
Table Stage: PUT file://local/file.csv @%my_table; — uploads to a table's dedicated stage. Only loads into that specific table. Path prefix: @%table_name.
Named Stage (Internal): CREATE STAGE my_stage FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"') ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE'); — reusable, shareable, configurable.
Named Stage (External S3): CREATE STAGE ext_stage URL = 's3://my-bucket/data/' STORAGE_INTEGRATION = my_s3_integration FILE_FORMAT = (TYPE = 'PARQUET'); — uses storage integration for credential management.
COPY INTO (Basic): COPY INTO my_table FROM @my_stage/data/ FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1) ON_ERROR = 'SKIP_FILE'; — loads all staged files. ON_ERROR options: CONTINUE, SKIP_FILE, ABORT_STATEMENT.
COPY INTO with Transformation: COPY INTO target(col1, col2) FROM (SELECT $1, $3 FROM @my_stage/file.csv); — select and transform columns from staged files during load. Avoids an intermediate staging table.
COPY INTO Options: PURGE = TRUE removes files after successful load. FORCE = TRUE reloads previously loaded files (bypasses load history). PATTERN = '.*\.csv' filters files by regex.
Data Unloading: COPY INTO @ext_stage/export/ FROM (SELECT * FROM orders WHERE status = 'COMPLETE') FILE_FORMAT = (TYPE = 'PARQUET') HEADER = TRUE MAX_FILE_SIZE = 524288000; — exports to external stage.
List and Remove Stage Files: LIST @my_stage; — shows all files in stage. REMOVE @my_stage/file.csv; — deletes specific file. REMOVE @my_stage/path/ PATTERN = '.*old.*'; — removes matching files.

Snowpipe and Streaming Ingestion

Create Snowpipe: CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO my_table FROM @ext_stage FILE_FORMAT = (TYPE = 'JSON'); — AUTO_INGEST uses SQS notification for S3. Copy the SQS ARN from SHOW PIPES for S3 event config.
Manual Snowpipe Trigger (REST): POST to https://<account>.snowflakecomputing.com/v1/data/pipes/<pipe>/insertFiles with a list of staged file paths. Use for non-cloud-native sources or when auto_ingest is not available.
Snowpipe Status and History: SELECT SYSTEM$PIPE_STATUS('my_pipe'); — returns JSON with pendingFileCount, lastIngestedEventTime. Use COPY_HISTORY view in INFORMATION_SCHEMA for per-file load status.
Snowpipe Streaming (Java SDK): SnowflakeStreamingIngestClient client = ... — channel.insertRow(rowData, offsetToken) — rows go directly to Snowflake table without staging files. Sub-second latency. No SQS needed.
Snowpipe vs Snowpipe Streaming: Snowpipe: file-based, SQS/Event Grid trigger, 1-5 min latency, serverless. Streaming: SDK row-level, channel-based offsets, <1 sec latency, no files. Use Streaming for CDC or IoT real-time ingestion.
Snowpipe Delivery Semantics: Snowpipe provides AT-LEAST-ONCE delivery — duplicate rows are possible if a file is retried. Implement deduplication logic downstream (unique constraints or QUALIFY ROW_NUMBER() = 1) if exactly-once is required.
Refresh Snowpipe (backfill): ALTER PIPE my_pipe REFRESH MODIFIED_AFTER = '2026-01-01'; — re-queues files in the stage for ingestion. Use to backfill files that arrived before the pipe was created.

Tasks, Streams, and Dynamic Tables

Create Task (Scheduled): CREATE TASK daily_transform WAREHOUSE = my_wh SCHEDULE = 'USING CRON 0 2 * * * UTC' AS INSERT INTO target SELECT * FROM source WHERE date = CURRENT_DATE - 1; ALTER TASK daily_transform RESUME;
Create Task (Serverless): CREATE TASK serverless_task USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'MEDIUM' SCHEDULE = '5 MINUTES' AS CALL my_procedure(); — uses Snowflake-managed compute, auto-scales to query size.
Task DAG (Predecessor): CREATE TASK child_task WAREHOUSE = my_wh AFTER parent_task AS ...; -- child_task runs after parent_task completes. Root task must have SCHEDULE; child tasks must NOT. Resume root last.
Stream on Table (Standard): CREATE STREAM orders_stream ON TABLE orders; -- captures all DML changes: INSERT, UPDATE, DELETE. Metadata columns: METADATA$ACTION (INSERT/DELETE), METADATA$ISUPDATE, METADATA$ROW_ID.
Stream on Table (Append-only): CREATE STREAM orders_append_stream ON TABLE orders APPEND_ONLY = TRUE; -- captures INSERT operations only. More efficient than standard streams for insert-only pipelines.
Check Stream for Data: SELECT SYSTEM$STREAM_HAS_DATA('orders_stream'); -- returns TRUE if stream has unconsumed changes. Use in task WHEN clause: CREATE TASK ... WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') AS ...
Consume Stream in Task: CREATE TASK process_orders WAREHOUSE = my_wh SCHEDULE = '1 MINUTES' WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') AS INSERT INTO orders_processed SELECT order_id, amount FROM orders_stream WHERE METADATA$ACTION = 'INSERT';
Dynamic Table (Basic): CREATE DYNAMIC TABLE customer_summary TARGET_LAG = '1 minute' WAREHOUSE = my_wh AS SELECT customer_id, SUM(amount) total FROM orders GROUP BY customer_id; -- auto-refreshes to stay within 1-minute lag.
Dynamic Table (Downstream Lag): CREATE DYNAMIC TABLE final_report TARGET_LAG = DOWNSTREAM WAREHOUSE = my_wh AS SELECT * FROM customer_summary JOIN products ...; -- DOWNSTREAM: lag is determined by any consuming dynamic table.
Dynamic Table Refresh: ALTER DYNAMIC TABLE customer_summary REFRESH; -- triggers immediate manual refresh. ALTER DYNAMIC TABLE customer_summary SUSPEND / RESUME; -- pause/resume automatic refresh.
Streams vs Dynamic Tables: Streams+Tasks: procedural, supports stored procedures + Snowpark, custom error handling, multi-target writes. Dynamic Tables: declarative SQL, simpler to maintain, automatic incremental refresh. Use DTs for pure SQL pipelines.

Data Transformation and UDFs

SQL Scalar UDF: CREATE FUNCTION usd_to_eur(amount FLOAT) RETURNS FLOAT AS $$ amount * 0.92 $$; -- called in SELECT: SELECT usd_to_eur(price) FROM products;
JavaScript UDF: CREATE FUNCTION parse_json_field(v VARIANT, key STRING) RETURNS STRING LANGUAGE JAVASCRIPT AS $$ return V[KEY] ? V[KEY].toString() : null; $$;
Python UDF: CREATE FUNCTION detect_language(text STRING) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION='3.9' PACKAGES=('langdetect') HANDLER='detect' AS $$ from langdetect import detect def detect(text): return detect(text) $$;
Vectorized (Batch) Python UDF: CREATE FUNCTION clean_text_batch(text STRING) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION='3.9' PACKAGES=('pandas') HANDLER='BatchUDF.compute' AS $$ import pandas as pd class BatchUDF: @staticmethod def compute(s: pd.Series) -> pd.Series: return s.str.strip().str.lower() $$;
SQL UDTF (Table Function): CREATE FUNCTION split_tags(tags STRING) RETURNS TABLE (tag STRING) AS $$ SELECT value FROM TABLE(SPLIT_TO_TABLE(tags, ',')) $$; -- returns multiple rows per input row.
External Function: CREATE EXTERNAL FUNCTION score_customer(customer_id NUMBER) RETURNS VARIANT API_INTEGRATION = my_api_integration AS 'https://api.example.com/score'; -- calls HTTPS endpoint. Requires API integration object.
Stored Procedure (Python): CREATE PROCEDURE load_and_validate() RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION='3.9' PACKAGES=('snowflake-snowpark-python') HANDLER='run' EXECUTE AS OWNER AS $$ def run(session): session.sql('COPY INTO ...').collect(); return 'done' $$;
EXECUTE AS CALLER vs OWNER: EXECUTE AS OWNER: procedure runs with owner's privileges (enables privilege escalation for controlled access). EXECUTE AS CALLER: runs with caller's privileges (caller must have direct object access).
Semi-structured: FLATTEN: SELECT f.value:product_id::STRING, f.value:qty::INT FROM orders o, LATERAL FLATTEN(input => o.items) f; -- explodes JSON array into rows. LATERAL FLATTEN is the standard way to unnest arrays.
Semi-structured: Dot / Bracket Notation: SELECT v:address.city::STRING, v['phone']::STRING FROM contacts; -- v:field casts VARIANT field. Use :: to cast to target type. Bracket notation for fields with special characters or dynamic keys.
PARSE_JSON and Object Construction: SELECT PARSE_JSON('{"id":1}') AS obj; SELECT OBJECT_CONSTRUCT('id', 1, 'name', 'Alice') AS obj; SELECT ARRAY_CONSTRUCT(1, 2, 3) AS arr; -- build VARIANT structures from SQL values.

Snowpark DataFrames

Snowpark Session (Python): from snowflake.snowpark import Session; session = Session.builder.configs(connection_params).create() -- creates a Snowpark session. All DataFrame operations push down to Snowflake SQL.
Load DataFrame: df = session.table('ORDERS') -- reads from Snowflake table (lazy). df = session.sql('SELECT * FROM orders WHERE status = \'OPEN\'') -- from SQL string.
DataFrame Transformations (Lazy): df2 = df.filter(col('STATUS') == 'OPEN').select('ORDER_ID', 'AMOUNT').sort('AMOUNT', ascending=False) -- builds a logical plan. NO execution until collect() or write().
Execute DataFrame: df.collect() -- returns list of Row objects (pulls data to client). df.show() -- prints first 10 rows. df.count() -- returns row count. df.to_pandas() -- converts to pandas DataFrame.
Write DataFrame: df.write.mode('overwrite').save_as_table('MY_RESULTS') -- writes to Snowflake table. df.write.copy_into_location('@my_stage/output/') -- writes to stage as files.
Snowpark UDF Registration: from snowflake.snowpark.functions import udf; @udf(return_type=StringType(), input_types=[StringType()]) def upper_udf(s): return s.upper() -- registers Python function as UDF in Snowflake.
Pushdown Optimization: Snowpark DataFrame operations (filter, select, join, groupBy) compile to Snowflake SQL and run inside Snowflake. Calling a plain Python function (not Snowpark API) on a DataFrame breaks pushdown — runs client-side.
Snowpark Join: orders.join(customers, orders['CUSTOMER_ID'] == customers['ID'], join_type='inner').select(orders['ORDER_ID'], customers['NAME']).show()
Snowpark GroupBy / Agg: from snowflake.snowpark.functions import sum, count; df.groupBy('REGION').agg(sum('AMOUNT').alias('total'), count('ORDER_ID').alias('cnt')).show()

Snowflake Cortex LLM Functions

COMPLETE (Text Generation): SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-7b', 'Write a product description for: ' || product_name) FROM products; -- generates text using a specified LLM model.
SUMMARIZE (Text Summarization): SELECT SNOWFLAKE.CORTEX.SUMMARIZE(ticket_text) AS summary FROM support_tickets; -- produces a concise summary of long-form text. No model parameter required.
SENTIMENT (Sentiment Analysis): SELECT SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment FROM reviews; -- returns a FLOAT between -1.0 (most negative) and 1.0 (most positive). No model parameter required. Not a string label.
TRANSLATE (Language Translation): SELECT SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'fr') AS french_desc FROM products; -- translates text from source language to target language using ISO 639-1 codes.
EXTRACT_ANSWER (QA): SELECT SNOWFLAKE.CORTEX.EXTRACT_ANSWER(document_text, 'What is the contract end date?') AS answer FROM contracts; -- extracts specific answers from a passage of text.
EMBED_TEXT (Vector Embeddings): SELECT SNOWFLAKE.CORTEX.EMBED_TEXT('snowflake-arctic-embed-m', description) AS embedding FROM products; -- returns a VECTOR type (not ARRAY) for semantic similarity search. Store in a VECTOR(N, FLOAT) column for similarity queries.
CLASSIFY_TEXT (Classification): SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(ticket_text, ARRAY_CONSTRUCT('billing', 'technical', 'cancellation', 'other')) AS category FROM tickets; -- classifies text into provided categories.
Cortex — Data Stays in Snowflake: All Cortex LLM functions execute INSIDE Snowflake. Data does NOT leave your Snowflake account. Cortex uses serverless compute credits (separate from virtual warehouse credits).

Performance Optimization and Warehouses

Query Profile Navigation: Snowsight: Query History → click query → Profile tab. Look for: Partitions Scanned vs Total (pruning ratio), Bytes Spilled to Local/Remote Storage (memory pressure), Join type and row estimate accuracy.
Spill to Disk Fix: Spill to local storage = warehouse memory exhausted. Spill to remote storage = local SSD exhausted too. Fix: scale up warehouse (more memory per node). Also check for cartesian products or inefficient joins.
Scale Up vs Scale Out Decision: Scale UP (larger size): single complex query runs slowly, spills to disk, large sorts/joins. Scale OUT (multi-cluster): many concurrent users queue, warehouse busy with multiple queries simultaneously.
Result Cache Hit: SHOW TRANSACTIONS; / QUERY_HISTORY: look for 'cache result' in query type. Result cache is FREE (no warehouse credits), account-level, 24-hour TTL. Invalidated by: DML on source table, 24hr expiry, or non-deterministic functions in query. Warehouse suspend/resize does NOT clear result cache.
Local Disk Cache (Data Cache): Warehouse-local SSD cache for recently scanned micro-partitions. Warm cache speeds up repeated queries on same data. Cleared when warehouse SUSPENDS. Re-warms on first query after resume.
Cache Tier Order: Query execution checks: 1) Result Cache (free, 24hr, account-level) → 2) Local Disk Cache (warehouse SSD, cleared on suspend) → 3) Remote Disk Cache (Snowflake Foundation/FDN layer, persists across suspends) → 4) Cloud Storage (S3/Blob/GCS).
SHOW WAREHOUSES / ALTER WAREHOUSE: SHOW WAREHOUSES; -- lists all warehouses with size, state, credit usage. ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'LARGE' MAX_CONCURRENCY_LEVEL = 16;
Query Acceleration Service: ALTER WAREHOUSE my_wh SET ENABLE_QUERY_ACCELERATION = TRUE QUERY_ACCELERATION_MAX_SCALE_FACTOR = 8; -- offloads large scan portions to serverless compute. Best for ad-hoc large scans.
Credit Usage Monitoring: SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WHERE START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP) ORDER BY CREDITS_USED DESC; -- identify high-credit warehouses.

Micro-partitions, Clustering, and Search Optimization

Check Clustering Quality: SELECT SYSTEM$CLUSTERING_INFORMATION('my_table', '(transaction_date, region)'); -- returns JSON with clustering_ratio (1.0 = perfect), clustering_depth (1 = ideal), total_partition_count.
Define Clustering Key: ALTER TABLE large_orders CLUSTER BY (transaction_date, region); -- organizes micro-partitions by these columns for better pruning. Best for high-cardinality columns frequently used in WHERE filters.
Automatic Clustering: Once a CLUSTER BY key is defined, Snowflake's automatic clustering service continuously reorganizes micro-partitions. This runs in the background and consumes credits. Monitor with CLUSTERING_HISTORY view.
Drop Clustering Key: ALTER TABLE my_table DROP CLUSTERING KEY; -- stops automatic clustering and removes the clustering key. Existing data layout is preserved but no longer maintained.
Good vs Bad Clustering Keys: Good: date columns (ORDER_DATE), status columns with 5-50 distinct values used in WHERE. Bad: UUID/surrogate keys (too many distinct values, tiny pruning benefit), columns never in WHERE filters.
Enable Search Optimization: ALTER TABLE customers ADD SEARCH OPTIMIZATION ON EQUALITY(email, phone), SUBSTRING(name); -- creates access path for equality, IN, and substring searches. Check eligibility with SEARCH_OPTIMIZATION_HISTORY.
Search Optimization vs Clustering: Search Optimization: equality/IN/substring point lookups — few rows returned. Clustering Key: range scans, date-range filters, GROUP BY on clustered column — many rows returned. Solve different problems.
Materialized View Create: CREATE MATERIALIZED VIEW daily_sales_mv AS SELECT order_date, SUM(amount) total FROM orders GROUP BY order_date; -- Snowflake refreshes automatically when ORDERS changes using serverless compute.
Materialized View Restrictions: MV limitations: no non-deterministic functions (RANDOM, CURRENT_TIMESTAMP), no CTEs (WITH clause), no LATERAL joins, no UDFs, no UNION, no ORDER BY. Dynamic Tables have fewer restrictions.

Time Travel and Data Protection

Time Travel — Timestamp: SELECT * FROM orders AT(TIMESTAMP => '2026-01-15 09:00:00'::TIMESTAMP_LTZ); -- queries the table as it existed at a specific timestamp. Requires data within retention period.
Time Travel — Offset (Seconds): SELECT * FROM orders AT(OFFSET => -3600); -- queries state 1 hour (3600 seconds) before current time. OFFSET must be negative.
Time Travel — Statement ID: SELECT * FROM orders BEFORE(STATEMENT => '01b12345-0000-...'); -- queries data BEFORE a specific statement (e.g., before a DELETE). Get query ID from QUERY_HISTORY.
Restore Deleted Rows: INSERT INTO orders SELECT * FROM orders BEFORE(STATEMENT => '<delete_query_id>') EXCEPT SELECT * FROM orders; -- reinserts only the rows that were deleted, avoiding duplicates.
UNDROP TABLE: UNDROP TABLE orders; -- restores a dropped table within its Time Travel retention period. Must be run in the same database/schema context. Restores data, metadata, and grants.
UNDROP DATABASE / SCHEMA: UNDROP DATABASE my_db; UNDROP SCHEMA my_schema; -- same behavior as UNDROP TABLE. Restores all contained objects if dropped within their retention periods.
Set Time Travel Retention: ALTER TABLE large_staging SET DATA_RETENTION_TIME_IN_DAYS = 0; -- disables Time Travel for cost savings on staging tables. Standard: max 1 day. Enterprise: max 90 days. Fail-safe still runs 7 days.
Transient Tables (No Fail-safe): CREATE TRANSIENT TABLE staging_load (id NUMBER, data VARIANT); -- no Fail-safe period, max 1-day Time Travel. Lower storage cost. Use for ephemeral staging data that does not need recovery.
Zero-copy Clone: CREATE TABLE orders_dev CLONE orders; -- instant, shares physical micro-partitions via metadata. Near-zero initial storage. Clone diverges as data changes (copy-on-write). Grants NOT inherited.
Clone at Point in Time: CREATE DATABASE dev_db CLONE prod_db AT(TIMESTAMP => '2026-06-01 00:00:00'::TIMESTAMP_LTZ); -- clones the database as it existed at a specific historical moment. Full database/schema cloning supported.
Time Travel vs Fail-safe: Time Travel: user-accessible via SQL, 0-90 days configurable, for user error recovery. Fail-safe: 7 days fixed, Snowflake-managed, not SQL-accessible, only for catastrophic platform failures.

Data Governance and Security

Create Object Tag: CREATE TAG pii_sensitivity ALLOWED_VALUES 'high', 'medium', 'low'; ALTER TABLE customers ALTER COLUMN email SET TAG pii_sensitivity = 'high'; -- tags are key-value pairs on columns, tables, or databases.
Automated Data Classification: CALL SYSTEM$CLASSIFY('my_database.my_schema.customers', OBJECT_CONSTRUCT('auto_tag', true)); -- SYSTEM$CLASSIFY is a stored procedure (use CALL, not SELECT). Snowflake ML scans columns and suggests privacy_category and semantic_category tags.
Create Masking Policy: CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING -> CASE WHEN IS_ROLE_IN_SESSION('DATA_STEWARD') THEN val ELSE REGEXP_REPLACE(val, '.+@', '***@') END;
Apply Masking Policy to Column: ALTER TABLE customers MODIFY COLUMN email SET MASKING POLICY email_mask; -- policy evaluated at query time. Unmasked data is stored; masking is applied when data is returned.
Tag-based Masking Policy: ALTER TAG pii_sensitivity SET MASKING POLICY email_mask; -- any column with tag pii_sensitivity = 'high' automatically gets email_mask applied. Scales governance across new columns automatically.
Create Row Access Policy: CREATE ROW ACCESS POLICY region_access AS (region STRING) RETURNS BOOLEAN -> IS_ROLE_IN_SESSION('ADMIN') OR region = (SELECT region FROM user_region_map WHERE username = CURRENT_USER());
Apply Row Access Policy: ALTER TABLE sales ADD ROW ACCESS POLICY region_access ON (region); -- policy evaluated per row at query time. Returns FALSE = row not returned (no error). One policy per table.
Query ACCESS_HISTORY: SELECT query_id, user_name, direct_objects_accessed, base_objects_accessed FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY WHERE query_start_time >= DATEADD('day', -1, CURRENT_TIMESTAMP); -- 2-3 hour lag.
Database Replication: ALTER DATABASE prod_db ENABLE REPLICATION TO ACCOUNTS myorg.us-west-2-account; -- on secondary: CREATE DATABASE prod_db_replica AS REPLICA OF myorg.us-east-1-account.prod_db; ALTER DATABASE prod_db_replica REFRESH;
Masking and Result Cache Interaction: Masking policies bypass result cache sharing across different roles. User A (unmasked) and User B (masked) running the same query do NOT share cached results — each query computes separately to avoid data leakage.
Data Clean Room (Provider Side): Data Clean Rooms use Snowflake Native App Framework. Provider packages allowed analyses as a Native App. Consumer installs the app and runs analyses without seeing raw data — only aggregate results exit the clean room.

Ready to test yourself?

Start a timed DEA-C02 mock exam or review practice questions by domain.