Quick Navigation
Three-Layer ArchitectureVirtual WarehousesTime Travel and Fail-safeZero-Copy CloningData Loading: Stages and COPY INTOSnowpipe, Connectivity, and Load HistorySemi-Structured Data (VARIANT)Performance Optimization and CachingStreams and Tasks (ETL Automation)Account Management and RBACData Governance: Masking and Row AccessSecure Data Sharing and CollaborationCOF-C03 New Topics: Cortex AI, Iceberg, Notebooks
Three-Layer Architecture
- Cloud Services Layer: - Authentication & authorization - Query parsing & optimization - Metadata management - Infrastructure management Compute Layer (Virtual Warehouses): - CPU, memory, local SSD cache - Executes queries and DML Storage Layer: - Centralized cloud object storage - Micro-partitions (columnar, compressed)
- Snowflake's three layers scale independently — storage costs are separate from compute costs, and multiple warehouses can query the same storage simultaneously.
- Micro-partitions: - 50–500 MB compressed columnar units - Immutable (never overwritten in-place) - Metadata tracks min/max values per column - Enable partition pruning without manual indexes
- Snowflake automatically creates and manages micro-partitions. The metadata cache in cloud services enables instant COUNT, MIN, MAX without scanning data.
- Snowflake Editions and key feature gates: - Standard: 1-day max Time Travel, single-cluster WH - Enterprise: 90-day Time Travel, multi-cluster WH, materialized views, dynamic data masking - Business Critical: customer-managed keys (Tri-Secret Secure), private connectivity (PrivateLink) - VPS: Dedicated Snowflake environment
- Edition determines which features are available. Multi-cluster warehouses, 90-day Time Travel, and materialized views all require Enterprise or higher.
- Table types and data protection: - Permanent: Time Travel 0–90 days (Enterprise+), 7-day Fail-safe - Transient: Time Travel 0–1 day, 0-day Fail-safe - Temporary: Time Travel 0–1 day, 0-day Fail-safe, session-scoped - External: No Time Travel, no Fail-safe, read-only
- Transient and temporary tables have ZERO Fail-safe days — once Time Travel expires, data is permanently lost. Use transient tables for staging or regenerable data.
- CREATE TABLE my_table ( id NUMBER, name STRING ) DATA_RETENTION_TIME_IN_DAYS = 30;
- Set Time Travel retention at table creation. Enterprise+ is required for values greater than 1 day. Alter any time with ALTER TABLE ... SET DATA_RETENTION_TIME_IN_DAYS.
- Data Encryption: - At rest: AES-256 encryption, always-on, not configurable - In transit: TLS 1.2 or higher, always-on - Key management: Standard/Enterprise: Snowflake-managed keys Business Critical: Customer-managed keys (CMK) Tri-Secret Secure: Dual key model — Snowflake key + customer key BOTH required to decrypt; data is inaccessible if customer revokes key
- All data in Snowflake is encrypted at rest and in transit by default — no configuration needed. Tri-Secret Secure (Business Critical+) ensures Snowflake itself cannot decrypt data without the customer's key.
Virtual Warehouses
- CREATE WAREHOUSE my_wh WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE;
- Create a warehouse that auto-suspends after 5 minutes of inactivity and auto-resumes on the next query. INITIALLY_SUSPENDED avoids billing before first use.
- ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'LARGE'; ALTER WAREHOUSE my_wh SUSPEND; ALTER WAREHOUSE my_wh RESUME;
- Resize a warehouse online (queries waiting in queue will use the new size). Manually suspend or resume to control costs.
- Warehouse credit consumption (per hour, running): XS = 1 S = 2 M = 4 L = 8 XL = 16 2XL = 32 3XL = 64 4XL = 128 5XL = 256 6XL = 512
- Credits double with each size increase. A 6XL costs 512x more than an XS per hour. Billing starts when the warehouse resumes and stops 60 seconds after it suspends.
- -- Multi-cluster warehouse (Enterprise+ required) CREATE WAREHOUSE my_wh WITH WAREHOUSE_SIZE = 'MEDIUM' MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 5 SCALING_POLICY = 'STANDARD'; -- or 'ECONOMY'
- Multi-cluster warehouses scale OUT (add clusters) for concurrency. STANDARD adds clusters immediately; ECONOMY waits until queue will last 6+ minutes.
- Scale UP → bigger warehouse size → complex, slow queries Scale OUT → more clusters (MCW) → many concurrent users Scaling policies: STANDARD: adds clusters immediately (performance priority) ECONOMY: waits ~6 min queue threshold (cost priority)
- Choose scale-up to speed up a single heavy query. Choose scale-out to handle more simultaneous users. Multi-cluster warehouses require Enterprise edition.
Time Travel and Fail-safe
- -- Query data at a specific timestamp SELECT * FROM orders AT(TIMESTAMP => '2026-06-01 09:00:00'::TIMESTAMP_TZ); -- Query data N seconds ago SELECT * FROM orders AT(OFFSET => -60 * 60); -- 1 hour ago -- Query data before a specific statement ran SELECT * FROM orders BEFORE(STATEMENT => '<query_id>');
- Three Time Travel reference modes: TIMESTAMP, OFFSET (seconds from now), and STATEMENT (before a specific query ID ran). Use AT for 'as of', BEFORE for 'just before'.
- -- Restore a dropped table, schema, or database UNDROP TABLE my_table; UNDROP SCHEMA my_schema; UNDROP DATABASE my_db; -- Clone a table to its state 1 hour ago CREATE TABLE orders_restored CLONE orders AT(OFFSET => -3600);
- UNDROP restores the most recently dropped object if within the Time Travel retention window. Cloning with AT creates an independent copy at a past point in time.
- Time Travel → USER-accessible, configurable (0–90 days) → Accessed via AT/BEFORE and UNDROP Fail-safe → SNOWFLAKE-ONLY, non-configurable 7 days → Only Snowflake Support can recover data → Begins AFTER Time Travel period ends → NOT available for transient or temporary tables
- The most-tested distinction on the exam: if YOU need to recover data, use Time Travel. Fail-safe is a last-resort Snowflake-internal recovery mechanism.
- ALTER TABLE my_table SET DATA_RETENTION_TIME_IN_DAYS = 0;
- Setting retention to 0 disables Time Travel for that table. This also means 0-day Fail-safe for transient/temporary tables — data is permanently lost on drop.
Zero-Copy Cloning
- -- Clone a database, schema, or table CREATE DATABASE dev_db CLONE prod_db; CREATE SCHEMA dev_schema CLONE prod_schema; CREATE TABLE dev_orders CLONE prod_orders; -- Clone at a past point in time CREATE TABLE orders_backup CLONE orders BEFORE(STATEMENT => '<query_id>');
- Zero-copy cloning creates a metadata-only pointer to existing micro-partitions. No data is duplicated — storage cost is zero until the clone diverges from the source.
- Cloning rules: - Initial storage cost = 0 bytes (metadata only) - Storage grows only as clone data diverges from source - Clones are independent objects — DML on clone does NOT affect source and vice versa - Clones inherit Time Travel history of the source - Can clone: databases, schemas, tables, streams, tasks, sequences, file formats, stages
- Cloning is commonly used for dev/test environments, protecting data before bulk DML, or restoring dropped tables. The exam tests that initial storage cost is zero.
- Cloning and data protection interaction: - A clone of a PERMANENT table inherits 7-day Fail-safe - A clone of a TRANSIENT table has 0-day Fail-safe - Cloning does NOT reset the Time Travel window — the clone can travel back to the source's history - TRUNCATE TABLE on source does NOT affect the clone - DROP TABLE on source does NOT affect the clone
- Clones are fully independent after creation. The exam tests that cloning a transient table still yields a transient clone with 0-day Fail-safe, not a protected permanent copy.
Data Loading: Stages and COPY INTO
- Stage types: @~ User stage (per-user, cannot be dropped) @%my_table Table stage (per-table, cannot be dropped) @my_stage Named internal stage (configurable) @my_ext_stage External stage (S3 / GCS / Azure Blob)
- User and table stages cannot be altered or dropped. Named internal stages are the most flexible. External stages reference your own cloud storage.
- -- Create a named internal stage CREATE STAGE my_stage FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1); -- Create an external stage (S3) CREATE STAGE my_s3_stage URL = 's3://my-bucket/path/' STORAGE_INTEGRATION = my_s3_integration;
- Use STORAGE_INTEGRATION objects instead of embedding credentials in stage definitions. Storage integrations store the cloud provider IAM configuration securely.
- -- Bulk load from a stage COPY INTO orders FROM @my_stage/orders/ FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1) ON_ERROR = 'CONTINUE' PURGE = TRUE;
- COPY INTO is the primary bulk load command. PURGE removes files from the stage after successful load. ON_ERROR = CONTINUE loads valid rows and skips error rows.
- -- Validate files without loading any data COPY INTO orders FROM @my_stage/ FILE_FORMAT = (TYPE = CSV) VALIDATION_MODE = 'RETURN_ALL_ERRORS';
- VALIDATION_MODE performs a dry run — returns errors but loads ZERO rows. Use before a full load to catch format mismatches early.
- -- ON_ERROR options: -- CONTINUE → skip bad rows, load rest -- SKIP_FILE → skip whole file on any error -- ABORT_STATEMENT → halt entire load (default) -- SKIP_FILE_<n> → skip file after n errors -- SKIP_FILE_<n>% → skip file after n% error rate -- Force reload of previously loaded files COPY INTO orders FROM @my_stage FORCE = TRUE;
- COPY INTO skips files already loaded (tracked by load history for 64 days). Use FORCE = TRUE to reload files regardless of prior load status.
- -- PUT: upload local file to internal stage (SnowSQL only) PUT file:///tmp/orders.csv @my_stage; -- GET: download file from internal stage to local (SnowSQL only) GET @my_stage/orders.csv file:///tmp/;
- PUT and GET only work from SnowSQL CLI or JDBC/ODBC connections. They DO NOT work in the Snowsight web UI or via the REST API.
- -- Unload data to a stage COPY INTO @my_stage/exports/orders_ FROM (SELECT order_id, amount, region FROM orders) FILE_FORMAT = (TYPE = PARQUET) HEADER = TRUE OVERWRITE = TRUE;
- COPY INTO can also unload table data to a stage. Use a SELECT subquery to filter or transform. Output files are automatically partitioned into 250 MB chunks.
- -- Transformation during COPY INTO load COPY INTO orders_clean (order_id, region, amount_usd) FROM ( SELECT $1::INTEGER, -- reorder columns UPPER($3::STRING), -- transform on load $2::FLOAT * 1.1 -- compute new value FROM @my_stage/orders.csv ) FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1);
- COPY INTO supports inline transformation using a SELECT subquery. Supports column reordering, casting, and simple expressions. Complex joins and aggregations are NOT supported during load.
Snowpipe, Connectivity, and Load History
- -- Create a Snowpipe (auto-ingest from S3) CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO orders FROM @my_s3_stage FILE_FORMAT = (TYPE = JSON);
- Snowpipe with AUTO_INGEST uses S3 event notifications (SNS/SQS) or Azure Event Grid or GCS Pub/Sub to trigger loads when new files arrive — no warehouse required.
- Snowpipe vs COPY INTO: COPY INTO → requires active warehouse, warehouse credits, batch-triggered, best for large scheduled loads Snowpipe → serverless compute, per-GB billing (since Dec 2025), event-triggered, best for near-real-time ingestion
- Snowpipe billing changed to simplified per-GB pricing (0.0037 credits per GB ingested) in December 2025. COPY INTO still uses virtual warehouse credits billed per credit-hour.
- -- Snowpipe REST API call (when AUTO_INGEST is not used) SELECT SYSTEM$PIPE_STATUS('my_pipe'); -- Check Snowpipe load history (available 14 days) SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY( TABLE_NAME => 'ORDERS', START_TIME => DATEADD(HOUR, -1, CURRENT_TIMESTAMP) ));
- Use SYSTEM$PIPE_STATUS to check if a pipe is running and its queue depth. COPY_HISTORY in INFORMATION_SCHEMA tracks all loads (both COPY INTO and Snowpipe) for the last 14 days.
- Snowflake connectivity options: Python Connector → native Snowflake Python library Spark Connector → read/write Snowflake from Apache Spark Kafka Connector → stream Kafka topics into Snowflake JDBC Driver → Java/JVM applications ODBC Driver → BI tools (Tableau, Power BI, Excel) Node.js Driver → JavaScript/Node applications .NET Driver → .NET/C# applications SnowSQL CLI → interactive SQL + scripting REST API → programmatic access (no client library)
- Know which connector to use for each scenario: Spark for large-scale ETL, Kafka for streaming ingestion, JDBC/ODBC for BI tools, and Python Connector for data science workflows.
Semi-Structured Data (VARIANT)
- -- VARIANT column stores JSON, Avro, Parquet, ORC, XML CREATE TABLE events ( raw_data VARIANT ); -- Insert JSON INSERT INTO events SELECT PARSE_JSON('{"user":"alice","clicks":5}'); -- Dot notation to access nested fields SELECT raw_data:user::STRING, raw_data:clicks::INTEGER FROM events;
- VARIANT stores up to 16 MB of semi-structured data per value. Access nested keys with colon notation (raw_data:key) and cast with :: to convert to a SQL type.
- -- Flatten a JSON array into rows SELECT f.value:name::STRING AS product_name FROM orders, LATERAL FLATTEN(input => orders.items) f;
- LATERAL FLATTEN explodes arrays or objects within a VARIANT column into individual rows. Each element becomes a row with a VALUE column containing the array element.
- -- Load JSON with schema inference COPY INTO events FROM @my_stage/events.json FILE_FORMAT = (TYPE = JSON) MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
- MATCH_BY_COLUMN_NAME maps source JSON fields to table columns by name (case-insensitive). Without it, JSON is loaded into a VARIANT column by default.
Performance Optimization and Caching
- Three caching layers: 1. Metadata cache (cloud services) → Instant COUNT(*), MIN, MAX, DISTINCT without warehouse → Always on, no configuration needed 2. Result cache (cloud services) → Reuses exact query result for up to 24 hours → FREE — zero warehouse compute cost → Invalidated by: data change, role change, query text change, certain param changes 3. Warehouse (data) cache (local SSD on compute nodes) → Caches raw micro-partition data on warehouse nodes → Requires a RUNNING warehouse → Lost when warehouse SUSPENDS
- The three caches serve different scenarios. Result cache is the most impactful cost saver for repeated identical queries. Warehouse cache is lost on suspension — balance cost vs warm cache.
- -- Clustering key for large tables with known filter patterns ALTER TABLE orders CLUSTER BY (region, order_date); -- Check clustering quality SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(region, order_date)');
- Clustering keys co-locate data by column values across micro-partitions, enabling pruning. Automatic Clustering (Enterprise+) maintains clustering over time for an ongoing compute cost.
- When clustering HELPS: - Multi-TB tables with consistent filter predicates - Date range queries on high-cardinality date columns When clustering HURTS (avoid): - Small tables (micro-partition pruning is already fast) - Tables with frequent random writes or full truncate+reload - No consistent filter pattern → clustering cost with no benefit
- Clustering is NOT always beneficial. It adds ongoing Automatic Clustering maintenance credits. Only cluster large tables with predictable, repetitive filter patterns.
- -- Diagnose a slow query with Query Profile -- Available in Snowsight: Activity → Query History → Profile Spilling indicators (fix by upsizing warehouse): Bytes spilled to local storage → memory insufficient Bytes spilled to remote storage → severely undersized WH Pruning indicator (good = fewer partitions scanned): Partitions scanned vs Partitions total
- Query Profile visualizes the execution plan with operators, data flow, and statistics. Spilling to remote storage is the worst symptom — a larger warehouse fixes it.
- -- Materialized view (Enterprise+ only) CREATE MATERIALIZED VIEW mv_daily_sales AS SELECT region, order_date, SUM(amount) AS total FROM orders GROUP BY region, order_date; -- Limitations: no JOINs, no UDFs, single base table only
- Materialized views are pre-computed and auto-refreshed by Snowflake. They incur background maintenance compute charges. Best for expensive aggregations on large, slowly changing tables.
- -- Search Optimization Service (Enterprise+ only) ALTER TABLE events ADD SEARCH OPTIMIZATION; -- Check search optimization status SELECT * FROM INFORMATION_SCHEMA.SEARCH_OPTIMIZATION_HISTORY WHERE TABLE_NAME = 'EVENTS'; Best for: - Point lookup queries (WHERE id = 12345) - High-cardinality column searches - Substring/regex searches on large text columns - Queries on VARIANT paths (semi-structured data)
- Search Optimization Service builds a persistent search access path that accelerates equality and IN predicates on large tables. It incurs a one-time build cost and ongoing storage for the search index.
Streams and Tasks (ETL Automation)
- -- Standard stream: captures INSERT, UPDATE, DELETE CREATE STREAM orders_stream ON TABLE orders; -- Append-only stream: captures INSERT only CREATE STREAM orders_append_stream ON TABLE orders APPEND_ONLY = TRUE; -- Query stream change records SELECT *, METADATA$ACTION, METADATA$ISUPDATE FROM orders_stream;
- Streams record a table's DML changes as a Change Data Capture (CDC) offset. METADATA$ACTION is INSERT or DELETE. Updates appear as a DELETE row (pre-image) and INSERT row (post-image).
- -- Task with warehouse compute CREATE TASK process_orders_task WAREHOUSE = my_wh SCHEDULE = '15 MINUTE' WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') AS INSERT INTO orders_target SELECT * FROM orders_stream WHERE METADATA$ACTION = 'INSERT'; -- Start the task (tasks are suspended by default) ALTER TASK process_orders_task RESUME;
- WHEN SYSTEM$STREAM_HAS_DATA() skips the task run if the stream is empty, avoiding unnecessary warehouse usage. Tasks are created in SUSPENDED state and must be explicitly resumed.
- -- Serverless task (no warehouse needed) CREATE TASK serverless_task USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL' SCHEDULE = 'USING CRON 0 2 * * * UTC' AS CALL my_stored_procedure(); -- Task DAG: child task runs after parent CREATE TASK child_task AFTER parent_task AS MERGE INTO target USING source ON ...
- Serverless tasks use Snowflake-managed compute billed per second. AFTER creates DAG dependencies — child tasks execute only after parent completes successfully.
Account Management and RBAC
- System-defined role hierarchy: ACCOUNTADMIN ├── SECURITYADMIN │ └── USERADMIN └── SYSADMIN └── (custom roles should be granted here) PUBLIC → auto-granted to all users
- ACCOUNTADMIN is the top-level role combining SYSADMIN and SECURITYADMIN. Custom roles must be granted to SYSADMIN (or its hierarchy) so SYSADMIN can manage the objects they own.
- Role responsibilities: ACCOUNTADMIN → top-level, manages account settings, sees all grants, creates resource monitors. Limit to 2-3 users with MFA enabled. SECURITYADMIN → manages grants (GRANT/REVOKE on any object) USERADMIN → creates and manages users and roles SYSADMIN → creates and manages objects (warehouses, databases, schemas, tables) PUBLIC → default role for every user, minimal privs
- USERADMIN creates roles but cannot grant object privileges. SECURITYADMIN manages grants but typically delegates user creation. ACCOUNTADMIN should NOT be used for daily operations.
- -- Create a custom role and grant it to SYSADMIN CREATE ROLE analyst_role; GRANT ROLE analyst_role TO ROLE sysadmin; -- Grant object privileges to the role GRANT USAGE ON DATABASE analytics TO ROLE analyst_role; GRANT USAGE ON SCHEMA analytics.public TO ROLE analyst_role; GRANT SELECT ON ALL TABLES IN SCHEMA analytics.public TO ROLE analyst_role; -- Assign role to a user GRANT ROLE analyst_role TO USER alice;
- Always grant custom roles to SYSADMIN to prevent orphan roles. Object access requires cascading USAGE grants on database and schema before granting table privileges.
- -- Resource monitor with credit quota and actions CREATE RESOURCE MONITOR monthly_limit WITH CREDIT_QUOTA = 1000 TRIGGERS ON 75 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND ON 110 PERCENT DO SUSPEND_IMMEDIATE; -- Assign to a specific warehouse ALTER WAREHOUSE my_wh SET RESOURCE_MONITOR = monthly_limit;
- Resource monitors track CREDIT usage (not storage). SUSPEND waits for current queries to finish. SUSPEND_IMMEDIATE cancels running queries. Only ACCOUNTADMIN can create resource monitors.
- -- Network policy: restrict by IP CREATE NETWORK POLICY corp_only ALLOWED_IP_LIST = ('192.168.1.0/24', '10.0.0.0/8') BLOCKED_IP_LIST = ('192.168.1.99'); -- Apply at account level (ACCOUNTADMIN) ALTER ACCOUNT SET NETWORK_POLICY = corp_only; -- Apply at user level ALTER USER alice SET NETWORK_POLICY = corp_only;
- Network policies control which IP addresses can connect to Snowflake. Applied at account level or per-user level. Blocked IPs override allowed IPs.
- -- Object tags: attach metadata labels to any Snowflake object CREATE TAG sensitivity_level ALLOWED_VALUES 'PUBLIC', 'CONFIDENTIAL', 'RESTRICTED'; ALTER TABLE customers SET TAG sensitivity_level = 'CONFIDENTIAL'; ALTER TABLE customers ALTER COLUMN email SET TAG sensitivity_level = 'RESTRICTED'; -- Query tag assignments SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES WHERE TAG_NAME = 'SENSITIVITY_LEVEL'; -- Data Classification: automatically tag columns with sensitivity categories SELECT EXTRACT_SEMANTIC_CATEGORIES('CUSTOMERS'); CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS('CUSTOMERS', EXTRACT_SEMANTIC_CATEGORIES('CUSTOMERS'));
- Object tags label any Snowflake object for governance tracking. Data classification uses built-in functions to automatically detect and tag PII columns (email, SSN, phone, etc.) — key COF-C03 governance topic.
- -- Secondary roles: use privileges from multiple roles simultaneously ALTER USER alice SET DEFAULT_SECONDARY_ROLES = ('ALL'); -- In a session, activate secondary roles USE SECONDARY ROLES ALL; -- MFA (multi-factor authentication) best practices: -- Required best practice for ACCOUNTADMIN role users -- Uses Duo Mobile app for TOTP-based 2FA -- Enforce MFA at account level with account policies
- Secondary roles let a session combine privileges from all roles assigned to a user without switching roles. MFA is required for ACCOUNTADMIN users and is enforced via Snowflake account policies.
Data Governance: Masking and Row Access
- -- Create a column masking policy CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING -> CASE WHEN current_role() IN ('ANALYST') THEN val ELSE '****@****.com' END; -- Apply to a table column ALTER TABLE customers ALTER COLUMN email SET MASKING POLICY email_mask;
- Masking policies apply at query time based on the current role. The column is visible in schema but values are conditionally hidden. Requires Enterprise edition or higher.
- -- Create a row access policy CREATE ROW ACCESS POLICY region_policy AS (region STRING) RETURNS BOOLEAN -> CASE WHEN current_role() = 'GLOBAL_ANALYST' THEN TRUE ELSE region = current_user() END; -- Apply to a table ALTER TABLE sales ADD ROW ACCESS POLICY region_policy ON (region);
- Row access policies filter entire rows based on role or user context. Rows returning FALSE from the policy body are invisible to the querying user.
- Masking policy → COLUMN-level → hides/transforms values → Column still visible, data is masked Row access policy → ROW-level → filters out entire rows → Matching rows invisible in results Both apply at QUERY TIME based on current role. Both can coexist on the same table.
- Key exam distinction: masking policies operate on column values (present but transformed), row access policies remove entire rows from results. Both are Enterprise+ features.
Secure Data Sharing and Collaboration
- -- Provider: create a share CREATE SHARE my_share; GRANT USAGE ON DATABASE analytics TO SHARE my_share; GRANT USAGE ON SCHEMA analytics.public TO SHARE my_share; GRANT SELECT ON TABLE analytics.public.sales TO SHARE my_share; -- Add consumer account to the share ALTER SHARE my_share ADD ACCOUNTS = consumer_account_name; -- Consumer: create a database from the share CREATE DATABASE shared_analytics FROM SHARE provider.my_share;
- Secure Data Sharing requires no data movement — data remains in the provider's account. Consumers pay only for their own compute. Only ACCOUNTADMIN can create shares by default.
- Secure Data Sharing rules: - Shared data is READ-ONLY for consumers - No data is copied — consumers query provider's storage - Consumer pays their own warehouse compute costs - Only SECURE views, SECURE materialized views, and SECURE UDFs can be added to shares - Regular (non-secure) views CANNOT be shared - Cross-region sharing requires database replication
- The top exam traps: consumers cannot modify shared data, regular views cannot be shared, and cross-region sharing requires replication setup.
- Reader accounts: CREATE MANAGED ACCOUNT reader_corp ADMIN_NAME = 'admin' ADMIN_PASSWORD = '...'; - Created BY provider FOR non-Snowflake customers - Provider pays compute costs for the reader account - Consumer does NOT need their own Snowflake account - Data shared via standard share mechanism
- Reader accounts let providers share data with partners who have no Snowflake account. The provider bears all compute costs for the reader account's queries.
- Collaboration features: Marketplace → public/paid data listings (powered by sharing) Data Exchange → private, curated sharing hub for partners Data Clean Rooms → multi-party analysis without raw data exposure Native Apps → deploy code + data as installable apps Replication → copy databases across regions/cloud providers
- Snowflake Marketplace, Data Exchange, and Clean Rooms all use Secure Data Sharing under the hood. Native Apps extend this to bundled applications.
COF-C03 New Topics: Cortex AI, Iceberg, Notebooks
- -- Cortex AI SQL functions (no model hosting needed) SELECT SNOWFLAKE.CORTEX.SENTIMENT(review_text) FROM customer_reviews; SELECT SNOWFLAKE.CORTEX.SUMMARIZE(article_body) FROM news_articles; SELECT SNOWFLAKE.CORTEX.TRANSLATE(text, 'en', 'fr') FROM messages;
- Cortex AI provides SQL-callable LLM functions (SENTIMENT, SUMMARIZE, TRANSLATE, COMPLETE, etc.). Data stays in Snowflake — no external API calls or model management required.
- Cortex AI components (know at concept level): Cortex Functions → SQL-callable AI: sentiment, summarize, translate, complete, classify, embed_text Cortex Search → semantic/hybrid search over documents Cortex Analyst → natural language to SQL (BI on your data) Cortex Fine-tuning → fine-tune LLMs on your own data
- COF-C03 tests conceptual knowledge of Cortex AI capabilities, not API syntax. Know what each component does and that all processing occurs within Snowflake.
- Apache Iceberg Tables in Snowflake: Native Iceberg → Snowflake manages metadata + storage, full read/write with Snowflake performance External Iceberg → files in your cloud storage, Snowflake provides query layer only Key use case: multi-engine interoperability (Snowflake + Spark + Trino + other engines on same data)
- Iceberg tables enable open data lake access. Unlike native Snowflake tables, Iceberg tables use open formats other engines can read. The exam tests the WHY, not the operational details.
- Snowflake Notebooks: - Interactive cell-based environment (SQL, Python, Markdown) - Runs inside Snowflake (compute stays in your account) - Supports Snowpark for Python transformations - Version control via Git integration - Collaborative: share notebooks across team members
- Snowflake Notebooks replace external Jupyter/Databricks notebooks for Snowflake-native development. Know the use cases and that they run on Snowflake compute — no data leaves the platform.
- -- Git integration: link a Snowflake Git repository object CREATE GIT REPOSITORY my_repo API_INTEGRATION = github_integration ORIGIN = 'https://github.com/org/repo.git'; -- Fetch latest changes ALTER GIT REPOSITORY my_repo FETCH; -- Execute a file from the repo EXECUTE IMMEDIATE FROM @my_repo/branches/main/scripts/etl.sql;
- Git integration enables CI/CD workflows within Snowflake. Stored procedures, notebooks, and SQL scripts can be version-controlled and executed directly from a linked Git repository.
- Snowpark (Python / Java / Scala): - Developer framework to write data pipelines in code - Execution happens INSIDE Snowflake — data never leaves - Uses DataFrame API (similar to Spark/Pandas) - Runs on virtual warehouse compute or serverless - Supports Stored Procedures, UDTFs, and vectorized UDFs -- Simple Snowpark Python example from snowflake.snowpark import Session session = Session.builder.configs(conn_params).create() df = session.table('orders') df.filter(col('region') == 'US').write.save_as_table('us_orders')
- Snowpark keeps data transformation logic inside Snowflake rather than pulling data out to a client. The exam tests positioning (when to use Snowpark vs SQL), not the API syntax.