General Exam Tips
- 1.Read ALL answer options before selecting — many SnowPro Core questions have two answers that seem correct, and the distinction is in a single qualifier like 'user-accessible' vs 'Snowflake-only'
- 2.There is no penalty for wrong answers — flag difficult questions and return to them, but never leave any blank
- 3.Multiple-select questions explicitly state how many answers to choose (e.g., 'Choose 2'). If you select the wrong number, the entire question is marked wrong
- 4.At 69 seconds per question (115 minutes / 100 questions), you cannot spend more than 90 seconds on any single question without creating time pressure — flag and move on
- 5.Domain 1 (31%) and Domain 4 (21%) together cover more than half the exam — if you ran out of study time, these two domains are your triage priority
- 6.When stuck between two answers, ask: does the scenario involve ONE user's query performance (scale up) or MANY concurrent users queuing (scale out)?
- 7.Questions about recovery almost always want you to distinguish between Time Travel (user can do it) and Fail-safe (only Snowflake support can do it)
- 8.When a question mentions 'no data movement' or 'no data copying,' the answer is almost certainly related to Secure Data Sharing or Zero-Copy Cloning
- 9.Questions about billing traps: Snowpipe uses serverless credits per GB, COPY INTO uses warehouse credits, result cache costs NOTHING
- 10.After creating a TASK, remember it must be explicitly RESUMED with ALTER TASK ... RESUME before it executes — a common hands-on trap
Quick Navigation
Snowflake AI Data Cloud Features and Architecture
Must-Know Facts
- Three-layer architecture: cloud services layer (query optimization, metadata, auth, access control), compute layer (virtual warehouses — CPU/memory/temp storage), storage layer (micro-partitions in cloud object storage). Each layer scales INDEPENDENTLY — this is Snowflake's core differentiator
- Virtual warehouse credit consumption doubles with each size step: XS=1, S=2, M=4, L=8, XL=16, 2XL=32, 3XL=64, 4XL=128, 5XL=256, 6XL=512 credits per hour
- Multi-cluster warehouses require Enterprise edition minimum. Standard scaling policy starts additional clusters immediately when a query queues. Economy scaling policy waits until the estimated queue time exceeds 6 minutes
- Micro-partitions: 50–500 MB immutable, compressed, columnar storage units. Snowflake stores min/max metadata per partition per column — this metadata enables partition pruning to skip irrelevant micro-partitions at query time
- Table type data protection matrix: Permanent = Time Travel (0–90 days on Enterprise) + 7-day Fail-safe. Transient = Time Travel (0–1 day max) + 0-day Fail-safe. Temporary = Time Travel (0–1 day max) + 0-day Fail-safe + session-scoped. External = no Time Travel, no Fail-safe
- Time Travel AT and BEFORE clauses: AT queries AT the specified timestamp, BEFORE queries just before. UNDROP restores dropped objects within the Time Travel window
- Fail-safe: fixed 7-day period that begins AFTER Time Travel expires. Snowflake-internal only — support engineers can recover data, but YOU cannot access Fail-safe directly
- Zero-copy cloning: creates a metadata pointer, not a data copy. Storage cost is ZERO at clone creation. New costs only accrue when rows are modified in the clone (new micro-partitions branch off)
- Snowflake editions in ascending order: Standard, Enterprise, Business Critical, Virtual Private Snowflake (VPS). Enterprise adds multi-cluster warehouses, 90-day Time Travel, materialized views, dynamic data masking, search optimization. Business Critical adds customer-managed keys (Tri-Secret Secure), HIPAA/PCI compliance, private connectivity
- Cortex AI: built-in LLM-powered SQL functions (SENTIMENT, SUMMARIZE, TRANSLATE, COMPLETE), Cortex Search (semantic search on unstructured data), Cortex Analyst (natural language to SQL). Know the use cases, not API syntax
- Apache Iceberg Tables: open table format for interoperability with Spark, Trino, etc. Snowflake can act as the Iceberg catalog and read/write files in object storage. Key difference from native tables: data lives in your cloud storage, not Snowflake-managed storage
- Encryption is always on: AES-256 at rest, TLS 1.2 in transit. No option to disable encryption
Common Traps
Confusing Pairs
Scenario Tips
Question describes a data team where 50 analysts run queries simultaneously and experience long wait times. Should they resize to a 4XL or enable multi-cluster?
Enable multi-cluster warehouse (scaling out). The problem is concurrency — many queries are queuing. Multi-cluster adds additional warehouse clusters to handle parallel query demand
Resizing to a larger single warehouse (scaling up) helps with complex individual queries, not concurrent load. 50 simultaneous users is a concurrency problem, not a query complexity problem
A company drops a critical permanent table. 12 days have passed. Can they recover it?
No, unless Enterprise edition with 90-day Time Travel was configured AND they set the retention period to >12 days. Standard edition allows only 1 day. After Time Travel expires, Fail-safe runs for 7 days (days 2–8), then data is permanently gone by day 9 on Standard edition
Candidates often forget to check the edition AND the configured retention period. Always note: (1) which edition? (2) what was the TIME_TRAVEL_IN_DAYS setting? (3) how many days have passed?
A table must be accessible to Spark, Trino, AND Snowflake without data duplication. What table type?
Apache Iceberg Table. Iceberg is an open table format stored in object storage that multiple engines can read/write natively
External tables in Snowflake are read-only. Native Snowflake tables cannot be read by Spark/Trino natively. Iceberg is the correct answer for multi-engine interoperability
A business analyst needs to run a natural language question against their Snowflake sales data and get a SQL result — without writing SQL themselves. Which Cortex AI component fits?
Cortex Analyst. It takes natural language input and generates SQL queries against structured Snowflake data, functioning as a conversational BI layer. All processing occurs inside Snowflake
Cortex Search is for semantic/hybrid search over documents. Cortex COMPLETE is a general LLM completion function. Neither is purpose-built for natural language to SQL over structured tables — that is Cortex Analyst's specific use case
Last-Minute Facts
Account Management and Data Governance
Must-Know Facts
- System-defined role hierarchy from top: ACCOUNTADMIN (holds SYSADMIN + SECURITYADMIN). SECURITYADMIN manages grants globally. USERADMIN creates users and custom roles. SYSADMIN creates and owns database objects. PUBLIC is the base role every user automatically receives
- ACCOUNTADMIN best practices: limited to a small number of users, must use MFA, should NOT be used for daily work. It does not automatically see all objects — it still requires proper role hierarchy for object access
- Custom role best practice: always grant custom roles UP to SYSADMIN (directly or via hierarchy). Roles that are not connected to SYSADMIN become orphaned — no one in SYSADMIN's hierarchy can manage those objects
- SECURITYADMIN can manage grants on ANY securable object, even objects it does not own. USERADMIN creates users/roles but cannot grant object-level privileges
- PUBLIC role: every user in the account has this role automatically. Any privilege granted to PUBLIC is accessible by ALL users — be extremely careful
- Resource monitors: created by ACCOUNTADMIN. Track credit usage (NOT storage). Can be assigned to the account level or individual warehouses. Actions: NOTIFY (alert only), SUSPEND (finish current queries then stop), SUSPEND_IMMEDIATELY (kill queries instantly). Each warehouse can have one resource monitor
- Dynamic data masking: column-level security. The masking policy applies at QUERY TIME based on the executing role — not at write time. The same column returns different values to different roles
- Row access policies: row-level security. Entire rows are hidden from roles that do not meet the policy condition. Multiple policies can be applied but only one per table/view
- Object tags: key-value metadata labels applied to Snowflake objects (databases, schemas, tables, columns). Used for governance, data classification, and lineage tracking
- Network policies: IP allowlists/blocklists applied at account level or per user. Applied at connection time. Can override account-level policies with user-level policies
- Secondary roles: a session can use multiple roles simultaneously (primary + secondary). Secondary roles provide additional privileges without explicitly switching roles. COF-C03 new topic
- Data classification: automated scanning to detect and tag sensitive data categories (PII, financial) using CLASSIFY_SCHEMA or similar functions. Tags are applied based on classification results
- ACCOUNT_USAGE schema: historical data, up to 1-year retention, 45-minute to 3-hour latency. INFORMATION_SCHEMA: real-time data, shorter retention (7 days to 6 months depending on the view)
Common Traps
Confusing Pairs
Scenario Tips
An analyst role should see customer phone numbers as '***-***-1234' (last 4 digits only), while the data steward role sees the full number. What do you implement?
Dynamic data masking policy on the phone_number column. The policy function checks the current role and returns the full number for data stewards, masked value for all others
Row access policies would hide the entire row, not mask a column value. Secure views could achieve this but masking policies are the purpose-built Snowflake feature and the exam answer
A company needs to automatically stop all warehouse activity if monthly credit spend exceeds 10,000 credits to avoid budget overrun. What object and setting?
Create a resource monitor with a monthly quota of 10,000 credits and set the action to SUSPEND_IMMEDIATELY. Assign it at the account level so all warehouses are covered
Auto-suspend on a warehouse stops the warehouse after inactivity — it does not enforce a credit budget. Auto-suspend does not know anything about cumulative monthly spend
A data engineer creates a custom role ANALYST_ROLE. Users have the role but SYSADMIN cannot transfer ownership of the tables owned by ANALYST_ROLE. Why?
ANALYST_ROLE was not granted to SYSADMIN (or to a role in SYSADMIN's hierarchy). Orphaned roles create objects that fall outside the standard management hierarchy
Candidates assume ACCOUNTADMIN handles this automatically. Object ownership management goes through SYSADMIN's hierarchy, and that requires the custom role to be explicitly connected
Last-Minute Facts
Data Loading, Unloading, and Connectivity
Must-Know Facts
- COPY INTO <table>: bulk load from stages into tables. Requires an active virtual warehouse. Key options: ON_ERROR (CONTINUE/SKIP_FILE/ABORT_STATEMENT), VALIDATION_MODE (RETURN_ERRORS/RETURN_N_ROWS/RETURN_ALL_ERRORS — dry run only, no data loaded), PURGE (delete staged files after successful load), FORCE (reload already-loaded files)
- COPY INTO <location>: bulk unload data from a table to a stage. Supports partitioning output files, max file size, and all supported file formats
- Stage types: user stage (@~) — each user has one, cannot be altered or dropped, tied to the user. Table stage (@%tablename) — each table has one, cannot be altered or dropped, files not copied between stages. Named internal stage (@stagename) — the most flexible, can have file formats, encryption settings, can be managed. External stage — points to S3/GCS/Azure Blob, requires storage integration or credentials
- Snowpipe: serverless continuous ingestion. Does not use virtual warehouse compute. Triggered by cloud event notifications (S3 SQS, Azure Event Grid, GCS Pub/Sub) in auto-ingest mode, or by REST API calls. Billed on a simplified per-GB ingested model (as of December 2025)
- Snowpipe does NOT automatically delete staged files after loading. You must run REMOVE @stage/path to clean up loaded files
- PUT command: uploads local files to an INTERNAL stage. ONLY works from SnowSQL CLI, JDBC, or ODBC drivers. Does NOT work in the Snowsight web UI
- GET command: downloads files from an INTERNAL stage to a local machine. Same connectivity requirements as PUT — SnowSQL/JDBC/ODBC only
- VARIANT data type: stores semi-structured data (JSON, Avro, ORC, Parquet, XML). Access nested fields with colon notation (src:address:city) or bracket notation. VARIANT column holds the entire document
- FLATTEN function: explodes arrays or objects in VARIANT into individual rows. Use with LATERAL for correlated expansion. The output columns include KEY, VALUE, INDEX, PATH, SEQ
- Storage integrations: named objects that hold cloud provider credentials and trust relationships for external stage access. Preferred over embedding credentials directly in stage DDL
- File formats supported: CSV, JSON, Avro, ORC, Parquet, XML. File format objects can be created as reusable objects or defined inline in COPY statements
- Transformation during COPY INTO load: SELECT statement can reorder columns, cast types, omit columns, and apply simple expressions during the load process — no separate transformation step needed
Common Traps
Confusing Pairs
Scenario Tips
Files arrive in S3 every 30 seconds and must be loaded into Snowflake within a minute. Which approach?
Snowpipe with auto-ingest using S3 event notifications (SQS). Snowflake provisions an SQS queue per pipe and S3 delivers object-created events to it. Snowpipe is triggered by file arrival events and uses serverless compute — no warehouse needs to be running
A task with COPY INTO running every minute would work but introduces warehouse costs and 1-minute maximum latency. Snowpipe gives true near-real-time loading without a warehouse
A COPY INTO command with VALIDATION_MODE='RETURN_ALL_ERRORS' runs. The engineer then queries the target table and finds it empty. Is this a bug?
No — this is expected behavior. VALIDATION_MODE is a dry run. The command validates files but loads ZERO rows regardless of validation results. To actually load data, run COPY INTO without VALIDATION_MODE
Candidates expect partial loading (valid rows in, bad rows rejected). That behavior requires ON_ERROR = CONTINUE, not VALIDATION_MODE
A developer wants to upload a 500 MB CSV file from their laptop to a named internal stage. They log into Snowsight and try to use the PUT command. What happens?
The PUT command fails — it is not supported in the Snowsight web UI. The developer must use SnowSQL CLI, JDBC, or ODBC driver to execute PUT
Snowsight supports many SQL commands, so candidates assume PUT works there too. This is one of the more reliably tested traps in the loading domain
Last-Minute Facts
Performance Optimization, Querying, and Transformation
Must-Know Facts
- Three caching tiers: (1) Metadata cache — cloud services layer, instant responses for COUNT/MIN/MAX/DISTINCT on micro-partition metadata, no warehouse needed. (2) Result cache — cloud services layer, stores exact query results for 24 hours of inactivity (extended up to 31 days with repeated execution), no warehouse needed, free. (3) Warehouse cache — local SSD on virtual warehouse nodes, caches raw data from previous scans, requires a RUNNING warehouse
- Result cache conditions: exact query text match (whitespace matters), same session role, no underlying table data changes, no relevant session parameter changes. ANY change breaks the cache
- Warehouse cache is lost when the warehouse SUSPENDS. Auto-suspend that is too aggressive eliminates cache warmth and degrades repeated query performance
- Query Profile: visual DAG of query execution operators. Key things to look for: large table scans with no partition pruning (fix: clustering key), exploding joins, spilling to local or remote storage (fix: larger warehouse), inefficient remote I/O
- Spilling to disk: occurs when warehouse memory is exhausted. First spills to local disk (SSD), then to remote storage (much slower). Visible in Query Profile under SPILL columns. Fix: upsize warehouse or reduce data volume per query
- Partition pruning: Snowflake uses micro-partition metadata (min/max per column per partition) to skip irrelevant partitions at query time. Effective pruning is the single biggest driver of fast query performance on large tables
- Clustering keys: user-defined sort columns that organize data within micro-partitions for better pruning. Best for large tables (multi-TB+) with well-known, consistent filter patterns (e.g., always filtering on date_column). Automatic Clustering (Enterprise+) maintains cluster order over time — this incurs ongoing background compute costs
- Poor candidates for clustering: small tables, tables with random write patterns, tables with many equally-frequent query patterns. Clustering ALWAYS costs maintenance credits — only cluster when the query savings outweigh the cost
- Materialized views: pre-computed, Snowflake-auto-refreshed query results. Enterprise+ only. Limitations: base table must be a SINGLE table (no joins), no UDFs, no subqueries in the view, no nesting of materialized views. Auto-refresh incurs background compute costs
- Streams: CDC (change data capture) objects on tables. Standard stream: captures inserts, updates, and deletes. Append-only stream: captures inserts only (more efficient for append-only workloads). Streams include metadata columns: METADATA$ACTION, METADATA$ISUPDATE, METADATA$ROW_ID
- Tasks: scheduled SQL execution. Must be RESUMED with ALTER TASK ... RESUME after creation — tasks do not start automatically. Use SYSTEM$STREAM_HAS_DATA() in the WHEN clause to skip task runs when the stream is empty
- Search Optimization Service: Enterprise+ background service that accelerates point lookups and equality/substring searches on VARIANT fields. Applied per-table as a managed service — adds storage cost for the search access path
- Automatic Query Acceleration (AQA): automatically offloads eligible portions of a long-running query to serverless compute, helping warehouses that are occasionally overwhelmed by outlier queries
Common Traps
Confusing Pairs
Scenario Tips
A query runs in 2 seconds on Monday but 40 seconds on Tuesday with identical query text and unchanged data. Most likely cause?
The result cache expired (24-hour TTL). On Monday, the query hit the cache. On Tuesday, after >24 hours of inactivity, the cache expired and the full query re-executed with warehouse compute
Candidates may blame warehouse resizing or data changes, but the scenario specifies unchanged data — result cache expiration is the designed explanation
The Query Profile shows a table scan of 800M rows where only 50K rows match the WHERE clause filter. No partition pruning is occurring. What is the best fix?
Add a clustering key on the filter column. This co-locates related data within micro-partitions so the pruning metadata can skip irrelevant partitions, dramatically reducing rows scanned
Upsizing the warehouse processes the same 800M rows faster but does not reduce the data scanned. The root problem is no pruning — the solution is better data organization, not more compute
A pipeline uses a stream to capture changes and a task to process them on a 10-minute schedule. The task runs even when no changes have occurred. What should be added to avoid unnecessary runs?
Add WHEN SYSTEM$STREAM_HAS_DATA('<stream_name>') to the task definition. This causes the task to skip execution when the stream has no unconsumed change records
Increasing the task interval from 10 to 60 minutes reduces runs but still executes unnecessarily on empty streams. SYSTEM$STREAM_HAS_DATA is the correct mechanism to skip empty-stream runs
A MERGE operation on a 10TB table spills to remote storage in the Query Profile. The current warehouse is Large. What is the recommended fix?
Upsize to an XL or 2XL warehouse to provide more memory. Remote storage spilling indicates the warehouse memory is exhausted, severely degrading performance. More memory eliminates the spill
Enabling multi-cluster (scaling out) would help with concurrent queries, not with a single memory-hungry MERGE operation. The right fix for memory-exhausted single queries is scaling UP, not OUT
Last-Minute Facts
Data Collaboration
Must-Know Facts
- Secure Data Sharing: data stays in the PROVIDER's account — no data is copied or moved. Consumers access a live, read-only view of the provider's data. Consumers pay only for their own compute (their warehouse) to query shared data. No storage cost for the consumer
- Shared objects are ALWAYS read-only for consumers. There is no mechanism for consumers to write back to shared data
- Only SECURE views, SECURE materialized views, and SECURE UDFs can be added to a share. Regular (non-secure) views cannot be shared because their underlying query logic would be exposed
- Sharing within the SAME region works natively. Cross-region or cross-cloud sharing requires database replication to move data to a secondary account first
- Reader accounts: Snowflake accounts created and paid for by the data provider, for consumers who do not have their own Snowflake account. The provider pays ALL compute costs for reader accounts
- Snowflake Marketplace: public data marketplace where providers publish listings (datasets, apps). Powered by Secure Data Sharing under the hood. Listings can be free or paid. Consumers do not always get data for free
- Data Exchange: private, invitation-only data sharing hub for controlled sharing within an organization or with selected partners. More controlled than the public Marketplace
- Data clean rooms: privacy-preserving multi-party collaboration. Multiple organizations can analyze overlapping data without exposing raw records to each other. Uses cryptographic and access control techniques
- Native Apps Framework: build and distribute applications packaged with Snowflake data and logic. Consumers install the app into their own account. Apps run in the consumer's account and can access the provider's shared data
- Secure views hide the underlying DDL — consumers cannot see the view definition or base table structure. Essential when sharing derived data to protect intellectual property or source schema
- Database replication: replicate a database across Snowflake accounts (same or different clouds/regions). Used for disaster recovery, data locality compliance, and enabling cross-region sharing
Common Traps
Confusing Pairs
Scenario Tips
A hospital wants to share patient data with a research institution that does not have Snowflake. The hospital wants the institution to be able to query the data. What do they create?
Create a reader account for the research institution. The hospital (provider) creates and pays for the reader account. The institution queries shared data through this account without needing their own Snowflake subscription
Snowflake Marketplace is for public data distribution, not for sharing sensitive patient data with a specific partner. Direct share requires the consumer to have their own Snowflake account
A company shares a view of a sales table with a partner account. The partner reports seeing the full underlying SQL of the view. What went wrong?
A regular (non-secure) view was shared instead of a secure view. Only secure views hide their DDL from consumers. The fix is to recreate the view as a SECURE view and add it to the share instead
Network policies or masking policies would not prevent DDL exposure. The issue is the view type — only SECURE views can be safely shared without exposing the query logic
A European company wants to share data with a US partner's Snowflake account (different region). What extra step is required?
Database replication must be set up first — replicate the source database to a Snowflake account in the US region, then create a share from that US-region account
Creating a direct share from EU to US will fail — cross-region sharing is not supported without replication. The exam tests whether candidates know this limitation