AWSSAP-C024 domains

SAP-C02 Exam Notes

Last-minute traps, must-know facts, and scenario tips for the AWS Certified Solutions Architect – Professional exam.

General Exam Tips

1.Read the LAST sentence of each question first — it states what you are actually being asked. Then read the scenario for constraints.
2.Flag and skip: never spend more than 2.5 minutes on a question the first pass. You have exactly 2.4 min/question on average.
3.Every question has 2–3 options that sound plausible. The exam is designed to test trade-offs, not trivia. Ask: which option best satisfies ALL stated constraints simultaneously?
4.Hunt for constraint keywords before picking an answer: 'least operational overhead', 'minimize cost', 'no downtime', 'compliance requirement', 'existing investment'. These words eliminate whole answer categories.
5.If an answer requires you to write custom code or manage infrastructure that a managed service already handles, it is almost always wrong at the Professional level.
6.Trust your first instinct — only change an answer if you can articulate a specific reason why it is wrong. Random second-guessing is the #1 way to drop from 780 to 730.
7.Non-native English speakers: request ESL accommodation before exam day for an extra 30 minutes. It is free and widely available.
8.The exam has unscored pilot questions. Some questions that feel impossible may not count — do not let them derail you emotionally.
9.You need 750/1000 on a scaled score. Of the 75 questions, 65 are scored and 10 are unscored pilots. Scoring is scaled (not a raw percentage), but a rough rule: you can miss roughly 16–17 scored questions and still pass. Pace accordingly and do not sacrifice a certain question for an uncertain one.
10.On multi-select questions, the number of required answers is stated. If it says 'choose 2', eliminate options until exactly 2 remain — partial credit does not apply.

Quick Navigation

Design for New Solutions Design Solutions for Organizational Complexity Continuous Improvement for Existing Solutions Accelerate Workload Migration and Modernization

Domain 129% of exam

Design for New Solutions

Must-Know Facts

DynamoDB Global Tables = multi-master active-active across regions, last-writer-wins conflict resolution, asynchronous replication typically under 1 second. NOT appropriate when strong consistency for concurrent writes is required.
Aurora Global Database = single writer, up to 5 read-only secondary regions, storage-level replication under 1 second lag, secondary can be promoted in under 1 minute. Use when write consistency is critical but you still want multi-region reads.
Step Functions Standard vs Express: Standard = up to 1 year duration, exactly-once semantics, auditable execution history, charged per state transition. Express = up to 5 minutes, at-least-once, high-volume (100K+ executions/sec), charged per duration. The exam will give you a human-approval workflow — that requires Standard.
API Gateway 29-second hard timeout for Lambda integrations cannot be increased. Any workflow that might exceed this needs asynchronous design (SQS, Step Functions, EventBridge).
Lambda reserved concurrency guarantees capacity AND throttles above the limit. Provisioned concurrency eliminates cold starts but costs money at idle. These are different tools for different problems.
EventBridge can route events cross-account and cross-region. A central event bus in a security or logging account can aggregate events from all org accounts.
Kinesis Data Streams consumer records are retained 1–365 days. Multiple consumers can independently process the same stream. Firehose is delivery-only — no custom consumer logic, minimum 60-second buffering.
SQS visibility timeout must be longer than your Lambda processing time or messages get processed twice. DLQ captures messages that fail after maxReceiveCount attempts.
For strictly ordered message processing with exactly-once deduplication: SQS FIFO. For fan-out to multiple subscribers: SNS + SQS fan-out pattern. These are different problems.

Common Traps

TrapChoosing DynamoDB Global Tables for an order processing system that needs strong consistency

RealityGlobal Tables use last-writer-wins conflict resolution with asynchronous replication. Two regions can accept writes to the same key simultaneously with no locking — the last write wins and the other is silently discarded. For order processing requiring consistency, use Aurora Global Database (single writer) or accept writes only in one region.

TrapUsing Step Functions Express Workflow for a business process that requires human review

RealityExpress Workflows max out at 5 minutes and are at-least-once. Human review steps can take hours or days. Use Standard Workflow with the Wait for Callback pattern (taskToken), which can pause indefinitely until a callback is received.

TrapThinking API Gateway can handle a Lambda integration that might take 2 minutes

RealityAPI Gateway has a hard 29-second integration timeout that cannot be raised. For long-running operations, the synchronous API should accept the request, return a 202 Accepted with a job ID, and use async processing (Step Functions, SQS + Lambda). A separate status-check endpoint lets clients poll for completion.

TrapChoosing CloudFront Functions when the requirement involves calling an external API at the edge

RealityCloudFront Functions run in a severely restricted environment: no network calls, no file system, sub-millisecond execution only. Lambda@Edge supports outbound network calls, up to 5 seconds (viewer) or 30 seconds (origin). If the scenario mentions auth checks against an external IdP or origin selection based on a database, it requires Lambda@Edge.

TrapUsing Kinesis Data Firehose when the question requires processing within seconds

RealityFirehose has a minimum 60-second buffer window. It is near-real-time, not real-time. If sub-second or <10-second processing latency is required, use Kinesis Data Streams with a Lambda consumer or KCL application.

TrapAssuming Aurora Serverless v2 always costs money at idle because it cannot scale to zero

RealityAurora Serverless v2 added scale-to-zero support in April 2026 (auto-pause and resume when minimum ACU is set to 0). When paused, you pay only for storage, not compute. This makes it viable for true intermittent workloads. However, cold-start resume latency (seconds) is a real cost — if the question emphasizes sub-second availability for every request, DynamoDB or a provisioned database is still a better fit. Do not eliminate Aurora Serverless v2 just because the question mentions periods of zero traffic.

Confusing Pairs

DynamoDB Global TablesAurora Global Database

Global Tables = multi-master, any region accepts writes, eventual consistency, last-writer-wins. Use for globally distributed low-latency reads AND writes where occasional conflict overwrites are acceptable. Aurora Global = single writer region, other regions are read-only replicas, strong consistency for writes. Use when write consistency is non-negotiable but you want multi-region reads and fast DR failover.

Step Functions Standard WorkflowStep Functions Express Workflow

Standard = long-running (up to 1 year), exactly-once, full execution history visible in console, costs per state transition. Use for business workflows, human-in-the-loop, anything needing audit trails. Express = short-lived (max 5 min), at-least-once, high throughput, costs per execution duration. Use for IoT event processing, high-volume data pipelines, streaming transformations. The exam will give you a human approval step — that is always Standard.

Lambda Reserved ConcurrencyLambda Provisioned Concurrency

Reserved Concurrency = sets a hard cap on how many concurrent executions a function can have. Guarantees that capacity is available (no competition) but also acts as a throttle. Provisioned Concurrency = pre-warms execution environments to eliminate cold starts. Costs money even when no invocations are happening. If the question is about protecting downstream services from Lambda bursts, choose reserved. If the question is about eliminating cold start latency for latency-sensitive APIs, choose provisioned.

Amazon Kinesis Data StreamsAmazon Data Firehose

Kinesis Data Streams = real-time (<200ms), custom consumer logic, multiple independent consumers, configurable retention (1-365 days), requires shard management or on-demand mode. Firehose = fully managed delivery to S3/Redshift/OpenSearch/Splunk, minimum 60-second buffer, no custom consumer, auto-scales. If the question asks for real-time processing with custom logic, use Streams. If the question just needs data reliably delivered to a destination, Firehose is simpler.

CloudFront FunctionsLambda@Edge

CloudFront Functions = viewer request/response only, sub-millisecond, no network access, 10 KB code limit, lowest cost. Use for URL rewrites, header manipulation, cache key normalization. Lambda@Edge = all four CloudFront trigger points, up to 30 seconds (origin), full Node.js/Python, can make network calls. Use for auth at edge, A/B testing, dynamic origin selection. The key disqualifier for CloudFront Functions: any network call or computation beyond simple string manipulation.

SQS StandardSQS FIFO

Standard = unlimited throughput, at-least-once delivery, best-effort ordering. Messages may arrive out of order or be processed multiple times — design consumers to be idempotent. FIFO = 300 TPS (3,000 with batching), exactly-once processing within a message group, strict ordering within groups. Use FIFO when order matters and duplicate processing would cause incorrect results (financial transactions, sequential state changes).

Scenario Tips

If the question asks about:

Question describes a multi-region architecture where 'writes must be strongly consistent' and asks which database provides active-active with sub-second replication

Answer:

The answer is Aurora Global Database, NOT DynamoDB Global Tables. Aurora Global has a single writer — it is active-passive from the write perspective. The trap answer is Global Tables because 'active-active' sounds right, but Global Tables has last-writer-wins conflict resolution which violates strong consistency.

Distractor to avoid:

DynamoDB Global Tables is wrong because it uses eventual consistency with last-writer-wins. It is multi-master but that does not mean strongly consistent.

If the question asks about:

Question asks for a 'cost-effective' solution for a workload that is bursty, with traffic spikes 10x base load for short periods, needing to process messages reliably

Answer:

SQS + Lambda is almost always the answer for bursty workloads. SQS absorbs the burst, Lambda scales automatically. The question is testing whether you know that SQS acts as a buffer and Lambda scales with the queue depth without over-provisioning servers.

Distractor to avoid:

EC2 Auto Scaling is wrong because it takes minutes to scale and you'd need to over-provision to handle instantaneous spikes. Kinesis is wrong if the question says 'messages' not 'streaming data'.

If the question asks about:

Question describes a document workflow where some documents need human review before processing continues, taking anywhere from hours to days

Answer:

Step Functions Standard Workflow with Wait for Callback (taskToken) pattern. The task sends a token to an external system (email, SQS, etc.), then pauses indefinitely. When a human approves, they call SendTaskSuccess with the token, and the workflow resumes.

Distractor to avoid:

Express Workflow cannot wait more than 5 minutes. SQS-based approach loses the workflow state and audit trail.

If the question asks about:

Question asks for near-real-time delivery of clickstream data to S3 for analysis, with no requirement for custom processing logic

Answer:

Amazon Data Firehose directly to S3. It handles buffering, batching, compression, and encryption automatically. No infrastructure to manage, no code to write.

Distractor to avoid:

Kinesis Data Streams is wrong for this scenario because it requires a consumer application to read and write to S3 — adding unnecessary complexity. Firehose is the managed delivery answer.

Last-Minute Facts

1API Gateway integration timeout: 29 seconds (hard limit, cannot be raised)

2Step Functions Standard max duration: 1 year. Express max duration: 5 minutes

3DynamoDB item size limit: 400 KB

4Lambda max timeout: 15 minutes. Lambda max memory: 10,240 MB

5SQS max message size: 256 KB. SQS max retention: 14 days. SQS default visibility timeout: 30 seconds

6SNS max message size: 256 KB

7Aurora Global Database secondary regions: up to 5. Promotion time: under 1 minute

8DynamoDB Global Tables replication lag: typically under 1 second

9Kinesis Data Streams default retention: 24 hours. Max retention: 365 days

10EventBridge event size limit: 256 KB

Domain 226% of exam

Design Solutions for Organizational Complexity

Must-Know Facts

SCPs set maximum permissions for accounts and OUs. They do NOT grant permissions. An allow SCP still requires an IAM policy that also allows. An SCP Deny overrides everything.
The management account (root account) of an AWS Organization is NEVER affected by SCPs — not even an explicit Deny in an SCP attached to the root OU restricts the management account.
SCP inheritance cascades downward: an SCP on a parent OU applies to all child OUs and all accounts within them. An account can be restricted by multiple SCPs simultaneously.
Control Tower preventive guardrails are implemented as SCPs. Detective guardrails are implemented as AWS Config rules. These are different mechanisms with different enforcement behavior.
Transit Gateway is Regional. Cross-region connectivity requires Transit Gateway peering between two TGWs in different regions.
VPC Peering is NOT transitive: if A peers with B and B peers with C, A cannot reach C through B. At scale, Transit Gateway replaces the n(n-1)/2 peering mesh.
RAM shared subnets: the owning account creates the VPC and subnets, shares specific subnets to participant accounts. Participants deploy resources into the shared subnets but do not control the VPC or subnets.
Direct Connect does NOT encrypt traffic. For encryption over Direct Connect, add an IPsec VPN tunnel (MACsec provides Layer 2 encryption on dedicated 10G+ connections).
IAM Identity Center (SSO) is the preferred approach for human access to multiple accounts. It creates IAM roles (permission sets) in each account — do not create individual IAM users per account.
For cross-account access by applications, always use IAM roles with trust policies and STS AssumeRole. Never use long-term access keys.

Common Traps

TrapThinking an SCP Allow is enough to grant a user permissions

RealitySCPs define the maximum possible permissions but grant nothing on their own. A user still needs an IAM policy that explicitly allows the action. Effective permissions = IAM identity policy ∩ Permission Boundary (if set) ∩ SCPs. All three must allow the action for it to succeed.

TrapApplying an SCP to the management account to restrict it

RealitySCPs do not apply to the management account — ever. The management account is exempt by design. This is exactly why AWS recommends running no workloads in the management account. An SCP on the root OU will restrict all accounts except the management account.

TrapAssuming Direct Connect provides encryption

RealityDirect Connect is a private dedicated connection but traffic is not encrypted in transit by default. To encrypt traffic over Direct Connect, you either run a VPN tunnel over it (IPsec) or use MACsec for Layer 2 encryption on dedicated 10 Gbps/100 Gbps connections. If a question mentions compliance requiring encryption in transit on the private link, the answer involves adding VPN or MACsec.

TrapUsing VPC Peering for a hub-and-spoke network with 10+ VPCs

RealityVPC Peering does not support transitive routing and does not scale. 10 VPCs fully meshed requires 45 peering connections. Transit Gateway is the answer for enterprise-scale hub-and-spoke: one TGW attachment per VPC, centralized route tables, supports segmentation. Every SAP-C02 enterprise networking scenario uses Transit Gateway.

TrapConfusing Direct Connect Gateway with Transit Gateway for on-premises connectivity

RealityDirect Connect Gateway lets one Direct Connect connection reach VPCs across multiple regions (hub for the DX link itself). It does NOT enable VPC-to-VPC routing through the Direct Connect connection. For VPC-to-VPC routing AND on-premises connectivity through a single hub, attach Direct Connect to a Transit Gateway. The DXGW + TGW combination is the enterprise pattern.

TrapThinking Control Tower guardrails are all implemented the same way

RealityPreventive guardrails use SCPs — they block actions before they happen. Detective guardrails use AWS Config rules — they detect non-compliant configurations after the fact. A question asking to PREVENT an action needs SCPs. A question asking to DETECT or AUDIT a configuration uses Config. The exam tests whether you know which type of guardrail is appropriate for the requirement.

Confusing Pairs

SCPs (Service Control Policies)IAM Permission Boundaries

SCPs = applied at the account or OU level, restrict all principals in that account (except management account and service-linked roles). Set by the organization admin. Permission Boundaries = applied to a specific IAM user or role, restrict that entity's max permissions. Set by an account admin. Both restrict but neither grants. Use SCPs for org-wide governance, use permission boundaries for safe delegation (e.g., allowing a team to create roles but only up to a defined ceiling).

Transit GatewayDirect Connect Gateway

Transit Gateway = regional network hub connecting VPCs, VPN connections, and Direct Connect. Handles VPC-to-VPC routing and on-premises routing in one place. Supports route table segmentation. Direct Connect Gateway = allows a single Direct Connect connection to access VPCs in multiple regions without a separate connection per region. Does NOT route VPC-to-VPC traffic. Choose TGW when you need transitive routing. Use DXGW when you just need one DX link to reach multiple regions.

AWS Control TowerCloudFormation StackSets

Control Tower = a governance and account vending product. Sets up the landing zone, account factory, mandatory guardrails, centralized logging. Uses StackSets internally. CloudFormation StackSets = a deployment mechanism that pushes a CloudFormation stack to multiple accounts and regions. Use Control Tower to govern your org. Use StackSets to deploy specific infrastructure across accounts. They complement each other — Control Tower even supports deploying custom StackSets alongside its own guardrails.

AWS PrivateLinkVPC Peering

PrivateLink = one-directional, service-level exposure. A provider creates an Endpoint Service backed by an NLB; consumers create Interface VPC Endpoints to reach that specific service. No routing changes needed, works across overlapping CIDRs, provider cannot initiate connections. VPC Peering = bidirectional, full network-level connectivity between two VPCs. Any resource in one VPC can reach any resource in the other. Requires non-overlapping CIDRs. PrivateLink is more secure for service-to-service exposure; peering for when full network access is needed.

CloudTrail Organization TrailCloudTrail Per-Account Trail

Organization Trail = created in the management account, automatically applies to all current and future member accounts, centrally managed, stored in a central S3 bucket in the Log Archive account. Member accounts cannot modify or delete it. Per-Account Trail = created within each account independently, each account manages its own. For enterprise compliance, always use the Organization Trail — it provides tamper-resistant centralized audit logs.

Scenario Tips

If the question asks about:

Question describes 200 accounts and asks how to prevent EC2 launch in non-approved regions across all accounts with minimal ongoing effort

Answer:

SCP at the root OU with a Deny for all regions except approved ones, using NotAction for global services (IAM, STS, CloudFront, Route 53, Support) to prevent breaking those. The SCP cascades automatically to all accounts and new accounts added to the org.

Distractor to avoid:

AWS Config rules are wrong because they are detective (detect after the fact) and would require per-account deployment. IAM policies per account require managing 200 individual policies. Only SCPs can preventively enforce this at org scale.

If the question asks about:

Question asks for network isolation between dev/staging/prod environments sharing a central logging VPC, across multiple accounts

Answer:

Transit Gateway with separate route tables per environment. Each environment's TGW route table has routes only to the shared logging VPC attachment, not to other environments. This provides isolation between environments while allowing all of them to reach shared services.

Distractor to avoid:

VPC Peering cannot provide this topology because it does not support transitive routing. A single shared VPC with security groups does not provide true account-level isolation.

If the question asks about:

Question involves a partner vendor needing cross-account access to an S3 bucket in your account, with the concern about confused deputy attacks

Answer:

Create an IAM role in your account with a trust policy allowing the partner's account. Add an ExternalId condition to the trust policy and share that ExternalId only with the specific partner. This prevents any other principal in the partner account from assuming the role even if they know your role ARN.

Distractor to avoid:

Resource-based S3 bucket policies alone (without ExternalId on the role) are vulnerable to confused deputy if the partner serves multiple clients.

If the question asks about:

Question asks for the 'least operational overhead' way to ensure all new AWS accounts automatically have CloudTrail, GuardDuty, and Security Hub enabled

Answer:

AWS Control Tower with the mandatory and recommended guardrails enables CloudTrail organization-wide automatically. GuardDuty and Security Hub can be enabled as delegated administrator from the management account via Organizations integration — new accounts automatically enroll. This requires zero per-account configuration.

Distractor to avoid:

Lambda + CloudWatch Events triggered on account creation works but requires custom code maintenance. StackSets work but require triggering on account creation events. The managed services (Control Tower, GuardDuty org-wide enrollment) are always preferred for 'least operational overhead'.

If the question asks about:

Question asks for a hub-and-spoke network connecting 50 VPCs across 3 regions with on-premises data center, where spoke VPCs must not communicate with each other

Answer:

Transit Gateway per region with inter-region TGW peering. Multiple TGW route tables: one for shared services, separate tables for spoke VPCs. Configure route tables so spoke VPCs only have routes to shared services, not to each other. Attach Direct Connect to TGW via Direct Connect Gateway for on-premises connectivity.

Distractor to avoid:

VPC Peering is wrong at this scale and cannot block transitive traffic. PrivateLink requires an NLB per service and cannot replace full VPC connectivity.

Last-Minute Facts

1SCP maximum policy size: 5,120 characters per policy

2Management account is NEVER restricted by SCPs

3Maximum accounts per AWS Organization: no hard limit (effectively thousands)

4Maximum OUs per organization: 1,000

5Transit Gateway: supports up to 5,000 VPC attachments per gateway

6VPC Peering: non-transitive, no overlapping CIDR ranges, max 125 peering connections per VPC

7Direct Connect connection speeds: 1, 10, 100, 400 Gbps (hosted: 50 Mbps – 10 Gbps)

8IAM Identity Center permission sets are deployed as IAM roles in each member account

9Control Tower preventive guardrails = SCPs. Detective guardrails = Config rules

10AWS RAM resource sharing within an Organization does NOT require acceptance — enabled with one setting

Domain 325% of exam

Continuous Improvement for Existing Solutions

Must-Know Facts

The question gives you a working architecture and asks for the BEST improvement. Your job is to identify the bottleneck first, then pick the fix. Do not add complexity that does not address the stated bottleneck.
RDS Multi-AZ standby does NOT serve read traffic. Adding Multi-AZ improves availability (HA), not read performance. If the database is the bottleneck on reads, the answer is Read Replicas or ElastiCache, not Multi-AZ.
ElastiCache write-through pattern keeps the cache always current but increases write latency. Lazy loading (cache-aside) is faster for writes but may serve stale data and has a cache miss penalty. Questions will specify whether stale data is acceptable.
CloudWatch detailed monitoring collects metrics every 1 minute (costs money per instance). Standard monitoring is every 5 minutes. High-resolution custom metrics can go to 1 second.
Auto Scaling cooldown period prevents repeated scale-out/scale-in. If scaling reacts too slowly or thrashes, review cooldown and warm-up settings before adding more capacity.
Cost Explorer shows historical spend. Compute Optimizer recommends rightsizing. Trusted Advisor flags idle resources. AWS Budgets alerts when thresholds are exceeded. These four tools are different and target different use cases.
The strangler fig pattern incrementally replaces a monolith: route specific traffic paths to new microservices while keeping the monolith running. The least-coupled component is extracted first.
S3 Lifecycle policies tier objects from Standard to Infrequent Access to Glacier based on age. Use Intelligent-Tiering for unknown/unpredictable access patterns — it monitors and moves objects automatically.
Systems Manager Session Manager replaces bastion hosts for operational access to EC2 instances. No SSH keys to manage, no open inbound ports, full audit logging. Always preferred over bastion hosts for 'least operational overhead' scenarios.

Common Traps

TrapAdding more EC2 instances to solve a database bottleneck

RealityIf CloudWatch shows EC2 CPU is low (30-50%) but RDS CPU is at 90%+, the bottleneck is the database — not the application tier. Adding more EC2 instances increases the load on an already-saturated database and makes things worse. The correct fix is caching (ElastiCache), read replicas, or database rightsizing.

TrapChoosing RDS Multi-AZ to improve database read performance

RealityMulti-AZ standby is a hot standby for failover only — it does not serve reads. It improves availability (HA), not performance. To offload reads, use RDS Read Replicas or add ElastiCache in front of the database.

TrapConfusing AWS Cost Explorer, Compute Optimizer, Trusted Advisor, and AWS Budgets

RealityCost Explorer = visualize and analyze past spending, trend forecasting. Compute Optimizer = ML-based right-sizing recommendations for EC2, ASG, Lambda, EBS, ECS on Fargate. Trusted Advisor = broader checklist (security, fault tolerance, performance, cost, service limits). Budgets = set spending alerts. Questions test whether you pick the right tool: if the question asks to 'identify over-provisioned resources', Compute Optimizer is the answer, not Cost Explorer.

TrapModernizing by containerizing a monolith onto ECS/EKS as the first step

RealityContainerizing a monolith gives you portability but does not improve reliability or reduce blast radius. The better improvement pattern (when the question asks for reliability with least effort) is extracting one loosely-coupled component (like notifications) while adding ALB + Auto Scaling to the remaining monolith. Full containerization is more effort than required.

TrapUsing CloudWatch standard monitoring for Auto Scaling when sub-5-minute scaling response is needed

RealityStandard CloudWatch metrics are published every 5 minutes. An Auto Scaling policy reacting to standard metrics can take up to 5 minutes to even see the breach. Enable detailed monitoring (1-minute metrics) for EC2 instances in ASGs when you need faster scaling response.

Confusing Pairs

ElastiCache Lazy Loading (Cache-Aside)ElastiCache Write-Through

Lazy Loading = application checks cache first; on miss, reads from DB and writes to cache. Pro: only caches requested data. Con: cache miss adds latency, data can go stale until TTL expires. Write-Through = every write to DB also writes to cache. Pro: cache is always fresh. Con: write latency increases, cache fills with data that may never be read. If the scenario says 'some stale data is acceptable' or asks to 'minimize write latency', lazy loading wins. If 'data freshness is critical', write-through wins.

AWS Compute OptimizerAWS Cost Explorer

Compute Optimizer = uses ML to analyze CloudWatch utilization data and recommend optimal instance types, memory settings (Lambda), volume types (EBS). Actionable rightsizing recommendations. Cost Explorer = financial analytics tool showing spend trends, forecasts, reserved instance coverage, savings plan recommendations based on spend patterns. For 'reduce cost by rightsizing infrastructure', choose Compute Optimizer. For 'analyze where money is being spent', choose Cost Explorer.

Pilot Light DRWarm Standby DR

Pilot Light = only the data tier (database) is running in the DR region. Compute is pre-configured but stopped or not yet launched. RTO is tens of minutes (time to start/scale compute). RPO is minutes (last replication point). Warm Standby = a scaled-down but fully operational copy of the production environment runs in the DR region. Can serve traffic immediately at reduced capacity, then scale out. RTO is minutes. More expensive than Pilot Light. When a question gives you a specific RTO target and asks for the cheapest solution that meets it, Pilot Light works for RTO ~30 min. Warm Standby for RTO ~5-15 min.

S3 Lifecycle PoliciesS3 Intelligent-Tiering

Lifecycle Policies = rule-based transitions based on object age (e.g., move to IA after 30 days, Glacier after 90 days). Best when access patterns are predictable and known upfront. Intelligent-Tiering = monitors access patterns per object and automatically moves between frequent/infrequent/archive tiers. No retrieval fees for frequent/infrequent. Small per-object monitoring fee (~$0.0025/1,000 objects/month). Minimum 128 KB object size for cost benefit. Use Intelligent-Tiering when access patterns are unknown or unpredictable.

Scenario Tips

If the question asks about:

ALB → EC2 ASG → RDS architecture. Response times spike. CloudWatch shows EC2 CPU at 40%, RDS CPU at 95%

Answer:

The bottleneck is RDS. Add ElastiCache (Redis or Memcached) to cache frequently-read query results. This is the most impactful change because it reduces RDS load without changing the application significantly.

Distractor to avoid:

Scaling up the RDS instance (vertical scaling) is a temporary band-aid and expensive. Adding EC2 instances increases DB load further. Multi-AZ does not help performance.

If the question asks about:

Question asks for a modernization strategy that gives the BEST reliability improvement with the LEAST effort on a legacy monolith

Answer:

Extract the least-coupled component (often notifications, reporting, or email) to a managed service (Lambda + SNS/SES), then add ALB + Auto Scaling to the remaining monolith. This is the strangler fig pattern — incremental improvement without a full rewrite.

Distractor to avoid:

Refactoring to microservices on EKS is the highest-effort option and takes months. Containerizing the monolith alone does not improve reliability — multiple identical containers failing is the same as one failing.

If the question asks about:

Question asks for the cheapest solution with an RTO of 30 minutes and RPO of 15 minutes for a non-critical internal application

Answer:

Pilot Light DR: replicate the database to the DR region continuously, but keep compute stopped or as AMIs. At failover, launch instances from pre-tested AMIs and point them at the DR database. Meets the 30-minute RTO and database-replication-based RPO.

Distractor to avoid:

Warm Standby meets the same RPO/RTO but costs more because a smaller version of the environment runs continuously. Backup-and-Restore is cheapest but RTO is hours, not 30 minutes.

If the question asks about:

Question asks how to identify EC2 instances that are over-provisioned (using less than 20% CPU consistently) across 5 accounts

Answer:

Enable AWS Compute Optimizer at the organization level. It analyzes CloudWatch metrics across all accounts and generates rightsizing recommendations showing which instance types to downsize and the estimated savings.

Distractor to avoid:

AWS Trusted Advisor can flag idle resources but does not give the same ML-based precision as Compute Optimizer. Cost Explorer shows spend but not which specific instances are underutilized.

Last-Minute Facts

1CloudWatch standard metric resolution: 5 minutes. Detailed monitoring: 1 minute. High-resolution custom metrics: 1 second

2ElastiCache Redis supports data persistence (RDB snapshots, AOF). Memcached does not persist data

3RDS Read Replicas: up to 15 for Aurora, up to 5 for other engines

4RDS Multi-AZ standby: synchronous replication (no data loss on failover). Read Replicas: asynchronous replication

5Auto Scaling default cooldown: 300 seconds (5 minutes)

6S3 Intelligent-Tiering minimum object size for cost benefit: 128 KB

7S3 IA minimum storage duration charge: 30 days. Glacier Instant Retrieval: 90 days. Glacier Deep Archive: 180 days

8Compute Optimizer requires CloudWatch agent for memory metrics (not collected by default)

9Systems Manager Session Manager: no inbound port 22 required, full audit via CloudTrail

Domain 420% of exam

Accelerate Workload Migration and Modernization

Must-Know Facts

The 6 Rs in order of effort: Retire < Retain < Rehost < Replatform < Repurchase < Refactor. Questions describe a workload and ask which R applies — match the description to the definition.
Rehost = lift-and-shift with no changes, use AWS MGN (Application Migration Service). Replatform = minor changes to optimize for cloud (e.g., swap self-managed MySQL for RDS). Refactor/Re-architect = redesign for cloud-native (microservices, serverless, managed services).
AWS Application Migration Service (MGN) replaced Server Migration Service (SMS). If SMS appears as an answer, it is outdated — treat it as a distractor.
DMS requires a source and target endpoint and a replication instance. For minimal downtime, use Full Load + CDC (Change Data Capture). CDC captures ongoing changes while the initial load runs, so the target catches up. Cutover happens when the target is current.
SCT (Schema Conversion Tool) is used for HETEROGENEOUS migrations (different source and target engines, e.g., Oracle to Aurora PostgreSQL). For same-engine migrations (MySQL to Aurora MySQL), SCT is not needed.
The network-vs-physical transfer calculation: at 1 Gbps dedicated, 200 TB ≈ 18.5 days (formula: TB × 8000 / Gbps / 3600 = hours). Always calculate before recommending a transfer method. If network transfer cannot meet the deadline, exam questions typically expect a Snow Family answer, though Snow devices are now restricted to existing AWS customers (as of Nov 2025) — on exam, treat them as valid options.
Application Discovery Service: agentless mode works only on VMware vCenter and collects basic host info. Agent-based mode works on any OS, collects detailed performance data and network dependencies. When the question says 'application dependencies are unknown', use agent-based.
Storage Gateway File Gateway: on-premises NFS/SMB applications read/write files that are backed by S3. Transparent to applications — they see a normal file share. Use for hybrid access where on-premises apps need to write to S3 without application changes.
For VMware environments, AWS MGN with the agentless replication option (vCenter integration) avoids installing agents on 500+ VMs.

Common Traps

TrapUsing SCT for a homogeneous database migration (MySQL to Aurora MySQL)

RealitySCT is only needed for heterogeneous migrations where the source and target use different database engines. For same-engine migrations, use DMS directly without SCT. Questions describing Oracle to PostgreSQL, SQL Server to Aurora, or SQL Server to MySQL require both SCT and DMS. Questions describing MySQL to Aurora MySQL need only DMS.

TrapRecommending a Snowball when network transfer would complete within the deadline

RealityAlways calculate network transfer time before recommending physical transfer. If 50 TB with a 1 Gbps connection would take 4.6 days and the deadline is 2 weeks, network transfer (DataSync over Direct Connect or internet) is faster and simpler. Snowball ordering, shipping, and loading also takes time. Only recommend Snowball when network transfer cannot meet the time requirement.

TrapUsing AWS MGN (Application Migration Service) as a synonym for AWS Elastic Disaster Recovery (DRS)

RealityMGN = migration tool for moving workloads from on-premises or other clouds TO AWS. After cutover, the source is decommissioned. DRS (Elastic Disaster Recovery) = ongoing DR tool that continuously replicates to a staging area for failover. If the source keeps running and you want DR, use DRS. If you are migrating and decommissioning the source, use MGN.

TrapChoosing DataSync to migrate a live Oracle production database

RealityDataSync is for file and object storage (NFS, SMB, HDFS, S3, EFS, FSx). It cannot migrate databases. For database migration with minimal downtime, use DMS with Full Load + CDC. DataSync cannot capture database transactions.

TrapSelecting 'Refactor' as the migration strategy when the question asks for minimal time/effort

RealityRefactor (re-architect) provides the most cloud-native benefit but takes the most time, cost, and effort — it means redesigning the application. Questions asking for 'fastest migration' or 'minimal code changes' point to Rehost or Replatform. Questions asking for 'maximum cloud benefit' or 'take advantage of cloud-native features' point to Refactor.

Confusing Pairs

AWS DMS (Database Migration Service)AWS DataSync

DMS = migrates and replicates DATABASES. Supports Full Load (bulk copy) + CDC (ongoing change replication). Handles heterogeneous migrations with SCT. Source/target engines: Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, DynamoDB, and more. DataSync = transfers FILE and OBJECT data at high speed with scheduling and integrity verification. Works with NFS, SMB, HDFS, S3, EFS, FSx. Cannot read database transaction logs or handle schema conversion. Never use DMS for files, never use DataSync for databases.

AWS MGN (Application Migration Service)AWS Elastic Disaster Recovery (DRS)

MGN = one-time migration. Replicates servers to AWS, you perform test migrations, then cutover and decommission the source. Purpose is to move workloads off the source environment permanently. DRS = ongoing DR. Continuously replicates production servers to a low-cost staging area in AWS. On a real disaster, launch the recovery instances. Source keeps running normally. Use MGN to migrate, use DRS for ongoing DR after you're on AWS.

RehostReplatform

Rehost = move the application to AWS with NO changes to the application code or architecture. Uses MGN to replicate and launch the same server image on EC2. Fastest, lowest risk, lowest cloud benefit. Replatform = make targeted optimizations without changing core architecture. Example: swap self-managed MySQL on EC2 for Amazon RDS (same MySQL, managed service), swap self-managed Tomcat for Elastic Beanstalk. Some code or config changes required but the application logic is unchanged. Better cloud benefit, more effort than Rehost.

Application Discovery Service (agentless)Application Discovery Service (agent-based)

Agentless = deploys as a VMware appliance into vCenter. Collects basic VM info (IP, MAC, hostname, CPU/memory allocation, running processes). Only works for VMware vCenter environments. Agent-based = installs a lightweight agent on each server (Windows/Linux). Collects detailed performance data, process info, and network connection data to map application dependencies. Works on any OS, physical or virtual. When question says 'map application dependencies' or 'unknown dependencies', agent-based is required.

AWS Storage Gateway (File Gateway)AWS DataSync

File Gateway = permanent hybrid integration. On-premises applications continue to access files via NFS/SMB as if it were a local file share, but data is stored in S3. Local cache stores frequently accessed files. No migration timeline — it is an ongoing solution. DataSync = scheduled or one-time high-speed data transfer. Optimized for bulk transfers, supports scheduling, provides integrity verification. Use File Gateway when on-premises apps need continuous S3 access. Use DataSync for bulk migration or recurring scheduled transfers.

Scenario Tips

If the question asks about:

Question describes migrating a 100TB Oracle database to Aurora PostgreSQL with a requirement for less than 2 hours of downtime

Answer:

Use AWS SCT to convert the Oracle schema and stored procedures to PostgreSQL, then AWS DMS with Full Load + CDC. During Full Load, ongoing Oracle transactions are captured via CDC. Once Full Load completes, let CDC replication catch up until lag is near zero, then stop the application briefly, verify the target is current, and switch the connection string. Total downtime = seconds to minutes for the final cutover.

Distractor to avoid:

DMS Full Load only requires a maintenance window for the entire 100 TB dump/load cycle — far longer than 2 hours. Exporting to S3 and re-importing requires even longer downtime.

If the question asks about:

Question asks which migration strategy applies when a company is moving from a self-managed Apache Kafka on EC2 to Amazon MSK

Answer:

Replatform. The application logic and Kafka workload remain unchanged, but the infrastructure is swapped for a managed AWS service (MSK). This is the classic 'lift-and-reshape' scenario where you keep the same technology but adopt the managed version.

Distractor to avoid:

Rehost would mean keeping self-managed Kafka but just moving it to different EC2 instances. Refactor would mean redesigning to use a completely different streaming architecture like Kinesis.

If the question asks about:

Question describes 500 on-premises VMware VMs to migrate with unknown application dependencies and a requirement for less than 30 minutes downtime per server

Answer:

Step 1: Deploy Application Discovery Service with agent-based collection to map dependencies (agents reveal network connections). Step 2: Use AWS MGN — it does continuous block-level replication, allows multiple test migrations before cutover, and cutover takes under 15 minutes.

Distractor to avoid:

Agentless Discovery only works with VMware and only gives basic VM inventory — it will not map application dependencies between servers. VM Import/Export requires VM shutdown during export.

If the question asks about:

Question asks for the fastest way to transfer 300 TB of archive data from on-premises NAS to S3 Glacier Deep Archive, with a 10-day deadline and a 500 Mbps internet connection

Answer:

Physical transfer via Snowball Edge Storage Optimized (210 TB per device, 2 devices cover 300 TB). Load them in parallel on-premises, ship to AWS. AWS ingests and stores data in S3. Network transfer calculation: 300 TB × 8000 / 0.5 Gbps / 3600 = ~1,333 hours (~56 days) — far beyond the 10-day deadline.

Distractor to avoid:

DataSync or S3 Transfer Acceleration at 500 Mbps cannot meet the 10-day deadline for 300 TB. Direct Connect takes weeks to provision. Note: exam questions may still reference Snowball scenarios even though new customer availability ended November 2025 — treat it as a known exam pattern.

Last-Minute Facts

1Snowball Edge Storage Optimized (current generation): 210 TB NVMe usable storage. 80 TB model discontinued November 2024. Note: as of November 2025, only available to existing customers — new customers should use DataSync or Data Transfer Terminal

2Snowball Edge Compute Optimized (current generation): 28 TB NVMe usable, 104 vCPUs, 416 GB RAM. 52 vCPU/GPU model discontinued November 2024

3Snowcone: DISCONTINUED November 2024. No longer available to new or existing customers. Was 8 TB HDD or 14 TB SSD

4Snowmobile: 100 PB per truck (exabyte-scale migration)

5Network transfer time formula: Size (TB) × 8000 / Bandwidth (Gbps) / 3600 = hours

6DMS CDC (Change Data Capture) uses source database transaction logs — requires log retention to be enabled

7SCT assessment report estimates the percentage of code that can be automatically converted vs requires manual work

8AWS MGN: replaces SMS (Server Migration Service). If you see SMS in an answer, it is the wrong/outdated option

9Application Discovery Service home region must match AWS Migration Hub home region

10Transfer Family supports SFTP, FTPS, FTP, AS2 protocols — for partner file exchange with existing protocols

Feeling confident?

Put your knowledge to the test with a timed SAP-C02 mock exam.