General Exam Tips
- 1.Read the LAST sentence of each question first — it states what you are actually being asked. Then read the scenario for constraints.
- 2.Flag and skip: never spend more than 2.5 minutes on a question the first pass. You have exactly 2.4 min/question on average.
- 3.Every question has 2–3 options that sound plausible. The exam is designed to test trade-offs, not trivia. Ask: which option best satisfies ALL stated constraints simultaneously?
- 4.Hunt for constraint keywords before picking an answer: 'least operational overhead', 'minimize cost', 'no downtime', 'compliance requirement', 'existing investment'. These words eliminate whole answer categories.
- 5.If an answer requires you to write custom code or manage infrastructure that a managed service already handles, it is almost always wrong at the Professional level.
- 6.Trust your first instinct — only change an answer if you can articulate a specific reason why it is wrong. Random second-guessing is the #1 way to drop from 780 to 730.
- 7.Non-native English speakers: request ESL accommodation before exam day for an extra 30 minutes. It is free and widely available.
- 8.The exam has unscored pilot questions. Some questions that feel impossible may not count — do not let them derail you emotionally.
- 9.You need 750/1000 on a scaled score. Of the 75 questions, 65 are scored and 10 are unscored pilots. Scoring is scaled (not a raw percentage), but a rough rule: you can miss roughly 16–17 scored questions and still pass. Pace accordingly and do not sacrifice a certain question for an uncertain one.
- 10.On multi-select questions, the number of required answers is stated. If it says 'choose 2', eliminate options until exactly 2 remain — partial credit does not apply.
Quick Navigation
Design for New Solutions
Must-Know Facts
- DynamoDB Global Tables = multi-master active-active across regions, last-writer-wins conflict resolution, asynchronous replication typically under 1 second. NOT appropriate when strong consistency for concurrent writes is required.
- Aurora Global Database = single writer, up to 5 read-only secondary regions, storage-level replication under 1 second lag, secondary can be promoted in under 1 minute. Use when write consistency is critical but you still want multi-region reads.
- Step Functions Standard vs Express: Standard = up to 1 year duration, exactly-once semantics, auditable execution history, charged per state transition. Express = up to 5 minutes, at-least-once, high-volume (100K+ executions/sec), charged per duration. The exam will give you a human-approval workflow — that requires Standard.
- API Gateway 29-second hard timeout for Lambda integrations cannot be increased. Any workflow that might exceed this needs asynchronous design (SQS, Step Functions, EventBridge).
- Lambda reserved concurrency guarantees capacity AND throttles above the limit. Provisioned concurrency eliminates cold starts but costs money at idle. These are different tools for different problems.
- EventBridge can route events cross-account and cross-region. A central event bus in a security or logging account can aggregate events from all org accounts.
- Kinesis Data Streams consumer records are retained 1–365 days. Multiple consumers can independently process the same stream. Firehose is delivery-only — no custom consumer logic, minimum 60-second buffering.
- SQS visibility timeout must be longer than your Lambda processing time or messages get processed twice. DLQ captures messages that fail after maxReceiveCount attempts.
- For strictly ordered message processing with exactly-once deduplication: SQS FIFO. For fan-out to multiple subscribers: SNS + SQS fan-out pattern. These are different problems.
Common Traps
Confusing Pairs
Scenario Tips
Question describes a multi-region architecture where 'writes must be strongly consistent' and asks which database provides active-active with sub-second replication
The answer is Aurora Global Database, NOT DynamoDB Global Tables. Aurora Global has a single writer — it is active-passive from the write perspective. The trap answer is Global Tables because 'active-active' sounds right, but Global Tables has last-writer-wins conflict resolution which violates strong consistency.
DynamoDB Global Tables is wrong because it uses eventual consistency with last-writer-wins. It is multi-master but that does not mean strongly consistent.
Question asks for a 'cost-effective' solution for a workload that is bursty, with traffic spikes 10x base load for short periods, needing to process messages reliably
SQS + Lambda is almost always the answer for bursty workloads. SQS absorbs the burst, Lambda scales automatically. The question is testing whether you know that SQS acts as a buffer and Lambda scales with the queue depth without over-provisioning servers.
EC2 Auto Scaling is wrong because it takes minutes to scale and you'd need to over-provision to handle instantaneous spikes. Kinesis is wrong if the question says 'messages' not 'streaming data'.
Question describes a document workflow where some documents need human review before processing continues, taking anywhere from hours to days
Step Functions Standard Workflow with Wait for Callback (taskToken) pattern. The task sends a token to an external system (email, SQS, etc.), then pauses indefinitely. When a human approves, they call SendTaskSuccess with the token, and the workflow resumes.
Express Workflow cannot wait more than 5 minutes. SQS-based approach loses the workflow state and audit trail.
Question asks for near-real-time delivery of clickstream data to S3 for analysis, with no requirement for custom processing logic
Amazon Data Firehose directly to S3. It handles buffering, batching, compression, and encryption automatically. No infrastructure to manage, no code to write.
Kinesis Data Streams is wrong for this scenario because it requires a consumer application to read and write to S3 — adding unnecessary complexity. Firehose is the managed delivery answer.
Last-Minute Facts
Design Solutions for Organizational Complexity
Must-Know Facts
- SCPs set maximum permissions for accounts and OUs. They do NOT grant permissions. An allow SCP still requires an IAM policy that also allows. An SCP Deny overrides everything.
- The management account (root account) of an AWS Organization is NEVER affected by SCPs — not even an explicit Deny in an SCP attached to the root OU restricts the management account.
- SCP inheritance cascades downward: an SCP on a parent OU applies to all child OUs and all accounts within them. An account can be restricted by multiple SCPs simultaneously.
- Control Tower preventive guardrails are implemented as SCPs. Detective guardrails are implemented as AWS Config rules. These are different mechanisms with different enforcement behavior.
- Transit Gateway is Regional. Cross-region connectivity requires Transit Gateway peering between two TGWs in different regions.
- VPC Peering is NOT transitive: if A peers with B and B peers with C, A cannot reach C through B. At scale, Transit Gateway replaces the n(n-1)/2 peering mesh.
- RAM shared subnets: the owning account creates the VPC and subnets, shares specific subnets to participant accounts. Participants deploy resources into the shared subnets but do not control the VPC or subnets.
- Direct Connect does NOT encrypt traffic. For encryption over Direct Connect, add an IPsec VPN tunnel (MACsec provides Layer 2 encryption on dedicated 10G+ connections).
- IAM Identity Center (SSO) is the preferred approach for human access to multiple accounts. It creates IAM roles (permission sets) in each account — do not create individual IAM users per account.
- For cross-account access by applications, always use IAM roles with trust policies and STS AssumeRole. Never use long-term access keys.
Common Traps
Confusing Pairs
Scenario Tips
Question describes 200 accounts and asks how to prevent EC2 launch in non-approved regions across all accounts with minimal ongoing effort
SCP at the root OU with a Deny for all regions except approved ones, using NotAction for global services (IAM, STS, CloudFront, Route 53, Support) to prevent breaking those. The SCP cascades automatically to all accounts and new accounts added to the org.
AWS Config rules are wrong because they are detective (detect after the fact) and would require per-account deployment. IAM policies per account require managing 200 individual policies. Only SCPs can preventively enforce this at org scale.
Question asks for network isolation between dev/staging/prod environments sharing a central logging VPC, across multiple accounts
Transit Gateway with separate route tables per environment. Each environment's TGW route table has routes only to the shared logging VPC attachment, not to other environments. This provides isolation between environments while allowing all of them to reach shared services.
VPC Peering cannot provide this topology because it does not support transitive routing. A single shared VPC with security groups does not provide true account-level isolation.
Question involves a partner vendor needing cross-account access to an S3 bucket in your account, with the concern about confused deputy attacks
Create an IAM role in your account with a trust policy allowing the partner's account. Add an ExternalId condition to the trust policy and share that ExternalId only with the specific partner. This prevents any other principal in the partner account from assuming the role even if they know your role ARN.
Resource-based S3 bucket policies alone (without ExternalId on the role) are vulnerable to confused deputy if the partner serves multiple clients.
Question asks for the 'least operational overhead' way to ensure all new AWS accounts automatically have CloudTrail, GuardDuty, and Security Hub enabled
AWS Control Tower with the mandatory and recommended guardrails enables CloudTrail organization-wide automatically. GuardDuty and Security Hub can be enabled as delegated administrator from the management account via Organizations integration — new accounts automatically enroll. This requires zero per-account configuration.
Lambda + CloudWatch Events triggered on account creation works but requires custom code maintenance. StackSets work but require triggering on account creation events. The managed services (Control Tower, GuardDuty org-wide enrollment) are always preferred for 'least operational overhead'.
Question asks for a hub-and-spoke network connecting 50 VPCs across 3 regions with on-premises data center, where spoke VPCs must not communicate with each other
Transit Gateway per region with inter-region TGW peering. Multiple TGW route tables: one for shared services, separate tables for spoke VPCs. Configure route tables so spoke VPCs only have routes to shared services, not to each other. Attach Direct Connect to TGW via Direct Connect Gateway for on-premises connectivity.
VPC Peering is wrong at this scale and cannot block transitive traffic. PrivateLink requires an NLB per service and cannot replace full VPC connectivity.
Last-Minute Facts
Continuous Improvement for Existing Solutions
Must-Know Facts
- The question gives you a working architecture and asks for the BEST improvement. Your job is to identify the bottleneck first, then pick the fix. Do not add complexity that does not address the stated bottleneck.
- RDS Multi-AZ standby does NOT serve read traffic. Adding Multi-AZ improves availability (HA), not read performance. If the database is the bottleneck on reads, the answer is Read Replicas or ElastiCache, not Multi-AZ.
- ElastiCache write-through pattern keeps the cache always current but increases write latency. Lazy loading (cache-aside) is faster for writes but may serve stale data and has a cache miss penalty. Questions will specify whether stale data is acceptable.
- CloudWatch detailed monitoring collects metrics every 1 minute (costs money per instance). Standard monitoring is every 5 minutes. High-resolution custom metrics can go to 1 second.
- Auto Scaling cooldown period prevents repeated scale-out/scale-in. If scaling reacts too slowly or thrashes, review cooldown and warm-up settings before adding more capacity.
- Cost Explorer shows historical spend. Compute Optimizer recommends rightsizing. Trusted Advisor flags idle resources. AWS Budgets alerts when thresholds are exceeded. These four tools are different and target different use cases.
- The strangler fig pattern incrementally replaces a monolith: route specific traffic paths to new microservices while keeping the monolith running. The least-coupled component is extracted first.
- S3 Lifecycle policies tier objects from Standard to Infrequent Access to Glacier based on age. Use Intelligent-Tiering for unknown/unpredictable access patterns — it monitors and moves objects automatically.
- Systems Manager Session Manager replaces bastion hosts for operational access to EC2 instances. No SSH keys to manage, no open inbound ports, full audit logging. Always preferred over bastion hosts for 'least operational overhead' scenarios.
Common Traps
Confusing Pairs
Scenario Tips
ALB → EC2 ASG → RDS architecture. Response times spike. CloudWatch shows EC2 CPU at 40%, RDS CPU at 95%
The bottleneck is RDS. Add ElastiCache (Redis or Memcached) to cache frequently-read query results. This is the most impactful change because it reduces RDS load without changing the application significantly.
Scaling up the RDS instance (vertical scaling) is a temporary band-aid and expensive. Adding EC2 instances increases DB load further. Multi-AZ does not help performance.
Question asks for a modernization strategy that gives the BEST reliability improvement with the LEAST effort on a legacy monolith
Extract the least-coupled component (often notifications, reporting, or email) to a managed service (Lambda + SNS/SES), then add ALB + Auto Scaling to the remaining monolith. This is the strangler fig pattern — incremental improvement without a full rewrite.
Refactoring to microservices on EKS is the highest-effort option and takes months. Containerizing the monolith alone does not improve reliability — multiple identical containers failing is the same as one failing.
Question asks for the cheapest solution with an RTO of 30 minutes and RPO of 15 minutes for a non-critical internal application
Pilot Light DR: replicate the database to the DR region continuously, but keep compute stopped or as AMIs. At failover, launch instances from pre-tested AMIs and point them at the DR database. Meets the 30-minute RTO and database-replication-based RPO.
Warm Standby meets the same RPO/RTO but costs more because a smaller version of the environment runs continuously. Backup-and-Restore is cheapest but RTO is hours, not 30 minutes.
Question asks how to identify EC2 instances that are over-provisioned (using less than 20% CPU consistently) across 5 accounts
Enable AWS Compute Optimizer at the organization level. It analyzes CloudWatch metrics across all accounts and generates rightsizing recommendations showing which instance types to downsize and the estimated savings.
AWS Trusted Advisor can flag idle resources but does not give the same ML-based precision as Compute Optimizer. Cost Explorer shows spend but not which specific instances are underutilized.
Last-Minute Facts
Accelerate Workload Migration and Modernization
Must-Know Facts
- The 6 Rs in order of effort: Retire < Retain < Rehost < Replatform < Repurchase < Refactor. Questions describe a workload and ask which R applies — match the description to the definition.
- Rehost = lift-and-shift with no changes, use AWS MGN (Application Migration Service). Replatform = minor changes to optimize for cloud (e.g., swap self-managed MySQL for RDS). Refactor/Re-architect = redesign for cloud-native (microservices, serverless, managed services).
- AWS Application Migration Service (MGN) replaced Server Migration Service (SMS). If SMS appears as an answer, it is outdated — treat it as a distractor.
- DMS requires a source and target endpoint and a replication instance. For minimal downtime, use Full Load + CDC (Change Data Capture). CDC captures ongoing changes while the initial load runs, so the target catches up. Cutover happens when the target is current.
- SCT (Schema Conversion Tool) is used for HETEROGENEOUS migrations (different source and target engines, e.g., Oracle to Aurora PostgreSQL). For same-engine migrations (MySQL to Aurora MySQL), SCT is not needed.
- The network-vs-physical transfer calculation: at 1 Gbps dedicated, 200 TB ≈ 18.5 days (formula: TB × 8000 / Gbps / 3600 = hours). Always calculate before recommending a transfer method. If network transfer cannot meet the deadline, exam questions typically expect a Snow Family answer, though Snow devices are now restricted to existing AWS customers (as of Nov 2025) — on exam, treat them as valid options.
- Application Discovery Service: agentless mode works only on VMware vCenter and collects basic host info. Agent-based mode works on any OS, collects detailed performance data and network dependencies. When the question says 'application dependencies are unknown', use agent-based.
- Storage Gateway File Gateway: on-premises NFS/SMB applications read/write files that are backed by S3. Transparent to applications — they see a normal file share. Use for hybrid access where on-premises apps need to write to S3 without application changes.
- For VMware environments, AWS MGN with the agentless replication option (vCenter integration) avoids installing agents on 500+ VMs.
Common Traps
Confusing Pairs
Scenario Tips
Question describes migrating a 100TB Oracle database to Aurora PostgreSQL with a requirement for less than 2 hours of downtime
Use AWS SCT to convert the Oracle schema and stored procedures to PostgreSQL, then AWS DMS with Full Load + CDC. During Full Load, ongoing Oracle transactions are captured via CDC. Once Full Load completes, let CDC replication catch up until lag is near zero, then stop the application briefly, verify the target is current, and switch the connection string. Total downtime = seconds to minutes for the final cutover.
DMS Full Load only requires a maintenance window for the entire 100 TB dump/load cycle — far longer than 2 hours. Exporting to S3 and re-importing requires even longer downtime.
Question asks which migration strategy applies when a company is moving from a self-managed Apache Kafka on EC2 to Amazon MSK
Replatform. The application logic and Kafka workload remain unchanged, but the infrastructure is swapped for a managed AWS service (MSK). This is the classic 'lift-and-reshape' scenario where you keep the same technology but adopt the managed version.
Rehost would mean keeping self-managed Kafka but just moving it to different EC2 instances. Refactor would mean redesigning to use a completely different streaming architecture like Kinesis.
Question describes 500 on-premises VMware VMs to migrate with unknown application dependencies and a requirement for less than 30 minutes downtime per server
Step 1: Deploy Application Discovery Service with agent-based collection to map dependencies (agents reveal network connections). Step 2: Use AWS MGN — it does continuous block-level replication, allows multiple test migrations before cutover, and cutover takes under 15 minutes.
Agentless Discovery only works with VMware and only gives basic VM inventory — it will not map application dependencies between servers. VM Import/Export requires VM shutdown during export.
Question asks for the fastest way to transfer 300 TB of archive data from on-premises NAS to S3 Glacier Deep Archive, with a 10-day deadline and a 500 Mbps internet connection
Physical transfer via Snowball Edge Storage Optimized (210 TB per device, 2 devices cover 300 TB). Load them in parallel on-premises, ship to AWS. AWS ingests and stores data in S3. Network transfer calculation: 300 TB × 8000 / 0.5 Gbps / 3600 = ~1,333 hours (~56 days) — far beyond the 10-day deadline.
DataSync or S3 Transfer Acceleration at 500 Mbps cannot meet the 10-day deadline for 300 TB. Direct Connect takes weeks to provision. Note: exam questions may still reference Snowball scenarios even though new customer availability ended November 2025 — treat it as a known exam pattern.