General Exam Tips
- 1.Read EVERY answer option before selecting. The distractor is always plausible — AWS writes options that are almost right.
- 2.Find the constraint keyword first: 'most cost-effective', 'minimum operational overhead', 'highest availability'. That keyword eliminates 2–3 answers immediately.
- 3.You can miss ~18 of 65 questions and still pass (720/1000). Don't spiral on hard questions — flag and move on.
- 4.~10 unscored experimental questions exist. A question that seems completely out of scope may be one of them — don't panic.
- 5.Eliminate vertical scaling (bigger instance) answers first. AWS almost always wants horizontal scaling.
- 6.When two answers both work technically, the one with LESS operational overhead wins. AWS favors managed services over custom solutions.
- 7.Multi-choice questions (select 2 or 3) often have two obvious correct answers and one that sounds right but violates a constraint in the question.
- 8.Pace yourself: 2 minutes per question. Flag anything taking more than 90 seconds. Return with fresh eyes.
- 9.When the question says 'MOST cost-effective', eliminate active-active multi-region setups immediately — those are always the most expensive.
- 10.IAM questions: if the scenario involves an EC2 instance or Lambda needing to access AWS, the answer is almost always an IAM role, never access keys.
Quick Navigation
Design Secure Architectures
Must-Know Facts
- IAM policy evaluation order: explicit Deny always wins, then Organizations SCP, then resource-based policy, then identity-based policy, then implicit Deny. Cross-account requires BOTH resource-based AND identity-based policies to allow.
- Security Groups are stateful (return traffic auto-allowed) and can only ALLOW. NACLs are stateless (return traffic needs explicit allow) and can DENY. Use NACLs to block specific IPs.
- VPC Gateway endpoints (S3, DynamoDB) are FREE and keep traffic off the internet. VPC Interface endpoints (all other services) cost money per hour + per GB.
- KMS key policies are the PRIMARY authorization for KMS keys — IAM policies alone are not sufficient without a key policy allowing the principal.
- SSE-KMS gives customer-managed encryption with CloudTrail audit visibility. SSE-S3 is simpler but no customer key control. SSE-C requires sending keys with every request.
- IAM roles use temporary STS credentials. Always use roles for EC2 instance profiles, Lambda, and cross-account — never embed long-term access keys.
- Permissions boundaries set the MAXIMUM permissions an identity policy can grant — they don't grant anything themselves.
- AWS Organizations SCPs restrict the maximum permissions available to member accounts — even root can be limited. They don't grant permissions.
- For cross-account S3 access, the IAM role in Account B needs a trust policy allowing Account A, AND the S3 bucket may need a bucket policy if the role isn't in the same account.
- Shield Standard is free for all customers (Layer 3/4). Shield Advanced ($3,000/month) adds Layer 7 DDoS protection, 24/7 DRT access, and cost protection.
Common Traps
Confusing Pairs
Scenario Tips
Question asks for private connectivity from EC2 to S3/DynamoDB without traversing the internet, and asks for 'most cost-effective'
VPC Gateway Endpoint — free, no data transfer charges, modifies route table. Works for S3 and DynamoDB only.
NAT Gateway — also works but has hourly + per-GB charges, making it wrong for 'most cost-effective'.
Question describes an application on EC2 that needs to access another AWS service (S3, DynamoDB, Secrets Manager)
Attach an IAM role to the EC2 instance (instance profile). The application uses the instance metadata service to get temporary credentials automatically.
Creating an IAM user with access keys and configuring them in the application — violates least privilege and is a security anti-pattern.
Question asks how to grant cross-account access to an S3 bucket 'without creating IAM users'
Create an IAM role in the bucket's account with S3 permissions and a trust policy that allows the other account's principal to AssumeRole. The other account's users assume the role.
Making the S3 bucket public or sharing access keys — both violate security best practices and will always be wrong answers.
Question asks about 'encrypting S3 data with keys the company fully controls and requires annual key rotation audit trail'
SSE-KMS with a customer managed key (CMK) and automatic annual rotation enabled. CloudTrail logs all key usage.
SSE-S3 — AWS manages the key, company has no visibility or control. Wrong when question specifies 'full control'.
Question involves a web application exposed to the internet that needs protection from SQL injection and cross-site scripting
AWS WAF attached to the ALB or CloudFront distribution, with managed rule groups for SQL injection and XSS patterns.
Security Groups or NACLs — these are network-level controls that cannot inspect HTTP payloads.
Last-Minute Facts
Design Resilient Architectures
Must-Know Facts
- DR strategies by cost: Backup & Restore (hours RTO) < Pilot Light (minutes-hours RTO) < Warm Standby (minutes RTO) < Active-Active (near zero RTO). Match the cheapest strategy to the RTO/RPO requirement.
- Multi-AZ = high availability within a Region. Multi-Region = disaster recovery. Do not confuse them — a question about 'if the entire region fails' needs Multi-Region, not Multi-AZ.
- RDS Multi-AZ: synchronous replication, automatic failover 60–120 seconds, standby CANNOT serve read traffic. Read Replicas: asynchronous, can serve reads, manual promotion, cross-Region supported.
- Aurora: 6 copies across 3 AZs, automatic failover in <30 seconds, up to 15 read replicas. Aurora Global Database: <1 second RPO, secondary Region becomes primary in under 1 minute.
- SQS visibility timeout MUST exceed the processing time, or the same message will be processed twice. Set it to ~6x the expected processing time.
- ALB sticky sessions break if the instance terminates — externalize session state to ElastiCache or DynamoDB instead.
- Auto Scaling across 2 AZs with min=2 means 1 instance per AZ. If one AZ fails, you have 1 instance until ASG recovers — design for n+1 redundancy.
- Route 53 failover routing requires health checks on the primary endpoint. If you forget health checks, failover never triggers.
- SQS FIFO guarantees exactly-once processing and ordering. Default throughput: 300 TPS (3,000 with batching). High-throughput mode (opt-in) supports up to 70,000 TPS in supported regions. Standard has unlimited throughput but at-least-once delivery (can duplicate).
- DynamoDB Global Tables = multi-Region, multi-active. Any Region can serve reads and writes. Conflict resolution is last-writer-wins.
Common Traps
Confusing Pairs
Scenario Tips
Users report losing session data when instances are replaced during Auto Scaling events
Externalize session state to ElastiCache for Redis or DynamoDB. This makes instances truly stateless so any instance can serve any user.
Enable ALB sticky sessions — this only delays the problem, it doesn't fix it. If the instance terminates, session data is still lost.
Company needs DR with RPO of 1 hour and RTO of 15 minutes, minimize cost
Cross-Region Read Replica with a promote-on-failure procedure. Replication lag is typically seconds (well within 1-hr RPO), promotion takes minutes (within 15-min RTO).
Aurora Global Database — meets requirements but costs more than a Read Replica solution when cost minimization is the constraint.
Application processes orders and loses them during peak traffic spikes
SQS queue between the web tier and processing tier. SQS buffers the orders and they persist until processed — zero order loss with eventual processing.
Increasing Auto Scaling maximum capacity — may still drop requests at the extreme peak moment before scaling completes.
Company runs a database and asks for 'both high availability and improved read performance'
RDS Multi-AZ for automatic failover HA PLUS Read Replicas for read traffic distribution. Both are needed because Multi-AZ standby cannot serve reads.
RDS Multi-AZ alone — provides HA but zero read performance improvement since standby is idle.
Question requires exactly-once message processing for a payment system
SQS FIFO queue with exactly-once processing guarantees. Financial transactions require no duplicates.
SQS Standard queue — at-least-once delivery means duplicate processing is possible, which is catastrophic for payments.
Last-Minute Facts
Design High-Performing Architectures
Must-Know Facts
- Caching layers from closest to farthest from user: CloudFront edge (global, HTTP/HTTPS content) → API Gateway cache (API responses) → ElastiCache (application-level, sub-ms) → DAX (DynamoDB only, in-memory). Each layer reduces load on the next.
- EBS gp3: 3,000 IOPS baseline regardless of size (unlike gp2 which ties IOPS to GB). gp3 is cheaper and more flexible than gp2 — default choice for most workloads.
- ElastiCache Redis cluster mode enables sharding for datasets larger than a single node's memory. Non-cluster mode = single primary, up to 5 replicas — limited to single-node RAM.
- DynamoDB on-demand mode scales instantly without capacity planning but costs ~2x provisioned at sustained load. Use on-demand for truly unpredictable/spiky traffic; provisioned + auto-scaling for consistent or predictable loads.
- S3 performance: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD per second PER PREFIX. Use multiple prefixes to scale beyond this. Multipart upload required for files >5 GB, recommended for >100 MB.
- EC2 placement groups: Cluster (same rack, lowest latency, 10 Gbps, all-or-nothing failure risk), Spread (different racks, max 7 per AZ, best fault isolation), Partition (large distributed systems like Hadoop, 7 partitions per AZ).
- Global Accelerator = static anycast IPs + AWS backbone routing for TCP/UDP. CloudFront = HTTP/HTTPS CDN with edge caching. Different tools for different problems.
- RDS Proxy: connection pooling for Lambda connecting to RDS. Lambda creates new connections per invocation — without proxy, database connection exhaustion is common at scale.
Common Traps
Confusing Pairs
Scenario Tips
RDS database at high CPU due to repeated reads of the same popular data records
ElastiCache for Redis as a read-through cache. Popular records are served from memory (sub-ms) without hitting the database at all.
RDS Read Replicas — these distribute different read queries but still execute SQL for every request, not solving the repeated identical-query problem.
Application requires static IP addresses that customers can whitelist in their firewalls, with global performance
AWS Global Accelerator provides 2 static Anycast IPs that route traffic to your endpoints over the AWS backbone.
CloudFront — doesn't support static IPs. ALB — IPs can change. Neither satisfies the 'static IP for firewall whitelist' requirement.
Lambda functions connecting to RDS causing 'too many connections' errors at scale
RDS Proxy — pools connections between Lambda invocations. Lambda creates a new connection per invocation, which exhausts database connection limits. Proxy reuses pooled connections.
Increasing RDS max_connections — you can increase the limit but Lambda's connection-per-invocation pattern will still exhaust any limit at sufficient scale.
ML training job on EC2 needs lowest inter-node network latency for distributed training
EC2 Cluster Placement Group — places instances on the same physical rack, enabling 10 Gbps enhanced networking. Required for tightly coupled HPC/ML workloads.
Spread Placement Group — distributes instances across racks for fault tolerance, which INCREASES latency between nodes.
Company wants to run containerized microservices without managing EC2 servers and needs auto-scaling to zero at off-peak hours
ECS or EKS with Fargate launch type — serverless containers, no EC2 fleet to manage, scales to zero, pay per task. Fargate removes the operational overhead of patching and capacity planning.
ECS with EC2 launch type — you still manage and pay for EC2 instances even when no tasks are running. Not truly serverless and doesn't scale to zero on infrastructure.
Last-Minute Facts
Design Cost-Optimized Architectures
Must-Know Facts
- EC2 savings in order (highest to lowest cost): On-Demand > Standard RI 1yr > Convertible RI > Compute Savings Plans > Standard RI 3yr All Upfront > Spot. Spot can reach ~90% off; Standard RI 3yr all-upfront gives ~72% for committed workloads.
- Spot Instances: up to 90% savings, interrupted with 2-minute warning. ONLY for fault-tolerant, stateless workloads. Never for databases, master nodes, or time-sensitive single-instance work.
- Serverless services have zero cost at zero load: Lambda, Fargate, DynamoDB on-demand, Aurora Serverless, SQS, SNS. Perfect for sporadic or unpredictable workloads.
- Data transfer costs: same AZ = free between services. Cross-AZ = $0.01/GB each direction. Cross-Region = varies (~$0.02/GB). Internet egress = $0.09/GB (US). Minimize cross-AZ traffic in cost-sensitive systems.
- NAT Gateway charges hourly ($0.045/hr) PLUS per-GB processed ($0.045/GB). High-data-transfer applications through NAT can get very expensive. Use VPC Gateway Endpoints for S3/DynamoDB to bypass NAT.
- S3 storage class selection: Standard-IA has 30-day minimum charge AND retrieval fee. Glacier has 90-day minimum AND retrieval cost/time. Pick the right class or the 'savings' become penalties.
- Compute Savings Plans are more flexible than EC2 RIs — apply across EC2 (any family/size/region/OS), Fargate, and Lambda. EC2 Instance Savings Plans apply within a specific family+region.
- Savings Plans vs RIs: Standard RI gives the deepest discount for known, stable workloads. Savings Plans give flexibility (apply to Fargate, Lambda) at slightly less discount.
Common Traps
Confusing Pairs
Scenario Tips
Company runs identical workload 24/7 for 3 years on specific EC2 instance type, asking for 'maximum cost savings'
3-year Standard Reserved Instance with All Upfront payment — maximum ~72% discount for predictable, committed workload.
Compute Savings Plans — slightly less savings (~66%) but more flexibility. If the question says 'maximum savings' with a known, stable workload, Standard RI wins.
Dev/test environment used only Monday–Friday 8am–6pm (50 hours/week)
On-Demand instances with scheduled Auto Scaling to stop instances outside business hours. Saves ~70% compared to running 24/7.
Reserved Instances — you pay for 24/7 even when instances are off, wasting ~70% of the cost.
Nightly batch job processing large files, 2-hour window, tolerates restarts, needs cheapest compute
EC2 Spot Instances in an Auto Scaling Group — up to 90% savings. Batch job is fault-tolerant (can checkpoint/restart), exactly the Spot use case.
On-Demand — correct but overpriced. Reserved — doesn't make sense for nightly 2-hour jobs.
Company has 50 TB of log data: accessed daily for first 30 days, never accessed again but must be kept 5 years
Lifecycle policy: S3 Standard for 30 days, transition to S3 Glacier Deep Archive after 30 days (cheapest long-term storage, $0.00099/GB/month). Glacier Deep Archive satisfies the 5-year retention at minimal cost.
S3 Intelligent-Tiering — adds unnecessary monitoring cost when the access pattern is perfectly predictable.