Study SAP-C02 Reliability Strategy for New Solutions: key concepts, common traps, and exam decision cues.
Reliability design is not only about adding more instances. SAP-C02 is testing whether you can remove single points of failure, use the right AWS-managed patterns, and keep recovery practical under normal operational pressure.
| Failure concern | Strongest first fit | Why |
|---|---|---|
| one instance or AZ can fail | Multi-AZ plus load balancing | removes local single points of failure |
| request spikes or uneven demand | Auto Scaling and elastic managed services | absorbs variability |
| component dependency failures | loosely coupled integration such as SQS or EventBridge | reduce tight synchronous failure chains |
| unhealthy endpoints need traffic steering | Route 53 routing policies and health checks | DNS routing supports reliability |
| account or service limits threaten scale | quota planning and monitoring | reliability includes capacity ceilings |
| Trap | Better rule |
|---|---|
| adding more servers while keeping the same SPOF in data or DNS | reliability has multiple layers |
| designing everything synchronously | loosely coupled services often fail more gracefully |
| forgetting quotas and limits | scale goals can still fail if quotas are ignored |
| using Multi-Region when Multi-AZ is enough | match the failure boundary to the actual requirement |
Reliability-design questions usually hinge on where the single points of failure are. If one AZ failing cannot take down the workload, think Multi-AZ and load balancing. If synchronous chains create fragility, think decoupling with queues or events. If growth or failover might hit service quotas, include quota planning in the design. The strongest answers remove realistic failure modes first.