AWS SOA-C03 Glossary: Key Terms

AWS SOA-C03 glossary of monitoring, reliability, recovery, automation, and networking terms.

Use this glossary when observability, remediation, automation, and continuity terms start to blur together. Keep it beside the cheat sheet and resources rather than treating it as a substitute for real incident-style reasoning.

Term Short meaning
Runbook Step-by-step operational procedure for diagnosis or remediation
Remediation Corrective action that restores or stabilizes service
Drift Deviation between expected configuration and current deployed state
Health check Automated signal used to determine whether a target is functioning correctly
Maintenance window Controlled time period for patching or administrative actions
Multi-AZ High-availability deployment pattern spread across multiple Availability Zones
RTO Recovery Time Objective, the acceptable restoration time after a disruption
RPO Recovery Point Objective, the acceptable data-loss window after a disruption
Metric math Derived CloudWatch metric logic built from multiple measurements
Baseline Normal expected operating behavior used as a comparison point
Rollback Returning to a previously known-good configuration or release
Escalation Handing an issue to a higher-support or specialist team based on severity or blockers
StackSet CloudFormation mechanism for deploying stacks across multiple accounts and Regions
EventBridge AWS event-routing service that matches events to target actions
Secrets Manager Managed service for storing and rotating secrets securely
Access Analyzer IAM-focused tool that helps detect broad or external access exposure
Security Hub Service that aggregates and organizes findings across AWS security tools
GuardDuty Threat-detection service that surfaces suspicious or malicious behavior
Config rule Compliance check against desired resource configuration state
Performance Insights RDS performance visibility feature for identifying database bottlenecks
Transfer Acceleration S3 feature that improves long-distance upload path performance
Policy simulator IAM tool for evaluating whether a policy decision allows or denies an action
Idempotent Safe to run multiple times without causing inconsistent repeated effects

Commonly confused pairs

Pair Keep this distinction clear
alarm vs remediation detection signal versus corrective response
backup vs replication restore copy versus continuity mechanism for lower outage time
drift detection vs configuration deployment identifying unapproved change versus pushing desired state
rollback vs failover returning to a prior version versus moving service to another healthy target
metric vs log numeric time-series signal versus event or record detail
CloudTrail vs Access Analyzer audit history versus access-exposure analysis
EventBridge vs Systems Manager Automation event routing versus operational runbook execution
Secrets Manager vs KMS secret storage lifecycle versus key-management control
Security Hub vs GuardDuty findings aggregation versus threat detection
Multi-AZ vs disaster recovery high availability in-region versus broader recovery strategy
CloudFront vs Global Accelerator content caching and delivery versus optimized global traffic pathing

If three terms blur together

Blur cluster Keep this separation clear
alarm / remediation / recovery detect / correct / restore
Multi-AZ / backup / disaster recovery in-region availability / restore copy / broader continuity strategy
EventBridge / Lambda / Systems Manager route event / run custom code / execute managed operational workflow
CloudTrail / Access Analyzer / policy simulator audit what happened / analyze exposure / test policy evaluation
Security Hub / GuardDuty / Config aggregate findings / detect threats / evaluate configuration compliance

If the confusion is really about…

Topic family Best page to revisit
service fit and operational heuristics Cheat Sheet
current AWS facts and primary docs Resources
pacing and review order Study Plan
overall exam framing Guide root
Revised on Sunday, May 10, 2026