Study DVA-C02 Root Cause Analysis with Metrics, Logs and Traces: key concepts, common traps, and exam decision cues.
Root-cause questions on DVA-C02 reward candidates who follow evidence instead of guessing. AWS wants developers to use logs, metrics, traces, dashboards, and deployment output to narrow the failure path before making changes.
Trace: Linked record of how one request or event moved through multiple services and components.
Embedded metric format (EMF): Logging pattern that lets application logs emit structured metric data for CloudWatch extraction and analysis.
AWS wants you to distinguish:
| Question you need to answer | Strongest first evidence lane | Why |
|---|---|---|
| Where did one request fail across multiple services? | Traces | You need request path correlation, not only aggregate counts. |
| What exactly did the application emit at the moment of failure? | Logs | Logs preserve event detail and contextual fields. |
| Is error rate or latency trending over time? | Metrics and dashboards | This is about aggregate behavior and thresholds. |
| Did the problem begin immediately after deploy? | Deployment logs, release output, and recent config changes | Release timing changes the likely fault boundary. |
| Is one integration failing while the rest of the app is healthy? | Logs plus traces around that dependency call | This isolates the failing downstream hop. |
flowchart TD
A["User-visible failure"] --> B{"What changed?"}
B -->|"Recent release or config change"| C["Check deployment output, rollback path, and changed config"]
B -->|"No obvious release event"| D{"What evidence is missing?"}
D -->|"Need request path"| E["Use traces"]
D -->|"Need exact failure detail"| F["Use logs"]
D -->|"Need rate or trend"| G["Use metrics and dashboards"]
Strong answers usually do not start with a fix. They start with the smallest evidence lane that can disprove the wrong hypotheses quickly.
| Trap | Better thinking |
|---|---|
| “High latency means add more instances first.” | Diagnose whether the delay is in code, dependency calls, or one downstream service. |
| “Logs alone are enough for every distributed failure.” | Traces are often stronger when you need cross-service request flow. |
| “Metrics tell me exactly which record failed.” | Metrics summarize behavior; they usually do not provide event-level detail. |
| “If the pipeline failed, production metrics are the first stop.” | Deployment failures should start with release logs and service output tied to that rollout. |