DVA-C02 Root Cause Analysis with Metrics, Logs and Traces Guide

April 1, 2026

Study DVA-C02 Root Cause Analysis with Metrics, Logs and Traces: key concepts, common traps, and exam decision cues.

On this page

Root-cause questions on DVA-C02 reward candidates who follow evidence instead of guessing. AWS wants developers to use logs, metrics, traces, dashboards, and deployment output to narrow the failure path before making changes.

Trace: Linked record of how one request or event moved through multiple services and components.

Embedded metric format (EMF): Logging pattern that lets application logs emit structured metric data for CloudWatch extraction and analysis.

What AWS is really testing here

AWS wants you to distinguish:

code defects from infrastructure symptoms
logs from metrics from traces
noisy symptoms from the first real failure signal
deployment failure evidence from ordinary runtime noise
correlation from coincidence in multi-service failures

Pick the right evidence lane first

Question you need to answer	Strongest first evidence lane	Why
where did one request fail across multiple services?	traces	You need request path correlation, not only aggregate counts.
what exactly did the application emit at the moment of failure?	logs	Logs preserve event detail and contextual fields.
is error rate or latency trending over time?	metrics and dashboards	This is about aggregate behavior and thresholds.
did the problem begin immediately after deploy?	deployment logs, release output, and recent config changes	Release timing changes the likely fault boundary.
is one integration failing while the rest of the app is healthy?	logs plus traces around that dependency call	This isolates the failing downstream hop.

Root cause analysis order

    flowchart TD
	    A["User-visible failure"] --> B{"What changed?"}
	    B -->|"Recent release or config change"| C["Check deployment output, rollback path, and changed config"]
	    B -->|"No obvious release event"| D{"What evidence is missing?"}
	    D -->|"Need request path"| E["Use traces"]
	    D -->|"Need exact failure detail"| F["Use logs"]
	    D -->|"Need rate or trend"| G["Use metrics and dashboards"]

Strong answers usually do not start with a fix. They start with the smallest evidence lane that can disprove the wrong hypotheses quickly.

Deployment context matters

The exam likes scenarios where the app looked healthy until a rollout happened.

If a problem starts right after:

a new Lambda version
a changed environment variable
a new API stage deployment
a pipeline promotion

then deployment context is part of troubleshooting, not a separate concern. A runtime symptom may still be caused by a release-path mistake.

EMF and code-aware debugging

DVA-C02 also expects you to think like a developer instrumenting code, not just an operator reading charts.

1import json
2
3def log_failure(request_id: str, action: str, status: str) -> None:
4    print(json.dumps({
5        "requestId": request_id,
6        "action": action,
7        "status": status,
8    }))

The exam lesson is not Python logging style itself. It is that good troubleshooting depends on contextual, queryable evidence rather than vague free-form messages.

High-yield review cues

If the issue is where the request failed across services, tracing is often central.
If the issue is what the application emitted during failure, logs matter first.
If the issue is trend, threshold, or rate, metrics and dashboards matter first.
If the issue started right after release, deployment logs and recent config changes matter immediately.
If multiple symptoms exist, find the first failing boundary, not the loudest downstream error.

Common traps

Trap	Better thinking
“High latency means add more instances first.”	Diagnose whether the delay is in code, dependency calls, or one downstream service.
“Logs alone are enough for every distributed failure.”	Traces are often stronger when you need cross-service request flow.
“Metrics tell me exactly which record failed.”	Metrics summarize behavior; they usually do not provide event-level detail.
“If the pipeline failed, production metrics are the first stop.”	Deployment failures should start with release logs and service output tied to that rollout.

Decision order that usually wins

First classify the evidence you need as metric trend, event-level log detail, or cross-service request path.
If you need to follow one request across services, think distributed tracing.
If you need the exact payload or exception for one failure, think application logs with context.
If the problem starts during a release, go to the deployment logs and service output first.
DVA-C02 rewards picking the narrowest evidence source that answers the actual question.

Quiz

Loading quiz…

Revised on Monday, June 15, 2026

4.2 Observability

Browse AWS Certification Guides