Study MLA-C01 Model Monitoring, Drift, Data Quality and A/B Testing: key concepts, common traps, and exam decision cues.
This lesson is about knowing when the model has started to behave differently in production. AWS expects ML engineers to monitor data quality and inference behavior, detect drift, watch for workflow anomalies, and compare live variants safely before major rollout decisions.
Drift: Change in the incoming data or model behavior over time that can reduce inference quality.
A/B test: Controlled comparison where two variants receive different portions of traffic so their outcomes can be compared.
Shadow comparison: Evaluation pattern where a candidate model sees production-like traffic without becoming the primary live answer path.
AWS wants you to separate:
flowchart TD
A["Unexpected production behavior"] --> B{"Model output degraded?"}
B -->|Yes| C["Check drift, data quality, and live model behavior"]
B -->|No| D{"Release comparison needed?"}
D -->|Yes| E["Use A/B or shadow comparison"]
D -->|No| F["Check infrastructure or workflow lane first"]
The exam usually punishes candidates who answer with generic monitoring language before deciding whether the issue is drift, bad incoming data, or unsafe model promotion.
| If the stem is mainly about… | Strongest first lane |
|---|---|
| changes in feature distributions or live inference input shape | data-quality and drift monitoring |
| whether a new model variant is safer than the current one | A/B test or shadow comparison |
| detecting degrading output quality over time | model-behavior monitoring |
| strange failures in the pipeline or monitoring workflow itself | workflow anomaly detection |
| Lane | Main question |
|---|---|
| Data-quality monitoring | Are live inputs still well-formed and usable? |
| Drift detection | Has the input or output pattern shifted enough to threaten quality? |
| A/B or shadow testing | Is the new model actually better or safer than the current one? |
| Infra monitoring | Is the serving platform healthy? |
If the question is about why outputs are worsening even though the endpoint is up, it is usually not an infrastructure question first.
| Symptom | What is usually going wrong | Fix first |
|---|---|---|
| every monitoring answer looks plausible | you are not separating model signals from platform signals | ask whether the issue is output quality, input quality, or host health |
| drift questions feel vague | you are not anchoring on change over time in production data or behavior | ask what changed since the model last performed well |
| A/B and shadow tests blur together | you are not deciding whether the candidate should influence live outcomes yet | if safe comparison is needed without full live impact, favor shadow first |
| you keep promoting too quickly | you are ignoring controlled validation before rollout | ask whether the candidate model has been compared under realistic live conditions |
| Trap | Better reading |
|---|---|
| “Endpoint is healthy, so the model is healthy.” | Infrastructure health does not prove output quality or data stability. |
| “Drift means the model should be replaced immediately.” | Drift is a signal for analysis, retraining, or controlled comparison, not blind promotion. |
| “A/B testing and shadow comparison are the same.” | A/B affects live outcomes for some traffic; shadow is safer when you only need comparison first. |
| “Monitoring is complete once the dashboard exists.” | MLA-C01 expects monitoring that leads to operational decisions, not just charts. |
A recommendation model’s endpoint latency looks normal, but click-through performance has dropped over the last month as the incoming product catalog and user behavior shifted. A new candidate model is available, but the team does not want to expose all users to it immediately.
The strongest first response is to work in the drift and controlled comparison lane: inspect the live-data shift, then use a safe A/B or shadow-style comparison before a full cutover. The core issue is not host health. It is changing production behavior.