Study MLA-C01 ML CI/CD, Orchestration, Retraining and Rollback: key concepts, common traps, and exam decision cues.
This lesson is about keeping ML delivery repeatable after the first successful deployment. AWS expects you to know how CI/CD, orchestration, retraining triggers, tests, and rollback strategies fit together in a maintainable ML workflow.
Retraining trigger: Event or condition that causes a model-building workflow to run again, such as drift, new data arrival, or a scheduled pipeline.
Canary rollout: Release strategy that exposes a smaller share of traffic to the new variant first before a full cutover.
Rollback-safe delivery: Release path that allows a known-good model or configuration to be restored quickly if quality or behavior regresses.
flowchart LR
A["Code or data change"] --> B["Build and test pipeline"]
B --> C["Train or retrain model"]
C --> D["Register approved model version"]
D --> E["Deploy candidate variant"]
E --> F["Observe metrics and rollback if needed"]
AWS wants you to distinguish:
| If the question is mainly about… | Strongest first lane |
|---|---|
| sequencing the ML workflow end-to-end | orchestration and pipeline structure |
| deciding when a model should rebuild | retraining trigger and automation |
| limiting blast radius when a new model misbehaves | staged rollout and rollback control |
The exam usually punishes answers that equate “more automation” with “better automation.” Stronger answers keep automation observable and reversible.
| Symptom | What is usually going wrong | Fix first |
|---|---|---|
| CI/CD and retraining seem like the same problem | you are not separating code delivery from model refresh logic | ask whether the trigger is software change, data change, drift, or scheduled refresh |
| pipeline answers feel too DevOps-generic | you are not watching for model registry, validation, and rollback clues | look for the ML-specific control points in the stem |
| rollback seems secondary | you are assuming successful deployment means successful model behavior | treat rollback as necessary because quality can regress even when infra succeeds |
| you keep choosing full automation everywhere | you are undervaluing approval gates and controlled promotion | ask whether the organization needs review before model cutover |
| Trap | Better reading |
|---|---|
| “If drift is detected, promote the next model immediately.” | Drift detection usually triggers evaluation and controlled retraining, not blind promotion. |
| “CI/CD solved deployment, so rollback is less important.” | MLA-C01 repeatedly rewards safe reversal because new models can regress in subtle ways. |
| “Manual retraining is fine if it worked once.” | The exam usually prefers repeatable orchestration when the trigger pattern is predictable. |
| “Pipeline orchestration and endpoint selection are the same decision.” | Endpoint choice is about serving shape; orchestration is about workflow control. |
A team retrains a model every week. The latest run passes build checks, but online quality degrades after deployment. The team has no easy way to compare the new model against the previously approved one and no simple return path to the earlier version.
The strongest first answer is to tighten the registry, staged rollout, and rollback-safe delivery path. The core failure is not only model training. It is that the promotion path is not controlled enough for safe ML operations.