MLA-C01 Deployment and Orchestration of ML Workflows Guide

AWS MLA-C01 deployment guide covering endpoints, containers, networking, orchestration, and retraining decisions.

This chapter is where MLA-C01 tests whether you can get a model into production safely and keep the workflow repeatable. AWS expects ML engineers to choose the right endpoint type, provision compute sensibly, automate infrastructure, and wire CI/CD or retraining flows that still allow rollback.

Current weight in the exam guide

AWS currently weights Deployment and Orchestration of ML Workflows at 22% of scored content.

What this domain is really testing

This domain is testing whether your ML solution can survive contact with production. Strong answers here:

  • match the serving pattern to latency, throughput, and cost constraints
  • provision and scale the runtime intentionally
  • automate deployment and retraining without losing rollback control
  • separate model artifact delivery from infrastructure and pipeline orchestration

Work this domain in order

Lesson Focus
3.1 Endpoints & Containers Learn how to match real-time, async, batch, multi-model, and container choices to the deployment requirement.
3.2 IaC, Autoscaling & Networking Learn how provisioning, autoscaling, VPC placement, and endpoint resource controls shape production behavior.
3.3 ML CI/CD & Retraining Learn how pipelines, retraining flows, tests, and rollback strategy keep ML delivery repeatable.

Fast routing inside this chapter

If the question is really about… Go first to…
real-time vs async vs batch, CPU vs GPU, containers, SageMaker endpoints, ECS, EKS, Lambda, or edge optimization 3.1 Endpoint Types, Containers, Deployment Targets & Trade-Offs
CloudFormation, CDK, VPC-hosted endpoints, autoscaling policies, or inference capacity sizing 3.2 IaC, Autoscaling, VPC Hosting & Resource Provisioning
CodePipeline, CodeBuild, EventBridge retraining, tests, deployment flow, or rollback 3.3 ML CI/CD, Orchestration, Retraining & Rollback

If you keep missing questions in this domain

Symptom What is usually going wrong Fix first
every serving option sounds plausible you are not classifying the latency and request pattern first rework 3.1 and decide real-time vs async vs batch before naming a service
autoscaling and provisioning answers blur together you are mixing baseline capacity, network placement, and runtime elasticity rework 3.2 and separate static provisioning choices from dynamic scaling behavior
CI/CD questions feel too DevOps-heavy you are missing the ML-specific parts: validation, registry, retraining, and rollback rework 3.3 and track what is unique about model delivery versus generic app delivery
you keep choosing complex orchestration you are not rewarding repeatability and safe rollback enough prefer the simpler repeatable path that still meets the production requirement

What strong answers usually do

  • choose the deployment target that matches the real latency and throughput requirement
  • keep provisioning and scaling explicit instead of relying on vague default behavior
  • automate repeatable ML delivery without hiding observability or rollback paths
  • separate endpoint strategy from CI/CD orchestration strategy

Common MLA-C01 traps in this domain

  • forcing real-time hosting when async or batch is cheaper and good enough
  • focusing on container packaging while ignoring scaling or rollback behavior
  • treating retraining automation as always better than controlled scheduled updates
  • assuming the most cloud-native answer is best even when the stem rewards simplicity and predictable operations

Before you leave this domain

Make sure you can explain:

  1. what serving pattern the workload really needs
  2. how capacity and scaling are chosen
  3. how deployments or retraining are validated
  4. how the system rolls back if the new model or infrastructure is wrong

Then move to 4. Operations, where AWS expects you to operate the full system after it is live.

In this section

Revised on Sunday, May 10, 2026