Study MLA-C01 Observability, Cost, and Rightsizing: key concepts, common traps, and exam decision cues.
This lesson covers the serving platform rather than the model alone. AWS expects you to understand how latency, throughput, scaling, quotas, logging, and purchasing options affect the cost and stability of ML infrastructure in production.
Rightsizing: Choosing a more appropriate instance family or size based on observed workload behavior rather than guesswork.
Provisioned concurrency: Pre-initialized execution capacity used to reduce startup latency in some serving paths.
Quota bottleneck: Service limit or account-level cap that blocks expected scaling or throughput.
AWS wants you to distinguish:
| If the stem is mainly about… | Strongest first lane |
|---|---|
| underused expensive instances | rightsizing and cost optimization |
| latency spikes, throttling, or saturation | observability and capacity tuning |
| scaling not happening when expected | autoscaling signal or quota review |
| broad logging gaps or weak runtime visibility | infrastructure observability |
The exam usually punishes “just make it bigger” answers when the real issue is poor visibility, wrong scaling trigger, or idle waste.
| Lane | Main question |
|---|---|
| Observability | What evidence proves the platform is under stress or misbehaving? |
| Rightsizing | Is the current resource shape too large, too small, or the wrong family? |
| Cost optimization | Which usage pattern is driving avoidable spend? |
| Quota awareness | Is the platform hitting a scaling or provisioning ceiling before it can recover? |
You need the signal first. Then you choose the cost or capacity action that matches it.
| Symptom | What is usually going wrong | Fix first |
|---|---|---|
| cost questions feel generic | you are not tying spend to actual workload shape | ask what is consuming money: idle endpoints, wrong instance class, or burst mis-scaling |
| latency and quota issues blur together | you are not distinguishing saturation from hard service ceilings | ask whether the platform is scaling badly or cannot scale further |
| rightsizing seems like “just smaller is better” | you are ignoring latency and throughput requirements | optimize for fit, not just lower hourly price |
| logging and metrics feel like background detail | you are skipping the evidence step | ask what operational signal would justify the change |
| Trap | Better reading |
|---|---|
| “Bigger instances solve observability issues.” | Capacity does not replace missing telemetry or poor debugging signals. |
| “Low utilization always means shrink immediately.” | Rightsizing still has to respect burst behavior and latency commitments. |
| “If latency is high, it must be the model.” | The bottleneck may be infrastructure, scaling, or quota-related instead. |
| “Cost optimization is separate from reliability.” | MLA-C01 expects cost decisions that preserve the serving requirement, not reckless downsizing. |
A team serves a model on large instances. Average CPU use is low, but latency spikes appear during traffic bursts, and occasional scale-outs stall because the account is near a service limit. Finance wants costs down, while operations wants stability.
The strongest first answer is to use observability plus quota-aware rightsizing, not a blind downsizing or blind upsizing move. The team needs evidence about burst behavior and the real scaling limit before changing resource shape.