Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.
Quick facts (MLA-C01)
Item
Value
Questions
65 (multiple-choice + multiple-response)
Time
130 minutes
Passing score
720 (scaled 100–1000)
Cost
150 USD
Domains
D1 28% • D2 26% • D3 22% • D4 24%
Fast strategy (what the exam expects)
If the question says best-fit managed ML , the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints).
If the scenario is “data is messy,” think data quality checks , profiling , transformations , and feature consistency (train/serve).
If the scenario is “accuracy dropped in prod,” think drift , monitoring baselines , A/B or shadow , and retraining triggers .
If the scenario is “cost is spiking,” think right-sizing , endpoint type selection , auto scaling , Spot / Savings Plans , and budgets/tags .
If there’s “security/compliance,” include least privilege IAM , encryption , VPC isolation , and audit logging .
Read the last sentence first to capture constraints: latency , cost , ops effort , compliance , auditability .
Domain weights (how to allocate your time)
Domain
Weight
Prep focus
Domain 1: Data Preparation for ML
28%
Ingest/ETL, feature engineering, data quality and bias basics
Domain 2: ML Model Development
26%
Model choice, training/tuning, evaluation, Clarify/Debugger/Registry
Domain 3: Deployment + Orchestration
22%
Endpoint types, scaling, IaC, CI/CD for ML pipelines
Domain 4: Monitoring + Security
24%
Drift/model monitor, infra monitoring + costs, security controls
Final 20-minute recall (exam day)
Cue -> best answer (pattern map)
If the question says…
Usually best answer
Data is messy/inconsistent before training
Data Wrangler/DataBrew + quality checks
Train/serve feature mismatch
SageMaker Feature Store
Need systematic hyperparameter search
SageMaker Automatic Model Tuning
Need fairness/explainability evidence
SageMaker Clarify
Training instability / convergence issues
SageMaker Debugger
Accuracy degraded in production
SageMaker Model Monitor + drift triggers + retraining
Govern model promotion and rollback
SageMaker Model Registry + approval workflow
Constant low-latency traffic
Real-time endpoint
Spiky traffic with low idle tolerance
Serverless endpoint
Long-running or non-interactive inference
Async endpoint or batch transform
Must-memorize MLA defaults
Topic
Fast recall
First failure domain
Data quality and leakage before model changes
Metric selection
Match metric to business cost (precision vs recall trade-off)
Drift controls
Baselines, alerts, and versioned retraining pipeline
Cost controls
Right-size, auto scale, pick correct endpoint mode, use Spot where safe
Security baseline
Least-privilege IAM, KMS/TLS, VPC isolation, CloudTrail
Last-minute traps
Chasing model complexity before fixing data quality.
Choosing real-time endpoints for workloads that are actually batch/async.
Treating accuracy as the only metric while ignoring latency/cost/compliance.
Deploying without monitoring baselines and rollback path.
0) SageMaker service map (high yield)
Capability
What it’s for
MLA‑C01 “why it matters”
SageMaker Data Wrangler
Data prep + feature engineering
Fast, repeatable transforms; reduces time-to-first-model
SageMaker Feature Store
Central feature storage
Avoid train/serve skew; feature reuse and governance
SageMaker Training
Managed training jobs
Repeatable, scalable training on AWS compute
SageMaker AMT
Hyperparameter tuning
Systematic search for better model configs
SageMaker Clarify
Bias + explainability
Responsible ML evidence + model understanding
SageMaker Model Debugger
Training diagnostics
Debug convergence and training instability
SageMaker Model Registry
Versioning + approvals
Auditability, rollback, safe promotion to prod
SageMaker Endpoints
Managed model serving
Real-time/serverless/async inference patterns
SageMaker Model Monitor
Monitoring workflows
Detect drift and quality issues in production
SageMaker Pipelines
ML workflow orchestration
Build-test-train-evaluate-register-deploy automation
1) End-to-end ML on AWS (mental model)
flowchart LR
S["Sources"] --> I["Ingest"]
I --> T["Transform + Quality Checks"]
T --> F["Feature Engineering + Feature Store"]
F --> TR["Train + Tune"]
TR --> E["Evaluate + Bias/Explainability"]
E --> R["Register + Approve"]
R --> D["Deploy Endpoint or Batch"]
D --> M["Monitor Drift/Quality/Cost/Security"]
M -->|Triggers| RT["Retrain"]
RT --> TR
High-yield framing: MLA‑C01 is about the pipeline, not just the model.
2) Domain 1 — Data preparation (28%)
You need…
Typical best-fit
Why
Visual data prep + fast iteration
SageMaker Data Wrangler
Interactive + repeatable workflows
No/low-code transforms and profiling
AWS Glue DataBrew
Good for business-friendly prep
Scalable ETL jobs
AWS Glue / Spark
Production batch ETL at scale
Big Spark workloads (custom)
Amazon EMR
More control over Spark
Simple streaming transforms
AWS Lambda
Event-driven, lightweight
Streaming analytics
Managed Apache Flink
Stateful streaming at scale
Format
Why it shows up
Typical trade-off
Parquet / ORC
Columnar analytics + efficient reads
Best for large tabular datasets
CSV / JSON
Interop + simplicity
Bigger + slower at scale
Avro
Schema evolution + streaming
Good for pipelines
RecordIO
ML-specific record formats
Useful with some training stacks
Rule: choose formats based on access patterns (scan vs selective reads), schema evolution , and scale .
Data ingestion and storage (high yield)
Amazon S3: default data lake for ML (durable, cheap, scalable).
Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.
Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.
Common best answers:
Use AWS Glue / Spark on EMR for big ETL jobs.
Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations.
Use SageMaker Feature Store to keep training/inference features consistent.
Feature Store: why it matters
Avoid train/serve skew : the feature used in training is the same feature served to inference.
Support feature reuse across teams and models.
Enable governance: feature definitions and versions.
Data integrity + bias basics (often tested)
Problem
What to do
Tooling you might name
Missing/invalid data
Add data quality checks + fail fast
Glue DataBrew / Glue Data Quality
Class imbalance
Resampling or synthetic data
(Conceptual) + Clarify for analysis
Bias sources
Identify selection/measurement bias
SageMaker Clarify (bias analysis)
Sensitive data
Classify + mask/anonymize + encrypt
KMS + access controls
Compliance constraints
Data residency + least privilege + audit logs
IAM + CloudTrail + region choices
High-yield rule: don’t “fix” model issues before you verify data quality and leakage .
3) Domain 2 — Model development (26%)
Choosing an approach
If you need…
Typical best-fit
A standard AI capability with minimal ML ops
AWS AI services (Translate/Transcribe/Rekognition, etc.)
A custom model with managed training + deployment
Amazon SageMaker
A foundation model / generative capability
Amazon Bedrock (when applicable)
Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops .
Training and tuning (high yield)
Training loop terms: epoch , step , batch size .
Speedups: early stopping , distributed training .
Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features.
Hyperparameter tuning: random search vs Bayesian optimization ; in SageMaker, use Automatic Model Tuning (AMT) .
Metrics picker (what to choose)
Task
Common metrics
What the exam tries to trick you on
Classification
Accuracy, precision, recall, F1, ROC-AUC
Class imbalance makes accuracy misleading
Regression
MAE/RMSE
Outliers and error cost (what matters more?)
Model selection
Metric + cost/latency
“Best” isn’t only accuracy
Overfitting vs underfitting (signals)
Symptom
Likely issue
Typical fix
Train ↑, validation ↓
Overfitting
Regularization, simpler model, more data, better features
Both low
Underfitting
More expressive model, better features, tune hyperparameters
Clarify vs Debugger vs Model Monitor (common confusion)
Tool
What it helps with
When to name it
SageMaker Clarify
Bias + explainability
Fairness questions, “why did it predict X?”
SageMaker Model Debugger
Training diagnostics + convergence
Training instability, loss not decreasing, debugging training
SageMaker Model Monitor
Production monitoring workflows
Drift, data quality degradation, monitoring baselines
Model Registry (repeatability + governance)
Track: model artifacts, metrics, lineage, approvals.
Enables safe promotion/rollback and audit-ready workflows.
4) Domain 3 — Deployment and orchestration (22%)
Endpoint types (must-know picker)
Endpoint type
Best for
Typical constraint
Real-time
Steady, low-latency inference
Cost for always-on capacity
Serverless
Spiky traffic, scale-to-zero
Cold starts + limits
Asynchronous
Long inference time, bursty workloads
Event-style patterns + polling/callback
Batch inference
Scheduled/offline scoring
Not interactive
Scaling metrics (what to pick)
Metric
Good when…
Watch out
Invocations per instance
Request volume drives load
Spiky traffic can cause oscillation
Latency
You have a latency SLO
Noisy metrics require smoothing
CPU/GPU utilization
Compute bound models
Not always correlated to request rate
Multi-model / multi-container (why they exist)
Multi-model: multiple models behind one endpoint to reduce cost.
Multi-container: pre/post-processing plus model serving, or multiple frameworks.
IaC + containers (exam patterns)
IaC: CloudFormation or CDK for reproducible environments.
Containers: build/publish to ECR , deploy via SageMaker , ECS , or EKS .
CI/CD for ML (what’s different)
You version and validate more than code:
Code + data + features + model artifacts + evaluation reports
Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation
Typical services: CodePipeline/CodeBuild/CodeDeploy , SageMaker Pipelines , EventBridge triggers.
flowchart LR
G["Git push"] --> CP["CodePipeline"]
CP --> CB["CodeBuild: tests + build"]
CB --> P["SageMaker Pipeline: process/train/eval"]
P --> Gate{"Meets<br/>thresholds?"}
Gate -->|yes| MR["Model Registry approve"]
Gate -->|no| Stop["Stop + report"]
MR --> Dep["Deploy (canary/shadow)"]
Dep --> Mon["Monitor + rollback triggers"]
5) Domain 4 — Monitoring, cost, and security (24%)
Monitoring and drift (high yield)
Data drift: input distribution changed.
Concept drift: relationship between input and label changed.
Use baselines + ongoing checks; monitor latency/errors too.
Common services/patterns:
SageMaker Model Monitor for monitoring workflows.
A/B testing or shadow deployments for safe comparison.
Monitoring checklist (what to instrument)
Inference quality: when ground truth is available later, compare predicted vs actual.
Data quality: nulls, ranges, schema changes, category explosion.
Distribution shift: feature histograms/summary stats vs baseline.
Ops signals: p50/p95 latency, error rate, throttles, timeouts.
Safety/security: anomalous traffic spikes, abuse patterns, permission failures.
Infra + cost optimization (high yield)
Theme
What to do
Observability
CloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces
Rightsizing
Pick instance family/size based on perf; use Inference Recommender + Compute Optimizer
Spend control
Tags + Cost Explorer + Budgets + Trusted Advisor
Purchasing options
Spot / Reserved / Savings Plans where the workload fits
Cost levers (common “best answer” patterns)
Choose the right inference mode first: batch (cheapest) → async → serverless → real-time (most always-on).
Right-size and auto scale; don’t leave endpoints overprovisioned.
Use Spot for fault-tolerant training/batch where interruptions are acceptable.
Use Budgets + tags early (before the bills surprise you).
Security defaults (high yield)
Least privilege IAM for training jobs, pipelines, and endpoints.
Encrypt at rest + in transit (KMS + TLS).
VPC isolation (subnets + security groups) for ML resources when required.
Audit trails (CloudTrail) + controlled access to logs and artifacts.
Common IAM/security “gotchas”
Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch).
Endpoint role has broad S3 access (“*”) instead of a tight prefix.
Secrets leak into logs/artifacts (build logs, notebooks, environment variables).
No audit trail for model registry approvals or endpoint updates.
Next steps
Use Resources to stay anchored to the official exam guide and SageMaker docs.
Use the FAQ to confirm expected depth and where the exam is more engineering than data science.
Turn weak deployment, monitoring, and security rows into timed scenario drills.
Quiz
This quiz requires JavaScript to run. The questions are shown below in plain text.
Loading quiz…