AWS MLA-C01 Glossary: ML Pipeline, Drift, and Endpoint Terms

AWS MLA-C01 glossary of ML pipeline, drift, endpoint, deployment, and monitoring terms.

Use this glossary when SageMaker and MLOps terms start to blur together. Keep it beside the cheat sheet and resources rather than treating it as a substitute for study.

High-yield terms

Term Short meaning Why it matters on MLA-C01
MLOps Deployment, monitoring, versioning, and lifecycle discipline for ML systems The exam is more engineering and operations than pure modeling theory
Feature store Managed store for reusable model features Prevents train/serve skew and supports repeated feature use
Endpoint Hosted inference interface for serving model predictions Central to real-time, async, and multi-model serving choices
Batch transform Offline inference over a dataset rather than real-time requests Tested against real-time and async serving patterns
Model registry Managed inventory of model versions and approval states Critical for rollback, traceability, and safe promotion
Drift Production data or behavior changing away from the training or expected pattern Core monitoring concept in live ML systems
Clarify SageMaker tool for explainability and bias-related analysis Often appears in fairness and explainability questions
Model Monitor SageMaker monitoring capability for production models Strongest first answer for drift and data-quality monitoring
Pipeline Orchestrated ML workflow such as prepare, train, validate, and deploy Key to retraining, CI/CD, and repeatability
Shadow deployment Comparing production traffic against a new model without full cutover Safer comparison than blind promotion
Blue/green deployment Safer rollout pattern with a separate replacement environment Helps reduce blast radius during rollout
Inference recommender SageMaker guidance for deployment instance and configuration fit Helps connect model serving choices to cost and capacity
Data Wrangler Managed data-prep workflow for transformations and feature work Strong answer in data preparation questions
Hyperparameter tuning Systematic search across training settings Distinct from model choice and deployment tuning
Multi-model endpoint Shared endpoint that serves several low-traffic models Cost and serving-fit concept, not a training concept
VPC isolation Keeping inference resources inside private network boundaries Common security and deployment control on MLA-C01
Train/serve skew Difference between how features are built for training versus live inference A classic feature-store and data-pipeline problem
Baseline Reference dataset or statistics used to compare later inference behavior Central to drift and monitoring questions
Ground truth Actual observed outcome used later to assess prediction quality Needed to reason about delayed quality evaluation
Lineage Record of how data, code, training, and models relate across versions Helps with auditability, rollback, and governance
Serverless inference Managed inference that scales down when idle Often the right answer for spiky low-duty-cycle traffic
Async inference Inference pattern where the request returns later rather than immediately Better fit for long-running or bursty jobs than always-on real-time serving

Commonly confused pairs

Pair Keep this distinction clear
online inference vs batch transform low-latency serving versus offline dataset scoring
drift vs bias changing production behavior versus unfair or skewed model behavior
registry vs endpoint managed version catalog versus live serving target
monitoring vs rollback detecting trouble versus returning to a safer version
feature store vs raw training data reusable engineered features versus general source data
batch vs async inference scheduled or offline scoring versus delayed-response online serving
model quality vs infra health whether the predictions are still good versus whether the platform is still healthy
Clarify vs Model Monitor fairness and explainability analysis versus production drift/data monitoring
feature engineering vs labeling improving input signal versus creating target values
train/serve skew vs drift inconsistent feature logic versus production behavior changing over time
lineage vs registry end-to-end record of artifacts and steps versus governed model version catalog

If three terms blur together

Cluster Fast separation
endpoint / registry / pipeline endpoint serves, registry tracks versions, pipeline orchestrates workflow
drift / data quality / infra issue drift means behavior changed over time, data quality means input is malformed or incomplete, infra issue means the platform is slow or unstable
Data Wrangler / Feature Store / Model Monitor Data Wrangler prepares data, Feature Store serves reusable features, Model Monitor watches live inference data
real-time / async / batch real-time answers now, async answers later, batch scores offline datasets
IAM / VPC isolation / encryption IAM controls who, VPC controls where from, encryption protects the data itself
registry / lineage / approval registry stores model versions, lineage shows how they were produced, approval decides whether they move forward

One-sentence memory hooks

  • If the question is about reuse across training and inference, think Feature Store.
  • If the question is about comparing or approving model versions, think Model Registry.
  • If the question is about live drift or production input quality, think Model Monitor.
  • If the question is about fairness or explainability, think Clarify.
  • If the question is about costly always-on serving for light traffic, think endpoint fit before instance size.
  • If the question is about safe release, think staged rollout plus rollback, not just “deploy latest”.
  • If the question is about comparing now to earlier behavior, think baseline before you think “just add more metrics”.
  • If the question is about late-arriving actual outcomes, think ground truth collection before you claim you can measure live model quality immediately.

Operational clusters worth keeping straight

Cluster What it usually signals on the exam
data quality / leakage / train-serve skew fix the pipeline and feature logic before changing the model
tuning / evaluation / approval improve the model and decide whether it is promotable
endpoint fit / autoscaling / inference recommender match serving shape and cost to traffic reality
drift / baseline / ground truth decide whether the model is still behaving acceptably in production
IAM / VPC / KMS / secrets separate identity, network boundary, encryption, and secret handling

If the confusion is really about…

Topic family Best page to revisit
deployment and MLOps quick rules Cheat Sheet
official AWS facts and service docs Resources
pacing and review order Study Plan
overall exam framing Guide root
training, tuning, and model versioning 2.2 Training, Tuning & Versions
drift, A/B testing, and live monitoring 4.1 Monitoring, Drift & A/B
endpoint shapes and scaling 3.1 Endpoints & Containers
Revised on Sunday, May 10, 2026