Use this when the stem mixes features, splits, metrics, MLflow, registry, or deployment.
flowchart TD
S["Scenario"] --> D["Check data and feature lane"]
D --> T["Check training and metric lane"]
T --> R["Check MLflow tracking and reproducibility"]
R --> P["Check registry, versioning, and promotion"]
P --> O["Check deployment and monitoring fit"]
Feature and split rules
If the question is mainly about…
Strongest first lane
inputs available only after prediction time
leakage risk
train and test data influencing each other
split contamination
repeated transformation mismatch in production
feature pipeline inconsistency
reproducible training inputs
tracked feature prep and stable splits
AutoML or feature-store question
Databricks ML platform workflow, not only raw algorithm choice
Databricks ML platform picker
If the question is really about…
Strongest first lane
faster model or feature exploration
AutoML
reproducible feature reuse across teams
feature tables in Unity Catalog
environment optimized for ML work
ML runtimes
experiment workflow and run comparison
MLflow
Leakage and contamination table
Risk
What it looks like
Safer approach
feature leakage
feature uses future or unavailable information
use only information available at prediction time
label leakage
feature derives from or strongly encodes the target
remove or rebuild the feature
train/test contamination
transforms or statistics fit on the full dataset
fit transforms on train only and apply to test
Data-processing quick rules
If the issue is mainly about…
Strongest first lane
broad summary of a Spark DataFrame
.summary() or built-in summary tools
extreme values harming training
outlier review using standard deviation or IQR logic
comparing two categorical or continuous features
choose the comparison and visualization that matches the data type
missing values
pick mean, median, or mode based on the feature and distribution
categorical encoding
use one-hot encoding only where it actually fits
skewed numeric feature
consider log transform where appropriate
Metric chooser
Task
Common metrics
What to watch
classification
accuracy, precision, recall, F1, AUC
imbalance and false-positive/false-negative cost
regression
RMSE, MAE, R²
sensitivity to large errors and interpretability
Metric traps
Trap
Better reading
using accuracy on a clearly imbalanced problem
think precision, recall, F1, or AUC depending on the trade-off
choosing one regression metric without error-context thinking
classify whether large errors should be penalized more heavily
trusting a very strong score immediately
check leakage, split quality, and feature pipeline consistency first
MLflow boundaries
MLflow concept
What it stores
Why the exam cares
run
one training or evaluation attempt
comparison and reproducibility
params
model and training configuration
explain how the run was produced
metrics
evaluation numbers
rank candidates consistently
artifacts
plots, files, models, reports
reproduce and inspect outputs
registry
named model versions and lifecycle management
controlled promotion and deployment
Fast MLflow picker
If the question is mainly about…
Strongest first lane
comparing experiments
runs with logged params and metrics
preserving the produced model and supporting files
artifacts
controlled model version promotion
registry
explaining how a result happened
params, metrics, artifacts, and data or version context together
promoting by champion or challenger pattern
aliases in the registry
Reproducibility rules
log params, metrics, and the model artifact
keep track of the data or feature version when it materially affects the result
avoid manual side notes as the only record of a training run
treat reproducibility as part of the experiment, not a later cleanup task
Training and evaluation quick rules
Requirement
Strongest first lane
compare two candidate models fairly
same split discipline and comparable metrics
explain why a model improved
compare logged runs and feature or config differences
too-good-to-be-true performance
investigate leakage, split quality, and artifact consistency
offline result differs from production
check schema, preprocessing, feature availability, and serving consistency
choose search strategy
random, grid, or Bayesian search based on the search need and cost
estimate training count in grid search plus CV
multiply parameter combinations by fold count
Registry and deployment cues
Step
What happens
Why it matters
register model
create named model with versions
stable deployment reference
create new version
tie a version back to a run or model artifact
traceability
promote version
controlled movement toward production use
governance and rollout discipline
deploy or serve
expose the chosen version for inference
consistency matters more than novelty
split traffic between endpoints
compare live realtime inference behavior safely
rollout control
Deployment traps
Trap
Better reading
strong offline metrics mean production is solved
check preprocessing, schema, and feature parity
registry is just storage
registry adds versioning and promotion control
logging a metric is enough for reproducibility
params and artifacts matter too
High-confusion pairs
Pair
Keep this distinction clear
params vs metrics
training configuration versus evaluation results
run vs registry version
experiment attempt versus promoted managed model version
leakage vs class imbalance
bad feature boundary versus data distribution problem
offline success vs production success
benchmark result versus operational consistency
estimator vs transformer
learning component versus data-transformation component
AutoML vs MLflow
automated model search aid versus lifecycle tracking system
feature table vs registered model
reusable features versus managed model artifact lineage
batch vs realtime vs streaming inference
bulk scoring versus endpoint serving versus continuous event-driven inference
Last 15-minute review
Recheck this
Because the miss often hides here
what information is available at prediction time
leakage questions often hinge on that boundary
metric choice for the real business risk
accuracy is not the default winner
what MLflow logs at each layer
runs, artifacts, and registry roles blur easily
reproducibility versus mere experimentation
the exam prefers the controlled workflow
registry and deployment consistency
versioning and inference parity matter
feature-store, AutoML, and registry roles
Databricks ML nouns blur easily under time pressure
What strong ML-ASSOC answers usually do
protect reproducibility before chasing model complexity
catch leakage and bad metric choice early
keep feature engineering, training, evaluation, and deployment roles separate
understand what MLflow stores, compares, versions, and promotes at each layer