Databricks ML-ASSOC Glossary: Features, Training, and Serving Terms

April 13, 2026

Databricks ML-ASSOC glossary of notebooks, data prep, models, deployment, and monitoring terms.

On this page

Use this glossary when feature engineering, evaluation, and MLflow terms start to blur together. Keep it beside the cheat sheet and resources instead of using it as a substitute for study.

High-yield terms

Term	Short meaning	Why it matters on ML-ASSOC
ML runtime	Databricks runtime environment optimized for machine learning tasks	Important platform-fit concept
AutoML	Databricks automation aid for model and feature exploration	High-yield Databricks ML feature
MLflow run	Logged experiment execution with parameters, metrics, and artifacts	Core experiment-tracking concept
Artifact	File output from a run such as a plot, model file, or dataset snapshot	Commonly confused with the model itself
Experiment	Collection of related MLflow runs	Comparison and organization layer
Registered model	Named model object tracked across versions and lifecycle stages	Core model-lifecycle concept
Alias	Stable name that points to a particular registered-model version	Important champion or challenger promotion concept
Feature engineering	Transforming raw data into model-usable inputs	High-yield feature-work concept
Feature table	Managed reusable feature storage object	Important Databricks ML platform concept
Online feature table	Feature storage designed for low-latency serving use	Commonly contrasted with offline feature tables
Offline feature table	Feature storage designed for training, analysis, or batch workflows	Commonly contrasted with online feature tables
Leakage	Information bleeding into training or evaluation from an invalid future or target-dependent source	One of the most tested evaluation failure modes
Baseline model	Simple comparison model used to judge whether a better approach adds value	Helps prevent “good-looking metric” mistakes
Precision	Share of predicted positives that are actually positive	Common classification metric
Recall	Share of actual positives captured by the model	Common classification metric
Cross-validation	Repeated training and validation across different data splits	Key validation concept
Hyperparameter	Tunable training setting that is chosen before or during model search	Common model-tuning term
Hyperopt	Hyperparameter optimization tool referenced in the exam outline	Important tuning tool concept
Estimator	ML component that learns from data and produces a model	Commonly contrasted with transformers
Transformer	Component that changes data shape or values without being the predictive model itself	Common pipeline concept
Inference	Using a trained model to produce predictions	Distinct from training and tracking
Train/validation/test split	Separation of data for fitting, tuning, and final evaluation	Core trustworthy-evaluation term
Artifact store	Backing storage for MLflow artifacts	Helps separate tracking metadata from stored outputs
Model version	Specific registered-model instance tracked through lifecycle changes	Common registry term
Reproducibility	Ability to rerun and explain the same experiment result reliably	Central operational ML concept

Commonly confused pairs

Pair	Keep this distinction clear
parameter vs metric	configuration input versus measured output
artifact vs model	general logged file versus trained predictive object
validation set vs test set	model-tuning feedback set versus final held-out evaluation set
precision vs recall	false-positive sensitivity versus false-negative sensitivity
experiment vs registered model	run-tracking workspace versus managed deployable model lineage
run vs model version	one tracked execution versus one promoted registered-model instance
leakage vs class imbalance	bad information boundary versus skewed data distribution
cross-validation vs single split	repeated evaluation across splits versus one specific train/validation division
estimator vs transformer	learning component versus data-transformation component
online vs offline feature table	low-latency serving store versus training or batch-oriented store
AutoML vs MLflow	automated model search aid versus lifecycle tracking system
batch vs realtime vs streaming inference	bulk scoring versus endpoint serving versus continuous event-driven inference

If three terms blur together

Cluster	Fast separation
run / experiment / registered model	one execution, collection of runs, or promoted model lineage
parameter / metric / artifact	config input, measured result, or stored output file
train / validation / test	fit the model, tune the model, or final held-out check
precision / recall / accuracy	false-positive control, false-negative control, or overall correctness
feature engineering / leakage / reproducibility	improve inputs, avoid invalid information flow, or ensure repeatable results
AutoML / feature table / registry	automate search, manage reusable features, or manage model lineage

One-sentence memory hooks

If the question is about what changes between attempts, think parameters.
If the question is about how well it performed, think metrics.
If the question is about files or outputs produced by the run, think artifacts.
If the result looks too good, think leakage or split quality before celebrating.
If the question is about promotion and managed lifecycle, think registered model and model version, not only runs.
If the question is about feature reuse across teams, think feature tables.
If the question is about which version is active in a role, think alias.

Operational clusters worth keeping straight

Cluster	What it usually signals on the exam
feature engineering / leakage / splits	trustworthy training and evaluation questions
metric choice / baseline model / imbalance	evaluation and model-selection questions
MLflow runs / params / metrics / artifacts	experiment-tracking questions
registered model / model version / deployment	model-lifecycle questions
reproducibility / artifact store / feature consistency	operational ML workflow questions
AutoML / feature tables / ML runtimes	Databricks-native ML platform questions

If the confusion is really about…

Topic family	Best page to revisit
MLflow and evaluation rules	Cheat Sheet
current Databricks facts and docs	Resources
pacing and review order	Study Plan
overall exam framing	Guide root

Revised on Monday, June 15, 2026

Resources

Browse Databricks Certification Guides