Databricks ML-ASSOC Glossary: Features, Training, and Serving Terms

Databricks ML-ASSOC glossary of notebooks, data prep, models, deployment, and monitoring terms.

Use this glossary when feature engineering, evaluation, and MLflow terms start to blur together. Keep it beside the cheat sheet and resources instead of using it as a substitute for study.

High-yield terms

Term Short meaning Why it matters on ML-ASSOC
ML runtime Databricks runtime environment optimized for machine learning tasks Important platform-fit concept
AutoML Databricks automation aid for model and feature exploration High-yield Databricks ML feature
MLflow run Logged experiment execution with parameters, metrics, and artifacts Core experiment-tracking concept
Artifact File output from a run such as a plot, model file, or dataset snapshot Commonly confused with the model itself
Experiment Collection of related MLflow runs Comparison and organization layer
Registered model Named model object tracked across versions and lifecycle stages Core model-lifecycle concept
Alias Stable name that points to a particular registered-model version Important champion or challenger promotion concept
Feature engineering Transforming raw data into model-usable inputs High-yield feature-work concept
Feature table Managed reusable feature storage object Important Databricks ML platform concept
Online feature table Feature storage designed for low-latency serving use Commonly contrasted with offline feature tables
Offline feature table Feature storage designed for training, analysis, or batch workflows Commonly contrasted with online feature tables
Leakage Information bleeding into training or evaluation from an invalid future or target-dependent source One of the most tested evaluation failure modes
Baseline model Simple comparison model used to judge whether a better approach adds value Helps prevent “good-looking metric” mistakes
Precision Share of predicted positives that are actually positive Common classification metric
Recall Share of actual positives captured by the model Common classification metric
Cross-validation Repeated training and validation across different data splits Key validation concept
Hyperparameter Tunable training setting that is chosen before or during model search Common model-tuning term
Hyperopt Hyperparameter optimization tool referenced in the exam outline Important tuning tool concept
Estimator ML component that learns from data and produces a model Commonly contrasted with transformers
Transformer Component that changes data shape or values without being the predictive model itself Common pipeline concept
Inference Using a trained model to produce predictions Distinct from training and tracking
Train/validation/test split Separation of data for fitting, tuning, and final evaluation Core trustworthy-evaluation term
Artifact store Backing storage for MLflow artifacts Helps separate tracking metadata from stored outputs
Model version Specific registered-model instance tracked through lifecycle changes Common registry term
Reproducibility Ability to rerun and explain the same experiment result reliably Central operational ML concept

Commonly confused pairs

Pair Keep this distinction clear
parameter vs metric configuration input versus measured output
artifact vs model general logged file versus trained predictive object
validation set vs test set model-tuning feedback set versus final held-out evaluation set
precision vs recall false-positive sensitivity versus false-negative sensitivity
experiment vs registered model run-tracking workspace versus managed deployable model lineage
run vs model version one tracked execution versus one promoted registered-model instance
leakage vs class imbalance bad information boundary versus skewed data distribution
cross-validation vs single split repeated evaluation across splits versus one specific train/validation division
estimator vs transformer learning component versus data-transformation component
online vs offline feature table low-latency serving store versus training or batch-oriented store
AutoML vs MLflow automated model search aid versus lifecycle tracking system
batch vs realtime vs streaming inference bulk scoring versus endpoint serving versus continuous event-driven inference

If three terms blur together

Cluster Fast separation
run / experiment / registered model one execution, collection of runs, or promoted model lineage
parameter / metric / artifact config input, measured result, or stored output file
train / validation / test fit the model, tune the model, or final held-out check
precision / recall / accuracy false-positive control, false-negative control, or overall correctness
feature engineering / leakage / reproducibility improve inputs, avoid invalid information flow, or ensure repeatable results
AutoML / feature table / registry automate search, manage reusable features, or manage model lineage

One-sentence memory hooks

  • If the question is about what changes between attempts, think parameters.
  • If the question is about how well it performed, think metrics.
  • If the question is about files or outputs produced by the run, think artifacts.
  • If the result looks too good, think leakage or split quality before celebrating.
  • If the question is about promotion and managed lifecycle, think registered model and model version, not only runs.
  • If the question is about feature reuse across teams, think feature tables.
  • If the question is about which version is active in a role, think alias.

Operational clusters worth keeping straight

Cluster What it usually signals on the exam
feature engineering / leakage / splits trustworthy training and evaluation questions
metric choice / baseline model / imbalance evaluation and model-selection questions
MLflow runs / params / metrics / artifacts experiment-tracking questions
registered model / model version / deployment model-lifecycle questions
reproducibility / artifact store / feature consistency operational ML workflow questions
AutoML / feature tables / ML runtimes Databricks-native ML platform questions

If the confusion is really about…

Topic family Best page to revisit
MLflow and evaluation rules Cheat Sheet
current Databricks facts and docs Resources
pacing and review order Study Plan
overall exam framing Guide root
Revised on Sunday, May 10, 2026