Databricks ML-PRO Glossary: Key Terms

April 13, 2026

Databricks ML-PRO glossary of Spark ML, training, tuning, inference, and MLOps terms.

On this page

Use this glossary when SparkML, MLflow, feature-engineering, monitoring, and deployment terms start to blur together. Keep it beside the cheat sheet and resources, not in place of scenario practice.

High-yield terms

Term	Short meaning	Why it matters on ML-PRO
SparkML	Spark’s distributed ML library for pipelines, estimators, transformers, and scalable inference	core model-development term
Nested run	MLflow tracking pattern that groups child runs under a parent experiment context	key advanced experimentation term
Alias	Stable label pointing to a chosen registered model version	key release-control term
Point-in-time correctness	Feature lookup behavior that prevents leakage by using only information available at that moment	one of the highest-value feature-engineering concepts
Online table	Databricks feature-serving storage for low-latency applications	key online-feature term
Lakehouse Monitoring	Databricks monitoring surface for data and model-quality signals	key drift and monitoring term
Drift metric	statistical signal that tracks change in data or model behavior over time	key monitoring decision term
Data parallelism	split data across workers while training the same model structure	key scaling strategy term
Model parallelism	split model computation itself across resources	key large-model scaling term
Optuna	hyperparameter tuning framework used in Databricks workflows and often paired with MLflow logging	key tuning term
Ray	distributed compute framework often contrasted with Spark for ML workloads	key scaling trade-off term
Databricks Asset Bundle	packaging and deployment structure for Databricks assets and environment promotion	key MLOps term
Blue-green deployment	deployment strategy that shifts traffic between two environments with a clear cutover path	key rollout term
Canary deployment	rollout strategy that exposes a small portion of traffic first	key blast-radius-control term
Custom PyFunc model	MLflow model packaged through the pyfunc interface for custom serving logic	key deployment-interface term
Deploy code strategy	lifecycle approach where code and environment transitions manage how models move across stages	key MLOps architecture term

Commonly confused pairs

Pair	Keep this distinction clear
MLflow run vs registered model version	experiment record versus release artifact
alias vs serving endpoint	release pointer versus deployed inference interface
point-in-time correctness vs feature freshness	leakage prevention versus recency of values
drift vs rollout regression	gradual distribution or quality change versus bad release event
SparkML vs single-node model	distributed pipeline fit versus local model path
Spark vs Ray	different distributed-training ecosystems and trade-offs
retrain vs rollback	create a new candidate versus restore a known good state

If three terms blur together

Cluster	Fast separation
run / version / alias	track the experiment, govern the releasable artifact, point the release control at the chosen version
drift / outage / rollout regression	gradual change, service failure, or bad deployment event
SparkML / Ray / single-node training	distributed Spark pipeline, alternative distributed framework, or local model path
blue-green / canary / rollback	cutover strategy, partial rollout, or revert to a trusted prior state
point-in-time correctness / leakage / online features	correct historical lookup, future-information contamination, or low-latency feature serving

One-sentence memory hooks

If the model scored well offline, ask whether the feature path and release path are still safe.
If production gets worse, separate drift, feature bug, rollout regression, and serving failure before acting.
If scaling is the issue, choose fit before size.
If monitoring fires, decide whether the right action is retrain, rollback, block promotion, or fix upstream data.

Revised on Monday, June 15, 2026

Sample Questions

FAQ

Browse Databricks Certification Guides

Databricks ML-PRO Glossary: Key Terms

High-yield terms

Commonly confused pairs

If three terms blur together

One-sentence memory hooks