Study Databricks ML-ASSOC Custom Endpoints and Traffic Splits: key concepts, common traps, and exam decision cues.
The last deployment questions are about control. Databricks expects you to recognize custom endpoints, live querying patterns, and traffic splitting, but it still wants consistency between the training workflow and the served workflow.
| Need | Better first instinct |
|---|---|
| expose a custom model for realtime use | custom endpoint |
| compare or roll out models safely | traffic split between endpoints |
| explain why production differs from offline results | check preprocessing, schema, and feature consistency |
| If the problem is mainly about… | Stronger first lane |
|---|---|
| how to expose the model | endpoint design |
| how to roll out safely | traffic split or controlled live comparison |
| why live predictions look wrong | inference consistency and feature parity |
This matters because rollout control and inference-quality diagnosis are not the same question.
| Trap | Better rule |
|---|---|
| assuming a strong offline metric guarantees strong endpoint behavior | training and serving inputs still have to match |
| treating traffic splits like experiment tracking | traffic splits are live rollout control, not run logging |
| serving a model without checking feature parity | inference consistency matters more than novelty |
Offline and production mismatch often comes from one of these:
The better answer usually diagnoses the inconsistency before reaching for a new model.
| Scenario clue | Stronger answer shape |
|---|---|
| “team needs a callable custom model over HTTP” | custom endpoint |
| “team wants a safer production rollout between candidates” | traffic split |
| “offline validation looked good but live predictions are odd” | inspect schema, preprocessing, and feature parity |
| “need live queries against the deployed model” | realtime inference path |
This lesson usually tests whether you can separate rollout safety from experiment tracking. Traffic splitting is about controlling live rollout and comparison in production. MLflow run comparison is about experiment history. If production behavior diverges from offline expectations, inspect schema, preprocessing, and feature consistency first. The weak answer usually blames live routing before checking training-serving parity.