Study Databricks ML-ASSOC Serving Patterns: key concepts, common traps, and exam decision cues.
The exam does not want one serving mode for everything. It wants you to match the inference pattern to the workload: bulk offline scoring, low-latency endpoint use, or continuous pipeline scoring.
| Need | Better first instinct |
|---|---|
| large scheduled scoring job | batch inference |
| immediate prediction for incoming request | realtime inference |
| continuous event or record flow | streaming inference |
| Trigger type | Stronger serving instinct |
|---|---|
| scheduled job over a dataset | batch |
| user or application request right now | realtime |
| endless event flow in a pipeline | streaming |
If you classify the trigger first, most wrong answers drop out quickly.
| Signal in the stem | Strong reading |
|---|---|
| batch scoring with a tabular workflow | pandas can fit batch inference tasks |
| streaming pipeline with changing event volume | Databricks streaming or Delta Live Tables style inference pattern |
| low-latency endpoint | realtime serving |
Each mode optimizes for a different operational shape:
The exam usually rewards the option that fits the workload shape cleanly rather than the flashiest serving technology.
| Trap | Better rule |
|---|---|
| using realtime endpoints for everything | batch and streaming use cases often fit better elsewhere |
| ignoring compute behavior in streaming scenarios | continuous workloads need a pipeline-aware design |
| assuming offline success alone proves serving fit | serving mode still has to match the workload |
| Scenario clue | Stronger answer shape |
|---|---|
| “nightly or hourly scoring over many rows” | batch |
| “application waits on each prediction” | realtime |
| “events arrive continuously and compute should adapt” | streaming or Delta Live Tables style pattern |
| “large tabular dataset needs scored output file or table” | batch inference, often with pandas or bulk workflow cues |
Deployment-pattern questions usually hinge on latency and workload shape. If the requirement is scheduled bulk scoring, think batch inference. If predictions are required per request, think realtime endpoints. If events arrive continuously and should be scored as they flow, think streaming. The weak answer usually chooses realtime serving for jobs that are really throughput-oriented.