Study Databricks DE-PRO Spark UI and Job Repair: key concepts, common traps, and exam decision cues.
Debugging questions are really about choosing the right signal and the least-disruptive recovery path. Professional answers rarely start with “rerun everything.”
| Requirement | Better first instinct |
|---|---|
| inspect stage and task behavior | Spark UI |
| inspect runtime or driver-level failure detail | cluster logs |
| rerun failed work after diagnosis | repair run |
| adjust run-time input for a rerun | parameter override |
| If you need to know… | Stronger first answer |
|---|---|
| what happened inside execution stages | Spark UI |
| what the runtime or driver reported | cluster logs |
| how to rerun only the failed slice | repair run |
| how to change rerun inputs safely | parameter override |
This ordering matters because DE-PRO usually punishes blind reruns.
| If the stem says… | Strong reading |
|---|---|
| “identify diagnostic information” | choose the signal that matches the failure layer |
| “remediate failed job runs” | repair and parameter control matter |
| “Lakeflow pipeline debugging” | event logs and Spark UI may both be relevant, but for different reasons |
Professional recovery tries to:
That is why repair runs and parameter overrides show up as distinct operational tools.
| Trap | Better rule |
|---|---|
| using a full rerun as the first answer to every failure | DE-PRO usually rewards bounded repair |
| changing parameters without understanding the failure | diagnosis should come first |
| treating Spark UI and cluster logs as identical | one focuses on execution behavior, the other on runtime log detail |
| Scenario clue | Stronger answer shape |
|---|---|
| “need stage/task execution detail” | Spark UI |
| “need driver/runtime failure detail” | cluster logs |
| “failed slice should rerun without full replay” | repair run |
| “rerun needs changed run-time value” | parameter override |
Debugging questions usually begin with blast radius. If you need low-level task and stage detail, go to Spark UI. If you have isolated the failed slice and need a controlled rerun, think repair run with overrides. The weak answer is replaying the whole workload blindly when the professional move is bounded recovery after diagnosis.