Study Databricks DE-ASSOC Workflows and Jobs: key concepts, common traps, and exam decision cues.
On this page
This lesson covers the operations side of production data engineering on Databricks. The exam wants you to know how jobs are deployed, scheduled, repaired, rerun, and managed when a failure happens. It also wants you to recognize when serverless execution is the cleaner operational answer.
Repair: Recovering from failure by rerunning the necessary failed scope rather than restarting everything blindly.
Simple operational flow
flowchart LR
A["Deploy workflow"] --> B["Run scheduled job"]
B --> C{"Task fails?"}
C -- "No" --> D["Successful completion"]
C -- "Yes" --> E["Inspect task and run details"]
E --> F["Repair or rerun the right scope"]
What strong answers usually do
distinguish repairing failed scope from rerunning everything blindly
read task dependencies and schedule intent before changing the workflow
use serverless when the question rewards hands-off job operation
keep recovery targeted so the blast radius stays small
Operational decision map
If the question is mainly about…
Strong lane
recurring execution on a schedule or dependency graph
workflow or job configuration
recovery after a failed task
repair or rerun judgment
reducing manual compute management for standard jobs
serverless jobs
debugging one failed run
inspect run and task evidence before replaying
Repair versus rerun instinct
The exam often makes the wrong answer look simple: rerun everything. The better answer usually depends on scope:
if one task failed, investigate and repair that scope first
if upstream state changed or dependencies are invalid, a broader rerun may be justified
if the question emphasizes managed operations for routine jobs, serverless is often the cleaner compute lane
Common traps
treating every failure as a full restart problem
forgetting that a job can be scheduled and productionized even if it started life as a notebook
choosing interactive compute because that is where the logic was authored
Harder scenario question
A scheduled workflow has five tasks. Only one downstream task failed because its source table was briefly unavailable, and the rest of the run state is still valid. Which instinct is strongest first?
A. Delete the entire workflow and recreate it
B. Inspect the failed task and choose the smallest repair or rerun scope that restores the run safely
C. Replace the workflow with a dashboard refresh
D. Grant more permissions to every user
Correct answer: B. This section rewards targeted recovery based on actual task state and dependency scope.
Decision order that usually wins
Decide whether the task is scheduling, dependency control, repair scope, or compute management.
Inspect task and run evidence before rerunning anything.
Repair the smallest failed scope that restores correctness safely.
Prefer serverless or job-oriented execution when the stem rewards hands-off operations.
Keep authoring history separate from production-run design.