Use this for last-mile review. Keep it open while drilling mixed questions. DE-ASSOC is usually easier when you classify the stem in this order:
- Platform lane: workspace, compute, serverless, SQL, notebook, or Unity Catalog?
- Pipeline lane: ingestion, transformation, Lakeflow logic, workflow scheduling, or job recovery?
- Table/governance lane: Delta behavior, managed vs external storage, permissions, sharing, or lineage?
- Evidence lane: logs, Spark UI, failed task pattern, skew, tiny files, or bad join logic?
DE-ASSOC section map in one screen
| Official section |
Best cheat-sheet focus |
| 1. Databricks Intelligence Platform |
platform fit, compute choices, serverless and SQL warehouse cues |
| 2. Development and Ingestion |
notebooks, Databricks Connect, Auto Loader, checkpoint thinking |
| 3. Data Processing and Transformations |
medallion purpose, Delta rules, DDL/DML, DataFrame and SQL patterns |
| 4. Productionizing Data Pipelines |
Asset Bundles, workflows, repair and rerun, serverless jobs, Spark UI |
| 5. Data Governance and Quality |
managed vs external tables, Unity Catalog, lineage, Delta Sharing, federation |
flowchart TD
subgraph Platform["Platform Lane"]
WS["Workspace + Notebooks"] --> Compute["Interactive, Job, or SQL Compute"]
end
subgraph Pipeline["Pipeline Lane"]
Auto["Auto Loader / Ingest"] --> Bronze["Bronze"] --> Silver["Silver"] --> Gold["Gold"]
end
subgraph Production["Production + Governance"]
Gold --> Workflows["Lakeflow / Workflows / Jobs"]
Workflows --> Observe["Spark UI, Logs, Repair"]
UC["Unity Catalog + Lineage + Permissions"]
end
Compute --> Auto
UC -. governs .-> Bronze
UC -. governs .-> Silver
UC -. governs .-> Gold
DE-ASSOC answer sequence
Use this when the stem mixes workspace, compute, ingestion, governance, and production behavior.
flowchart TD
S["Scenario"] --> P["Classify the lane"]
P --> W["Workspace, compute, SQL, pipeline, or governance?"]
W --> D["Choose the right Databricks feature"]
D --> G["Check Unity Catalog, table type, and permissions"]
G --> O["Verify logs, Spark UI, lineage, or run recovery"]
| If the question is mainly about… |
Strongest first lane |
| interactive exploration and ad hoc transformation work |
notebook on the right compute |
| local IDE-driven development against Databricks |
Databricks Connect |
| incremental file discovery and append-heavy ingestion |
Auto Loader |
| declarative ETL pipeline structure |
Lakeflow Declarative Pipelines |
| scheduled production execution and repair |
Databricks Workflows |
| SQL-serving, dashboards, or analyst-facing queries |
SQL warehouse |
| governance boundary, permissions, lineage, or sharing |
Unity Catalog |
Compute and workload fit
| Workload signal |
Interactive cluster |
Job compute / serverless job |
SQL warehouse |
| notebook exploration or development loop |
strongest fit |
weak |
weak |
| scheduled batch pipeline |
possible but less disciplined |
strongest fit |
weak |
| analyst SQL and BI path |
weak |
weak |
strongest fit |
| exam trap |
using interactive compute as permanent production runtime |
forgetting repair/rerun and scheduling behavior |
treating it like general ETL compute |
Compute traps
| Trap |
Better reading |
| “it runs in a notebook, so it belongs on an interactive cluster forever” |
separate development workflow from scheduled production execution |
| “serverless means every workload should move there” |
first classify whether the question is about SQL serving, notebook work, or job execution |
| “workspace” and “compute” blur together |
workspace is the operating environment; compute is the execution lane |
Ingestion and development picker
| Requirement |
Strongest first lane |
Why |
| discover new files incrementally with less manual listing logic |
Auto Loader |
ingestion-first tool with checkpoint/state thinking |
| move local dev workflow toward Databricks execution |
Databricks Connect |
local IDE development against platform runtime |
| one-off file load from stage or source into a table |
direct load pattern |
simpler than inventing a streaming path |
| understand why an ingest step failed |
logs, run details, and recent source/schema changes |
evidence before redesign |
Auto Loader cues
| Cue |
Fast recall |
| repeated file arrival over time |
Auto Loader lane |
| checkpoint thinking |
resume incremental processing safely |
| schema drift concern |
classify whether schema should be enforced, rescued, or intentionally evolved |
| common trap |
treating Auto Loader like a generic transformation framework instead of an ingestion lane |
| If the question is really about… |
Strongest first lane |
| ACID table behavior on the lake |
Delta table |
| upsert or change-merge logic |
MERGE |
| historical inspection or rollback reasoning |
Delta history or time travel |
| incompatible write protection |
schema enforcement |
| intentionally adding columns |
schema evolution |
| transformation layer purpose |
Bronze vs Silver vs Gold choice |
Bronze / Silver / Gold
| Layer |
Main purpose |
Common exam reading |
| Bronze |
raw ingest, append-heavy, close to source |
keep source fidelity and land data safely |
| Silver |
cleaned, validated, joined, shaped |
enforce quality and prepare reusable data |
| Gold |
business-ready serving layer |
curated output for BI, reporting, or stable consumption |
High-confusion Delta pairs
| Pair |
Keep this distinction clear |
| schema enforcement vs schema evolution |
reject incompatible write versus intentionally allow structure change |
| managed vs external table |
Databricks-managed storage location versus externally controlled storage path |
| batch transformation logic vs streaming or incremental ingest |
processing lane versus arrival/discovery lane |
| medallion layer choice vs Unity Catalog object boundary |
data refinement stage versus governance namespace |
SQL and DataFrame quick rules
| Question pattern |
Strongest reading |
| “keep all left-side rows” |
left join |
| “find missing matches” |
anti join or left-side missing-match logic |
| “top or latest row within each entity” |
window function such as ROW_NUMBER() |
| “too much shuffle after join or aggregation” |
wide transformation, possible skew, repartitioning or better key design |
| “slow query with excessive small reads” |
file layout, compaction, pruning, and data skipping before brute-force scaling |
Production, jobs, and repair cues
| Requirement |
Strongest first lane |
| packaged deployable project structure |
Databricks Asset Bundles |
| scheduled dependency-aware execution |
Workflows |
| rerun only the failed work rather than everything |
repair / rerun logic |
| performance evidence instead of guesswork |
Spark UI and run diagnostics |
| smaller operational burden for scheduled jobs |
serverless jobs when the stem points there |
Workflow traps
| Trap |
Better reading |
| rerun the whole pipeline every time |
repair the failed path when the question is about safe recovery |
| notebook success means production readiness |
separate interactive proof from packaged, scheduled, observable workflow behavior |
| “optimize” with no evidence |
inspect Spark UI, task skew, shuffle pattern, and recent code changes first |
Unity Catalog, sharing, and governance
| If the question is mainly about… |
Strongest first lane |
| catalogs, schemas, tables, and privilege boundaries |
Unity Catalog object model |
| who can access what |
permissions and role boundary |
| where data lives and who manages the path |
managed vs external table choice |
| auditability and downstream visibility |
lineage and audit logs |
| sharing data to others without copying every object manually |
Delta Sharing |
| querying external systems through a governed connection |
federation |
Governance-boundary table
| Item |
What it really answers |
Do not confuse it with |
| catalog |
high-level namespace and governance boundary |
a single physical data file path |
| schema |
grouping inside a catalog |
a medallion layer by itself |
| managed table |
Databricks-managed storage lifecycle |
external table storage ownership |
| lineage |
upstream/downstream dependency evidence |
permissions |
| sharing |
controlled exposure to consumers |
cloning or duplicating data pipelines |
Troubleshooting first look
| Symptom |
Inspect first |
| duplicate records after upsert |
MERGE condition and source uniqueness |
| write fails on mismatch |
schema enforcement versus intended evolution |
| slow transformation after join or aggregate |
shuffle, skew, partitioning, and Spark UI evidence |
| job failed after partial success |
workflow run details, repair path, and failed task boundary |
| unexpected permission denial |
Unity Catalog object boundary and granted privileges |
| too many tiny files |
write pattern, compaction strategy, and table layout |
Last 15-minute review
| Recheck this |
Because the exam often hides the miss here |
| development workflow vs production workflow |
notebook comfort is not the same as job discipline |
| Auto Loader vs Lakeflow vs Workflows |
ingestion, declarative pipeline logic, and scheduling are different lanes |
| Bronze / Silver / Gold purpose |
many answers fail because the layer purpose is blurred |
| managed vs external tables |
governance and storage ownership matter |
Delta rules such as MERGE, time travel, and schema behavior |
these are high-yield feature distinctions |
What strong DE-ASSOC answers usually do
- classify whether the question is about platform, pipeline, governance, or runtime evidence
- choose the more repeatable and observable production behavior over the more manual notebook habit
- separate ingestion, transformation, and scheduling instead of treating them as one tool choice
- keep Unity Catalog, table type, lineage, and sharing boundaries precise