Databricks DE-ASSOC glossary of ingestion, pipelines, governance, and platform terms.
Use this glossary when Spark and Delta terms start to blur together. Keep it beside the cheat sheet and resources, not as a substitute for them.
| Term | Short meaning | Why it matters on DE-ASSOC |
|---|---|---|
| Action | Spark operation that triggers execution | Core execution-behavior concept |
| Transformation | Spark operation that builds a plan without executing immediately | Commonly confused with actions |
| Shuffle | Data redistribution across partitions, often a performance cost | High-value performance and job-behavior term |
| Delta table | Table backed by Delta Lake with ACID transactions and schema handling | Central storage model for the exam |
| Schema enforcement | Write protection that rejects incompatible data | Core Delta safety concept |
| Schema evolution | Intentional table-schema update during write | Common distractor against enforcement |
MERGE |
Upsert-style operation that combines matched updates and new inserts | High-yield operational SQL concept |
| Time travel | Ability to query an earlier Delta table version | Distinct Delta capability |
| Bronze | Raw ingestion layer in a common lakehouse pattern | Medallion architecture term |
| Silver | Cleaned and conformed layer for operational use | Medallion architecture term |
| Gold | Curated business-ready layer for downstream consumption | Medallion architecture term |
| Idempotent | Safe to run repeatedly without creating inconsistent duplicates | Core pipeline-safety concept |
| Checkpoint | Saved processing state used to resume incremental work safely | Important in ingestion and workflow behavior |
| Auto Loader | Databricks ingestion feature for incremental file discovery | High-value ingestion feature |
| Unity Catalog | Databricks governance layer for catalogs, schemas, tables, permissions, and lineage | Central governance model |
| External table | Table whose data lives outside Databricks-managed storage | Commonly contrasted with managed tables |
| Managed table | Table whose lifecycle is governed more directly by the metastore and platform | Governance and lifecycle distinction |
| Serverless job | Databricks-managed execution path that reduces compute management overhead | Operational compute-fit concept |
| Pair | Keep this distinction clear |
|---|---|
| transformation vs action | plan building versus execution trigger |
| schema enforcement vs schema evolution | reject mismatch versus intentionally update schema |
| append vs overwrite | add new data versus replace existing target content |
| Delta table vs plain files | managed transactional table behavior versus raw file behavior |
| bronze vs silver | raw near-source ingestion versus cleaned and conformed data |
| managed vs external table | more platform-managed table lifecycle versus externally located data path |
| rerun vs repair | run the work again versus recover or rerun failed parts intentionally |
| shuffle cost vs simple filter/projection | wide data movement versus lighter local transformation behavior |
| Cluster | Fast separation |
|---|---|
| transformation / action / job execution | build the plan, trigger execution, or observe the resulting run |
| append / overwrite / merge | add rows, replace content, or upsert by matching logic |
| bronze / silver / gold | raw ingest, cleaned operational layer, or curated business layer |
| managed table / external table / files in storage | governed table lifecycle, external data path, or raw files without table semantics |
| checkpoint / rerun / repair | saved state, run again, or recover failed work intentionally |
| Cluster | What it usually signals on the exam |
|---|---|
| transformations / actions / shuffles | Spark execution and performance behavior |
| append / overwrite / merge / schema changes | Delta write-safety questions |
| bronze / silver / gold | medallion architecture and pipeline-shape questions |
| workflow / rerun / repair / serverless jobs | production operations questions |
| Unity Catalog / managed tables / external tables | governance and storage-boundary questions |
| Topic family | Best page to revisit |
|---|---|
| Spark execution and Delta quick rules | Cheat Sheet |
| official Databricks facts and docs | Resources |
| pacing and study order | Study Plan |
| exam framing | Guide root |