Databricks DE-ASSOC Glossary: Ingestion, Delta, and Catalog Terms

Databricks DE-ASSOC glossary of ingestion, pipelines, governance, and platform terms.

Use this glossary when Spark and Delta terms start to blur together. Keep it beside the cheat sheet and resources, not as a substitute for them.

High-yield terms

Term Short meaning Why it matters on DE-ASSOC
Action Spark operation that triggers execution Core execution-behavior concept
Transformation Spark operation that builds a plan without executing immediately Commonly confused with actions
Shuffle Data redistribution across partitions, often a performance cost High-value performance and job-behavior term
Delta table Table backed by Delta Lake with ACID transactions and schema handling Central storage model for the exam
Schema enforcement Write protection that rejects incompatible data Core Delta safety concept
Schema evolution Intentional table-schema update during write Common distractor against enforcement
MERGE Upsert-style operation that combines matched updates and new inserts High-yield operational SQL concept
Time travel Ability to query an earlier Delta table version Distinct Delta capability
Bronze Raw ingestion layer in a common lakehouse pattern Medallion architecture term
Silver Cleaned and conformed layer for operational use Medallion architecture term
Gold Curated business-ready layer for downstream consumption Medallion architecture term
Idempotent Safe to run repeatedly without creating inconsistent duplicates Core pipeline-safety concept
Checkpoint Saved processing state used to resume incremental work safely Important in ingestion and workflow behavior
Auto Loader Databricks ingestion feature for incremental file discovery High-value ingestion feature
Unity Catalog Databricks governance layer for catalogs, schemas, tables, permissions, and lineage Central governance model
External table Table whose data lives outside Databricks-managed storage Commonly contrasted with managed tables
Managed table Table whose lifecycle is governed more directly by the metastore and platform Governance and lifecycle distinction
Serverless job Databricks-managed execution path that reduces compute management overhead Operational compute-fit concept

Commonly confused pairs

Pair Keep this distinction clear
transformation vs action plan building versus execution trigger
schema enforcement vs schema evolution reject mismatch versus intentionally update schema
append vs overwrite add new data versus replace existing target content
Delta table vs plain files managed transactional table behavior versus raw file behavior
bronze vs silver raw near-source ingestion versus cleaned and conformed data
managed vs external table more platform-managed table lifecycle versus externally located data path
rerun vs repair run the work again versus recover or rerun failed parts intentionally
shuffle cost vs simple filter/projection wide data movement versus lighter local transformation behavior

If three terms blur together

Cluster Fast separation
transformation / action / job execution build the plan, trigger execution, or observe the resulting run
append / overwrite / merge add rows, replace content, or upsert by matching logic
bronze / silver / gold raw ingest, cleaned operational layer, or curated business layer
managed table / external table / files in storage governed table lifecycle, external data path, or raw files without table semantics
checkpoint / rerun / repair saved state, run again, or recover failed work intentionally

One-sentence memory hooks

  • If the question is about when Spark really starts work, think action.
  • If the question is about safe repeat execution, think idempotent pattern plus checkpoint/state awareness.
  • If the question is about historical Delta state, think time travel.
  • If the question is about governed object boundaries, think Unity Catalog.
  • If the question is about incremental file ingest, think Auto Loader before hand-rolled logic.

Operational clusters worth keeping straight

Cluster What it usually signals on the exam
transformations / actions / shuffles Spark execution and performance behavior
append / overwrite / merge / schema changes Delta write-safety questions
bronze / silver / gold medallion architecture and pipeline-shape questions
workflow / rerun / repair / serverless jobs production operations questions
Unity Catalog / managed tables / external tables governance and storage-boundary questions

If the confusion is really about…

Topic family Best page to revisit
Spark execution and Delta quick rules Cheat Sheet
official Databricks facts and docs Resources
pacing and study order Study Plan
exam framing Guide root
Revised on Sunday, May 10, 2026