Databricks DE-PRO Glossary: Sharing, Governance, and Federation Terms

Databricks DE-PRO glossary of ingestion, transformation, monitoring, sharing, and governance terms.

Use this glossary when Lakeflow, Delta, Unity Catalog, observability, and deployment terms start to blur together. Keep it beside the cheat sheet and resources, not in place of scenario practice.

High-yield terms

Term Short meaning Why it matters on DE-PRO
Lakeflow Declarative Pipelines Databricks managed declarative pipeline layer for batch and streaming ETL current pipeline terminology for a major production lane
Lakeflow Jobs Databricks orchestration layer for scheduled or triggered runs core repair, notification, and operational-control term
DLT older Delta Live Tables name that still appears in legacy material you need to map it to the current Lakeflow wording
Auto Loader Databricks file-discovery ingestion mechanism common source-ingestion decision term
Event log pipeline lifecycle and quality event record core declarative-pipeline monitoring term
System tables Databricks telemetry tables for audit, cost, and workload visibility core observability and governance signal source
Query profile execution analysis view for a query or workload important performance-diagnosis term
Deletion vector Delta optimization feature that helps avoid expensive file rewrites in some update or delete patterns common performance and maintenance tie-break term
Liquid clustering flexible Delta layout strategy that improves pruning without rigid partition plans common layout and optimization term
Row filter policy that limits which records a user can see key data-security boundary term
Column mask policy that transforms or hides sensitive values key sensitive-data control term
Delta Sharing governed data sharing from Databricks to Databricks or external consumers key sharing term
Lakehouse Federation governed access to external systems through Databricks key federation term
Asset Bundle Databricks packaging and deployment structure for resources and code core CI/CD and promotion term
Parameter override run-time value change used when rerunning or repairing a job key debugging and operational-control term
Repair run targeted rerun of failed work rather than full reprocessing low-blast-radius recovery term
Pseudonymization replacing direct identifiers with alternative values while retaining useful structure common compliance term
Anonymization transforming data so individuals cannot reasonably be re-identified stronger privacy term than masking alone
Medallion architecture bronze, silver, and gold layering pattern for raw, refined, and serving-ready data common modeling and pipeline-design term
Managed table Databricks-managed Delta table with storage and governance handled through the platform key maintenance and governance choice

Commonly confused pairs

Pair Keep this distinction clear
Lakeflow Declarative Pipelines vs Lakeflow Jobs pipeline logic versus orchestration and run control
event log vs system tables pipeline-specific lifecycle detail versus broader telemetry and audit data
Delta Sharing vs Lakehouse Federation governed sharing out of Databricks versus governed access into external systems
row filter vs column mask hide rows versus transform or hide values
repair run vs retry deliberate rerun after diagnosis versus automatic repeat attempt
liquid clustering vs partitioning flexible layout strategy versus hard physical split
Asset Bundles vs Git folders deployment package versus workspace source integration
pseudonymization vs anonymization reversible or still-linkable transformation versus stronger de-identification target

If three terms blur together

Cluster Fast separation
checkpoint / watermark / trigger recovery state, lateness boundary, and work cadence
event log / Spark UI / query profile pipeline lifecycle, low-level execution, and workload bottleneck analysis
ACL / row filter / column mask object access, row visibility, and value protection
Delta Sharing / Lakehouse Federation / Unity Catalog inheritance data exchange, external querying, and permission flow inside Databricks
bundle target / parameter override / repair run environment promotion, run-time change, and bounded rerun

One-sentence memory hooks

  • If the question is about operating a pipeline, separate logic, orchestration, and observability.
  • If the question is about data access, separate permissions, filters, masks, sharing, and federation.
  • If the question is about performance, separate layout, pruning, joins, and shuffle before touching compute size.
  • If the question is about compliance, separate masking, pseudonymization, anonymization, and retention.
Revised on Sunday, May 10, 2026