Databricks DE-PRO Cheat Sheet: Sharing, Governance, and Federation
April 13, 2026
Databricks DE-PRO cheat sheet for sharing, governance, federation, traps, and final review.
On this page
Use this for last-mile review. DE-PRO usually gets easier when you classify the stem first instead of trying to solve everything at once.
Fast lane picker
If the question is mainly about…
Strongest first lane
project packaging, dependencies, tests, or deployment units
chapter 1 or chapter 9
file discovery, message-bus input, or append-only vs streaming ingest
chapter 2
joins, windows, quarantining, or quality rules
chapter 3
data exchange across workspaces or external platforms
chapter 4
what signal to inspect first
chapter 5 or chapter 9
slow workloads, poor pruning, or bad join plans
chapter 6
row visibility, masking, PII protection, or retention
chapter 7
discoverability, metadata, or permission inheritance
chapter 8
repair, parameter overrides, bundles, or CI/CD
chapter 9
partitioning, table shape, medallion fit, or serving design
chapter 10
Production answer rules
If you need to choose between…
Better DE-PRO instinct
fast once vs safe to rerun
safe to rerun
broad reprocessing vs bounded replay
bounded replay
manual notebook repair vs auditable job repair
auditable repair
bigger cluster vs measured bottleneck analysis
measured bottleneck analysis
vague permissions vs specific policy surface
specific policy surface
DE-PRO answer sequence
Use this when the stem mixes packaging, quality, observability, governance, or recovery.
flowchart TD
S["Scenario"] --> L["Find the main lane"]
L --> P["Package, ingest, transform, govern, observe, or repair?"]
P --> F["Choose the narrowest Databricks feature that fits"]
F --> R["Check logs, event logs, query profile, or Spark UI"]
R --> V["Verify rerun, recovery, or promotion behavior"]
Monitoring and debugging signal map
Need
Better first signal
pipeline lifecycle, quality, and declarative run state
event log
query-level bottlenecks, joins, skew, or pruning
query profile
account or workspace cost, audit, and workload telemetry
system tables
low-level stage or task behavior
Spark UI
failed-run remediation path
Jobs UI, repair state, logs, and parameter overrides
Performance triage table
Symptom
Likely cause
Better first action
one task runs much longer than peers
skew
inspect hot keys and shuffle distribution
scans read far too much data
weak pruning or bad layout
inspect clustering, partitioning, and filter selectivity
too many tiny files
write pattern or over-partitioning
compact and rethink layout, not just cluster size
repeated high-cost reprocessing
weak incremental or replay design
tighten boundaries and use targeted reprocessing
poor merge or update performance
table layout and file behavior
inspect clustering, pruning, and change pattern first
Security, governance, and sharing boundaries
If the question is about…
Keep this boundary clear
row filters
who can see which records
column masks
how sensitive values are transformed or hidden
ACLs or workspace permissions
who can access objects or actions
Delta Sharing
how live data is exposed to another Databricks deployment or external platform
Lakehouse Federation
querying external systems through governed access
Unity Catalog inheritance
how permissions flow from higher objects to lower objects
High-confusion pairs
Pair
Keep this distinction clear
Lakeflow Declarative Pipelines vs Lakeflow Jobs
declarative pipeline logic vs orchestration and run control
checkpoint vs watermark
recoverability state vs lateness boundary
event log vs system tables
pipeline lifecycle record vs broader platform telemetry
Delta Sharing vs Lakehouse Federation
governed data exchange vs governed access to external source systems
row filter vs column mask
hide rows vs transform or hide values
repair run vs retry
targeted rerun after diagnosis vs automatic repeat attempt
liquid clustering vs partitioning
flexible layout strategy vs hard physical split
Databricks Asset Bundles vs Git folders
deployment package and targets vs workspace source integration
Last 15-minute recheck
Recheck this
Because the miss often hides here
package structure, dependencies, and target config