Databricks DE-PRO Cheat Sheet: Sharing, Governance, and Federation

April 13, 2026

Databricks DE-PRO cheat sheet for sharing, governance, federation, traps, and final review.

On this page

Use this for last-mile review. DE-PRO usually gets easier when you classify the stem first instead of trying to solve everything at once.

Fast lane picker

If the question is mainly about…	Strongest first lane
project packaging, dependencies, tests, or deployment units	chapter 1 or chapter 9
file discovery, message-bus input, or append-only vs streaming ingest	chapter 2
joins, windows, quarantining, or quality rules	chapter 3
data exchange across workspaces or external platforms	chapter 4
what signal to inspect first	chapter 5 or chapter 9
slow workloads, poor pruning, or bad join plans	chapter 6
row visibility, masking, PII protection, or retention	chapter 7
discoverability, metadata, or permission inheritance	chapter 8
repair, parameter overrides, bundles, or CI/CD	chapter 9
partitioning, table shape, medallion fit, or serving design	chapter 10

Production answer rules

If you need to choose between…	Better DE-PRO instinct
fast once vs safe to rerun	safe to rerun
broad reprocessing vs bounded replay	bounded replay
manual notebook repair vs auditable job repair	auditable repair
bigger cluster vs measured bottleneck analysis	measured bottleneck analysis
vague permissions vs specific policy surface	specific policy surface

DE-PRO answer sequence

Use this when the stem mixes packaging, quality, observability, governance, or recovery.

    flowchart TD
	  S["Scenario"] --> L["Find the main lane"]
	  L --> P["Package, ingest, transform, govern, observe, or repair?"]
	  P --> F["Choose the narrowest Databricks feature that fits"]
	  F --> R["Check logs, event logs, query profile, or Spark UI"]
	  R --> V["Verify rerun, recovery, or promotion behavior"]

Monitoring and debugging signal map

Need	Better first signal
pipeline lifecycle, quality, and declarative run state	event log
query-level bottlenecks, joins, skew, or pruning	query profile
account or workspace cost, audit, and workload telemetry	system tables
low-level stage or task behavior	Spark UI
failed-run remediation path	Jobs UI, repair state, logs, and parameter overrides

Performance triage table

Symptom	Likely cause	Better first action
one task runs much longer than peers	skew	inspect hot keys and shuffle distribution
scans read far too much data	weak pruning or bad layout	inspect clustering, partitioning, and filter selectivity
too many tiny files	write pattern or over-partitioning	compact and rethink layout, not just cluster size
repeated high-cost reprocessing	weak incremental or replay design	tighten boundaries and use targeted reprocessing
poor merge or update performance	table layout and file behavior	inspect clustering, pruning, and change pattern first

If the question is about…	Keep this boundary clear
row filters	who can see which records
column masks	how sensitive values are transformed or hidden
ACLs or workspace permissions	who can access objects or actions
Delta Sharing	how live data is exposed to another Databricks deployment or external platform
Lakehouse Federation	querying external systems through governed access
Unity Catalog inheritance	how permissions flow from higher objects to lower objects

High-confusion pairs

Pair	Keep this distinction clear
Lakeflow Declarative Pipelines vs Lakeflow Jobs	declarative pipeline logic vs orchestration and run control
checkpoint vs watermark	recoverability state vs lateness boundary
event log vs system tables	pipeline lifecycle record vs broader platform telemetry
Delta Sharing vs Lakehouse Federation	governed data exchange vs governed access to external source systems
row filter vs column mask	hide rows vs transform or hide values
repair run vs retry	targeted rerun after diagnosis vs automatic repeat attempt
liquid clustering vs partitioning	flexible layout strategy vs hard physical split
Databricks Asset Bundles vs Git folders	deployment package and targets vs workspace source integration

Last 15-minute recheck

Recheck this	Because the miss often hides here
package structure, dependencies, and target config	deployment questions break here first
append-only vs streaming ingest boundary	ingestion stems often hinge on this choice
expectations, quarantine, and bad-data visibility	quality questions reward explicit handling
event logs, system tables, and query profile	observability questions punish guessing
liquid clustering, pruning, and shuffle evidence	performance questions punish “add compute” reflexes
row filters, masks, sharing mode, and inheritance	governance questions reward precise boundaries

One-sentence memory hooks

If replay safety matters, think idempotent boundary + targeted rerun, not “reprocess everything.”
If the workload is slow, think signal first, not “bigger cluster first.”
If the question mentions data exchange, separate sharing from federation.
If the question mentions PII, separate masking, filtering, anonymization, and retention.
If the question mentions deployment, think bundle targets, environment config, and auditable promotion.

Revised on Monday, June 15, 2026

Study Plan

Sample Questions

Browse Databricks Certification Guides