Databricks DA-ASSOC Cheat Sheet: Unity Catalog, Dashboards, and SQL

April 13, 2026

Databricks DA-ASSOC cheat sheet for Unity Catalog, dashboards, SQL, traps, and final review.

On this page

Use this for last-mile review. DA-ASSOC is usually easiest when you decide, in order:

Row grain: what should one row represent before any join or aggregate?
Workflow layer: Unity Catalog, import path, SQL Warehouse, dashboard, alert, Genie, or secure-sharing issue?
SQL correctness: join, filter, null-handling, grouping, time-travel, or window-frame issue?
Trust boundary: permissions, lineage, freshness, certified-dataset quality, or trusted-asset mismatch?

Workflow in one screen

    flowchart TD
	  Source["Source or Imported Data"] --> Catalog["Unity Catalog + Certified Dataset Discovery"]
	  Catalog --> Query["Databricks SQL Query Logic"]
	  Query --> Analyze["Warehouse, Query History, Profile, Cache"]
	  Analyze --> Consume["Dashboard, Alert, or Genie Consumption"]
	  Govern["Permissions, Sharing, Lineage, Ownership"] -. protects .-> Consume

DA-ASSOC answer sequence

Use this when the stem mixes grain, SQL logic, Databricks SQL, dashboards, alerts, or governance.

    flowchart TD
	  S["Scenario"] --> G["Check row grain first"]
	  G --> W["Classify the workflow layer"]
	  W --> Q["Fix SQL correctness or query analysis"]
	  Q --> C["Check trust boundary, permissions, and lineage"]
	  C --> V["Verify with query profile, history, or trusted asset"]

Fastest 10-minute review

If the question says…	Strongest first lane
dashboard number is too high	check row grain and duplicate joins before chart settings
rows disappeared after a `LEFT JOIN`	inspect filters placed after the join
“latest record per customer”	`ROW_NUMBER()` with partition and descending order
alert did not fire	inspect source query logic, alert condition, schedule, and destination
query is expensive or slow	warehouse fit, query profile, filter selectivity, and data scanned
federated join question	decide whether the need is on-demand querying versus ingestion first
user trusts Genie output too much or too little	inspect curated datasets, instructions, trusted assets, and governed access
“metric looks wrong”	lineage, freshness, and source-query correctness before dashboard cosmetics

DA-ASSOC review map

Exam area	Use this cheat-sheet section first
Unity Catalog and managed data	governed objects, table types, namespace, and discovery cues
query execution	row-grain checks, join chooser, filter-placement traps, warehouses, and time travel
query analysis	Photon, query history, query profile, caching, and liquid clustering
dashboards, alerts, and Genie	parameter, refresh, alert, trusted-asset, and sharing cues
governance and trusted analytics	permissions, lineage, ownership, certified datasets, and secure-consumption boundaries

Row grain first

Question to ask	Why it matters
what should one row represent?	almost every wrong metric starts here
is the dimension side unique on the chosen key?	non-unique dimensions cause row multiplication
should I aggregate before or after the join?	wrong ordering distorts totals
does the filter belong before or after the join or window?	placement changes semantics

Grain-diagnostic table

Symptom	Strongest first cause
counts doubled after enrichment	non-unique dimension key or many-to-many join
sums inflated after lookup join	repeated dimension rows or wrong grain
dashboard total differs from trusted report	query grain drift before visualization
query uses repeated `DISTINCT` to “fix” numbers	underlying join or grain problem, not a final fix
Genie answer sounds plausible but metric is off	trusted asset or source query may be tied to the wrong grain

Fast grain traps

Trap	Better reading
adjusting chart settings when the metric is wrong	repair the query grain first
aggregating detail after a many-to-many join	reduce or normalize before the join when needed
blaming warehouse size for obviously wrong totals	correctness comes before performance

Join chooser

Join type	Keeps rows from	Strong use
INNER	both sides with matches	enrichment when a match is required
LEFT	left side always	keep all facts and attach optional dimension data
FULL	both sides always	mismatch or reconciliation review
SEMI	left rows with a match	existence filtering without bringing columns
ANTI	left rows without a match	find missing or unmatched records

Join correctness table

Symptom	Strongest first check
expected unmatched rows are missing	LEFT join plus post-join filter may have become INNER logic
inflated sums after adding a lookup table	dimension uniqueness and join cardinality
output needs “records with no match”	ANTI join or LEFT join plus null check
unexpected nulls in dimension columns	LEFT join behavior may be correct, not broken
same data lives outside Databricks and inside Databricks	federated read may fit analysis, but ingestion may fit repeated production use

Filter-placement traps

Pattern	Why it matters
filter in `WHERE` after a LEFT join on right-table column	can remove unmatched left rows
filter before the join	can reduce row set early and preserve intended outer-join semantics
filter after aggregation	changes grouped result, not source row set

Window-function rules

Piece	What it really does	Common mistake
`PARTITION BY`	defines the group	forgetting the grouping boundary
`ORDER BY`	defines sequence	expecting running logic without order
frame	defines which ordered rows count	relying on defaults without checking fit

Window-function picker

If the question is mainly about…	Strongest first lane
latest row per entity	`ROW_NUMBER()` with partition and descending sort
running total	window aggregate with ordered frame
ranking with ties	`RANK()` or `DENSE_RANK()` based on tie behavior
compare prior or next row	`LAG()` or `LEAD()`
compare current data with prior table version	Delta Lake history or time travel, not a window function alone

High-confusion window pairs

Pair	Keep this distinction clear
partition vs order	group boundary versus sequence inside that group
row number vs rank	unique sequence versus tied-rank behavior
grouped aggregate vs window aggregate	one row per group versus value repeated beside detail rows
current-row comparison vs running total	`LAG` or `LEAD` logic versus ordered-frame accumulation

Table and storage-type cues

If the question is really about…	Strongest first lane
Databricks-managed storage lifecycle	managed table
data kept outside Databricks-managed storage	external table
precomputed query result for repeated analytics	materialized view
continuously arriving data	streaming table
querying old Delta table versions	time travel
governed sharing to outside systems	Delta Sharing

High-confusion data-object pairs

Pair	Keep this distinction clear
managed table vs external table	Databricks-managed lifecycle versus externally managed storage path
materialized view vs standard view	stored refreshed result versus logical saved query
materialized view vs streaming table	repeated query acceleration versus continuous ingestion pattern
imported data vs federated data	copied into Databricks versus queried in place

CTE, aggregation, and query-shape cues

If the question is really about…	Strongest first lane
query readability or staged transformation	CTE
reduce rows before expensive downstream work	early filter and aggregate discipline
one metric per chosen grain	aggregate at that grain before visualizing
de-duplicating after messy joins	fix the join shape before leaning on `DISTINCT`

Bad SQL smells

Smell	Better reading
many nested fixes after a wrong join	the grain is probably wrong earlier
grouping and windowing without a clear business grain	stop and define the row first
dashboard-ready query still returns unstable duplicates	lock down metric grain before formatting

Databricks SQL warehouse and query-analysis cues

Requirement	Strongest first lane
analyst-facing SQL and dashboard execution	Databricks SQL warehouse
explain why query is slow	query history, execution details, scanned data, and filter shape
reusable filtered analysis	parameters instead of many copied queries
trusted metric reused across dashboards	stable governed query logic
warehouse seems fine but answer is still wrong	fix SQL logic before tuning compute

Performance first look

Symptom	Inspect first
slow query	filter selectivity, join shape, and data scanned
high cost or long runtime	warehouse fit, repeated scans, and cache opportunity
dashboard looks stale	refresh behavior, upstream data readiness, and query schedule
query returns correct logic slowly	correctness is fine; then inspect warehouse and scan pattern
repeated filter on large tables is still costly	liquid clustering may fit the filter pattern

Dashboard, alert, and Genie picker

If the stem is mainly about…	Strongest first lane
visual presentation of governed metrics	dashboard
threshold-triggered notification	alert
natural-language exploration over curated governed data	Genie
reusable filters or comparable views	parameterized query and dashboard filter design
published analytics outside the workspace	dashboard sharing or embedding

Consumption-boundary table

Feature	Main job	Common miss
dashboard	present trusted metrics visually	expecting visualization to fix wrong SQL
alert	notify when threshold logic is met	forgetting schedule, query result shape, or destination behavior
Genie	natural-language access to curated data	assuming it replaces dataset curation and governance
query history	inspect performance and executions	treating it as the semantic source of truth

Dashboard and Genie traps

Trap	Better reading
changing the chart when the metric is wrong	fix grain and query logic first
blaming Genie when the source model is weak	inspect dataset quality, instructions, trusted assets, and governed access
alert fired on the wrong threshold	validate source query, alert schedule, and condition definition
assuming published dashboards inherit every object permission automatically	restate dashboard sharing mode and object access boundary

Governance and trusted analytics

Boundary	What it really answers
Unity Catalog permissions	who can access which governed object
lineage	where the metric or dataset came from
sharing	who can consume the output
freshness	whether the result reflects current data
ownership	who controls object changes and secure management

Governance traps

Trap	Better reading
user cannot see the result	inspect object permissions and sharing boundary before rewriting SQL
metric looks suspicious	inspect lineage and source-query semantics
data is present but not trusted	freshness, lineage, and documented metric meaning matter
security question sounds like storage only	namespace, roles, sharing settings, and object ownership all matter

Last 15-minute review

Recheck this	Because the miss usually hides here
one-row meaning before every join or aggregate	wrong numbers usually start here
managed, external, materialized, streaming, and shared object differences	Databricks nouns blur under time pressure
warehouse, profile, cache, and liquid-clustering cues	performance questions punish wrong-layer thinking
dashboard, alert, and Genie job boundaries	consumption features are related but not interchangeable
catalog, schema, object, owner, and sharing scope	security questions often hide inside naming and permission boundaries

What strong DA-ASSOC answers usually do

separate governed data discovery, query authoring, query analysis, consumption, and security
keep row grain and trusted-source quality ahead of cosmetic fixes
decide whether the question is asking for copied data, shared data, federated data, or governed internal data

classify dashboard, alert, and Genie behavior before choosing a feature

Recheck this	Because the miss often hides here
one-row-per-what	row-grain mistakes poison everything downstream
join type and filter placement	many wrong answers are logic errors, not syntax gaps
partition, order, and frame	window questions punish blurred boundaries
warehouse, dashboard, alert, and Genie roles	Databricks workflow features are distinct lanes
permissions, lineage, and freshness	trusted analytics is more than valid SQL

What strong DA-ASSOC answers usually do

preserve the correct row grain before joining or aggregating
separate SQL correctness, warehouse performance, dashboard behavior, and governance
treat alerts, lineage, freshness, and sharing as part of trusted analytics
fix the query logic before polishing the visual layer

Revised on Monday, June 15, 2026

Study Plan

Sample Questions

Browse Databricks Certification Guides

Databricks DA-ASSOC Cheat Sheet: Unity Catalog, Dashboards, and SQL

Workflow in one screen

DA-ASSOC answer sequence

Fastest 10-minute review

DA-ASSOC review map

Row grain first

Grain-diagnostic table

Fast grain traps

Join chooser

Join correctness table

Filter-placement traps

Window-function rules

Window-function picker

High-confusion window pairs

Table and storage-type cues

High-confusion data-object pairs

CTE, aggregation, and query-shape cues

Bad SQL smells

Databricks SQL warehouse and query-analysis cues

Performance first look

Dashboard, alert, and Genie picker

Consumption-boundary table

Dashboard and Genie traps

Governance and trusted analytics

Governance traps

Last 15-minute review

What strong DA-ASSOC answers usually do

What strong DA-ASSOC answers usually do