Use this when the stem mixes grain, SQL logic, Databricks SQL, dashboards, alerts, or governance.
flowchart TD
S["Scenario"] --> G["Check row grain first"]
G --> W["Classify the workflow layer"]
W --> Q["Fix SQL correctness or query analysis"]
Q --> C["Check trust boundary, permissions, and lineage"]
C --> V["Verify with query profile, history, or trusted asset"]
Fastest 10-minute review
If the question says…
Strongest first lane
dashboard number is too high
check row grain and duplicate joins before chart settings
rows disappeared after a LEFT JOIN
inspect filters placed after the join
“latest record per customer”
ROW_NUMBER() with partition and descending order
alert did not fire
inspect source query logic, alert condition, schedule, and destination
query is expensive or slow
warehouse fit, query profile, filter selectivity, and data scanned
federated join question
decide whether the need is on-demand querying versus ingestion first
user trusts Genie output too much or too little
inspect curated datasets, instructions, trusted assets, and governed access
“metric looks wrong”
lineage, freshness, and source-query correctness before dashboard cosmetics
DA-ASSOC review map
Exam area
Use this cheat-sheet section first
Unity Catalog and managed data
governed objects, table types, namespace, and discovery cues
query execution
row-grain checks, join chooser, filter-placement traps, warehouses, and time travel
query analysis
Photon, query history, query profile, caching, and liquid clustering
dashboards, alerts, and Genie
parameter, refresh, alert, trusted-asset, and sharing cues
governance and trusted analytics
permissions, lineage, ownership, certified datasets, and secure-consumption boundaries
Row grain first
Question to ask
Why it matters
what should one row represent?
almost every wrong metric starts here
is the dimension side unique on the chosen key?
non-unique dimensions cause row multiplication
should I aggregate before or after the join?
wrong ordering distorts totals
does the filter belong before or after the join or window?
placement changes semantics
Grain-diagnostic table
Symptom
Strongest first cause
counts doubled after enrichment
non-unique dimension key or many-to-many join
sums inflated after lookup join
repeated dimension rows or wrong grain
dashboard total differs from trusted report
query grain drift before visualization
query uses repeated DISTINCT to “fix” numbers
underlying join or grain problem, not a final fix
Genie answer sounds plausible but metric is off
trusted asset or source query may be tied to the wrong grain
Fast grain traps
Trap
Better reading
adjusting chart settings when the metric is wrong
repair the query grain first
aggregating detail after a many-to-many join
reduce or normalize before the join when needed
blaming warehouse size for obviously wrong totals
correctness comes before performance
Join chooser
Join type
Keeps rows from
Strong use
INNER
both sides with matches
enrichment when a match is required
LEFT
left side always
keep all facts and attach optional dimension data
FULL
both sides always
mismatch or reconciliation review
SEMI
left rows with a match
existence filtering without bringing columns
ANTI
left rows without a match
find missing or unmatched records
Join correctness table
Symptom
Strongest first check
expected unmatched rows are missing
LEFT join plus post-join filter may have become INNER logic
inflated sums after adding a lookup table
dimension uniqueness and join cardinality
output needs “records with no match”
ANTI join or LEFT join plus null check
unexpected nulls in dimension columns
LEFT join behavior may be correct, not broken
same data lives outside Databricks and inside Databricks
federated read may fit analysis, but ingestion may fit repeated production use
Filter-placement traps
Pattern
Why it matters
filter in WHERE after a LEFT join on right-table column
can remove unmatched left rows
filter before the join
can reduce row set early and preserve intended outer-join semantics
filter after aggregation
changes grouped result, not source row set
Window-function rules
Piece
What it really does
Common mistake
PARTITION BY
defines the group
forgetting the grouping boundary
ORDER BY
defines sequence
expecting running logic without order
frame
defines which ordered rows count
relying on defaults without checking fit
Window-function picker
If the question is mainly about…
Strongest first lane
latest row per entity
ROW_NUMBER() with partition and descending sort
running total
window aggregate with ordered frame
ranking with ties
RANK() or DENSE_RANK() based on tie behavior
compare prior or next row
LAG() or LEAD()
compare current data with prior table version
Delta Lake history or time travel, not a window function alone
High-confusion window pairs
Pair
Keep this distinction clear
partition vs order
group boundary versus sequence inside that group
row number vs rank
unique sequence versus tied-rank behavior
grouped aggregate vs window aggregate
one row per group versus value repeated beside detail rows
current-row comparison vs running total
LAG or LEAD logic versus ordered-frame accumulation
Table and storage-type cues
If the question is really about…
Strongest first lane
Databricks-managed storage lifecycle
managed table
data kept outside Databricks-managed storage
external table
precomputed query result for repeated analytics
materialized view
continuously arriving data
streaming table
querying old Delta table versions
time travel
governed sharing to outside systems
Delta Sharing
High-confusion data-object pairs
Pair
Keep this distinction clear
managed table vs external table
Databricks-managed lifecycle versus externally managed storage path
materialized view vs standard view
stored refreshed result versus logical saved query
materialized view vs streaming table
repeated query acceleration versus continuous ingestion pattern
imported data vs federated data
copied into Databricks versus queried in place
CTE, aggregation, and query-shape cues
If the question is really about…
Strongest first lane
query readability or staged transformation
CTE
reduce rows before expensive downstream work
early filter and aggregate discipline
one metric per chosen grain
aggregate at that grain before visualizing
de-duplicating after messy joins
fix the join shape before leaning on DISTINCT
Bad SQL smells
Smell
Better reading
many nested fixes after a wrong join
the grain is probably wrong earlier
grouping and windowing without a clear business grain
stop and define the row first
dashboard-ready query still returns unstable duplicates
lock down metric grain before formatting
Databricks SQL warehouse and query-analysis cues
Requirement
Strongest first lane
analyst-facing SQL and dashboard execution
Databricks SQL warehouse
explain why query is slow
query history, execution details, scanned data, and filter shape
reusable filtered analysis
parameters instead of many copied queries
trusted metric reused across dashboards
stable governed query logic
warehouse seems fine but answer is still wrong
fix SQL logic before tuning compute
Performance first look
Symptom
Inspect first
slow query
filter selectivity, join shape, and data scanned
high cost or long runtime
warehouse fit, repeated scans, and cache opportunity
dashboard looks stale
refresh behavior, upstream data readiness, and query schedule
query returns correct logic slowly
correctness is fine; then inspect warehouse and scan pattern
repeated filter on large tables is still costly
liquid clustering may fit the filter pattern
Dashboard, alert, and Genie picker
If the stem is mainly about…
Strongest first lane
visual presentation of governed metrics
dashboard
threshold-triggered notification
alert
natural-language exploration over curated governed data
Genie
reusable filters or comparable views
parameterized query and dashboard filter design
published analytics outside the workspace
dashboard sharing or embedding
Consumption-boundary table
Feature
Main job
Common miss
dashboard
present trusted metrics visually
expecting visualization to fix wrong SQL
alert
notify when threshold logic is met
forgetting schedule, query result shape, or destination behavior
Genie
natural-language access to curated data
assuming it replaces dataset curation and governance
query history
inspect performance and executions
treating it as the semantic source of truth
Dashboard and Genie traps
Trap
Better reading
changing the chart when the metric is wrong
fix grain and query logic first
blaming Genie when the source model is weak
inspect dataset quality, instructions, trusted assets, and governed access
alert fired on the wrong threshold
validate source query, alert schedule, and condition definition
assuming published dashboards inherit every object permission automatically
restate dashboard sharing mode and object access boundary
Governance and trusted analytics
Boundary
What it really answers
Unity Catalog permissions
who can access which governed object
lineage
where the metric or dataset came from
sharing
who can consume the output
freshness
whether the result reflects current data
ownership
who controls object changes and secure management
Governance traps
Trap
Better reading
user cannot see the result
inspect object permissions and sharing boundary before rewriting SQL
metric looks suspicious
inspect lineage and source-query semantics
data is present but not trusted
freshness, lineage, and documented metric meaning matter
security question sounds like storage only
namespace, roles, sharing settings, and object ownership all matter
Last 15-minute review
Recheck this
Because the miss usually hides here
one-row meaning before every join or aggregate
wrong numbers usually start here
managed, external, materialized, streaming, and shared object differences
Databricks nouns blur under time pressure
warehouse, profile, cache, and liquid-clustering cues
performance questions punish wrong-layer thinking
dashboard, alert, and Genie job boundaries
consumption features are related but not interchangeable
catalog, schema, object, owner, and sharing scope
security questions often hide inside naming and permission boundaries
What strong DA-ASSOC answers usually do
separate governed data discovery, query authoring, query analysis, consumption, and security
keep row grain and trusted-source quality ahead of cosmetic fixes
decide whether the question is asking for copied data, shared data, federated data, or governed internal data
classify dashboard, alert, and Genie behavior before choosing a feature
Recheck this
Because the miss often hides here
one-row-per-what
row-grain mistakes poison everything downstream
join type and filter placement
many wrong answers are logic errors, not syntax gaps
partition, order, and frame
window questions punish blurred boundaries
warehouse, dashboard, alert, and Genie roles
Databricks workflow features are distinct lanes
permissions, lineage, and freshness
trusted analytics is more than valid SQL
What strong DA-ASSOC answers usually do
preserve the correct row grain before joining or aggregating
separate SQL correctness, warehouse performance, dashboard behavior, and governance
treat alerts, lineage, freshness, and sharing as part of trusted analytics
fix the query logic before polishing the visual layer