Databricks GENAI-ASSOC Cheat Sheet: RAG, Agents, and Evaluation
April 13, 2026
Databricks GENAI-ASSOC cheat sheet for RAG, agents, evaluation, traps, and final review.
On this page
Use this for last-mile review. Keep it open while drilling mixed questions. GENAI-ASSOC usually gets easier when you classify the failure or design choice first:
Design lane: business goal, model task, chain component, agent tool order, or Agent Bricks fit?
Data prep lane: source quality, extraction, chunking, metadata, embeddings, or retrieval metrics?
Development lane: prompt augmentation, framework choice, guardrails, model fit, or Agent Framework pattern?
Use this when the stem mixes task fit, retrieval, grounding, agents, deployment, or governance.
flowchart TD
S["Scenario"] --> T["Clarify the business task"]
T --> D["Choose model task and chain design"]
D --> R["Check chunking, embeddings, and retrieval"]
R --> G["Add governance, guardrails, and permission checks"]
G --> V["Validate with tracing, logging, and evaluation"]
Fast lane picker
If the question is mainly about…
Strongest first lane
the application solves the wrong business task
requirements, model task, or chain design
documents are split badly or context windows overflow
chunking strategy
the right documents are not returned
retrieval quality, filters, embeddings, reranking, or top-k
answers sound fluent but wrong
grounding, retrieval quality, model fit, or guardrails
latency or cost spikes
context length, top-k, serving path, vector search config, or model choice
cross-tenant leakage or unsafe output
metadata filters, governance, prompt-safety controls, masking, and evaluation
agent tooling or multi-step reasoning sounds overbuilt or underbuilt
Agent Bricks, tools, or multi-agent design fit
Design and tool-choice cues
If the question is really about…
Strongest first lane
what the AI pipeline should take in and produce
define inputs and outputs from the business use case
which model task fits the requirement
model task selection
which components belong in the chain
chain design and tool order
whether to use Agent Bricks
choose the Databricks packaged option that matches the problem
multi-stage reasoning or tool usage
define and order tools explicitly
Design traps
Trap
Better reading
starting with a framework before defining the business requirement
requirements first, tools second
using an agent pattern when a simpler chain fits
choose the least complex architecture that satisfies the task
treating Agent Bricks as generic buzzwords
each brick solves a specific type of problem
Chunking and embeddings
Decision
Trade-off
Rule of thumb
chunk size
recall versus precision
big enough for meaning, small enough for focused retrieval
overlap
continuity versus redundancy/cost
use enough overlap to preserve context edges without flooding the index
metadata
filter precision versus ingestion effort
store source, version, tenant, sensitivity, and freshness if those affect retrieval
embedding choice
semantic quality versus cost/latency
choose the embedding path that matches the retrieval task and governance boundary
Chunking traps
Trap
Better reading
chunks are huge so retrieval “has context”
oversized chunks dilute precision and waste context window
chunks are tiny because “more precise is better”
undersized chunks lose meaning and hurt answer quality
metadata is optional
metadata often controls tenant, document version, freshness, and policy boundaries
extraction package choice does not matter
OCR, PDF, HTML, and other source formats need the right extraction path
Retrieval quality
If the problem is mainly…
Strongest first explanation
irrelevant documents appear
weak chunking, weak embeddings, or missing filters
right documents exist but do not surface
top-k, index quality, ranking, or query formulation issue
wrong tenant or version shows up
missing metadata filters or governance boundary
latency is too high
candidate set too large, top-k too large, or unnecessary context packing
Retrieval quick rules
Cue
Fast recall
tenant isolation
metadata filtering and governance boundary
most useful few documents
rank and top-k discipline
query meaning mismatch
reformulation or better retrieval strategy
repeated miss on same content family
source documents may be weak before the model is weak
poor ordering among good candidates
reranking can be the missing step
Development and generation
If the question is mainly about…
Strongest first lane
organizing retrieved evidence into the prompt
prompt assembly with grounded context
unsupported claims in the answer
grounding or retrieval weakness, not just prompt wording
style or output format
prompt instruction layer
model capability versus cost or latency
model selection and experiment signal
framework choice
LangChain or similar tooling fit for the application design
lifecycle tooling for agents
MLflow and Agent Framework
multi-agent use with Genie or conversational APIs
multi-agent pattern and Databricks-specific integration
Generation traps
Trap
Better reading
clever prompting can fix missing source evidence
retrieval and source quality usually dominate
bigger context is always safer
too much weak context can increase noise, cost, and latency
model swap is the first response to every quality issue
first classify whether the miss is retrieval, context packing, or evaluation blind spot
safety issue means only the prompt changed
guardrails and policy controls are separate from prompt wording
Deployment picker
If the question is really about…
Strongest first lane
package a chain with pre- and post-processing
pyfunc model
retrieve from Databricks vector indexes
Vector Search
serve an LLM app on Databricks
model serving or Foundation Model APIs path
register the model or chain in the governed catalog
Unity Catalog plus MLflow registration
store intermediate memory or structured state
persistent datastore choice
batch inference against data
ai_query() where it fits
promote prompts or indexes across environments
CI/CD and prompt lifecycle controls
add tools via managed, external, or custom servers
MCP
Evaluation loop
What to evaluate
Examples
retrieval quality
hit rate, useful-context rate, groundedness support