Study Databricks DE-PRO PII and Retention: key concepts, common traps, and exam decision cues.
On this page
Compliance questions are really workflow questions. Databricks wants privacy controls that still make operational sense across batch, streaming, and serving layers.
Privacy-choice map
Requirement
Better first instinct
preserve analytical usefulness while hiding direct identifiers
pseudonymization
reduce re-identification risk more aggressively
anonymization
ensure old data is removed according to policy
retention-aware purging workflow
enforce privacy in active pipelines
integrate masking and privacy logic into the pipeline design
Start with privacy goal, then operational consequence
If the business needs…
Stronger first answer
hidden identifiers but still useful joins
pseudonymization
stronger reduction of re-identification risk
anonymization
data removed after a policy window
retention-aware purge workflow
privacy enforced during ongoing processing
pipeline-integrated controls
What the exam is really testing
If the stem says…
Strong reading
“hashing, tokenization, suppression, or generalization”
this is a privacy-method selection question
“PII in silver and gold”
privacy controls must fit downstream sharing and analytics use
“retention policies”
the answer needs a real purging design, not just a statement of intent
Why retention is operational
Retention questions are not solved by saying “follow policy.” They require a design that actually removes or expires sensitive data on time and does not leave old copies lingering indefinitely in downstream layers.
Common traps
Trap
Better rule
treating masking as the same thing as anonymization
they solve different privacy goals
ignoring retention policy after data lands
compliance includes purge behavior
choosing a privacy method without considering downstream use
protection and utility both matter
Scenario triage
Scenario clue
Stronger answer shape
“hide identifiers but preserve analytical linkage”
pseudonymization
“reduce re-identification risk as strongly as possible”
anonymization
“old sensitive data must be removed after a fixed period”
purge workflow
“privacy controls must work in active pipelines”
integrate privacy logic into pipeline design
Decision order that usually wins
Privacy questions usually hinge on whether the business still needs linkage utility. If analytics still needs joins but direct identifiers should be hidden, think pseudonymization. If old sensitive data must actually disappear, you need a real retention-enforcement and purge workflow, not just access control. DE-PRO usually rewards operational privacy controls instead of cosmetic policy answers.