Last-mile DE-PRO review: incremental pipelines, Structured Streaming (watermarks/checkpoints), Delta Live Tables concepts, performance tuning pickers (shuffle/skew/file layout), and production troubleshooting heuristics.
Use this for last‑mile review. Pair it with the Resources for coverage and IT Mastery to harden production instincts.
CDC: Change data capture, where upstream inserts, updates, and deletes are represented as change events for downstream processing.
Checkpoint: Streaming state and progress location used to recover correctly after restart.
Watermark: Streaming rule that bounds how late data can arrive before the engine stops updating old state.
DLT: Delta Live Tables, Databricks framework for declarative, managed data pipelines.
If two answers “work,” choose the one that is:
MERGE (CDC)1MERGE INTO silver t
2USING cdc s
3ON t.id = s.id
4WHEN MATCHED AND s.op = 'D' THEN DELETE
5WHEN MATCHED THEN UPDATE SET *
6WHEN NOT MATCHED THEN INSERT *;
Rules of thumb
ON condition is unique on the source side.| Concept | Why it matters | Failure mode |
|---|---|---|
| Checkpoint | enables exactly-once style recovery for sinks | deleting/moving checkpoint breaks correctness |
| Watermark | bounds state and handles late data | missing watermark → unbounded state |
1(df
2 .withWatermark("event_time", "10 minutes")
3 .writeStream
4 .format("delta")
5 .option("checkpointLocation", "/chk/orders")
6 .outputMode("append")
7 .start("/delta/silver/orders"))
Exam cues
DLT is about declarative pipelines with built-in operational structure:
flowchart LR
BR["Bronze (ingest)"] --> SI["Silver (clean + dedupe)"]
SI --> GO["Gold (metrics)"]
Operator mindset: treat expectations as guardrails; don’t silently pass bad data downstream.
| Symptom | Likely cause | Safe next step |
|---|---|---|
| Slow joins/aggregations | heavy shuffle | reduce data early; pick join strategy; tune partitions |
| One task runs forever | data skew | handle hot keys; split/skew hints (concept-level) |
| Lots of tiny files | write pattern | compaction/OPTIMIZE (concept-level) |