Google Cloud PDE glossary of pipelines, storage, modeling, governance, and analytics terms.
Use this glossary when Google Cloud Professional Data Engineer (PDE) terms start to blur together. The goal is practical recognition, not encyclopedia coverage.
| Term | Exam meaning |
|---|---|
| BigQuery | Serverless analytical warehouse for SQL analytics at scale. |
| Pub/Sub | Messaging service for event ingestion and decoupling. |
| Dataflow | Managed stream and batch processing based on Apache Beam. |
| Dataproc | Managed Spark and Hadoop service. |
| Partitioning | Data layout technique that prunes scanned data by partition keys. |
| Lineage | Record of where data came from and how it changed. |
| Pair | How to separate them |
|---|---|
| Data ingestion vs Storage and modeling | Ask which layer the scenario is testing, then match the answer to that layer only. |
| Control vs evidence | A control changes behavior; evidence proves behavior or supports investigation. |
| Managed service vs custom build | Managed services win for lower operational effort unless the requirement needs unsupported customization. |
| Prevention vs detection | Prevention blocks or reduces a bad event; detection finds or reports that it happened. |
Do not memorize terms in isolation. For each term, write one scenario where it is the best answer, one scenario where it is a distractor, and one signal that proves it worked.