AWS DEA-C01 Glossary: Lakehouse, Catalog, and Ingestion Terms

March 30, 2026

AWS DEA-C01 glossary of lakehouse, catalog, ingestion, transformation, and governance terms.

On this page

Use this glossary when streaming, lake, catalog, and warehouse terms start to blur together. Keep it beside the cheat sheet and resources instead of using it as a substitute for real service trade-off study.

Term	Short meaning
Data lake	Centralized storage pattern that keeps raw and curated data available for many engines and consumers
Data warehouse	Analytics platform optimized for structured SQL reporting and performance-tuned query workloads
CDC	Change data capture, where inserts, updates, and deletes are emitted as downstream change events
Backfill	Reprocessing historical data to fill gaps or rebuild downstream tables
Schema evolution	Controlled change in table or event structure over time
Partition pruning	Query engine reading only the partitions needed for a specific filter
Glue Data Catalog	AWS metadata catalog used by services such as Athena, Glue, and Redshift Spectrum
Crawler	Glue process that discovers schema and partitions from source data
Job bookmark	Glue tracking mechanism that helps avoid reprocessing already handled data
Checkpoint	Persisted processing position used to resume or replay safely
Lake Formation	AWS governance layer for permissions and controls on S3-backed data lakes
Dimensional model	Analytics modeling pattern built around facts and dimensions for reporting
UNLOAD	Redshift command that exports query results or table data to Amazon S3
TTL	Time to live setting that lets DynamoDB expire items automatically after a validity window
Skew	Uneven data distribution that makes one worker or partition handle far more work than others
Lineage	Record of where data came from and how it changed across the pipeline
Least privilege	Granting only the actions and resource scope a workload really needs

Commonly confused pairs

Pair	Keep this distinction clear
Athena vs Redshift	serverless SQL on S3 versus managed warehouse for broader analytical workloads
crawler vs explicit schema	automatic discovery versus manual metadata control
checkpoint vs bookmark	generic stream or pipeline progress marker versus Glue-specific processed-state tracking
CDC vs full load	incremental source changes versus complete dataset copy
governance vs encryption	access and audit control versus protection of data at rest or in transit
Lake Formation vs IAM	governed lake-data permissions versus baseline AWS identity and service permissions
Glue vs EMR	managed/serverless ETL path versus cluster-control big-data processing
masking vs encryption	obfuscating exposed values versus cryptographically protecting stored or transmitted data
EventBridge vs Step Functions	triggering or routing events versus coordinating multi-step workflow logic
versioning vs lifecycle	object-history recovery versus age-based storage-tiering or expiration rules

If three terms blur together

Blur cluster	Keep this separation clear
Athena / Redshift / QuickSight	query engine on S3 / warehouse analytics engine / BI presentation layer
EventBridge / Step Functions / SNS	trigger / orchestrator / notification fan-out
KMS / Lake Formation / IAM	key management / governed lake access / baseline AWS permissions
crawler / catalog / business catalog	discovery mechanism / technical metadata layer / ownership-lineage-governance context

If the confusion is really about…

Topic family	Best page to revisit
service fit and high-yield trade-offs	Cheat Sheet
current AWS facts and primary docs	Resources
pacing and review order	Study Plan
overall exam framing	Guide root

Revised on Monday, June 15, 2026

FAQ

Browse AWS Certification Guides

AWS DEA-C01 Glossary: Lakehouse, Catalog, and Ingestion Terms

Commonly confused pairs

If three terms blur together

If the confusion is really about…