Study DEA-C01 Choosing Data Stores for Access Patterns: key concepts, common traps, and exam decision cues.
Store choice is one of the highest-yield DEA-C01 decision types. The right store depends on access pattern, latency, schema flexibility, cost, and the difference between landing, serving, and warehouse-style analytics.
Access pattern: The way applications or analysts read, write, filter, update, or aggregate data in practice.
Lake storage: Durable object storage that keeps raw or curated files for broad analytical use.
Serving store: Store optimized for the way an application or workload consumes data, such as low-latency key lookups or relational transactions.
AWS wants you to separate:
| Need | Strongest first fit | Why |
|---|---|---|
| object-based lake storage | Amazon S3 | The requirement is durable object storage for raw or curated data |
| warehouse analytics | Amazon Redshift | DEA-C01 expects a warehouse answer for large analytical SQL workloads |
| managed relational pattern | Amazon RDS or Aurora | The need is relational structure, transactions, or familiar SQL application behavior |
| key-value / low-latency access | Amazon DynamoDB | The access pattern is high-scale lookup, not relational join logic |
| search-oriented text retrieval | Amazon OpenSearch Service | The need is search and indexing rather than warehouse analytics |
| If the stem emphasizes… | Think first | Why this fits |
|---|---|---|
| raw or curated files, broad durability, low-cost object storage | S3 | This is lake storage, not necessarily the serving layer |
| complex SQL over modeled analytical data | Redshift | The workload is warehouse analytics |
| transactional rows, joins, and application consistency | RDS or Aurora | This is a relational serving workload |
| high-scale key lookups or access by primary key | DynamoDB | The workload is low-latency key-value access |
| text search, ranked results, or indexing | OpenSearch | This is search, not warehouse or OLTP access |
flowchart LR
A["Data workload"] --> B{"How is data consumed?"}
B -->|Files and lake-style storage| C["S3"]
B -->|Warehouse SQL analytics| D["Redshift"]
B -->|Relational transactions and joins| E["RDS / Aurora"]
B -->|Low-latency key access| F["DynamoDB"]
B -->|Search and indexing| G["OpenSearch"]
| Question | Better reading |
|---|---|
| “Will this mostly be queried as files in a lake?” | Think S3-based storage plus the right query layer |
| “Do analysts need warehouse-style SQL performance on modeled data?” | Think Redshift |
| “Does the application need relational consistency and joins?” | Think RDS or Aurora |
| “Is the dominant need millisecond key-value access at scale?” | Think DynamoDB |
| “Do users need full-text or indexed search behavior?” | Think OpenSearch instead of forcing a warehouse to do search work |
| Role | Typical first thought |
|---|---|
| raw landing zone | S3 |
| application serving store | DynamoDB or RDS/Aurora depending on the access pattern |
| analytical serving store | Redshift or a lake query pattern |
DEA-C01 often rewards answers that keep these roles separate instead of forcing a single store to handle every access pattern badly.
| Trap | Better reading |
|---|---|
| “S3 is enough for every workload.” | S3 is foundational lake storage, not always the final serving store. |
| “Redshift is just another database.” | DEA-C01 expects you to read it as a warehouse analytics engine. |
| “DynamoDB is a relational replacement by default.” | DynamoDB fits key-value and high-scale access patterns, not every relational workload. |
| “RDS and Aurora are the same as analytical warehouses.” | They solve transactional relational needs, not the same warehouse-style problem as Redshift. |
When several stores could technically work, choose in this order:
A company lands raw clickstream files in a lake, serves user-profile lookups to an application with low latency, and also needs modeled analytical SQL for business reporting. What is the strongest reading first?
Correct answer: B. DEA-C01 expects you to match different stores to different access patterns instead of forcing one store to solve unrelated workloads.