Study DEA-C01 Ingestion Patterns, Sources and Triggers: key concepts, common traps, and exam decision cues.
Most DEA-C01 pipeline questions start with the ingestion pattern. The exam is usually less about which service you remember and more about whether you can tell if the workload is batch, streaming, CDC, file drop, or API-driven ingestion.
CDC: Change data capture, where inserts, updates, and deletes are captured from a source system as changes happen.
DEA-C01 often hides the right ingestion answer behind target-store noise. The strongest first move is to classify the source behavior before you think about Glue, Redshift, Spark, or dashboards.
| Need | Strongest first fit |
|---|---|
| durable high-volume event stream | Amazon Kinesis or Amazon MSK pattern |
| scheduled or file-based bulk load | S3 landing plus scheduled processing |
| database changes over time | CDC with services such as DMS |
| application-driven request/response data movement | API or service integration pattern |
| If the stem emphasizes… | Think first | Why this fits |
|---|---|---|
| nightly drops, periodic loads, or low-cost delayed processing | Batch ingestion | Time-windowed data movement is the center of gravity. |
| ordered events, replay, or near-real-time consumers | Streaming ingestion | Event stream semantics matter more than the final store. |
| inserts, updates, and deletes from a source database | CDC | The workload is about tracking changes, not moving full copies. |
| “when a file lands, start processing” | Event-triggered ingestion | The trigger path matters as much as the load itself. |
| one system calling another to submit data | API-driven ingestion | The pattern is service integration, not passive file landing. |
| Situation | Stronger first answer |
|---|---|
| scheduled overnight file movement | batch ingestion |
| continuous ordered event flow with replay needs | streaming ingestion |
| changed rows from a relational source | CDC |
| service submits records one request at a time | API-driven ingestion |
| new object arrival starts downstream work | event-triggered ingestion |
flowchart LR
A["Source system"] --> B{"What actually arrives?"}
B -->|Periodic files or exports| C["Batch ingestion"]
B -->|Continuous event stream| D["Streaming ingestion"]
B -->|DB changes only| E["CDC"]
B -->|Request/response submissions| F["API-driven ingestion"]
C --> G["S3 landing and scheduled processing"]
D --> H["Kinesis or MSK pattern"]
E --> I["DMS or CDC pipeline"]
F --> J["Service integration path"]
Choose the ingestion pattern before the transformation engine. If the stem emphasizes replay, ordering, late arrivals, or near-real-time processing, ingestion semantics matter more than the eventual target store.
When two ingestion answers both look plausible, use this order:
One common DEA-C01 trap is assuming only one pattern can exist. A workflow can be batch ingestion at the source level and still use an event trigger when new batch files arrive. The dominant question is still what the source behavior and freshness requirement actually are.
| Trap | Better reading |
|---|---|
| “The target is Redshift, so the answer must start with Redshift.” | The first decision is still how data arrives: batch, streaming, CDC, or API-driven. |
| “Near-real-time means Lambda no matter what.” | If the main challenge is durable stream ingestion, you still need a real streaming pattern. |
| “We only need the latest rows from a database, so run full reloads very often.” | If the question is about changed rows over time, CDC is usually the stronger pattern. |
| “A new object in S3 should kick off work, so this is just batch.” | The stem may really be testing event-triggered ingestion behavior. |
A retail company exports a large product catalog every night, but order updates must also appear in downstream dashboards within seconds. The strongest answer usually separates the lanes: batch ingestion for the nightly catalog load and a streaming or CDC pattern for the live order changes. DEA-C01 rewards answers that do not force one ingestion model onto all data.