AWS AIP-C01 Cheat Sheet: Bedrock, RAG, Agents, and Evaluation

April 24, 2026

AWS AIP-C01 cheat sheet for Bedrock, RAG, agents, prompt governance, safety, evaluation, model routing, operations, optimization, and final review traps.

On this page

Use this cheat sheet for AWS Certified Generative AI Developer - Professional (AIP-C01) when you already understand the basic vocabulary and need faster scenario decisions. The exam lane is not about naming every AI service. It is about building production generative AI systems that are grounded, secure, observable, cost-aware, and recoverable.

Quick facts (AIP-C01)

I verified these current AWS exam facts on May 24, 2026.

Item	Value
Exam	AWS Certified Generative AI Developer - Professional
Exam code	AIP-C01
Level	Professional
Questions	75 total
Scored questions	65
Unscored questions	10, not identified on the exam
Time	180 minutes
Passing score	750, scaled 100-1000
Question types	Multiple choice and multiple response
Cost	300 USD

Domain weights and review priority

Domain	Weight	What to compress for final review
Foundation Model Integration, Data Management, and Compliance	31%	model fit, RAG design, vector stores, data prep, compliance boundaries
Implementation and Integration	26%	Bedrock APIs, agents, Knowledge Bases, app integration, orchestration, deployment
AI Safety, Security, and Governance	20%	guardrails, IAM, KMS, private access, responsible AI, audit evidence
Operational Efficiency and Optimization for GenAI Applications	12%	token cost, latency, throughput, caching, quotas, monitoring, rollback
Testing, Validation, and Troubleshooting	11%	model evaluation, regression tests, hallucination diagnosis, logs, traces, failure analysis

The first two domains are more than half of the scored blueprint. Do not study safety and operations as separate side topics; they are usually embedded inside model, RAG, and agent implementation scenarios.

Production GenAI proof stack

AIP-C01 questions usually ask whether a GenAI application is production-ready, not whether a model can answer one prompt. Keep this stack in mind before choosing a Bedrock feature or architecture:

Task fit: direct inference, RAG, agent, extraction, evaluation, or workflow orchestration must match the business outcome.
Context path: source data, chunks, embeddings, metadata filters, prompt variables, and retrieved evidence must be accurate, current, and authorized.
Tool boundary: agents and function calls need scoped tools, IAM, input validation, state handling, idempotency, approvals, and failure behavior.
Safety and data boundary: guardrails, prompt-injection defenses, IAM, KMS, private access, log controls, retention, and responsible AI controls must protect prompts, context, tools, and outputs.
Operational evidence: metrics, traces, invocation logs, token/cost signals, retrieval quality, safety events, version comparisons, and alarms must show what is happening.
Evaluation and rollback: golden datasets, RAG tests, agent task completion, release gates, canaries, and rollback criteria must prove quality before and after release.

If an answer improves output quality but ignores authorization, observability, evaluation, or rollback, it is usually too demo-oriented for this professional exam.

Official task compression

Foundation Model Integration, Data Management, and Compliance covers requirements, FM selection, data validation, vector stores, retrieval, and prompt governance. Under time pressure, start with task fit, data path, retrieval quality, prompt lifecycle, and compliance boundary before choosing a model.

Implementation and Integration covers agents, model deployment, enterprise integration, FM APIs, application patterns, and developer tools. Production integration includes tool contracts, state, streaming, asynchronous paths, CI/CD, retry, fallback, and observability.

AI Safety, Security, and Governance covers input/output controls, privacy, governance, and responsible AI. Guardrails are one control, not the whole answer; pair them with IAM, KMS, data filtering, audit, and evaluation.

Operational Efficiency and Optimization covers token efficiency, model choice, caching, latency, throughput, and monitoring. Optimize from evidence: tokens, retrieval time, tool time, model latency, concurrency, quotas, and quality impact.

Testing, Validation, and Troubleshooting covers evaluation systems, regression gates, retrieval quality, agent performance, and troubleshooting. A prompt that worked once is not production proof; use datasets, quality gates, traces, and regression checks.

Read every GenAI scenario in this order

Identify the workload: direct model call, retrieval augmented generation (RAG), agent, content generation, extraction, evaluation, or operations.
Identify the failure risk: hallucination, stale context, unsafe output, data exposure, latency, cost, permissions, or poor observability.
Choose the smallest production pattern that fixes that risk.
Add the missing control: IAM, KMS, VPC endpoint, guardrail, logging, evaluation, retry, human review, or rollback.
Reject answers that improve model output but ignore data boundary, quality evidence, or operational ownership.

Question-type traps

Question type	Exam-day habit
Multiple choice	Find the deciding constraint before comparing services.
Multiple response	Count requirements in the stem; a correct-looking answer that misses one requirement is still wrong.
Sequencing distractors	Even without a formal ordering item, some answer choices describe steps in the wrong operational order. Validate permissions, data boundaries, deployment, and rollback before production exposure.
Capability distractors	Match the failure mode to the control: retrieval, guardrail, IAM, KMS, evaluation, monitoring, or workflow approval.

Unanswered questions are incorrect and there is no penalty for guessing. Mark long multi-requirement items, answer what you can, and return if time remains.

GenAI decision flow

Use this compact flow when the stem mixes model choice, grounding, safety, and operational controls.

    flowchart TD
	  S["Scenario"] --> W["Workload: prompt, RAG, agent, extraction, eval, or ops?"]
	  W --> R["Risk: hallucination, unsafe output, stale facts, exposure, cost, or latency?"]
	  R --> P["Pattern: prompt, RAG, agent, guardrail, eval, or workflow?"]
	  P --> C["Control: IAM, KMS, logging, review, retry, or rollback"]
	  C --> E["Evidence: citations, telemetry, approval, or audit trail"]

Rule: if the answer choice names a real AWS GenAI feature but misses the named risk, it is the wrong answer.

AIP-C01 answer sequence

Use this when the stem mixes app integration, grounding, tools, governance, and evaluation.

    flowchart TD
	  S["Scenario"] --> T["Task: answer, retrieve, act, extract, evaluate, or operate?"]
	  T --> G["Need grounding, tools, workflow state, or human review?"]
	  G --> P["Pattern: prompt, RAG, agent, pipeline, or workflow"]
	  P --> C["Controls: IAM, KMS, VPC endpoint, guardrail, validation, logging"]
	  C --> V["Evidence: citations, eval score, trace, alarm, approval, or rollback"]

Professional-level answers rarely stop at “call the model.” They include the data path, permissions, failure behavior, observability, and quality evidence that make the application safe to run in production.

Final answer stack

When two answers both name plausible AWS AI features, keep the one that satisfies the whole production loop:

Name the failure mode first. Is the issue stale knowledge, hallucination, unsafe output, overbroad tool access, token cost, latency, privacy exposure, or unmeasured quality?
Fix grounding before model size. Current or private facts usually point to RAG, source sync, metadata, retrieval quality, and citations before a larger model or customization.
Constrain actions before autonomy. Agentic answers need tool contracts, IAM, validation, approvals, state, and audit before sophistication.
Treat context as sensitive data. Source documents, chunks, embeddings, prompts, outputs, logs, and evaluation sets can all carry regulated or tenant data.
Optimize from measurement. Use token counts, retrieval latency, model latency, tool latency, concurrency, cache hit rate, and quality impact before buying capacity.
Validate every release. Compare prompt, model, index, parameter, and tool versions with representative evaluations, safety checks, and rollback criteria.

GenAI architecture chooser

Requirement in the stem	Start with	Reject answers that…
Answer questions from enterprise documents	RAG with governed source data, embeddings, retrieval, and citations where required	only tune the prompt or choose a larger model
Execute multi-step business tasks	Agent pattern with explicit tools, permissions, state, and failure handling	let the agent call broad tools without audit or constraints
Generate controlled marketing, support, or code output	Prompt template, guardrails, evaluation set, and human approval where impact is high	rely on one manual prompt test
Extract structured fields from text or documents	Purpose-built extraction pipeline, validation, and schema checks	ask a chat model to infer structure without verification
Improve answer quality over time	Evaluation dataset, groundedness checks, regression tests, and telemetry	change prompts randomly after user complaints
Reduce latency or cost	Model fit, prompt length, retrieval size, caching, batching, and token control	scale capacity before measuring the bottleneck

Production RAG and agent architecture map

Use this mental model when a stem mixes enterprise data, tool use, safety, and monitoring.

    flowchart LR
	  U["User or business workflow"] --> A["Application/API layer"]
	  A --> I["AuthZ and tenant context"]
	  I --> R["Retriever with metadata filters"]
	  R --> V["Vector store / knowledge base"]
	  V --> C["Grounded context with source IDs"]
	  C --> M["Foundation model"]
	  A --> T["Agent tools when action is required"]
	  T --> P["Scoped IAM role and input validation"]
	  M --> G["Guardrails and output checks"]
	  G --> O["Response with citations or safe refusal"]
	  A --> L["Logs, traces, evals, and cost metrics"]

What to notice:

authorization happens before retrieval, not after the model generates an answer
tool permissions are separate from model permissions
guardrails reduce unsafe behavior but do not replace source filtering or IAM
evaluation and telemetry are part of the production path, not a postmortem luxury

Foundation model selection cues

Stem clue	Strong first decision
low latency, high volume, simple extraction or summarization	smaller or faster model if quality target is still met
complex reasoning over long context	model with stronger reasoning/context fit, plus retrieval discipline
regulated content or brand-sensitive output	model fit plus guardrails, evaluation, and human review where impact is high
model availability varies by Region	deployment pattern with regional fit, fallback, or graceful degradation
cost is rising with tokens	compress prompts, reduce retrieved chunks, cache repeated work, and cap output length before scaling capacity

Model routing and deployment chooser

Requirement	Strong first fit	Watch for
simple on-demand model invocation	Bedrock API call from app, Lambda, container, or service integration	retries, timeouts, throttling, request validation, logging, and cost controls
predictable high-volume workload	provisioned throughput or capacity-planned endpoint where justified	do not buy capacity before measuring token volume, latency, and utilization
model fallback or provider switching	abstraction layer, AppConfig, API Gateway, Lambda, or routing logic	failover must preserve prompt contract, safety controls, and observability
long-running or asynchronous workflow	SQS, Step Functions, EventBridge, or callback pattern	idempotency, partial failure, replay, and user notification
real-time response UX	streaming APIs, WebSockets, or server-sent events where appropriate	streaming improves perceived latency but does not fix poor retrieval or unsafe output
domain-specific customized model	SageMaker AI endpoint or managed customization path where appropriate	lifecycle, versioning, rollback, registry, and evaluation gates still matter
multi-step model/tool orchestration	agent, Step Functions, Prompt Flows, or explicit workflow	state, stopping conditions, tool permissions, and circuit breakers are part of the answer

Bedrock and application integration map

Area	What to remember
Model access	Pick a model for task fit, latency, cost, context length, safety needs, and supported features.
Knowledge base	Use for managed retrieval when enterprise content must ground model output. Secure the source and the retrieval path.
Agent	Use when the model must decide between tools or steps. Tool contracts and permissions are part of the answer.
Guardrail	Use to enforce safety, denied topics, sensitive information handling, or output constraints.
Embeddings	Use for semantic retrieval and similarity, not as a substitute for authorization or data quality.
Application path	Treat GenAI calls like any production dependency: retries, timeouts, idempotency, logs, metrics, and alarms.

RAG decision rules

If the problem is…	Better fix
unsupported or invented answer	improve retrieval source, chunking, metadata, grounding, and evaluation before changing model size
stale answer	refresh ingestion, indexing, source synchronization, and cache behavior
answer from data the user should not see	enforce source permissions, retrieval filtering, and app-level authorization
too much irrelevant context	improve chunking, ranking, metadata filters, and prompt context budget
poor citations or traceability	preserve document IDs, source metadata, and response attribution
inconsistent quality after updates	run regression evaluations before promoting the new prompt, index, model, or pipeline

Agent decision rules

Agent design question	Strong answer pattern
What tools can the agent use?	Only the tools required by the task, with scoped IAM and input validation.
What if a tool call fails?	Return a controlled failure, retry safely when idempotent, log the event, and avoid hidden partial state.
What if a tool changes data?	Require explicit authorization, validation, audit logging, and rollback or compensation where possible.
What if the agent sees sensitive input?	Keep secrets out of prompts, restrict logs, apply data classification, and use approved storage paths.
What if output has business impact?	Add human review, policy checks, confidence thresholds, or workflow approval.

Prompt and governance rules

Requirement	Better answer pattern
reusable prompt behavior across teams	Prompt Management, template versioning, approval workflow, and change history
multi-step prompt process	Prompt Flows or Step Functions with conditional branches, validation, and failure handling
deterministic structured output	schema validation, constrained output instructions, post-processing checks, and retries where safe
prompt regression after update	compare prompt versions against a golden dataset before promotion
prompt injection risk	separate system instructions from user/retrieved text, validate tool inputs, apply guardrails, and avoid trusting retrieved content as instructions
compliance requires traceability	preserve prompt version, model version, retrieval sources, output, approval, and audit logs

Security and governance checklist

Control	Exam instinct
IAM	Scope model, data, tool, logging, and deployment access separately. Avoid broad application roles.
KMS	Check both encryption setting and key policy, especially cross-account or regulated data scenarios.
Network path	Prefer private access patterns when the requirement says private, internal, or no internet exposure.
Data retention	Know where prompts, retrieved chunks, embeddings, logs, and generated outputs are stored.
Guardrails	Use for policy enforcement, but do not confuse them with authorization, retrieval filtering, or audit.
Audit evidence	Logs, traces, evaluation reports, approval records, and model/version history matter in production scenarios.

Safety and privacy traps

Trap	Better exam instinct
Guardrail selected as the only privacy control	Add data classification, source authorization, log minimization, retention, KMS, and access controls.
PII detection happens after the response	Detect and minimize sensitive data before storage, retrieval, prompt construction, and output where required.
Prompt injection handled by better wording only	Treat user and retrieved text as untrusted input; constrain tools, validate parameters, and monitor adversarial behavior.
Hallucination fixed only by lower temperature	Use grounding, source attribution, evaluation, schema checks, and answer refusal when evidence is missing.
Compliance solved by a dashboard	Keep traceable source metadata, decision logs, model or prompt versions, approvals, and retention policy.
Responsible AI treated as final review only	Build fairness, transparency, human oversight, and policy checks into the workflow from the start.

Evaluation and troubleshooting

Symptom	First things to check
hallucinations	retrieval quality, grounding instructions, evaluation set, source freshness, and citation behavior
slow responses	model choice, token volume, retrieval latency, tool latency, network path, and concurrency
high cost	model tier, prompt length, response length, retrieval size, retries, cache misses, and unused steps
unsafe output	guardrails, content policy, prompt injection defenses, human review, and red-team tests
access denied	application IAM role, resource policy, KMS key policy, VPC endpoint policy, and service permissions
inconsistent agent behavior	tool schema, tool errors, state handling, prompt instructions, and evaluation coverage

Evaluation gate chooser

Change under test	Minimum useful gate
prompt template change	regression dataset, expected format checks, safety checks, and reviewer approval if impact is high
retrieval index update	recall/precision sample, citation accuracy, source freshness, latency, and access-filter tests
model switch	quality, latency, token cost, safety, fallback behavior, and business outcome comparison
agent tool change	task completion, tool input validation, permission scope, failure behavior, and audit log checks
safety policy change	adversarial examples, blocked-topic tests, sensitive-data checks, false positive review, and monitoring
production rollout	canary or staged release with alarms, rollback, eval metrics, and trace coverage

Common traps

Trap	Better instinct
Bigger model equals better production answer	Use model fit plus grounding, evaluation, latency, and cost evidence.
RAG means secure by default	RAG can leak data if retrieval ignores source permissions or metadata filters.
Guardrails solve all safety problems	Guardrails help output policy; they do not replace IAM, data classification, or review workflow.
Prompt tests are enough	Use repeatable evaluation sets and regression checks.
Agents are just smarter prompts	Agents are app workflows with tools, permissions, errors, state, and audit requirements.
Optimization starts with capacity	Optimization starts with measurements: latency, tokens, retrieval time, tool time, and model behavior.
Bedrock feature name equals correct answer	The feature must fix the named failure mode and preserve data, safety, and operations controls.
Model routing without contract discipline	Fallback models must support the required prompt shape, output format, guardrail path, and monitoring.
Evaluation only after launch	Professional answers build evaluation into promotion, rollback, and troubleshooting.

Scenario eliminations

Stem clue	Eliminate first	Keep in play
private enterprise documents and citation requirement	prompt-only answer or bigger model	RAG with source governance, metadata, and evaluation
agent can update records or trigger workflows	broad tool access with no approval path	scoped tools, IAM, validation, audit, rollback or compensation
user input may contain malicious instructions	trusting retrieved or user text as system instructions	prompt-injection defenses, tool allowlists, guardrails, and validation
response quality regressed after index update	random prompt changes	regression evaluation, retrieval diagnostics, source freshness checks
latency and cost are high	provision more capacity first	token budget, model fit, retrieval size, caching, batching, and retry behavior
sensitive prompts appear in logs	only add output guardrails	data minimization, log redaction, retention policy, IAM, and KMS review

The exam often places a real Bedrock feature in a weak answer. Keep the feature only if it fixes the named risk and preserves the production controls.

Final 15-minute review

If the stem says…	Start here
enterprise knowledge, citations, or source documents	RAG, source governance, retrieval filtering, evaluation, and metadata
autonomous task, tool use, or multi-step workflow	agent tools, IAM scope, validation, audit, and failure handling
regulated, private, or sensitive data	data boundary, encryption, KMS policy, private access, retention, and logs
unsafe, biased, or prohibited output	guardrails, responsible AI policy, human review, and safety tests
poor answer quality	retrieval, prompt, model fit, evaluation set, and regression test order
production deployment	observability, rollback, alarms, retries, cost controls, and ownership

Practice fit

Use IT Mastery for the exact product route, practice status, spaced review when available, and close-answer explanation practice as coverage expands.

One-line decision rule

AIP-C01 answers should be production-grade: ground the output, secure every data path, constrain unsafe behavior, evaluate quality repeatedly, observe runtime behavior, and optimize from evidence.

Revised on Monday, June 15, 2026

5. Testing, Validation, and Troubleshooting

Study Plan

Browse AWS Certification Guides