Confluent CCAAK Cheat Sheet: Kafka Admin, Cluster Ops, and Security

April 13, 2026

Confluent CCAAK cheat sheet for Kafka admin, cluster ops, security, traps, and final review.

On this page

Use this for last-mile review. CCAAK rewards the answer that protects durability, cluster safety, and reversible operations instead of treating Kafka like a pile of isolated configs.

ISR: In-sync replicas currently caught up enough to count toward safe acknowledgements.

URP: Under-replicated partitions, meaning one or more followers are lagging outside the healthy replication set.

Controller: Broker role responsible for metadata leadership and partition-state coordination.

CCAAK answer sequence

Use this when the stem mixes durability, security, topic behavior, or operations.

    flowchart TD
	  S["Scenario"] --> D["Classify the durability lane"]
	  D --> C["Check connectivity or security lane"]
	  C --> T["Check topic behavior lane"]
	  T --> O["Check operations lane and evidence"]

Read the question in this order

Durability lane: acks, ISR, min.insync.replicas, replication factor, and leader safety.
Connectivity or security lane: listeners, advertised.listeners, TLS, SASL, and ACL boundaries.
Topic behavior lane: partitions, retention, compaction, and cleanup intent.
Operations lane: lag, URP, offline partitions, maintenance safety, and recovery order.

Fastest 10-minute review

If the question says…	Strongest first lane
safe acknowledgements or data loss tolerance	`acks`, ISR, and `min.insync.replicas`
producer says broker is up but clients still fail	`advertised.listeners`, DNS, TLS, and SASL alignment
cluster feels unstable	controller health, broker availability, URP, and leader election
replay window or storage pressure	retention, compaction, and segment growth
consumer lag or throughput issue	partitions, consumer speed, broker health, disk, and network path
rolling restart or config change	smallest reversible change with health checks between steps

Durability control loop

    flowchart LR
	  P["Producer"] --> A["acks policy"]
	  A --> L["Leader replica"]
	  L --> I["ISR members"]
	  I --> M["min.insync.replicas gate"]
	  M --> S["Safe acknowledgement or write failure"]

Write-safety matrix

Control	What it changes	Strong reminder	Common trap
`acks=0`	producer does not wait for broker ack	low safety and low confirmation	treating it like normal production durability
`acks=1`	leader-only acknowledgement	better than no ack, still weaker than ISR-based safety	assuming it is enough for critical data
`acks=all`	waits for ISR-based confirmation	strongest normal durability lane	forgetting it depends on healthy ISR and `min.insync.replicas`
`min.insync.replicas`	minimum ISR required for safe writes	raises write safety when paired with `acks=all`	setting it high without considering write availability
replication factor	total copies	improves resilience	thinking it replaces correct producer or topic settings
`unclean.leader.election.enable`	whether an out-of-sync replica may lead	usually keep `false` to protect data	enabling it to hide availability problems

One-sentence rule

Replication factor improves resilience, but safe writes still depend on producer acks, ISR health, and min.insync.replicas.

Topic behavior and storage chooser

Requirement	Strongest first fit	Why
more consumer parallelism	more partitions	consumer-group concurrency follows partition count
preserve full history for replay or audit	delete retention policy	event log semantics
keep latest value per key	compaction	changelog or latest-state semantics
preserve ordering for one entity	stable key	ordering is per partition only
reduce storage growth safely	retention and segment policy review	storage pressure is often a policy problem

Retention and compaction table

Topic intent	Better fit	Trap to avoid
immutable event stream	delete retention	using compaction and losing replay history
latest state per entity	compaction	expecting compacted topic to preserve full event history
long replay window	longer retention and capacity planning	shrinking retention as the first storage fix

Listener and security stack

Control	What it really does	What the exam is usually testing
`listeners`	where brokers bind	interface and port availability
`advertised.listeners`	what clients are told to use	wrong endpoint breaks clients even when broker is healthy
`listener.security.protocol.map`	listener-to-protocol mapping	mismatched TLS or SASL path
`inter.broker.listener.name`	broker-to-broker communication path	replication and cluster communication health
TLS	encryption in transit	trust and certificate correctness
SASL	authentication	who may connect
ACLs	authorization	what authenticated principals may do

Security boundary table

Pair	Keep this distinction clear
TLS vs SASL vs ACLs	encryption vs authentication vs authorization
`listeners` vs `advertised.listeners`	bind location vs client-facing endpoint
client connectivity vs broker replication path	external client path versus inter-broker path

Cluster-health and lag triage

Symptom	First things to check	Common trap
URP climbing	broker health, disk I/O, inter-broker network, follower lag	blaming consumers for replication issues
offline partitions	leader availability, controller stability, multi-broker impact	treating it like ordinary lag
consumer lag rising	partition count, consumer throughput, rebalance churn, downstream bottleneck	changing retention before confirming the consumer is simply slow
frequent rebalances	long processing time, heartbeat/session timing, unstable membership	tuning brokers first
disk pressure	`log.dirs`, retention, segment growth, broker capacity	increasing replication factor during a storage incident
producer timeout under load	ISR health, broker saturation, network path, write safety requirements	disabling durability first

High-confusion admin pairs

Pair	Keep this distinction clear
replication factor vs ISR	total copies versus healthy caught-up subset
URP vs offline partition	lagging replica health issue versus partition lacking a working leader
broker config vs topic config	cluster default versus per-topic behavior
lag vs durability issue	consumer speed problem versus replication-safety problem
compaction vs retention delete	latest-by-key store versus event history

Safe maintenance order

Operation	Safe first step	What strong answers protect
rolling restart	restart one broker at a time and recheck health between nodes	quorum and ISR safety
security change	validate with a test client before broad rollout	cluster-wide client outage
disk incident	free space, stabilize brokers, then repair replication	data safety before tuning
config change	smallest reversible change first	rollback path and blast radius
topic-setting update	confirm workload intent before changing compaction or retention	replay and durability behavior

Operations traps

Trap	Better reading
tuning throughput while the cluster is unhealthy	restore safety first, optimize later
enabling unsafe leader behavior to keep writes alive	availability without durability is often the wrong trade-off for exam answers
changing multiple controls at once	the exam favors contained, reversible maintenance

Last 15-minute review

Review this	Because it fixes…
`acks`, ISR, and `min.insync.replicas`	durability misses
`listeners` vs `advertised.listeners`	connectivity questions that look like generic networking
retention vs compaction	topic-behavior distractors
URP vs offline partitions	severity misclassification
TLS vs SASL vs ACLs	security-control confusion
rolling restart order	unsafe-operations distractors

What strong answers usually do

reason from durability first, then tuning
distinguish cluster-wide behavior from topic-specific behavior
protect safety during maintenance instead of optimizing too early
classify incidents correctly before reaching for commands
choose the least risky next step that restores health without widening the blast radius

Revised on Monday, June 15, 2026

Study Plan

Sample Questions

Browse Confluent Certification Guides