CertPrepNow
ConfluentCCDAK75 concepts

CCDAK Cheat Sheet

Quick reference for the Confluent Certified Developer for Apache Kafka exam.

Kafka CLI Commands

kafka-topics.sh --bootstrap-server localhost:9092 --create --topic orders --partitions 6 --replication-factor 3
Creates a topic with 6 partitions and replication factor 3.
kafka-topics.sh --bootstrap-server localhost:9092 --list
Lists all topics in the cluster.
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic orders
Shows partition count, replication factor, leader, replicas, and ISR for a topic.
kafka-console-producer.sh --bootstrap-server localhost:9092 --topic orders --property parse.key=true --property key.separator=:
Produces messages from the console with an explicit key:value split on the separator.
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic orders --from-beginning
Consumes messages from the earliest offset for ad hoc testing.
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service
Shows current offset, log-end-offset, and consumer lag per partition for a group.
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group order-service --reset-offsets --to-earliest --topic orders --execute
Resets a consumer group's offsets to the earliest available offset; omit --execute to dry-run.
kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name orders --add-config retention.ms=604800000
Dynamically alters a topic-level configuration without restarting brokers.

Producer Configuration

acks
Default is all (since Kafka 3.0) — requires acknowledgment from all in-sync replicas; other values are 0 (fire-and-forget) and 1 (leader only).
enable.idempotence
Default is true (since Kafka 3.0) — assigns a Producer ID and per-partition sequence number to eliminate duplicate writes on retry.
linger.ms
Default is 0 (send immediately); raising it batches more records per request, trading latency for throughput.
batch.size
Default is 16384 bytes — the maximum bytes of records batched together per partition before a send.
compression.type
Default is none; other options are gzip, snappy, lz4, and zstd — reduces network/disk usage at the cost of CPU.
max.in.flight.requests.per.connection
Default is 5 — the maximum unacknowledged requests per connection; must be <= 5 to preserve ordering with idempotence enabled.
delivery.timeout.ms
Default is 120000ms — the upper bound on time from send() to acknowledgment or failure, covering batching, retries, and the in-flight request.
transactional.id
A stable unique ID enabling transactions across producer restarts; required together with enable.idempotence=true for exactly-once writes.

Consumer Configuration

enable.auto.commit
Default is true — offsets are committed automatically every auto.commit.interval.ms (default 5000ms) on the next poll() call.
auto.offset.reset
Default is latest; earliest reads from the start of the topic; none throws an exception when no committed offset exists — applies only when there is no committed offset.
session.timeout.ms
Default is 45000ms — if no heartbeat is received within this window, the broker evicts the consumer and triggers a rebalance.
max.poll.records
Default is 500 — the maximum number of records returned in a single poll() call.
max.poll.interval.ms
Default is 300000ms (5 min) — the maximum time allowed between poll() calls before the consumer is considered failed and a rebalance is triggered.
partition.assignment.strategy
Default list is [RangeAssignor, CooperativeStickyAssignor]; CooperativeStickyAssignor enables incremental rebalancing without a stop-the-world pause.
isolation.level
Default is read_uncommitted; set to read_committed so the consumer only sees successfully committed transactional messages.
group.instance.id
Enables static group membership so a restarting consumer rejoins with the same identity, avoiding an unnecessary rebalance.

Replication and Durability

Leader / Follower Replicas
Every partition has one leader that handles all reads/writes and follower replicas that passively replicate the log.
ISR (In-Sync Replicas)
The set of replicas fully caught up with the leader within replica.lag.time.max.ms (default 30000ms); only ISR members are eligible for leader election by default.
min.insync.replicas
Default is 1 — the minimum ISR count required for a produce request with acks=all to succeed; set to 2+ for real durability with acks=all.
unclean.leader.election.enable
Default is false — prevents an out-of-sync replica from becoming leader, which would cause silent data loss.
High Watermark
The offset up to which all ISR replicas have replicated; consumers can only read up to the high watermark.
Replication Factor
The total number of copies of a partition across brokers (leader + followers); production standard is 3 for fault tolerance against one or two broker failures.

Exactly-Once Semantics and Transactions

EOS Chain
Exactly-once requires all three: idempotent producer (enable.idempotence=true), transactional producer (transactional.id set), and consumer with isolation.level=read_committed.
producer.initTransactions()
Registers the transactional.id with the transaction coordinator and fences off any previous zombie producer instance with the same ID.
producer.beginTransaction() / producer.send(...) / producer.commitTransaction()
Wraps one or more sends across multiple topics/partitions into a single atomic transaction; commitTransaction() makes all writes visible together.
producer.abortTransaction()
Rolls back all writes in the current transaction; consumers with read_committed will never see the aborted records.
Idempotence Scope
Idempotent producers deduplicate retries only within a single producer session; surviving a producer restart requires transactions with a stable transactional.id.
transaction.state.log.replication.factor
Default is 3 — replication factor of the internal __transaction_state topic that stores transaction metadata.

Kafka Streams — KStream, KTable, GlobalKTable

KStream
An unbounded stream of independent records (inserts); every event is treated as a new fact.
KTable
A changelog stream where each record updates the current value for its key; only the latest value per key is retained (like a materialized view).
GlobalKTable
Broadcasts the full dataset to every Streams instance; used for small reference/lookup data and does not require co-partitioning.
KStream-KStream Join
Requires a windowed join (JoinWindows) since both sides are unbounded; supports inner, left, and outer join variants.
KStream-KTable Join
A non-windowed lookup/enrichment join; requires co-partitioning (same partition count and partitioning strategy).
KTable-KTable Join
A non-windowed changelog merge join; requires co-partitioning for the primary-key join, but foreign-key joins do not.
StreamsBuilder builder = new StreamsBuilder(); KStream<String, String> source = builder.stream("input-topic");
Defines a processing topology by reading from a source topic into a KStream using the Streams DSL.

Kafka Streams — Windowing and State

Tumbling Window
Fixed-size, non-overlapping windows; each event belongs to exactly one window (e.g., count events per 5-minute block).
Hopping Window
Fixed-size, overlapping windows that advance by a hop interval smaller than the window size; a single event may fall in multiple windows.
Sliding Window
Windows are created based on the time difference between events, not fixed clock intervals; commonly used for join windows.
Session Window
Defined by an inactivity gap rather than a fixed duration; sessions merge when a new event arrives before the gap expires.
State Stores
Stateful operators (aggregate, reduce, count, join) persist local state in RocksDB by default, backed by a compacted internal changelog topic for fault tolerance.
num.stream.threads
Controls the number of stream processing threads per Streams instance; total application parallelism is capped by the source topic's partition count.
TopologyTestDriver
In-memory test harness for unit testing a Streams topology without a running Kafka cluster.

Schema Registry and Serialization

BACKWARD
Default compatibility mode — the new schema can read data written with the old schema; allows deleting fields and adding optional fields with defaults.
FORWARD
The old schema can read data written with the new schema; allows adding fields and deleting optional fields.
FULL
Combines BACKWARD and FORWARD — new and old schemas can both read each other's data; the strictest checked mode.
NONE
Disables compatibility checking entirely; any schema change is accepted.
Confluent Wire Format
Serialized Avro/Protobuf/JSON Schema messages are prefixed with a magic byte and a 4-byte schema ID that the deserializer uses to fetch the writer's schema from Schema Registry.
TopicNameStrategy
Default subject naming strategy — the subject is named <topic>-key or <topic>-value; alternatives are RecordNameStrategy and TopicRecordNameStrategy for multiple schemas per topic.

Kafka Connect

Source Connector
Reads data FROM an external system and writes it INTO Kafka (e.g., JDBC Source, Debezium CDC).
Sink Connector
Reads data FROM Kafka topics and writes it INTO an external system (e.g., JDBC Sink, S3 Sink).
Standalone vs Distributed Mode
Standalone runs a single worker with no fault tolerance (dev/test only); distributed runs a cluster of workers with automatic task rebalancing (production).
Converters
Serialize/deserialize record keys and values (JsonConverter, AvroConverter, StringConverter); configured independently for key.converter and value.converter.
Single Message Transforms (SMTs)
Lightweight per-record transforms such as InsertField, ReplaceField, MaskField, TimestampRouter, Cast, and RegexRouter; can be chained and are applied before Kafka write (source) or before sink write.
errors.tolerance=all errors.deadletterqueue.topic.name=dlq-topic
Routes records that fail conversion/transformation to a dead letter queue topic instead of stopping the connector.
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @connector-config.json
Registers a new connector on a distributed Connect worker via its REST API.
tasks.max
Sets the maximum number of parallel tasks a connector can create; actual task count is also bounded by the data source's parallelism (e.g., table count).

Cluster Architecture, KRaft, and Topic Config

KRaft Mode
ZooKeeper-free consensus mode using a Raft-based controller quorum for metadata management; ZooKeeper was fully removed starting with Kafka 4.0.
process.roles=broker,controller
Server config that designates a node's role(s) in KRaft mode; controllers manage cluster metadata, brokers serve client traffic.
cleanup.policy
delete (default) removes segments older than retention.ms/retention.bytes; compact retains only the latest value per key, using a null value (tombstone) to mark deletion.
retention.ms
Default is 604800000 (7 days) — how long records are retained before eligible for deletion under cleanup.policy=delete.
Log Compaction Head/Tail
The head holds newly written records (may contain duplicate keys); the tail holds already-compacted records with unique keys per compaction pass.
Tiered Storage
Splits data into a fast local tier and a cheaper remote object storage tier; not supported for compacted topics.

Security and Monitoring

security.protocol
PLAINTEXT (no security), SSL (encryption only), SASL_PLAINTEXT (auth only, unencrypted), SASL_SSL (auth + encryption, recommended for production).
SASL Mechanisms
PLAIN, SCRAM (challenge-response), GSSAPI (Kerberos), and OAUTHBEARER are the standard authentication mechanisms available for SASL.
ACLs
Authorization rules controlling which principals can perform which operations (Read, Write, Create, Describe) on which resources; disabled/open by default without an authorizer configured.
Consumer Lag
The difference between a partition's log-end-offset and the consumer group's committed offset; rising lag with stable producer throughput signals a slow consumer.
Under-Replicated Partitions (URP)
A non-zero count means one or more replicas have fallen out of the ISR; a critical broker health metric indicating durability risk.

Ready to test yourself?

Start a timed CCDAK mock exam or review practice questions by domain.