CertPrepNow
ConfluentCCDAKUpdated 2026-06-17

CCDAK Study Guide

Everything you need to pass the Confluent Certified Developer for Apache Kafka exam. Structured study plans, key services, common traps, and practice questions.

You Can Pass This Exam For Free

The CCDAK exam is passable with free resources alone if you study consistently for 4-8 weeks with hands-on Kafka experience:

  • Confluent Developer portal free courses (Kafka 101, Kafka Streams 101, ksqlDB 101, Schema Registry 101)
  • Apache Kafka official documentation (kafka.apache.org)
  • Kafka: The Definitive Guide (free O'Reilly ebook from Confluent)
  • Confluent blog technical deep-dives on Kafka internals
  • GitHub CCDAK exam question repositories for practice
  • 500+ free practice questions on this site

Hands-on experience is critical for this exam. The questions are scenario-based and test real-world configuration knowledge. Running a local Kafka cluster and experimenting with producers, consumers, Kafka Streams, and Connect is far more valuable than memorizing theory.

Choose Your Study Path

Limited or no experience with event streaming or Apache Kafka. You need to build foundational knowledge of distributed systems and Kafka architecture before tackling advanced topics.

Week 1Complete Confluent's free Kafka 101 course. Learn core concepts: events, topics, partitions, offsets, brokers, and how messages flow through a cluster
Week 2Set up a local Kafka cluster using Docker Compose. Practice creating topics, producing and consuming messages from the CLI. Understand consumer groups and partition assignment
Week 3Study producer internals: acks (0, 1, all), batching (linger.ms, batch.size), compression, idempotent producers (enable.idempotence), retries, and delivery guarantees
Week 4Study consumer internals: offset management (auto vs manual commit), consumer group rebalancing, partition assignment strategies (Range, RoundRobin, Sticky, CooperativeSticky)
Week 5Learn Schema Registry: Avro serialization, schema evolution, compatibility modes (BACKWARD, FORWARD, FULL, NONE). Practice registering and evolving schemas
Week 6Study Kafka Connect: source vs sink connectors, distributed vs standalone mode, Single Message Transforms (SMTs), converter configuration, and dead letter queues
Week 7Learn Kafka Streams: KStream vs KTable vs GlobalKTable, stateless operations (filter, map, branch), stateful operations (aggregate, reduce, count), windowing (tumbling, hopping, sliding, session)
Week 8Study replication, fault tolerance (ISR, min.insync.replicas, unclean leader election), transactions, exactly-once semantics, and KRaft mode
Week 9Take full practice exams. Focus on scenario-based questions. Review all incorrect answers and revisit weak areas
Week 10Final review: configuration trade-offs (throughput vs latency vs durability), common exam traps, and confusable concepts. Take one more practice exam aiming for 80%+

Exam Overview

Format

60 questions, 90 minutes. Multiple-choice and multiple-response questions including scenario-based and code analysis items.

Scoring

Pass/Fail with a 70% passing threshold. No official scaled score — you either pass or fail. No penalty for wrong answers, so always answer every question.

Domains & Weights

  • Application Design40%
  • Development30%
  • Deployment, Testing, and Monitoring30%

Registration

$150 USD. Online proctored via Honorlock. Requires webcam, microphone, Google Chrome, and government-issued ID. Exam fee is $150 USD. Certification valid for 2 years.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these concepts deeply — they appear across multiple questions and form the core of the exam. Know configurations, defaults, and trade-offs.
Tier 2: Should KnowUnderstand these concepts well. They appear in 2-5 questions each and often show up in scenario-based questions.
Tier 3: Recognize OnlyKnow what these are at a high level. Rarely more than 1-2 questions each.
Domain 140% of exam

Application Design

The heaviest domain at 40% — expect roughly 24 questions covering Kafka architecture fundamentals, topic design, partitioning strategies, replication, producer and consumer configuration, Schema Registry, and exactly-once semantics. This domain tests your ability to design Kafka applications with the right trade-offs between performance, durability, and scalability.

Key Topics

Kafka ProducersKafka ConsumersTopics and PartitionsReplicationSchema RegistryExactly-Once SemanticsConsumer Groups

Must-Know Concepts

  • Kafka event structure: timestamp, key, value, headers, and how keys determine partition assignment via the default partitioner (murmur2 hash)
  • Producer configuration trade-offs: acks (0, 1, all), batch.size, linger.ms, compression.type, buffer.memory, max.in.flight.requests.per.connection
  • Idempotent producer: enable.idempotence=true assigns a Producer ID (PID) and sequence number per partition to deduplicate retries
  • Transactional producer: requires transactional.id, enables atomic writes across multiple topics and partitions via initTransactions(), beginTransaction(), commitTransaction(), abortTransaction()
  • Consumer offset management: auto-commit (enable.auto.commit, auto.commit.interval.ms) vs manual commit (commitSync, commitAsync). Offsets stored in __consumer_offsets topic
  • Consumer group rebalancing strategies: Range (contiguous partitions), RoundRobin (distributed), Sticky (minimizes reassignment), CooperativeSticky (incremental, no stop-the-world)
  • auto.offset.reset behavior: 'earliest' reads from beginning, 'latest' reads only new messages, 'none' throws exception if no committed offset exists
  • Replication: leader and follower replicas, ISR (In-Sync Replicas), min.insync.replicas, unclean.leader.election.enable, high watermark
  • Schema Registry compatibility modes: BACKWARD (delete fields, add optional), FORWARD (add fields, delete optional), FULL (both), NONE (no checks). Default is BACKWARD
  • Exactly-once semantics chain: idempotent producer + transactional producer + consumer with isolation.level=read_committed
  • Topic configuration: partition count (determines parallelism ceiling), replication factor (fault tolerance), retention.ms, cleanup.policy (delete or compact)
  • Key-based ordering: messages with the same key always go to the same partition, guaranteeing order for that key. Changing partition count breaks this guarantee

Common Exam Traps

acks=all does NOT guarantee durability alone. If min.insync.replicas=1, data can still be lost if the leader is the only ISR member and it crashes
Idempotent producers only prevent duplicates within a single producer session. Across restarts, you need transactions with a stable transactional.id
Changing the number of partitions in a topic breaks key-based ordering guarantees because the partition assignment hash changes
auto.offset.reset only applies when there is NO committed offset for the consumer group. It does NOT affect consumers with existing committed offsets
CooperativeSticky rebalancing does NOT stop all consumers during rebalancing — only affected partitions are revoked. The older Eager protocol stops all consumers
BACKWARD compatibility (the default) means the NEW schema can read OLD data. Many candidates confuse the direction
Quick Check: Application Design

Question 1 of 4

A producer sends messages to a topic with replication factor 3 and min.insync.replicas=2. The producer is configured with acks=all. One of the three replicas goes offline. What happens when the producer sends a message?

Domain 230% of exam

Development

This domain covers Kafka Streams development, Kafka Connect configuration, and ksqlDB. Expect scenario-based questions on stream processing operations, join types, windowing, Connect pipeline configuration, and data integration patterns. Code analysis questions may present Kafka Streams DSL snippets to evaluate.

Key Topics

Kafka Streams DSLKStreamKTableGlobalKTableWindowingKafka ConnectSMTsksqlDB

Must-Know Concepts

  • KStream vs KTable: KStream is an event stream (each record is an independent fact), KTable is a changelog (latest value per key is kept). GlobalKTable broadcasts all data to every instance
  • Stateless operations: filter(), map(), mapValues(), flatMap(), flatMapValues(), branch(), merge(), peek() — these do not maintain state stores
  • Stateful operations: aggregate(), reduce(), count(), join() — these use local state stores (RocksDB by default) backed by changelog topics
  • Windowing types: Tumbling (fixed, non-overlapping), Hopping (fixed, overlapping), Sliding (event-triggered, time-difference based), Session (inactivity gap based)
  • Join types and requirements: KStream-KStream requires window, KStream-KTable is non-windowed lookup, KTable-KTable is non-windowed changelog merge. KStream-KTable and KTable-KTable require co-partitioning. GlobalKTable joins do NOT require co-partitioning
  • Co-partitioning requirement: two topics must have the same number of partitions and use the same partitioning strategy for KStream-KTable and KTable-KTable joins
  • Kafka Streams parallelism: maximum parallelism equals the number of input topic partitions. Each stream task processes one partition. num.stream.threads controls threads per instance
  • Kafka Connect source connectors ingest data INTO Kafka; sink connectors export data FROM Kafka. Converters serialize/deserialize (JsonConverter, AvroConverter). SMTs transform individual records
  • Dead letter queues in Connect: errors.tolerance=all routes failures to DLQ topic (errors.deadletterqueue.topic.name) instead of stopping the connector
  • Kafka Streams state store changelog topics: stateful operations maintain local state in RocksDB, backed by compacted changelog topics in Kafka for fault tolerance. Standby replicas enable fast failover

Common Exam Traps

KStream-KStream joins REQUIRE a window — this is the most common Streams join mistake. Without a window, the join would need unbounded state
GlobalKTable loads ALL data to every Streams instance. It does NOT require co-partitioning but consumes more memory. Use it only for small reference data
mapValues() is preferred over map() when you do not need to change the key, because map() triggers a repartition (data shuffle) if followed by a key-based operation
Connect standalone mode is NOT fault tolerant. If the worker process dies, all connectors stop. Distributed mode provides automatic failover
SMTs run in a specific order: for source connectors, SMTs apply AFTER the connector produces the record but BEFORE it is written to Kafka. For sink connectors, AFTER reading from Kafka but BEFORE sending to the sink
Session windows are defined by INACTIVITY gaps, not fixed time intervals. Two events 1 second apart with a 5-minute gap setting belong to the SAME session
Quick Check: Development

Question 1 of 3

A Kafka Streams application needs to join a high-volume click stream (KStream) with a slowly-changing user profile table (KTable). The topics have different partition counts. What must be done before the join?

Domain 330% of exam

Deployment, Testing, and Monitoring

This domain covers deploying Kafka applications to production, testing strategies, and monitoring. Expect questions on testing frameworks (TopologyTestDriver, EmbeddedKafka), monitoring metrics (consumer lag, throughput), security configuration (SSL, SASL, ACLs), and operational best practices for Kafka clusters.

Key Topics

TopologyTestDriverEmbeddedKafkaConsumer LagJMX MetricsSSL/TLSSASLACLsLog Compaction

Must-Know Concepts

  • TopologyTestDriver: in-memory test driver for Kafka Streams topologies. Does not require a running Kafka cluster. Fast and deterministic for unit testing stream processing logic
  • EmbeddedKafka: spins up an in-process Kafka broker for integration tests. Slower than TopologyTestDriver but tests real producer/consumer interactions
  • Consumer lag: the difference between the latest offset in a partition and the consumer's committed offset. High lag indicates the consumer cannot keep up with the producer rate
  • Key monitoring metrics: consumer lag per partition, request latency (produce/fetch), throughput (bytes/messages per second), under-replicated partitions, ISR shrink/expand rate
  • Security: SSL/TLS for encryption in transit, SASL for authentication (PLAIN, SCRAM, GSSAPI/Kerberos, OAUTHBEARER), ACLs for authorization (who can read/write/create topics)
  • Kafka security configuration: security.protocol (PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL), ssl.truststore.location, ssl.keystore.location, sasl.mechanism
  • Application deployment considerations: num.stream.threads for Kafka Streams parallelism, state store cleanup, graceful shutdown, rolling upgrades
  • Log retention: retention.ms (time-based), retention.bytes (size-based), cleanup.policy=delete (remove old segments) vs cleanup.policy=compact (keep latest per key)
  • Monitoring tools: JMX metrics, Confluent Control Center, consumer group describe command (kafka-consumer-groups.sh --describe), broker metrics via JMX
  • Graceful shutdown of Kafka Streams: calling close() triggers a final rebalance, commits offsets, and flushes state stores. Use shutdown hooks to ensure clean exit

Common Exam Traps

TopologyTestDriver does NOT require a running Kafka cluster. It is for unit testing Streams topologies only. For integration tests involving real brokers, use EmbeddedKafka
Consumer lag of 0 does not mean the system is healthy — it could mean the consumer has caught up OR that no messages are being produced. Check throughput metrics too
SSL/TLS encrypts data in transit but does NOT provide authentication. For authentication, you need SASL (or mutual TLS with client certificates)
ACLs are disabled by default. Without an authorizer configured, any client can access any topic. This is a common production security oversight
Under-replicated partitions (URP) is one of the most critical broker health metrics. A non-zero URP count indicates potential data loss risk and should be investigated immediately
Rolling upgrades of Kafka Streams applications require careful handling of state store compatibility. Incompatible changes to the topology require a clean restart, not a rolling upgrade
Quick Check: Deployment, Testing, and Monitoring

Question 1 of 3

A team wants to write fast, deterministic unit tests for their Kafka Streams topology without starting a Kafka cluster. Which testing approach should they use?

Kafka Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

acks=1 vs acks=all

Use acks=1 when…

Producer waits for acknowledgment from the partition leader only. Faster but risks data loss if the leader crashes before followers replicate.

Use acks=all when…

Producer waits for acknowledgment from all in-sync replicas (ISR). Slowest but provides the strongest durability guarantee.

Exam trap

acks=all only guarantees durability if min.insync.replicas is set to 2 or higher. With min.insync.replicas=1 and acks=all, you still risk data loss if the leader is the only ISR member.

KStream vs KTable

Use KStream when…

An unbounded, continuously updating stream of events. Each record is an independent event (insert). Think of it as an append-only log of facts.

Use KTable when…

A changelog stream where each record is an update to a key. Only the latest value per key is retained. Think of it as a materialized view or database table.

Exam trap

A KStream-KTable join is a lookup enrichment (non-windowed). A KStream-KStream join REQUIRES a window because both sides are unbounded. This is the most common Kafka Streams join trap.

Source Connector vs Sink Connector

Use Source Connector when…

Reads data FROM an external system (database, file, API) and writes it INTO Kafka topics. Examples: JDBC Source, Debezium CDC connectors.

Use Sink Connector when…

Reads data FROM Kafka topics and writes it INTO an external system (database, Elasticsearch, S3). Examples: JDBC Sink, S3 Sink connectors.

Exam trap

The direction is relative to Kafka: Source means data flows INTO Kafka. Sink means data flows OUT OF Kafka. SMTs are applied AFTER source produces and BEFORE sink consumes.

BACKWARD Compatibility vs FORWARD Compatibility

Use BACKWARD Compatibility when…

New schema can read data written by the old schema. You can delete fields and add optional fields with defaults. Consumers using the new schema can process old data.

Use FORWARD Compatibility when…

Old schema can read data written by the new schema. You can add fields and delete optional fields. Consumers using the old schema can process new data.

Exam trap

BACKWARD = new code reads old data. FORWARD = old code reads new data. FULL = both directions. BACKWARD is the Schema Registry default. The exam tests whether you know which operations each mode allows.

enable.auto.commit=true vs Manual Offset Commit

Use enable.auto.commit=true when…

Consumer automatically commits offsets at a fixed interval (auto.commit.interval.ms, default 5 seconds). Simple but risks reprocessing if consumer crashes between commits.

Use Manual Offset Commit when…

Application explicitly calls commitSync() or commitAsync() after processing. Gives precise control but adds code complexity. commitSync blocks; commitAsync does not.

Exam trap

Auto-commit does NOT commit immediately on poll() — it commits the offsets from the previous poll() on the NEXT poll() call if the interval has elapsed. The risk is that auto-commit may commit offsets for messages that were fetched but whose processing subsequently failed, making those messages appear 'done' when they were not. With manual commit, you control exactly when offsets are committed — typically after successful processing.

Standalone Mode (Connect) vs Distributed Mode (Connect)

Use Standalone Mode (Connect) when…

Single worker process runs all connectors and tasks. Configuration via properties files. No fault tolerance — if the worker dies, all connectors stop.

Use Distributed Mode (Connect) when…

Multiple workers form a Connect cluster. Configuration via REST API. Provides fault tolerance and automatic task rebalancing across workers.

Exam trap

Distributed mode is required for production. Standalone is only for development and testing. The exam may present scenarios where you need to choose the appropriate mode.

Tumbling Window vs Hopping Window

Use Tumbling Window when…

Fixed-size, non-overlapping time windows. Each event belongs to exactly one window. Example: count events per 5-minute block.

Use Hopping Window when…

Fixed-size, overlapping time windows that advance by a specified hop interval. An event may belong to multiple windows. Example: 5-minute windows advancing every 1 minute.

Exam trap

A tumbling window is a hopping window where the hop size equals the window size. Session windows are different entirely — they are defined by inactivity gaps, not fixed time intervals.

At-Most-Once Delivery vs At-Least-Once Delivery

Use At-Most-Once Delivery when…

Messages may be lost but are never duplicated. Achieved with acks=0 or by committing offsets before processing. Suitable when occasional data loss is acceptable.

Use At-Least-Once Delivery when…

Messages are never lost but may be duplicated. Achieved with acks=all and committing offsets after processing. The default behavior for most Kafka applications.

Exam trap

Exactly-once semantics (EOS) requires idempotent producers + transactions + read_committed consumers. It is NOT a simple configuration toggle — multiple components must be coordinated.

Top Mistakes to Avoid

Confusing acks=all with full durability — acks=all only guarantees durability when min.insync.replicas is set to at least 2, otherwise the leader alone can acknowledge
Mixing up KStream (event stream, every record is independent) with KTable (changelog stream, latest value per key) — they have fundamentally different semantics
Thinking KStream-KTable joins require a window — only KStream-KStream joins require windows. KStream-KTable joins are non-windowed lookups
Confusing BACKWARD compatibility (new schema reads old data) with FORWARD compatibility (old schema reads new data) — BACKWARD is the Schema Registry default
Forgetting that GlobalKTable does NOT require co-partitioning but loads ALL data to every instance — it is only suitable for small reference datasets
Assuming enable.auto.commit commits offsets immediately after processing — auto-commit commits the previous batch's offsets on the next poll() call if the interval elapsed. If processing fails but the next poll() fires, those offsets are committed as done even though processing failed, causing silent data loss. Manual commit gives you control to commit only after confirmed processing
Not understanding that changing partition count breaks key-based ordering because the murmur2 hash maps keys to different partitions
Confusing source connectors (data INTO Kafka) with sink connectors (data OUT OF Kafka) — the direction is always relative to Kafka
Thinking TopologyTestDriver tests real Kafka interactions — it tests Streams topology logic only, without a broker. Use EmbeddedKafka for integration tests
Assuming Session windows have fixed time intervals — they are defined by inactivity gaps, not fixed durations, and merge when activity resumes

Exam-Ready Checklist

Can explain the 3 exam domains and their weights (Application Design 40%, Development 30%, Deployment/Testing/Monitoring 30%)
Know all producer acks settings (0, 1, all) and their trade-offs with min.insync.replicas and replication factor
Understand idempotent producers (PID, sequence numbers) and transactional producers (transactional.id, initTransactions, begin/commit/abort)
Can explain all 4 consumer group rebalancing strategies and when to use CooperativeSticky to avoid stop-the-world pauses
Know Schema Registry compatibility modes (BACKWARD, FORWARD, FULL, NONE) and which schema changes each allows
Can distinguish KStream, KTable, and GlobalKTable and know which join combinations require windowing and co-partitioning
Understand all 4 windowing types (Tumbling, Hopping, Sliding, Session) and their use cases
Know Kafka Connect architecture: source vs sink, standalone vs distributed, converters, SMTs, and dead letter queues
Can explain exactly-once semantics end-to-end: idempotent producer + transactions + read_committed consumer
Know testing frameworks: TopologyTestDriver for unit tests (no broker) vs EmbeddedKafka for integration tests (with broker)
Understand key monitoring metrics: consumer lag, under-replicated partitions, request latency, and throughput
Can configure Kafka security: SSL for encryption, SASL for authentication, ACLs for authorization, and their protocol combinations
Scored 80%+ on at least two full practice exams — the real exam is harder than most practice tests

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Frequently Asked Questions