Message middleware enables reliable, synchronous or asynchronous communication between distributed applications using message queues and transmission protocols. It facilitates platform-agnostic data exchange and supports system integration through decoupled, scalable communication models.
Apache Kafka is a distributed event streaming platform renowned for high throughput, durability, horizontal scalability, and real-time stream processing capabilities. Its core roles include:
- Messaging System: Provides decoupling, buffering, load leveling, and fault tolerance. Unlike traditional brokers, Kafka guarantees per-partition message ordering and supports replayable consumption.
- Storage System: Persists messages to disk with replication, reducing data loss risk. With log compaction or infinite retention policies, Kafka can serve as a durable storage layer.
- Stream Processing Platform: Supplies data streams to external frameworks and includes native stream processing APIs for transformations, aggregations, joins, and windowing.
Kafka’s architecture comprises producers, consumers, brokers, and (historically) ZooKeeper—though newer versions use KRaft for metadata management. Key components:
- Producer: Publishes records to topics.
- Consumer: Subscribes to topics and processes records.
- Consumer Group: A set of consumers that jointly consume a topic; each partition is assigned to exactly one consumer per group.
- Broker: A server node storing and serving partitions.
- Topic: A logical category for messages, split into ordered partitions.
- Partition: An immutable, ordered sequence of records stored on a broker.
- Replicca: Copies of a partition for fault tolerance. One acts as the leader (handles reads/writes); others are followers (synchronize from leader).
Replica management uses three sets:
- AR (Assigned Replicas): All replicas for a partition.
- ISR (In-Sync Replicas): Replicas within acceptable lag of the leader.
- OSR (Out-of-Sync Replicas): Lagging replicas excluded from ISR.
Only ISR members are eligible for leader election upon failure.
Consumers track progress via offsets—unique positions with in a partition. Offset commits (to __consumer_offsets) determine restart points. On first subscription or missing offsets, behavior depends on auto.offset.reset (earliest or latest). Manual offset commits offer stronger delivery guarantees than auto-commit.
Producers support three sending modes:
- Fire-and-forget: No acknowledgment; fastest but unreliable.
- Synchronous: Blocks until acknowledgment (via
Future.get()). - Asynchronous: Uses callbacks to handle send results without blocking.
Partitioning strategies include:
- Round-robin (when key is null).
- Key-based hashing (ensures same-key messages land in same partition).
- Explicit partition assignment.
- Custom
Partitionerimplementations.
Kafka does not support read-write separation. Reads always go to the leader to avoid replication lag and ensure consistency, aligning with its low-latency design goals.
Load balancing occurs at two levels:
- Producer side: Achieved via partitioning logic distributing messages across partitions.
- Consumer side: Handled by consumer group rebalancing, which redistributes partitions when group membership changes. Built-in assignors (e.g., Range, RoundRobin) govern allocation.
Despite these mechanisms, imbalance can arise from:
- Uneven initial partition distribution across brokers.
- Skewed producer traffic targeting specific leaders.
- Consumer skew due to uneven processing.
- Leader redistribution after failover or scaling.
Reliability is ensured through:
ackssetting:0: No acknowledgment.1: Ack after leader write.-1/all: Ack after all ISR replicas confirm.
- Robust sending modes (sync/async with retries).
- Manual offset commits after successful message processing.
Kafka uses a pull-based consumption model. Message delivery semantics follow:
- Point-to-point: Within a consumer group, each message is consumed once.
- Publish-subscribe: Across groups, all consumers receive all messages.
Partition reassignment addresses cluster imbalances caused by broker addition/removal. It migrates replicas to new brokers via controlled copying and cleanup, often with rate limiting to avoid performance degradation.
Leader election favors the preferred replica—the first replica in the AR list—to maintain balanced leadership across brokers. Without this, repeated failovers can concentrate leaders on fewer nodes.
While increasing partitions can boost throughput (by enabling parallelism), excessive counts introduce overhead:
- Higher memory usage (per-partition buffers on client and server).
- Increased file handles (each segment requires index and data files).
- Longer replication times and controlller recovery delays (e.g., 10k partitions ≈ 20s metadata reload).
- Potential end-to-end latency increases due to shared replication threads.
To improve consumer throughput:
- Scale partitions and match consumer count (ideally 1:1).
- Use multi-threaded consumption within a single consumer instance.
- Optimize business logic to reduce per-message processing time.
The Kafka Controller (one per cluster) manages:
- Partition leadership elections.
- ISR updates.
- Partition reassignments.
Controller election leverages ZooKeeper (or KRaft in newer versions). Brokers compete to create an ephemeral znode; the first succeeds and becomes controller. On failure, ZooKeeper triggers re-election.
Kafka achieves high performance through:
- Sequential I/O: Appends to logs minimize disk seek.
- Page Cache: Leverages OS buffer cache instead of JVM heap.
- Zero-copy: Uses
sendfile()to transfer data directly from disk to NIC, bypassing user space. - Segmented logs + indexing: Partitions split into segments with
.indexfiles for fast lookups. - Batching: Messages are sent and written in batches to reduce network/syscall overhead.
- Compression: Batches are compressed (e.g., Snappy, LZ4) to save bandwidth and storage.
Message loss can occur in three scenarios:
- Producer: With
acks=0oracks=1if leader fails before syncing to followers. - Broker: If unflushed Page Cache data is lost during crash (mitigated by
flush.messages/flush.ms, though rarely used in practice). - Consumer: Auto-committed offsets before successful processing lead to skipped messages on failure.