Kafka Configuration and Consumer Group Management

Installation

Download and extract Kafka:

wget --no-check-certificate https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
tar -xzf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0

Start Zookeeper and Kafka server:

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

By default, Zookeeper listens on port 2181. For remote access, modify server.properties:

listeners=PLAINTEXT://your.machine.ip:9092

Consumer Group Rebalancing

Rebalancing is a protocol that distributes partitions among consumers within a group. When a group has 20 consumers and a topic with 100 partitions, each consumer gets approximately 5 partitions.

Trigger Conditions

  1. Member count changes (new or removed consumers)
  2. Topic subscription count changes
  3. Partition count of subscribed topics changes

During rebalancing, all consumers pause until the process completes.

Rebalancing Process

The process consists of two phases:

  1. Join: All members send JoinGroup requests to the coordinator. One is elected as leader.
  2. Sync: The leader assigns partitions and sends the plan back via SyncGroup requests.

Scenarios

  • New member joins: Triggers rebalance
  • Member crashes: Coordinator detects after session timeout
  • Graceful leave: Initiates rebalance
  • Offset commits: May trigger rebalance

Avoiding Unnecessary Rebalances

To prevent unwanted rebalances, tune these parameters:

  • session.timeout.ms: Default 10 seconds. Set to 6 seconds for faster detection
  • heartbeat.interval.ms: Default 3 seconds. Set to 2 seconds
  • max.poll.interval.ms: Default 5 minutes. Increase if processing takes longer

Ensure session.timeout.ms >= 3 * heartbeat.interval.ms for reliable heartbeats.

Key Concepts

Group Coordinator

Each broker runs a Group Coordinator service that manages group metadata and stores offsets in the internal __consumer_offsets topic. The coordinator is determined by:

partition-id = Math.abs(groupId.hashCode() % groupMetadataTopicPartitionCount)

Where groupMetadataTopicPartitionCount defaults to 50.

Performance Metrics

In a test with 10 partitions and 3 consumers, average rebalancing time was ~87ms.

Parameter Settings

Important configurations:

  • group.max.session.timeout.ms > session.timeout.ms
  • group.min.session.timeout.ms < session.timeout.ms
  • request.timeout.ms > session.timeout.ms + fetch.wait.max.ms
  • session.timeout.ms / 3 > heartbeat.interval.ms
  • session.timeout.ms > worst-case processing time

Message Handling

Kafka uses partitioning for throughput scaling. Each topic can have multiple partitions stored as separate directories. Producers route messages based on keys using partitioning strategies.

Consumers read from partitions within their group. Only one consumer in a group reads from a specific partition. Multiple groups can consume the same topic independently.

Data Retention

Two policies exist:

  1. Time-based retention
  2. Size-based retention

Example configuration:

log.retention.hours=168
log.segment.bytes=1073741824

Practical Application

For multi-engine systems where different engines support varying license counts:

  • Engine count = consumer group count
  • License count = consumer count per group
  • Partitions = LCM of engine license capacities

Troubleshooting

Large message errors occur when message.max.bytes is exceeded. Adjust these settings:

message.max.bytes=10485760
replica.fetch.max.bytes=10485760
fetch.message.max.bytes=10485760
socket.request.max.bytes=104857600

These limits prevent memory allocation issues when handling large payloads.

Tags: Kafka Consumer rebalancing configuration partitioning

Posted on Fri, 15 May 2026 17:56:58 +0000 by soianyc