# The Magical Rebalance Protocol of Apache Kafka

• Very useful (and dense) talk that goes into the details of Kafka consumer groups.

• Production/consumption always occurs against the leader replica (for a partition); all other replicas are simply standbys.

• The consumer_offsets topic contains offsets for each connected consumer (group), which provides fault tolerance.

• Consumers in a group figure out what partitions to consume using a homegrown protocol.

• This first used Zookeeper, which couldn't scale (expected load is 30-50k consumer groups, with 100+ consumers per group)

• v2 was a custom protocol that had a broker make all the assignments, which was unsuitable

• v3 (current) moves this responsibility over to a consumer in each group

• Protocol steps:

• FindCoordinator: hash group id, modulo no. of partitions in consumer_offsets to get a partition. leader of this partition is the coordinator (guaranteed to be live)

• JoinGroup: agree on an assignment algorithm + receive a member id + choose one consumer as the leader (linked list of consumers in the order they joined; head of the list is the leader, remove when a consumer dies)

• SyncGroup: leader decides the paritition assignment and tells the coordinator, which gives this assigment to all other consumers

• Heartbeat: consumers tell the coordinator they're still alive. Response can be just OK, or a command to rebalance, which is triggered when a consumer enters/leaves the group, topic metadata changes, etc.

• LeaveGroup: consumer deliberately leaves the group

• Can specify multiple assignment strategies when joining a group, in priority order. This is useful for upgrades, when each consumer may support a different set of assignment strategies. The coordinator settles on the highest-priority strategy that all consumers can support.

• As far as I can tell, consumers don't talk directly to each other

• Override AbstractCoordinator to implement a custom coordination (at the client level, not the broker) strategy (essentially customize consumer behavior for each of the 5 steps above)

• Create a new ConsumerPartitionAssignor to implement just a custom partition assignment strategy

• There was some explanation of ways this protocol is used outside the core use case towards the end, but went by too fast for me to actually grasp:

• The schema registry overrides AbstractCoordinator to provide a leader election system (does this depend on a Kafka broker or is it entirely standalone?)

• Kafka connect uses this to assign generic tasks (copy table X from a given source database, for example) to workers, so there's some scope to use this more generically, not just with partitions.

• Kafka streams uses a custom assignor to have a consumer join data from two source topics. This is possible because the protocol allows for user metadata, which this strategy (ab)uses.

• RocksDB for local state, but I didn't really understand the context/motivation

• I'm curious about fault-tolerance strategies for when the coordinator goes down, which this talk didn't cover.

Edit