Running Apache Kafka in KRaft mode

TL;DR

The controller role that was previously (i.e. when running in ZooKeeper mode) held by one of the brokers in the cluster is now held by this KRaft controller. But it isn't a single node, but an ensemble of them (for resiliency purposes, 3 or 5 members typically), where a leader (called the active controller) is elected amongst them (using the Raft algorithm). The cluster metadata now lives in an internal topic, which is managed by the active controller, and replicated to all brokers.

Why this post

My first experience with managing Apache Kafka was a few years back now. I had just moved teams at Shopify and I was now managing the Kafka clusters for the whole company. Easy, right? My previous team had been in charge of the Apache Flink platform, so I wasn't entirely new to streaming, but it was nonetheless daunting.

I don't consider myself an expert nowadays, even though my team at New Relic (my employer as of this writing) manages dozens of them. But as mentioned, there was a time when I was a total newbie.

How did I ramp up? You might ask.

Well, my onboarding consisted of leading a project to migrate our Kafka clusters to KRaft mode, and ditch ZooKeeper.

So I first had to learn about:

How Kafka works
What's ZooKeeper
How does Kafka use ZooKeeper
Why future versions of Kafka are getting rid of ZooKeeper
What does Kafka look like without ZooKeeper, i.e. what's KRaft mode

So I'm writing this for anyone out there that's struggling to understand what does it mean to run Kafka in KRaft mode.

Side note: when I started working on this (circa Jan 2024), we had just migrated to Kafka v3.6, and v3.7 was supposed to be the latest Kafka version to support ZooKeeper mode. Finally, it was Kafka v3.9 (released Nov 2024) the latest to do so, and starting v4.0 (released March 2025) ZooKeeper mode is no longer supported.

What is Apache Kafka

There are many answers to this, and I think everyone uses the ones that resonate most with them.

For me, the simplest answer is "a distributed log". But that's just what stuck to my head, so let's try to explain how it works:

You produce and consume messages to/from topics
You can horizontally scale by dividing your topics into partitions (you can have just one, or hundreds)
In a Kafka cluster you have few or many brokers, which are just "workers" processing produce/consume requests (and holding the data on disk, typically)
You can horizontally scale the cluster by adding more brokers — that way your topic partitions can be processed in parallel, spreading them across as many brokers as there are
Each topic partition is replicated for resilience (typically with a replication factor, or RF, of 3), distributed amongst brokers (and ideally availability zones), so if a broker goes down a new partition leader is elected and you can continue processing data
You can have many independent consumers for the same topic (called consumer groups), each with its own offset tracking (i.e. how far along it has processed)
You can configure how long you want the topic data to live on disk — could be days or minutes

This lets you do things like decoupling producers and consumers, i.e. let Kafka act as a buffer to absorb bursts. Or distribute data amongst many services. Or both. It's great for high-throughput applications, but you can also tweak it for low-latency scenarios.

Worth mentioning: consuming (or, better named, fetching) the latest data reads from the page cache directly — no disk lookup required, so it's fast. And you don't saturate your disks, which are already busy writing incoming data plus replicating partition replicas from other brokers.

How does Kafka work in ZooKeeper mode

Kafka brokers operate as part of a cluster. Within a cluster, one broker also functions as the cluster controller (elected automatically). That controller is responsible for administrative operations, including:

Assigning partition replicas to brokers
Monitoring for broker failures
Electing partition leaders
Creating/deleting topics

For that reason, it also manages the cluster metadata, which lives on ZooKeeper.

ZooKeeper has another important function (apart from storing the cluster metadata): electing a controller (using the /controller znode).

While ZooKeeper's design works well in general, it comes with some limitations for Kafka:

The controller has to load all cluster metadata from ZooKeeper before it becomes active, which can take several seconds in clusters with a large number of partitions. This matters during controller failover (e.g. a node shuts down and a different broker becomes the controller).
When a broker leaves the cluster, the controller finds a new leader for all partitions that need one, and then must communicate the leadership changes to every broker that holds replicas for those partitions. Since every broker also maintains a MetadataCache (a map of all brokers and all replicas in the cluster), the controller ends up sending leadership-change updates to all brokers in the cluster.

How does Kafka work in KRaft mode

ZooKeeper is removed. That means two things need a new home:

the cluster metadata
the mechanism to elect a controller

The cluster metadata now lives in a Kafka log — a single-partition internal topic called __cluster_metadata, managed by the controller. This topic is not exposed for direct access or management through the admin client.

The controller is now a Raft quorum — an ensemble of nodes where there's an active controller (the leader) and followers. For production-ready workloads, these are dedicated nodes rather than brokers.

The KRaft controller quorum is formed by all nodes with a controller role — they can participate in leader election voting and propose themselves as leaders
The remaining nodes in the cluster are brokers, which participate as observers: the cluster metadata log is also replicated to every broker

This directly addresses the limitations from ZooKeeper mode: controller failover no longer requires loading all metadata from scratch, and the number of partitions in a cluster is no longer a bottleneck of the same order of magnitude.

About nomenclature

I've added this section because the terminology was very confusing when I first started, so I hope it helps others.

KRaft is short for Kafka-Raft — named after the Raft consensus algorithm used by the KRaft controller quorum to elect a leader.

The KRaft controller is also referred to as:

KRaft controller quorum
Quorum controller
Metadata quorum

These all refer to the same thing. I prefer KRaft controller quorum or simply KRaft controller.

When referring to which mode Kafka is running in, the correct nomenclature (in my opinion) is:

Kafka running in ZooKeeper mode
Kafka running in KRaft mode

I've seen it referenced as "Kafka with ZooKeeper" or "Kafka with KRaft", but the above is less confusing and matches how it's officially referenced.

Why this post

What is Apache Kafka

How does Kafka work in ZooKeeper mode

How does Kafka work in KRaft mode

About nomenclature

Tags