Queues for Kafka (KIP-932): The Bridge between Event Streaming and Queuing

Queues for Kafka

For years, architects and developers have adopted Apache Kafka as the standard for event streaming and distributed logs, yet continued to rely on systems like RabbitMQ for traditional queuing.

This separation has never been ideological, but architectural. In Kafka’s classic consumer-group model there is in fact a fundamental constraint: the 1:1 mapping between partition and active consumer.

If a topic has three partitions for example, you can scale up to a maximum of three consumers to cooperatively consume messages. A fourth remains idle. This model guarantees partial ordering and efficient offset management, but introduces a structural limit to concurrency and operational flexibility.

With KIP-932, introduced in preview in Apache Kafka 4.0 and officially released in Apache Kafka 4.2, this paradigm changes radically. Share Groups are born, a model that brings the concept of queue natively inside Kafka, allowing the decoupling between message processing and storage and surpassing some of the historical limits of distributed logging as originally conceived in Kafka.


The Limits of the Conventional Consumption Model

To understand the value of Share Groups, it is necessary to analyze the criticalities of Kafka’s classic model based on Consumer Groups.

Maximum Level of Parallelism

In the model based on Kafka Consumer Groups, the maximal parallelization in message consumption is limited by the number of partitions. This can lead to the need to use preventive over-partitioning techniques: companies create topics with hundreds of partitions merely to absorb peaks in load (for example user or order peaks during Black Friday), maintaining an infrastructure oversized for the rest of the year.

Head of Line (HOL) Blocking

A single consumer within a Consumer Group is assigned an entire partition and messages are processed in sequential order.

If a single message requires a call to a slow external system, performs a computationally heavy task or fails repeatedly, the entire partition remains blocked. This phenomenon is known as Head of Line (HOL) blocking. The result is a pipeline that stops because of one problematic event.

The Cost of Rebalancing

Rebalancing is Kafka’s fault-tolerance mechanism. However, especially in earlier versions, it could become a highly invasive event: during partition reassignment, consumption was interrupted, increasing latency and generating instability during peak periods. Recent versions of Kafka introduced some optimizations but still have not eliminated the problem.


Kafka Share Groups: Record-Level Assignment

The innovation of KIP-932 lies in moving from the “one consumer per partition” logic to “multiple consumers cooperate on the same partition”. It is no longer the partition that is assigned exclusively, but individual records (or batches of records). This allows scaling the number of consumers beyond the number of partitions, eliminating the historical concurrency constraint.

How It Works: The Share-Partition Leader

In this new architecture, state management is no longer tied to a simple sequential offset. The figure of the Share-Partition Leader is introduced, co-located with the physical partition leader. Its role is to manage the state of so-called In-Flight Records, that is, messages currently in processing.

To keep performance high, Kafka uses a “sliding window” defined by two new markers:

  • SPSO (Share-Partition Start Offset): the offset of the first message not yet acknowledged.
  • SPEO (Share-Partition End Offset): the upper limit of messages available to be fetched by the Share Group.

This approach lets Kafka handle huge topics without needing to keep in memory the state of every single record in the topic’s entire history.


Record Lifecycle and Resilience

With KIP-932, every record has an associated state evolving through a state machine:

  1. Available: the record is in the log and ready to be consumed.
  2. Acquired: the record has been sent to a consumer and “locked” for a defined duration (lock duration).
  3. Acknowledged: the consumer confirms successful processing.
  4. Archived: if a record fails repeatedly or the lock duration expires too many times, it is automatically archived.

This logic natively integrates management of Poisonous Messages, preventing a single faulty record from blocking the system indefinitely and improving the overall robustness of the application.

Rebalancing Without Interruptions

Unlike classic Consumer Groups, rebalancing in Share Groups is much less invasive. Since records are not “owned” exclusively through partition assignment, adding or removing a consumer does not require a full stop of processing: the system simply continues distributing available records to active members.


When to Use Share Groups

Despite the obvious advantages in scalability and flexibility, adopting Share Groups requires careful evaluation of some fundamental architectural trade-offs. The first and most evident is the loss of partial ordering. With Share Groups, records may be processed out of sequence because of inherent concurrency among multiple consumers or retry mechanisms. If the application logic depends strictly on per-partition message sequencing, this model is not the correct choice.

Another significant limitation concerns network cost optimization: Follower Fetching is currently not supported. The state of locks (“Acquired”) resides exclusively in the memory of the Share-Partition Leader. Replicating this transient state in real-time on followers is a complex challenge that, for now, prevents use of Rack Aware Fetching. In multi-zone environments, this can lead to higher network costs compared to the traditional model.

Finally, one must consider the absence of Exactly Once Semantics (EOS). Although it is possible to read transactionally written records, the current protocol does not include the ability to acknowledge message delivery within an atomic transaction. If the application requires strict end-to-end transactional guarantees, the classical consumer group remains the reference standard.

In practical terms there are however some scenarios where this technology can make a difference:

  • Long Running Tasks: dispatching complex tasks, like heavy data transformations on single events, without risking stalling other messages due to blocked partitions.
  • Cloud Cost Optimization: on platforms like Confluent Cloud, partitions have economic weight. With Share Groups, we can scale compute (consumers) independently from storage (partitions), handling message spikes without having to over-provision the entire Kafka infrastructure.

Conclusion

The introduction of Share Groups with KIP-932 marks the overcoming of the historical boundary between streaming and queuing in Apache Kafka. This evolution allows companies to finally decouple computing power from storage, eliminating critical bottlenecks such as Head of Line blocking and optimizing infrastructural costs tied to over-partitioning.

However, adopting this model requires a strategic analysis of trade-offs, especially regarding the loss of ordering and absence of Exactly-Once semantics. This is where Bitrock’s expertise becomes decisive: we don’t limit ourselves to technical implementation, but guide companies in an end-to-end digital transformation. Thanks to our deep knowledge of the Kafka ecosystem, we help partners balance innovation and architectural solidity, ensuring our clients obtain a concrete and sustainable competitive advantage.Do you want to discover how Share Groups can optimize your architecture?

Contact us at Bitrock for a dedicated technical consultancy.


Main Author: Simone Esposito, Software Architect & Team Lead @ Bitrock

Do you want to know more about our services? Fill in the form and schedule a meeting with our team!