In this article, we will delve into a concrete use case in the large-scale retail sector, exploring the hub-and-spoke architecture that uses Kafka to ensure local resilience and global visibility, enabling bidirectional data flows ranging from receipt management to the dynamic update of promotions.
The Strategic Importance of Data
In the digital era, the ability to process and act on data in real time represents a fundamental strategic necessity. In the large-scale retail sector, data represents a crucial strategic lever for leaders aiming to optimize operational efficiency and reinvent the shopping experience.
The sector is intrinsically dependent on efficient information management: data is essential to support commercial decisions and to make more accurate demand forecasts. In-depth customer knowledge, derived from data analysis, allows for personalized offers and a better shopping experience. Furthermore, timely access to data enables a whole range of innovations in services and the shopping experience that were previously impossible.
However, the crucial problem that emerges is that this data is often fragmented and scattered across heterogeneous systems. And this is precisely where Apache Kafka comes into play, whose first fundamental role is that of transport. Kafka is indeed designed to move data from where it resides to where it can be used. Once Kafka is introduced, it is then possible to fully leverage real-time, which offers a significant competitive advantage thanks to the ability to react quickly to information and data.
The Operational Challenge of Large-Scale Retail
For a large company operating in the large-scale retail world, the data architecture must necessarily overcome three fundamental and often conflicting operational principles:
- Local resilience: Individual stores must maintain full operational capacity regardless of the status of network or Internet connectivity. If the network collapses, essential operations, such as issuing receipts and managing the point of sale, must not be interrupted.
- Global visibility: The company’s central headquarters (HQ) needs to access all data issued by the peripheral systems (the stores) as soon as possible. This real-time visibility is crucial for central decision-making and management processes.
- Bidirectional synchronization: It is essential to ensure that locally generated operations and data flow towards the central headquarters, and at the same time, that centrally generated decisions, configurations, or updates flow quickly and reliably towards the stores, while maintaining the autonomy of the parties.
System Architecture
To respond to this threefold challenge, the proposed architecture is based on a hub-and-spoke model that fully leverages the capabilities of Apache Kafka and the Confluent Platform.
- Central headquarter (Hub): Integrates the Confluent Platform and is responsible for the central processing of all aggregated data.
- Local shops (Spokes): Each point of sale hosts a local Kafka cluster. These local clusters are the key to ensuring resilience, as they allow stores to continue functioning in isolation, even in the absence of a connection with the central headquarters.
- Synchronization (Replicator): The bidirectional data synchronization between the central cluster and the local clusters occurs thanks to Replicator. This is fundamental because it ensures that, in the event of a network partition or connectivity collapse, data synchronization resumes as soon as possible. This configuration ensures that stores remain operational even offline, satisfying the principle of local resilience.
Data Flows
Within the large-scale retail sector, there are numerous critical data flows, whose correct management and optimization can lead to notable advantages:
Inbound Data Flow: From the Store to the Central Headquarters
When a customer finalizes their purchase, data containing a wealth of information is generated. The cash registers act as Kafka Producers: they directly publish data to the store’s local Kafka. Subsequently, the Replicator handles sending the data from the local Kafka to the central headquarters.
Once the central headquarters receives the receipt data, a series of processes are triggered, starting with the validation of the information both semantically and syntactically. Following validation, the data is written to another central topic, which becomes the fundamental source and trigger for a multitude of processes starting within the HQ. The consolidated data acts as a data contract with respect to the receipt, and anyone in the system can subscribe to it for their own logic.
Once centralized, the data can be leveraged in various ways, including:
- Invoicing and merchandise procurement
- Optimized supply chain
- Analysis of products and future consumption
- Analysis and elaboration of predictive models
Outbound Data Flow: From the Central Headquarters to the Store
The implemented architecture is not only unidirectional, but also allows communication from the central headquarters to the store. A fundamental example is the promotion flow. Promotions are generated by the central headquarters: this complex process aggregates various data, including those coming from Kafka, from relational tables on products and brands, and from information on local promotions.
Once the promotion information is present in the local Kafka cluster, the cash registers connect to the cluster and consume this data to apply the correct rules at the time of purchase. This flow potentially enables near real-time communication between the central headquarters and the cash registers. However, it is essential to consider the technological limits of the local hardware. For this reason, the update of promotions is typically managed in parallel with the business, and the releases occur during the nighttime phase. During the night, the cash registers, being freed from business work, have the necessary time to process and update the rules.
Advantages Beyond Technology: Agility, Synergy, and Organizational Vision
The introduction of a data streaming architecture based on Kafka not only offers benefits in a purely technical scope (transport, speed, etc.), but also brings with it deep organizational and business advantages. Often, in fact, attention is focused on the technical capabilities of Kafka within the software architecture; however, the true value lies in the organizational benefits it enables.
Firstly, the implemented architecture is highly agile and scalable. With a view to growth, the addition of new points of sale is relatively simple: it is enough to replicate the architecture, installing a new local Kafka cluster and Replicator. Every new store, with its cash registers connected to the local cluster, is seamlessly integrated into the central system. Even within the single point of sale, the presence of a local Kafka cluster allows scaling by introducing new consumers or applications that work on local data, simply by appropriately sizing the cluster.
Furthermore, the architecture guarantees excellent data visibility in near real-time. This timeliness is valuable for analytics and for procurement calculation logic.
Finally, a less technical aspect, but one that is highly appreciated by Software Engineers, is that Kafka acts as an enabler. Once introduced into the system, it begins to surface business use cases that had not been contemplated before. The immediate and structured availability of data facilitates mutual discovery with product and business owners on new opportunities and functionalities that can be implemented.
Future Evolutions
An architecture of this type, based on data streaming, is not static; on the contrary, integrations are possible to obtain different functionalities based on the data collected:
- Customer loyalty through AI: A crucial area of development is customer loyalty. The goal is to start thinking based on customer behavior and spending habits; this allows offering ad hoc promotions and particular logic that encourages the customer to return and purchase. These logics pair perfectly with Artificial Intelligence to process customer data, in order to extract value and find insights for the business.
- Migration from Replicator to Flink: A significant technical evolution currently being studied is the migration of some synchronization functionalities from Replicator to Flink. Replicator is an effective tool that performs a precise job: copying data from one cluster to another. Flink, however, opens up different possibilities, as it is a stream processing tool.
- Synchronization improvement (Filtering): The use of Flink also allows applying processing to the data on Kafka. For example, we might be interested in improving synchronization by sending only a subset of essential data to the central headquarters, avoiding loading the central infrastructure with a quantity of micro-data that is not necessary to visualize and which can impact costs.
- Edge Computing: Flink’s ability to perform stream processing even on the Edge side (i.e., on the individual store) enables the concept of Edge Computing. This makes it possible to directly provide information to the store and enable a series of local logic even in case of prolonged lack of connection with the central headquarters.
Conclusions
The implementation of an event-driven architecture based on the Confluent Platform and Apache Kafka solves some of the main historical challenges of the large-scale retail sector – including balancing local resilience with global visibility – and enables powerful bidirectional data synchronization.
In this use case, we have seen how this architecture supports critical business processes, from immediate invoicing to merchandise procurement, ensuring that even in case of isolation (if the data does not reach the central headquarters), the store maintains full operational capacity. The environment is agile and ready to scale, and the data streaming approach offers a unique synergy, where a single event (the receipt) feeds multiple business processes.
Bitrock, as a consulting company specializing in high-end technology and a leader in areas such as DevOps, Kafka, Confluent, and event-driven architectures, supports its customers in the design and development of event-driven and AI-based data streaming solutions that overcome complex operational challenges.
Contact our team of professionals for a dedicated consultation.
Authors: Daniele Bonelli e Simone Esposito, Team Lead e Software Engineer @ Bitrock