Polymorphic Messages in Kafka Streams

Polymorphic Messages in Kafka Streams


Things usually start simple…

You are designing a Kafka Streams application which must read commands and produce the corresponding business event.
The Avro models you’re expecting to read look like this:

While the output messages you’re required to produce look like this:

You know you can leverage the sbt-avrohugger plugin to generate the corresponding Scala class for each Avro schema, so that you can focus only on designing the business logic.

Since the messages themselves are pretty straightforward, you decide to create a monomorphic function to map properties between each command and the corresponding event.
The resulting topology ends up looking like this:


…But then the domain widens

Today new functional requirements have emerged: your application must now handle multiple types of assets, each with its own unique properties.
You are pondering how to implement this requirement and make your application more resilient to further changes in behavior.


Multiple streams

You could split both commands and events into multiple topics, one per asset type, so that the corresponding Avro schema stays consistent and its compatibility is ensured.
This solution, however, would have you replicate pretty much the same topology multiple times, so it’s not recommended unless the business logic has to be customized for each asset type.


“All-and-none” messages

Avro doesn’t support inheritance between records, so any OOP strategy to have assets inherit properties from a common ancestor is unfortunately not viable.
You could however create a “Frankenstein” object with all the properties of each and every asset and fill in only those required for each type of asset.
This is definitely the worst solution from an evolutionary and maintainability point of view.


Union types

Luckily for you, Avro offers an interesting feature named union types: you could express the diversity in each asset’s properties via a union of multiple payloads, still relying on one single message as wrapper.


Enter polymorphic streams

Objects with no shape

To cope with this advanced polymorphism, you leverage the shapeless library, which introduces the Coproduct type, the perfect companion for the Avro union type.
First of all, you update the custom types mapping of sbt-avrohugger, so that it generates an additional sealed trait for each Avro protocol containing multiple records:

The generated command class ends up looking like this:


Updating the business logic

Thanks to shapeless’ Poly1 trait you then write the updated business logic in a single class:

Changes to the topology are minimal, as you’d expect:


A special kind of Serde

Now for the final piece of the puzzle, Serdes. Introducing the avro4s library, which takes Avro GenericRecords above and beyond.
You create a type class to extend a plain old Serde providing a brand new method:

Now each generated class has its own Serde, tailored on the corresponding Avro schema.


Putting everything together

Finally, the main program where you combine all ingredients:



Conclusions

When multiple use cases share (almost) the same business logic, you can create a stream processing application with ad-hoc polymorphism and reduce the duplication of code to the minimum, while making your application even more future-proof.

Read More
Turning Data at REST into Data in Motion with Kafka Streams

Turning Data at REST into Data in Motion with Kafka StreamsTurning Data at REST into Data in Motion with Kafka Streams

From Confluent Blog

Another great achievement for our Team: we are now on Confluent Official Blog with one of our R&D projects based on Event Stream Processing.

Event stream processing continues to grow among business cases that have been reliant primarily on batch data processing. In recent years, it has proven especially prominent when the decision-making process must take place within milliseconds (for ex. in cybersecurity and artificial intelligence), when the business value is generated by computations on event-based data sources (for ex. in industry 4.0 and home automation applications), and – last but not least – when the transformation, aggregation or transfer of data residing in heterogeneous sources involves serious limitations (for ex. in legacy systems and supply chain integration).

Our R&D decided to start an internal POC based on Kafka Streams and Confluent Platform (primarily Confluent Schema Registry and Kafka Connect) to demonstrate the effectiveness of these components in four specific areas:

1. Data refinement: filtering the raw data in order to serve it to targeted consumers, scaling the applications through I/O savings

2. System resiliency: using the Apache Kafka® ecosystem, including monitoring and streaming libraries, in order to deliver a resilient system

3. Data update: getting the most up-to-date data from sources using Kafka

4. Optimize machine resources: decoupling data processing pipelines and exploiting parallel data processing and non-blocking IO in order to maximize hardware capacity These four areas can impact data ingestion and system efficiency by improving system performance and limiting operational risks as much as possible, which increases profit margin opportunities by providing more flexible and resilient systems.

At Bitrock, we tackle software complexity through domain-driven design, borrowing the concept of bounded contexts and ensuring a modular architecture through loose coupling. Whenever necessary, we commit to a microservice architecture.

Due to their immutable nature, events are a great fit as our unique source of truth. They are self-contained units of business facts and also represent a perfect implementation of a contract amongst components. The Team chose the Confluent Platform for its ability to implement an asynchronous microservice architecture that can evolve over time, backed by a persistent log of immutable events ready to be independently consumed by clients.

This inspired our Team to create a dashboard that uses the practices above to clearly present processed data to an end user—specifically, air traffic, which provides an open, near-real-time stream of ever-updating data.

If you want to read the full article and discover all project details, architecture, findings and roadmap, click here: https://bit.ly/3c3hQfP.

Read More