Processing Kafka messages

Apache Kafka is an open source project that provides a messaging service capability, based upon a distributed commit log, which lets you publish and subscribe data to streams of data records (messages). IBM® Integration Bus provides built-in input and output nodes for processing Kafka messages.

The Kafka messaging protocol is a TCP based protocol that provides a fast, scalable, and durable method for exchanging data between applications. Messaging providers are typically used in IT architectures in order to decouple the processing of messages from the applications that produce them. IBM Integration Bus has very strong existing support for many kinds of messaging, including IBM MQ, JMS messaging providers, and the MQTT messaging protocol. The Kafka nodes expand this support to help IBM Integration Bus interact with Kafka literate applications.

For information about the supported versions of Kafka, see IBM Integration Bus system requirements. For more information about Kafka version compatibility, see the Apache Kafka documentation.

Kafka is a very popular choice for cloud-based architectures, where the numbers of connected clients can change frequently without impacting the scaling characteristics. Common use-cases where Kafka is considered as a messaging transport include:

General messaging
You can decouple producer and consumer applications from each other. Data flowing between multiple processing stages in a distributed application could use Kafka as a means of connecting the steps. The built-in features that Kafka provides for partitioning, replication, and fault tolerance can make it a good choice for this type of large-scale messaging.
Web site activity tracking
When Kafka was first developed, it was used for helping to track page views, searches, or other actions taken on a web site. This activity was published to a set of central topics for different activity types. Subscribers could then use the data for real-time processing and monitoring, and persisting to other data warehouse systems.
Logging and metrics
You can use Kafka to aggregate operational data from multiple sources.
Apache Kafka maintains feeds of messages in categories called topics. IBM Integration Bus provides two built-in nodes for processing Kafka messages, which use the Apache Kafka Java™ client:
  • KafkaConsumer node, which subscribes to a Kafka topic and propagates the feed of published messages to nodes connected downstream in the flow
  • KafkaProducer node, which publishes messages to a Kafka topic.
IBM Integration Bus acts as a Kafka client, and can communicate with your Kafka implementation by sending messages over the network to the Kafka cluster. A Kafka cluster implementation is made up of one or more servers, known as Kafka brokers. Kafka brokers replicate data between themselves in order to cope with specific server failures without losing any messages that have been committed to the Kafka log. Each instance of a Kafka node is configured with the name of a topic. For each topic maintained by Kafka, the Kafka brokers maintain a partitioned commit log, as shown in the following diagram:
Diagram showing a partitioned commit log maintained by Kafka.

Each partition is an ordered sequence of messages, whose state cannot be altered after they have been created. Each of the messages in a partition is assigned a sequential ID number, called the offset, which uniquely identifies each message in the partition. These messages are all retained for a configurable period of time, regardless of whether they have been consumed by another application, after which the messages are discarded to free up space. This approach to message queuing is very different from the approach of traditional messaging products such as IBM MQ.

Message ordering is preserved only within a partition, not across all topics in a partition; therefore, if message order is important, ensure that you either use a single partition per topic, or associate a message key with each message published. A hash of the key for a message is used to select the partition to which the message is sent, so all messages published with the same key are stored on the same partition.

For more information about Kafka messaging, see the Apache Kafka documentation.