Introduction

Developed by LinkedIn and open sourced in 2010.

Apache Kafka is a publish/subscribe messaging system designed to solve this problem. It is often described as a “distributed commit log” or, more recently, as a “distributing streaming platform.”

A filesystem or database commit log is designed to provide a durable record of all transactions so that they can be replayed to build the state of a system consistently.

Messages

The unit of data within Kafka is called a message. In the DB world, it's similar to a Row.

The message is simply an array of bytes and does not have a specific format or meaning.

Topic

The topic is a category or feed name to which messages are published, like a TV Channel / Radio station.

Partitions

Topics are additionally broken down into several partitions.

Messages are written to it in an append-only fashion.

Note that as a topic typically has multiple partitions, message time-ordering is not guaranteed across the entire topic, just within a single partition.

Partitions provide redundancy and scalability. The partition can be hosted on a different server.

Keys

A message can have an optional bit of metadata, a key. As with the message, the key is also a byte array with no specific meaning to Kafka. Keys are used when messages are to be written to partitions in a more controlled manner.

Remember, MongoDB data has _id field. Similarly, keys can be used to relay messages.

TV Shows you have Season/Episode

Offset

Offset values in Kafka are long integers uniquely identifying each message within a partition. They start at 0 for the first message in a partition and increment by 1 for each subsequent message. For example, if a partition has five messages, the offsets would be 0, 1, 2, 3, and 4.

Batches

A batch is a collection of messages on the same topic and partition.

Producers

Producers create new messages. In general, a message will be produced on a specific topic.

In general, a message will be produced to a specific topic. By default, the producer does not care what partition a particular message is written to and will evenly balance messages over all partitions of a topic.

In some cases, the producer will direct messages to specific partitions. This is typically done using the message key and a partitioner to generate a hash of the key and map it to a particular partition.

Consumers

Consumers work as part of a consumer group, one or more consumers working together to consume a topic. The group ensures that each partition is only consumed by one member.

Broker

A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk.

Kafka brokers are designed to operate as part of a cluster.

One broker will also function as the cluster controller within a cluster of brokers.

A partition may be assigned to multiple brokers, which will result in Replication.

At least Once Delivery

Automatic offset commits provide "at least once" delivery semantics because if the consumer fails after processing a message but before the next auto-commit, it may re-process some messages after restarting.
This behavior ensures no message loss but can lead to duplicate processing.

At most Delivery

The message is sent a maximum of only one time.

PreviousKafka NextKafka Use Cases

Last updated 1 year ago