Introduction

Developed by LinkedIn and open sourced in 2010.

Apache Kafka is a publish/subscribe messaging system designed to solve this problem. It is often described as a “distributed commit log” or, more recently, as a “distributing streaming platform.”

A filesystem or database commit log is designed to provide a durable record of all transactions so that they can be replayed to build the state of a system consistently.

Messages

The unit of data within Kafka is called a message. In the DB world, it's similar to a Row.

The message is simply an array of bytes and does not have a specific format or meaning.

Topic

The topic is a category or feed name to which messages are published, like a TV Channel / Radio station.

Partitions

Topics are additionally broken down into several partitions.

Messages are written to it in an append-only fashion.

Note that as a topic typically has multiple partitions, message time-ordering is not guaranteed across the entire topic, just within a single partition.

Partitions provide redundancy and scalability. The partition can be hosted on a different server.

Src: Oreilly Kafka Book

Keys

A message can have an optional bit of metadata, a key. As with the message, the key is also a byte array with no specific meaning to Kafka. Keys are used when messages are to be written to partitions in a more controlled manner.

Remember, MongoDB data has _id field. Similarly, keys can be used to relay messages.

TV Shows you have Season/Episode

Offset

Offset values in Kafka are long integers uniquely identifying each message within a partition. They start at 0 for the first message in a partition and increment by 1 for each subsequent message. For example, if a partition has five messages, the offsets would be 0, 1, 2, 3, and 4.

Batches

A batch is a collection of messages on the same topic and partition.

Producers

Producers create new messages. In general, a message will be produced on a specific topic.

In general, a message will be produced to a specific topic. By default, the producer does not care what partition a particular message is written to and will evenly balance messages over all partitions of a topic.

In some cases, the producer will direct messages to specific partitions. This is typically done using the message key and a partitioner to generate a hash of the key and map it to a particular partition.

Consumers

Consumers work as part of a consumer group, one or more consumers working together to consume a topic. The group ensures that each partition is only consumed by one member.

Src: Oreilly Kafka Book

Broker

A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk.

Kafka brokers are designed to operate as part of a cluster.

One broker will also function as the cluster controller within a cluster of brokers.

A partition may be assigned to multiple brokers, which will result in Replication.

Src: Oreilly Kafka Book

At least Once Delivery

  • Automatic offset commits provide "at least once" delivery semantics because if the consumer fails after processing a message but before the next auto-commit, it may re-process some messages after restarting.

  • This behavior ensures no message loss but can lead to duplicate processing.

At most Delivery

  • The message is sent a maximum of only one time.

Last updated