RabbitMQ VS Kafka

[Reference Materials]

Video: [10 Minutes Tech Talk] RabbitMQ vs Kafka
Blog:

Introduction

In modern server application architectures, a messaging system is inevitable.

The purpose can vary, such as separating logic to improve response speed, ensuring stability and execution, and communicating with other application servers.
Many development teams will worry about which technology to use.

I too started with the question, "Why does our team use RabbitMQ instead of Kafka?"
In that process, I found my own answers that I wanted to share.

RabbitMQ (Smart Broker & Dumb Consumer)

RabbitMQ is a message broker that implements AMQP (Advanced Message Queuing Protocol) and also provides the MQTT protocol.
The core idea is that the broker (RabbitMQ) handles most of the routing, message storage, and delivery logic.

AMQP?
Advanced Message Queueing Protocol: A standard protocol for message-oriented middleware.
It ensures reliable messaging between different systems and applications.

Interoperability: Enables communication between systems developed on different languages and platforms.

Reliability and Trustworthiness: Ensures messages are delivered without loss, with message acknowledgment (ACK), persistence, and transaction support.

Flexible Routing: Routing through Exchange - producers send to Exchange, and Exchange distributes messages to the appropriate queue according to set rules.

Producer: Creates messages and sends them to the Exchange.
Exchange: A set of routing rules that decides which Queue to send the received messages from the Producer to (does not store the message itself).
- Direct Exchange: Sends messages to the Queue matching the exact routing key (unicast).
- Topic Exchange: Sends messages to the Queue matching a certain pattern in the routing key (multicast).
- Fanout Exchange: Sends messages to all queues bound to it (broadcast).
- Headers Exchange: Routes based on message header attributes.
Queue: A buffer that stores messages until the consumer retrieves them.
Binding: A rule connecting Exchange and Queue. ("This Exchange sends messages to this Queue according to the routing rules").
Consumer: Retrieves messages from the Queue and processes them.

Core Flow: Producer → Exchange → (Binding Rule) → Queue → Consumer

Smart Broker

Complete control over message flow: The broker decides where to send messages and routes them according to Exchange & Binding rules.
Tracks consumer state: Continuously tracks whether the consumer is connected to any queue and whether messages are properly processed.
Sends messages to consumers: Pushes messages to consumers; consumers adjust the amount they can handle with prefetch.
Offers various features: Provides functionalities such as Dead Letter Exchanges, message TTL, priority queues.

Dumb Consumer

Focus on simple roles: The consumer connects to a designated queue to receive and process messages and only sends completion signals.
No need to know about routing: The consumer doesn't need to consider how the message arrived in its queue.

=> The post office (broker) categorizes and addresses all letters while the mail carrier (consumer) delivers the letters assigned to their zone.

Kafka (Dumb Broker & Smart Consumer)

Kafka is a distributed streaming platform that treats messages as a consecutive stream of immutable logs.
The broker stores and manages data, with the consumer handling complex routing logic.

Broker: Kafka server instance that stores and manages messages and forms a Cluster when gathered with other brokers.
- Receives messages from Producers, assigns offsets, and stores messages on disk.
- Responds to partition read requests from consumers and sends messages recorded on disk.
- One serves as the cluster's controller, assigning partition responsibilities to each broker and monitoring their proper operation.
Cluster: Composed of multiple brokers, providing data replication, fault tolerance, and high availability.
- Adding server brokers within the cluster increases the handling of message reception and delivery.
- Can be performed online without affecting overall system usage (easily scale from a small operation to accommodate large traffic amounts).

Topic: A category or feed name to separate messages. Similar in role to Exchange in RabbitMQ, but directly stores messages.
- Similar to a DB table or a folder in a file system.
- A Topic is composed of multiple partitions.
Partition: An append-only log that allows for distributed storage of Topic data across multiple brokers and increases throughput.
- Order within a partition is guaranteed. - No order guarantee between different partitions.

Producer: Creates messages (records) and sends them to a specific Topic.
Consumer: Retrieves and processes messages from the Topic.
- Subscribes to one or more topics and reads messages in the order they were created, keeping track of message locations via offset on a partition basis.
Offset: A unique serial number (ID) each message has within a partition. Consumers use offsets to track and control where they have read up to.
- Commit Offset: An offset confirming the consumer has processed "up to here."
- Current Offset: An offset confirming "where they have read up to."
Consumer Group: A group of one or more consumers where each partition of a Topic is allocated to only one consumer within the group.
- Each consumer can read messages from different partitions of the topic they are responsible for.
- Adding consumers extends message consumption performance.

Adding more consumers than the number of partitions within a topic is meaningless.

Core Flow: Producer → Topic (Partition) → Consumer (Consumer Group)

Dumb Broker

Concentration on role as high-performance file storage: Brokers quickly add and store messages at the end of topic partitions.
No message state tracking: Brokers don't consider which consumer read which message, just storing them for the set retention period.
No complex routing: Brokers store messages in the topic/partition designated by the producer without redistributing messages on their own.

Smart Consumer

Manages reading locations by themselves: Consumers record & manage which part of the topic's partition they read up to.
Actively fetch data: Pull, consumers actively request and fetch data from brokers.
Responsible for partition assignment logic: The consumer group's client library handles decision logic on which partition each consumer will manage.

=> A large library (broker) only continuously shelves books (messages) in bookcases (partitions), while readers (consumers) visit the library, remember what they have read (offset), and take the next book to read.

The Gemini analogy is good...!

Common Misunderstandings

There may be misconceptions that the two systems have similar structures and differ only in performance (or that RabbitMQ is a message system not used), but understanding these concepts clearly is essential to grasp the architectural philosophy differences between the two systems.

Fanout: A pattern where one message is delivered as identical copies to several independent consumers.

1. Kafka's 'Fanout' is not Simple Broadcast

One of the most common misconceptions is the binary view that Kafka is Pub/Sub and RabbitMQ is Work-Queue.
In fact, Kafka elegantly integrates these two models through the concept of Consumer Groups.

Inter-Group: Pub/Sub (Fanout/Broadcast):
Different consumer groups can still independently consume the entire message stream even if they subscribe to the same topic.
EX) When there is a topic called order-events, groups like Inventory Management Service (Group A) and Data Analysis Team (Group B) can subscribe to it separately.
In this case, both groups can receive and process all messages from order-events from start to finish independently.
Intra-Group: Work-Queue (Distributed Processing):
Within a single consumer group, the story changes. Consumers within the group divide and process the partitions of the topic. (A partition is assigned to only one consumer within the group).
If there are 4 partitions in a topic and 4 consumers in the group, each consumer will be responsible for one partition and process messages.
-> This distributes workloads and increases throughput, matching the 'work queue' model.

Core: Kafka applies both methods through Consumer Groups: data replication/broadcast between different systems and distributed task processing within a single system to maximize throughput.

2. Philosophy on Message Retention: 'Log' or 'Queue'?

The fundamental difference between the two systems lies in how they handle messages.

Kafka: Data is an 'Immutable Log'
Kafka does not immediately delete messages once consumed. Messages are safely stored in the topic until the retention period (e.g., 7 days) or capacity is reached.
Consumers only manage an Offset that indicates 'where they have read up to'.
- Message Replay & Time Travel:
  - If a bug occurs in a consumer? Correct the code, then rewind the offset to reprocess all data.
  - Introducing a new system using the messages? Retrieve all events from the beginning of the topic to reconstruct state.
- Multipurpose Data Hub: A single event stream can be consumed multiple times by various consumers for different purposes such as real-time dashboards, batch analysis, or model training, at their own pace.
RabbitMQ: Data is 'Transient Task'
In traditional RabbitMQ, messages are 'tasks to be processed.' Once a consumer retrieves a message and confirms it has been processed (ack), the message is permanently removed from the queue.

This method features:
- Optimized for Task Queues: Efficiently manages tasks like 'Send an email,' 'Generate an image,' or 'Optimize an image,' which do not need to be preserved after being processed.
- Error-focused Retention: Message TTL (Time-To-Live) or Dead Letter Exchange (DLX) are functions for when messages are not successfully processed.
  (Permanent retention is more for exception handling and retry logic, not the core purpose)

Note: RabbitMQ, adapting to the times, introduced a new queue type called Streams.
It provides offset-based non-destructive consumption similar to Kafka, operating like logs.
However, RabbitMQ is fundamentally used for the 'consume-remove' pattern.

3. RabbitMQ's Fanout Relies on 'Exchange'

RabbitMQ fanout operates differently from Kafka’s. The core of RabbitMQ's routing capabilities lies in the Exchange.

Producers send messages to Exchanges rather than directly to a Queue. Based on its type and rules, the Exchange determines which queue to send messages to.

fanout Exchange: Copies and sends messages to all queues bound to it. It's the purest form of broadcast model.
topic Exchange: Matches routing keys with binding patterns using wildcards (*, #) and sends messages to queues meeting the criteria (multicast).
direct Exchange: Sends messages only to queues whose binding key exactly matches the routing key (unicast).

Core: In RabbitMQ, smart brokers (Exchange) interpret routing rules intelligently to distribute messages.
In contrast, in Kafka, producers specify topics, and smart consumers group together to fetch messages.