Where the Problem Started
Kafka looks complicated at first because it introduces many terms at once. The core idea is simpler: place a durable event log between services that would otherwise be coupled directly, and separate producer and consumer responsibilities. This post organizes the first structure I needed to understand Kafka.
Start with Kafka installation.
Implementation Path
brew install kafka
Dependencies needed for kafka, such as zookeeper, are all installed.
Before using Kafka, it helps to define pub/sub.
pub/sub is short for publish/subscribe, and it is a messaging paradigm used in software architecture. A message is data passed from a publisher to a subscriber in response to an event, command, or processing flow. A rough analogy is React’s useEffect.
function ExampleComponent() {
const [count, setCount] = useState(0);
// useEffect hook to update the document title
useEffect(() => {
// Update the document title using the browser API
document.title = `You clicked ${count} times`;
});
return (
<div>
<p>You clicked {count} times</p>
<button onClick={() => setCount(count + 1)}>
Click me
</button>
</div>
);
}
Conceptual Takeaway
If Kafka is treated as only a message queue, the meaning of partitions, offsets, and consumer groups becomes easy to miss. Kafka is a log-based system that stores events, lets them be replayed, and allows consumers to process them at their own pace. That perspective is what makes failure handling and scaling strategies easier to understand later.
This is a simple example. When the count value changes, the page reacts. Read through a pub/sub lens, the state change becomes the message that triggers the subscriber.
In this analogy, the React component is the subscriber, and useEffect is the subscription to state changes. Each count update becomes a message that the subscriber processes.
When the button below is clicked, the setCount function changes count and can be seen as “publishing” a message.
In this way, the process in which the side that subscribed handles a published message according to some logic is the pub/sub pattern.
The next concept is the message queue. It looks similar to pub/sub, but the difference is in how consumers receive messages.
In a message queue, a consumer processes a message, and that message is sent from one producer to one consumer. In other words, the message stays in a queue, and when a consumer is ready to process it, exactly one consumer processes the message once. Examples include RabbitMQ, ZeroMQ, Amazon SQS, and IBM MQ.
In pub/sub, a message is published to a topic while the producer does not know what subscribers exist, and every subscriber takes that message.
Examples include Apache Kafka, Google Pub/Sub, and MQTT.
In conclusion, a message queue is suitable for one-to-one message consuming where order matters, while pub/sub is suitable for environments where scalability and real-time message consuming are important.
It may feel inefficient that all messages in pub/sub spread out to subscribers. However, this is offset by the advantage that messages can be produced without needing to know who the subscribers are, while enabling broad subscriber coupling and processing.
The following are Kafka’s objects.
Topic: a named stream/category for messages with a specific purpose.
Partition(leader): A subfolder under a topic, and the unit divided for distribution and scalability. Both read and write operations happen on the leader.
Replica(follower): A copy of the leader, replicated from the leader by pull. It feels somewhat like a slave DB.
Producer: The entity that publishes data.
Consumer: The entity that subscribes to data.
Offset: It acts like a bookmark that lets a consumer know how far it has read the data.