Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
162 views
in Technique[技术] by (71.8m points)

Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?

Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

question from:https://stackoverflow.com/questions/27918065/why-do-kafka-consumers-connect-to-zookeeper-and-producers-get-metadata-from-bro

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...